However well we protect our systems, there is always a chance they will be penetrated. Constructing practical distributed systems that achieve their goals even after being penetrated is a challenge. During the last few years, there has been considerable progress in the design of intrusion-tolerant (Byzantine) replication systems. The current state of the art can perform well on small-scale systems that are usually confined to local area networks. This talk presents recent progress scaling Byzantine replication to wide area networks.
We present Stewards, the first hierarchical Byzantine replication architecture tailored to systems that span multiple wide area sites, each consisting of several replicas. Steward provides excellent performance, including latency and throughput not far from standard (non-Byzantine) replication systems, and dramatically improves availability and manageability. However, Steward suffers from drawbacks commonly exhibited by "first of its kind" systems: it is overly engineered to achieve high performance and is non-customizable.
Constructing logical machines out of collections of physical machines is a well-known technique for improving the robustness and fault tolerance of distributed systems. Based on the Steward work and the logical machine concept, we present a much simpler, customizable architecture for scalable Byzantine replication, paying the price of a small reduction in performance.
In this customizable architecture, the physical machines in each site implement a logical machine by running a local state machine replication protocol, and a wide-area replication protocol runs among the logical machines. This affords free substitution of the fault tolerance method used in each site (Byzantine or benign) and in the wide-area replication protocol, allowing one to balance performance and fault tolerance based on perceived risk.
Joint work with Brian Coan, Claudiu Danilov, Danny Dolev, Jonathan Kirsch, John Lane, Cristina Nita-Rotaru, Josh Olsen, and David Zage.
Yair Amir is a professor of computer science at Johns Hopkins University, heading the Distributed Systems and Networks lab (www.dsn.jhu.edu). His research goal is to understand the challenges, invent algorithms, and construct software tools that enable high-performance, robust, secure, and survivable distributed systems. Yair was the initiator of the Spread group communication toolkit, which is used in thousands of installations around the world in commercial, academic, and government settings. He also led the development of Secure Spread, including the first robust key agreement protocols, as well as the SMesh wireless mesh network (www.smesh.org), the first seamless 802.11 mesh with fast lossless handoff, the Spines overlay platform (www.spines.org), and the Wackamole and Backhand N-way failover and load-balancing projects (www.backhand.org).
Yair has been a member of the program committees of the IEEE International Conference on Distributed Computing Systems (1999, 2002, 2005-07), the ACM Conference on Principles of Distributed Computing (2001), and the International Conference on Dependable Systems and Networks (2001, 2003, 2005). Over the years, Yair headed several DARPA projects, winning the Dynamic Coalitions program "Bytes for Buck" trophy from DARPA in 2002, and was nominated for the agency-wide "Performer with Significant Technical Achievement" DARPA award in 2004.
His current research on the analysis, design, and construction of systems and networks that can survive insider attacks is funded by the CyberTrust program of NSF.