If an off-the-shelf software product exhibits poor dependability due to design faults, software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Thanks to low acquisition costs, even using multiple versions of software in a parallel architecture, a scheme formerly reserved for few and highly critical applications, may become viable for many applications. We have studied the potential dependability gains from these solutions for off-the-shelf database servers. We based the study on the bug reports available for four off-the-shelf SQL servers, plus later releases of two of them. We found that many of these faults cause systematic, non-crash failures, a category ignored by most studies and standard implementations of fault tolerance for databases. Our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products. Only in very few cases would demands that triggered a bug in one server cause failures in another one, and there were no coincident failures in more than two of the servers. Use of different releases of the same product would also tolerate a significant fraction of the faults. We report our results and discuss their implications, the architectural options available for exploiting them and the difficulties that they may present.
We developed a protocol for diverse database replication, which allows for tolerating non-crash failures, and using it measured the performance implications of using diversity by comparing the performance of the diverse and non-diverse replication solutions on TPC-C, a database performance benchmark application for on-line transaction processing. We found that the performance penalty for using diversity may be significant when full dependability assurance is required. We explored various ways of minimizing this penalty by trading off dependability assurance for performance improvement and report in the talk some initial results.
Peter Popov holds a PhD degree in Computer Science from the National Technical University "KPI," Ukraine (1989). He graduated from the same University in 1982. Currently he is Reader in Systems Dependability at the Centre for Software Reliability, City University, London. Before joining CSR in 1997, Peter was an Associate Professor at the Institute of Computer and Communication Systems of the Bulgarian Academy of Sciences, Sofia; that was preceded by 6 years of experience in industry.
His main research interests are in methods for assessing software reliability both via probabilistic modeling and experimentally, more specifically of software systems built using design diversity. Peter has been involved in various national (UK, Bulgarian, and Ukrainian) and European projects dealing with dependability of computer systems. He has published numerous papers and served on the program committees of several international conferences dealing with computer dependability and software engineering.
Peter is currently on a sabbatical leave, part of which he is spending at CSL.