Abstract: Phylogenetic trees represent the evolutionary history of a set of species (or genes), and are important for much biological research. Large-scale phylogenetic estimation (whether of many taxa or smaller numbers but with full genomes) presents a wide range of "Big Data" issues, including data heterogeneity, missing data, hard optimization problems, and other challenges. In my lab, we have developed a number of new algorithmic approaches for handling large datasets, including novel divide-and-conquer strategies for co-estimating alignments, for estimating trees without alignments, for estimating species trees from multiple gene trees, and for taxon identification of metagenomic data. Some of these methods have strong statistical guarantees, while others show empirical advantages over existing methods. Finally, I will discuss the empirical challenges in analyzing real biological datasets.
Bio: Tandy Warnow is the David Bruton Jr. Centennial Professor of Computer Sciences at the University of Texas at Austin. Her research combines mathematics, computer science, and statistics to develop improved models and algorithms for reconstructing complex and large-scale evolutionary histories in biology and historical linguistics. Tandy received her PhD in Mathematics at UC Berkeley under the direction of Gene Lawler, and did postdoctoral training with Simon Tavare and Michael Waterman at USC. Her awards include the NSF Young Investigator Award (1994), the David and Lucile Packard Foundation Award (1996), a Radcliffe Institute Fellowship (2006), and a Guggenheim Fellowship (2011). She served as the Chair of the BDMA Study Section at NIH (2010-2012), and was the lead program director for BIG DATA at NSF (2012-2013).