research overview | Statistical Diversity Lab

for statisticians

The Statistical Diversity Lab (est. 2018), is my research group in the Department of Biostatistics at the University of Washington. We apply our statistical training to develop methods for the analysis of biodiversity data, with a particular emphasis on the microbiome. Microbes play a critical role in a wide variety of human diseases and environmental outcomes, and manipulating the microbiome can be as easy as washing your hands or taking a probiotic. We believe that the microbiome is the most exciting frontier of human and environmental health research, and that's why it is the focus of our methods, theory and collaborative work.

My research background is in statistical inference on non-Euclidean metric spaces, discrete data models, applied probability, and hierarchical modelling. The more recent work of my group focuses on modeling relative abundance, diversity, phylogenetic uncertainty and batch effects, and developing statistical inference procedures for amplicon, whole-genome and metabolomic datasets. We believe that there are many un/underused data structures in microbial sequencing studies that have the potential to greatly improve microbiome modeling. We develop and sustain long-term collaborative relationships with outstanding microbial ecologists to keep our methodological and theoretical work relevant. Methodological development for microbial ecology demands expertise in compositional data analysis, networked data, high dimensional data, constrained estimation, and missing data, and we love learning from our colleagues in statistics, probability and applied math to expand our skills in these areas.

We care about producing high-quality software and making our papers reproducible. Please check out our github page to find data and code for our papers. If you cannot find the resources you need to reproduce our results, contact us!

~ Amy D. Willis, Ph.D., Associate Professor

For prospective students

Unfortunately I cannot admit students directly to my research group.

If you are interested in joining the StatDivLab, that's wonderful! We have many projects available for motivated and curious students.

Please apply to the PhD program in Biostatistics at the University of Washington, or to the PhD program in Statistics at the University of Washington to join the lab. When you are invited for an interview or visit, please let me know and we can set up a time to meet and discuss projects.

for biologists

The Statistical Diversity Lab (est. 2018), is the research group of Amy Willis, Ph.D., an Associate Professor in the Department of Biostatistics at the University of Washington. We aim to bring our skills in statistics and computational biology to improve the way that microbial ecologists analyse microbiome data. We want to develop tools that enable biological discovery while maintaining the highest standards of statistical rigour and ethics.

The methods that are typically used to analyse microbiome data were borrowed from ecology. In classical ecology experiments, a scientist would cordon off a section of a rainforest and count the number of (for example) frogs in the section. The species of each frog could be easily identified, most species of frog were probably represented in the portion, the scientist could return the next day and find a fairly similar result, and if a new frog showed up it could be brought back to the lab to be study more closely. In contrast, a microbial ecologist cannot directly observe the microscopic organisms in an ecosystem. She has to take a tiny sample, break apart the cells (thus changing the sample), sequence that genetic information (incurring errors along the way), determine which microbes probably contributed that genetic information (possibly getting it wrong)... and that's just to construct the data for analysis! Furthermore, most microbes cannot be grown in the lab, and so validating surprising findings can be challenging, or it can be impossible.

Classical ecology tools (such as analysing biodiversity and community composition) didn't require complex error modelling. However, using these tools for microbiome data is fraught with problems. Instead of samples of 100 or 200 specimens, high throughput sequencing generates hundreds of thousands or millions of sequences. However, since the most commonly used models are underdispersed relative to the data, standard errors are almost always zero. The result is that ecologists see incredibly small p values, but replicating results is rare.

We therefore see a pressing need for new statistical methods to answer the questions asked by biologists and clinicians studying microbiomes. By understanding the data generating mechanism and building this understanding into statistical methods, we aim to make the results of microbiome experiments consistent across independent experiments. Coupled with methodological development, we see outreach and collaboration as an important part of this mission.

We care about producing high-quality software and making our papers reproducible. We generally post data and code on our github page, but if you cannot find the resources you need to reproduce our results, please contact us.

for the public

Science moves forward through an iterative process of positing hypotheses and using data to evaluate them. Evaluating the evidence for a hypothesis is often extremely complex, and specialised or custom methods are needed. The Statistical Diversity Lab develops specialised methods for the analysis of data collected for studying biodiversity.

In the 21st century, many scientists studying biodiversity are not directly observing the creatures they study. Often, we will be looking for their molecular signatures - inferring their presence and features, rather than directly observing them. These inferences about life contain both random and systematic error. As a result, much of our work focuses on mathematically describing the probability of these errors. These descriptions are called statistical models. We propose, construct and evaluate statistical models for biodiversity data, and apply them to data - ultimately allowing us to evaluate scientific hypotheses. A particular focus of our lab is the unseen world of bacteria. Communities of bacteria are everywhere, and bacteria play key roles in the human digestive and immune systems, as well as in food production and global climate cycles. Modern microbiological data is extremely complex, and that's why we find statistical models for this data so interesting and challenging to work with.

We have incredible collaborators, including in human health (with the Minot Lab), toxicology (with the Cui Lab), soil science (with the Whitman Lab), computational biology (with the Meren Lab), animal migration (with the SAiVE Lab), marine science (with the Kelly Lab), human-animal-microbe interactions (with the One Health Center), as well as with other statisticians (such as the Witten Lab).

We are grateful for the support of the National Institutes of Health though grant R35GM1334120, which funds much of our methods development.

If you have questions about our work, I would love to hear from you. Please reach out to me via email.