The purpose of this post is to answer some questions that I often receive by email about my paper "Rarefaction, Alpha Diversity, and Statistics" (Willis, 2019, Frontiers in Microbiology). If I have responded to you by email with this link, thanks for understanding that people from all over the world reach out to me on a daily basis with questions about microbiome data analysis, statistics, and diversity estimation - and I unfortunately can't respond to everyone individually. I will try to update this post periodically, though.
Where's the code to reproduce the figures and analyses?
https://github.com/adw96/antirarefaction
Unfortunately I'm not maintaining this repository, but it's there as a reference for you! As you will see below, I maintain other software that implements the tools that I used in the paper.
I want to estimate total diversity and its uncertainty. How?
There are lots of estimators out there, and many of them are very good! I wrote a R package that implements some of my favourites: the package is called breakaway. My personal favorite methods are breakaway::breakaway, breakaway::richness_chao_bunge, and breakaway::objective_bayes_negbin. Usually, when I need to estimate species richness, I use breakaway::breakaway, but you should experiment and see which estimate is best adapted to your data structure. The vignettes in the breakaway package are a good resource for building this knowledge and intuition.
The citation for the package breakaway is Willis & Bunge, 2015, Biometrics, doi.org/10.1111/biom.12332. Depending on what function(s) in the package you use, you may need to cite and credit other work as well.
I want to compare estimates of total diversity including their uncertainty. How?
There are actually relatively few ways of doing this. To fill this gap, I wrote a method called betta, which I implemented in the R package breakaway. I recommend that you check out breakaway::betta. Again, the vignettes in the breakaway package are a good resource for building this knowledge and intuition.
The citation for the function betta is Willis, Bunge & Whitman, 2017, JRSS-C, doi.org/10.1111/rssc.12206.
Why didn't you put all of this in the paper?!
It was years ago and I can't remember why. Potential reasons include that I often err; that journals have space limits; and I didn't want to look like an egomaniac by constantly recommending my own tools (I don't always think they're the best, so I don't always recommend them!).
Why do you always tell me to use your software and methods? What sort of egomaniac are you?!
Fair point! But if I didn't think my tools were worth using, I wouldn't develop them. I won't be offended if you don't use them!
A reviewer/advisor/supervisor told me I have to rarefy! What do I do?
That's a really tough situation to be in! If you don't believe that you should be rarefying, you could consider sharing some of the resources linked above, or some lecture notes on the topic (such as here and here), the original critique of rarefying by statisticians, or some of the alternatives. I've heard from many junior folx who have convinced senior or reviewing folx this way. However, the particulars of your situation will dictate if this is feasible. Good luck!
Will all future blogposts be written in this fake question-answer style?
In drafting this post, I was imagining that this was a fun conversation over coffee at a wonderful conference in a fabulous location, rather than me alone in my home office in a pandemic. So... probably!
Be safe and well!
Dr. Amy D Willis, Ph.D.
24 June 2021