Topic modeling for indentifying variation in microbial communities in plants and soil environment
Next-Generation Sequencing (NGS)-based technology now enables the high-throughput quantification of non-culturable microbial organisms in all environments. The use of microbial sequences has enhanced our understanding of the microbiome interactions at the interface of host-plant and soil environment and their roles in microbial ecology. These implications can be used in the plant microbiome-based agro-management to improve agricultural production, promote plant growth and health, keep resistance against diseases, and environmental stress. Nevertheless, the plant microbiome data pose statistical challenges such as 1) amplification of host contaminations such as plastid and mitochondria, 2) heterogeneous microbiome communities in host-plants, and soil environment, 3) sequencing of unequal sampling depth, 4) latent microbial communities that involve in interactions. We propose a Latent Dirichlet Allocation to identify microbial communities in plant root rhizosphere and endosphere and their interactions at the interface to exchange resources. We illustrate our method by reusing the 16S rRNA gene sequencing data of the rhizosphere and endosphere in nine different plant species with the approach to limit the co-amplification of host-plant contamination across different plant species.
- January 9-10, 2020: International Conference on Environmental and Medical Statistics: Keynote Speaker.
Statistical Challenges in Microbiome Genome-level Data
- We are working on establishing a workflow for analyzing microbiome genome-level data: 16S rRNA gene sequencing data, metagenomics, metatranscriptomics, and metabolomics.