?

Multi-Coverage Metagenome Binning Greatly Outperforms Single-Coverage Binning

A recent research paper demonstrates that multi-coverage binning methods can produce more bins of higher quality than single-coverage binning methods, with a greater proportion of archaea and increased diversity of taxa.

Share This Post

Table of Contents

Key Takeaways
  • 42 rumen microbiome samples were assembled and binned. Single-coverage bins show an increased level of contamination (22.5%) vs multi-coverage bins (3.5%).

  • One phylum, 2 classes, 3 orders, 9 families, 35 genera, and 96 species were found only in multi-coverage bins.

  • Single-coverage bins contain a large number of hidden contaminants, whereas multi-coverage bins perform much better.

  • Metagenomic binning should be performed using multi-coverage data whenever possible, and significant effort must be always put into quality control and filtering.
Title and Copyright Information

Attention: all answers provided below are solely based on this research paper and thus please be mindful of any potential biases.

Mattock, J., Watson, M. A comparison of single-coverage and multi-coverage metagenomic binning reveals extensive hidden contamination. Nat Methods (2023). https://doi.org/10.1038/s41592-023-01934-8

What is Metagenome Binning?

A metagenome is a collection of all DNA from all organisms in a sample. Metagenome binning is a process of grouping DNA reads or contigs from a metagenomic sample into individual microbial genomes. This is done by comparing the sequences of the reads or contigs to known genomes and using statistical methods to identify groups of reads that are likely to come from the same organism.

Metagenome binning is extremely useful for probing the function of uncultured microbes in a sample. As sequencing technology continues to improve, metagenome binning is likely to become an even more important tool for understanding the ecology of microbes and their role in human and planetary health.

But challenges exist in applying metagenome binning:

  • Sequencing errors can make it difficult to identify the correct sequences for each organism.
  • Organisms that are present in low abundance in a sample may not be represented in the metagenome reads or contigs.
  • Metagenomic samples often contain a wide variety of organisms, which can make it difficult to identify the correct bins.
Q&A
Q1: What is the hypothesis of this study?

Metagenomic binning has become an essential tool for exploring the composition and function of microorganisms. In this study, the authors propose a hypothesis: when using single-coverage binning, certain contigs may be incorrectly grouped together, because they only appear in a single sample. They believe these errors are due to undetectable contamination and can be identified using multi-coverage data. 

Single-sample assembly provides coverage information that is insufficient to differentiate between conspecific microorganisms. On the other hand, multi-sample co-assembly (multi-coverage) improves binning accuracy. However, utilizing abundance information from multi-sample co-assembly requires more computational resources, resulting in higher time and monetary costs.

Q2: How much can multi-coverage binning reduce the contamination and increase the number of bins compared to single-coverage binning?

In their study, the authors conducted assembly and binning of 42 rumen microbiome samples using two different strategies: single-coverage and multiple-coverage binning. They kept all other parameters constant for both approaches. The analysis revealed that there was no significant difference in the distributions of completeness scores between the single-coverage and multi-coverage bins.

However, important findings emerged from the results: the single-coverage bins showed a higher level of contamination. Specifically, 22.5% (1,273 out of 5,658) of the single-coverage bins had a contamination score of 5 or higher, while only 3.5% (293 out of 8,420) of the multi-coverage bins exhibited the same level of contamination. The single-coverage approach generated 931 filtered bins, while the multi-coverage approach generated 1,660, a 78% increase. 

This observation suggests that the multi-coverage binning method appears to be more effective in reducing the number of contaminated bins and increasing the total number of high-quality bins, potentially leading to more accurate and reliable results in the analysis of rumen microbiome samples.

Q3: What taxa are missing in single-coverage binning?

In multi-coverage binning, the proportion of Archaea (4.3%) was higher than in single-coverage binning (3.1%). Notably, the multi-coverage bins revealed one phylum (Patescibacteria), two classes (Endomicrobia and Saccharimonadia), three orders, nine families, 35 genera, and 96 species that were not found in single-coverage bins. In contrast, only two genera and 11 species were unique to single-coverage bins. 

After the dereplication of the bins at the species and strain level, single-coverage bins identified 460 species and 573 strains, while multi-coverage bins found 682 species and 943 strains. Dereplicating all bins together resulted in 700 species, of which 240 were exclusively in multi-coverage bins and 18 solely in single-coverage bins. At the strain level, a total of 969 strains were detected, with 398 found exclusively by multi-coverage and 23 only by single-coverage. This highlights that leveraging coverage data from multiple samples aids in recovering missed species and strains.

Q4: Any challenges in applying multi-coverage binning?

There were some challenges in the implementation of the method, such as computational burden, selection of appropriate cutoff values, and possible loss of mobile genetic elements. However, the results of this study clearly suggest that metagenomic binning should be performed using multi-coverage data whenever possible; and in all cases, significant effort must be put into quality control and filtering beyond existing methods (e.g., CheckM and GUNC), as neither single-copy core genes nor taxonomic methods are able to detect hidden contaminants that are undetectable by statistical methods.

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Admin

Holidays 2024

This is a summary of official holidays of 2024 in Denmark, the UK, and China. We use this calendar to count the number of workdays spent on a project.

Read More »
Technical documents

Microbial Genome Sequencing

Sequencing a microbial genome has never been as easy and affordable as now. We provide a complete sequencing package with the basic bioinformatics analysis included. Our professional services allow you to be hustle-free and focus on your critical tasks.

Read More »
Technical documents

Microbial Community Profiling

Microbial life dominates our planet Earth in terms of quantity and biodiversity. The very first step to understanding them is to know who they are. This can be addressed via high-throughput amplicon sequencing of a few conservative biomarker genes.

Read More »

live a life as light as microbes

Do You Want To share your project or recieve a quote?