Spike-in: An Internal Standard For Absolute Quantification In NGS Applications

Haiyuan & Yonas
September 8, 2023

Spike-ins are known quantities of DNA added to samples before sequencing. They can be used to quantify the total amount of DNA in a sample, normalize the data across samples, and assess the quality of the sequencing data.

Share This Post

Key Takeaways

Standard amplicon sequencing and shotgun metagenome sequencing only provide relative abundance data on bacterial or gene compositions.
DNA spike-ins enable absolute quantification, fostering a more comprehensive understanding of microbial community dynamics and their correlations with biogeochemical rates.
Utilizing the spike-in method with natural samples provides an accurate estimation of the absolute count of bacterial cells in intricate microbial communities.

Copyright Information

Attention: all information provided are solely based on these research papers and thus please be mindful of any potential biases.

Hardwick et al., 2018. “Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis.” Nature Communications. 9:3096
Zaramela et al., 2022. “synDNA—a Synthetic DNA Spike-in Method for Absolute Quantification of Shotgun Metagenomic Sequencing.” mSystems. 7: 00447-22
Lin et al., 2019. “Towards Quantitative Microbiome Community Profiling Using Internal Standards.” Appl Environ Microbiol 85: e02634-18
Piwosz et al., 2018. “Determining lineage-specific bacterial growth curves with a novel approach based on amplicon reads normalization using internal standard (ARNIS).” The ISME J. 12:2640–2654.
Shen et al. 2022. “An improved workflow for accurate and robust healthcare environmental surveillance using metagenomics.” Microbiome 10.1: 1-18.

What is Spike-in?

In microbial genomics, a “spike-in” typically refers to a predetermined quantity of DNA added to an unknown DNA sample before sequencing. This spike-in DNA can originate from a single organism or a mixture of organisms. The incorporation of spike-in offers several advantages, enabling researchers to:

Determine the total DNA content in the sample.
Standardize data across multiple samples.
Evaluate the sequencing data’s quality.

Typically, the spike-in DNA is introduced into the sample at a specific concentration, like 1% or 10%. The exact quantity added varies based on the intended application.

Spike-ins serve as a potent tool in microbial genomics research. They facilitate DNA quantification, data normalization across samples, sequencing data quality evaluation, and contamination identification.

Background

Ribosomal rRNA gene-based amplicon sequencing, either 16S for prokaryotes or 18S for microbial eukaryotes, is a standard method for studying microbial diversity. However, this approach yields only compositional data, i.e., taxonomical profiles in relative proportions. This limitation can result in numerous statistical challenges and obstruct the cross-comparison of the swiftly growing communal rRNA gene datasets. Although various transformations (e.g., centered log-ratio transformation) and specialized data analysis methods have been introduced to address these challenges, these solutions often obscure the interpretation of the inherent biological and ecological dynamics.

Exogenous spike-in bacteria (whole cells) or DNA fragments with predetermined sequences and quantities have been suggested for both amplicon and shotgun metagenomic sequencing. Yet, to avoid complications in subsequent analyses, prior knowledge of the sample’s community composition is essential. For instance, marine bacterial cells or genomes might serve as spike-in controls when sequencing freshwater or soil samples. Any commonly shared species across marine and freshwater habitats should be avoided to be used as the internal standards.

Given that DNA extraction methods can produce significant variations in results across species, the spike-in nucleic acid has been advocated for absolute quantification in high-throughput sequencing. Notably, a series of synthetic DNA sequences, bearing minimal resemblance to sequences in the NCBI Nucleotide database, has been crafted. This innovation paves the way for a universal methodology that is independent of microbial composition and dispenses with the need for prior information.

Method Summary

Table 1: Technical Features of the Spike-in Methods for 16S or 18S rRNA Amplicon Sequencing

Reference	Type of spike-in	No. of species in spike-in	Percentage of spike-in DNA or cells
Piwosz et al. 2018	E. coli cells	1	5% of the natural microbial community
Shen et al. 2022	ZymoBIOMICS Microbial Community Standard: Listeria monocytogenes – 12% Pseudomonas aeruginosa – 12% Bacillus subtilis – 12% Escherichia coli – 12% Salmonella enterica – 12% Lactobacillus fermentum – 12% Enterococcus faecalis – 12% Staphylococcus aureus – 12% Saccharomyces cerevisiae – 2% Cryptococcus neoformans – 2%	10	6.50 μL was spiked into 1 mL sample, resulting in the DNA of the species with the highest abundance approximating 1% of the total DNA
Lin et al. 2019	Bacteria genomic DNA	1	1% of the total genomic DNA in the sample

Table 2: Technical Features of the Spike-in Methods for Metagenome Sequencing

Reference	Length of Synthetic DNA	GC Content of Synthetic DNA	No. of Synthetic DNA fragments	No. of DNA Pool (Mock Microbial Communities)	Percentage of Synthetic DNA Used in Samples
Hardwick et al., 2018	~1 to 10 kb	20-71%	86	2	1%
Zaramela et al., 2022*	2 kb	26-66%	10	3	5%

*Plasmids are available at AddGene.

Result Summary

The relative abundance of species, calculated based on read counts (whether amplicon or metagenome), inadequately represents the true microbial community composition. Consequently, traditional statistical analyses that require the precise quantification of each species living in a given environment are inapplicable to such samples. The spike-in nucleic acid approach facilitates the absolute quantification of the total bacterial cell count with high precision and consistency. An evident advantage of spike-in nucleic acid over spike-in cells is the high reproducibility of results, irrespective of the DNA extraction technique used.

Synthetic DNA (synDNA), bearing minimal similarity to sequences in the NCBI nucleotide database, holds promise for crafting a universal methodology that’s independent of microbial composition and eliminates the need for prior knowledge. This synDNA can be exchanged among labs, ensuring consistent results. Absolute quantification proves more precise when a pool of multiple synthetic DNA molecules serves as a reference compared to using a singular synthetic DNA molecule. The total read count is aligned to each synDNA sequence, and the synDNA’s dilution is employed to derive a linear model for every experiment. These linear models aid in estimating the overall genome copies for each organism in the community. No linear models materialized when only one synDNA sequence or a single genomic DNA or cell was employed as the internal benchmark. The synDNA pool exhibited high coefficients of determination (R^2 > 0.96) and was statistically significant (P < 0.01).

The correlation between the predicted percentage of microbial species via the spike-in method and the cell counts measured with the flow cytometer was significant, exceeding 0.99 for all investigated natural samples. Similar significant correlations were also observed in phytoplankton abundances between the 18S rRNA gene amplicon sequencing-based method and that inferred from Chl a. concentrations.

Conclusion

The spike-in method facilitates precise predictions of the absolute count of bacterial cells in multifaceted microbial communities. It augments the statistical and subsequently ecological interpretation of the burgeoning microbiome datasets.

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Case Studies

Quarterly Campaign Featuring Sequencing Library Pooling

Our budget sequencing services are highly popular with our clients. We provide quarterly sequencing campaigns at unmatched discounts, significantly reducing costs through pooling various sequencing libraries.

Admin

Holidays 2024

This is a summary of official holidays of 2024 in Denmark, the UK, and China. We use this calendar to count the number of workdays spent on a project.

Technical documents

Microbial Genome Sequencing

Sequencing a microbial genome has never been as easy and affordable as now. We provide a complete sequencing package with the basic bioinformatics analysis included. Our professional services allow you to be hustle-free and focus on your critical tasks.