16S rRNA Database

Which database do we use for 16S rRNA sequences?

16S-ID Database Contents

Standardized 16S rRNA gene sequence representing reference taxa

All sequences span end-to-end between the standard full-length 16S PCR primers (27F-1492R)
Each reference taxon is represented by a single representative 16S sequence (i.e., non-redundant).

Reference taxa mean

Currently validly published taxonomic names
Some of the invalid names (that are likely representing distinct species).
Candidatus taxa
Unnamed phylotypes that do not belong to the above. These include 16S amplicons and genome sequences.

Complete taxonomic hierarchy is given for all 16S sequences (from species to phylum). The hierarchy is based on the maximum likelihood phylogenetic tree of 16S with consideration of the currently accepted classification.

Source of 16S-ID database

We have utilized multiple sources to collect the representative 16S sequences of the highest quality. The sources include:

NCBI 16S amplicon sequences of validly published taxa: e.g., AY692362 for Adiaceo aphidicola
NCBI 16S amplicon sequences of phylotypes: e.g., AJ290038 for AJ290038_s (phylotype corresponding species)
16S sequences extracted from NCBI genome assembly: e.g., CP000238 for Baumannia cicadellinicola.
16S sequences extracted from JGI genome assembly (this genome data may not be available in NCBI): e.g. jgi.1096475 for phylotype jgi.1096475_s in the genus Geodermatophilus.
16S sequences compiled from Pacific Biosciences full-length sequencing of microbiome samples. These represent high-quality 16S sequences using PacBio’s circular consensus sequencing (ccs) technology: e.g. PAC001304 for phylotype PAC001304_s.
16S sequences from the genomospecies (e.g. CP014326_s). These are tentatively new species supported by whole genome sequences [Learn more].

Consequently, not all data are available in the NCBI database.

Why do we use 16S sequences extracted from genome assemblies instead of PCR-derived sequences in some cases?

Since typical WGS projects produce 50X or higher sequencing depth, the sequences provided within genome assemblies are usually more accurate than the sequences determined by PCR + Sanger sequencing which produces 1-5X sequencing depth.
When we include genome sequence-derived 16S in the 16S-ID database, we always check the quality by manual alignment using secondary structural information. In our experience, using genome sequences can improve the quality of 16S databases for reference purposes.

Database Citations

Our database has been introduced in the following three publications. The numbers of citations are as of Mar. 28, 2023:

Yoon, S. H., Ha, S. M., Kwon, S., Lim, J., Kim, Y., Seo, H. & Chun, J. (2017). Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int J Syst Evol Microbiol 67, 1613-1617 [Learn more]. It was the No. 1 cited paper published in 2017 in the “Microbiology” & related category (by the Web of Science; 298 times cited out of 57,143 publications). 5,315 times cited by Google Scholar. Check out the publications citing this article.

Kim, O. S., Cho, Y. J., Lee, K., Yoon, S. H., Kim, M., Na, H., Park, S. C., Jeon, Y. S., Lee, J. H.& other authors (2012). Introducing EzTaxon-e: a prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. Int J Syst Evol Microbiol 62, 716-721 [Learn more]. It was the No. 1 cited paper published in 2012 in the “Microbiology” & related category (by the Web of Science; 3,045 times cited out of 25,438 publications). 5,458 times cited by Google Scholar. Check out the publications citing this article.

Chun, J., Lee, J. H., Jung, Y., Kim, M., Kim, S., Kim, B. K. & Lim, Y. W. (2007). EzTaxon: a web-based tool for the identification of prokaryotes based on 16S ribosomal RNA gene sequences. Int J Syst Evol Microbiol 57, 2259-2261 [Learn more]. 2,448 times cited by Google Scholar. Check out the publications citing this article.

Previous16S rRNA Resolution NextGenome Identification

Last updated 1 year ago

hashtag16S-ID Database Contents

hashtagStandardized 16S rRNA gene sequence representing reference taxa

hashtagReference taxa mean

hashtagSource of 16S-ID database

hashtagWhy do we use 16S sequences extracted from genome assemblies instead of PCR-derived sequences in some cases?

hashtagDatabase Citations