16S rRNA Database
Which database do we use for 16S rRNA sequences?
Last updated
Which database do we use for 16S rRNA sequences?
Last updated
All sequences span end-to-end between the standard full-length 16S PCR primers ()
Each reference taxon is represented by a single representative 16S sequence (i.e., non-redundant).
Currently validly published taxonomic names
Some of the invalid names (that are likely representing distinct species).
Candidatus taxa
Unnamed phylotypes that do not belong to the above. These include 16S amplicons and genome sequences.
Complete taxonomic hierarchy is given for all 16S sequences (from species to phylum). The hierarchy is based on the maximum likelihood phylogenetic tree of 16S with consideration of the currently accepted classification.
We have utilized multiple sources to collect the representative 16S sequences of the highest quality. The sources include:
NCBI 16S amplicon sequences of validly published taxa: e.g.,for Adiaceo aphidicola
NCBI 16S amplicon sequences of phylotypes: e.g.,for AJ290038_s (phylotype corresponding species)
16S sequences extracted from NCBI genome assembly: e.g.,for Baumannia cicadellinicola.
16S sequences extracted from JGI genome assembly (this genome data may not be available in NCBI): e.g.for phylotype jgi.1096475_s in the genus Geodermatophilus.
16S sequences compiled from Pacific Biosciences full-length sequencing of microbiome samples. These represent high-quality 16S sequences using PacBio’s circular consensus sequencing (ccs) technology: e.g. PAC001304 for phylotype.
16S sequences from the genomospecies (e.g.). These are tentatively new species supported by whole genome sequences [].
Consequently, not all data are available in the NCBI database.
Since typical WGS projects produce 50X or higher sequencing depth, the sequences provided within genome assemblies are usually more accurate than the sequences determined by PCR + Sanger sequencing which produces 1-5X sequencing depth.
When we include genome sequence-derived 16S in the 16S-ID database, we always check the quality by manual alignment using secondary structural information. In our experience, using genome sequences can improve the quality of 16S databases for reference purposes.
Our database has been introduced in the following three publications. The numbers of citations are as of Mar. 28, 2023:
Yoon, S. H., Ha, S. M., Kwon, S., Lim, J., Kim, Y., Seo, H. & Chun, J. (2017). Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int J Syst Evol Microbiol 67, 1613-1617 []. It was the No. 1 cited paper published in 2017 in the “Microbiology” & related category (by the Web of Science; 298 times cited out of 57,143 publications). 5,315 times cited by Google Scholar.
Kim, O. S., Cho, Y. J., Lee, K., Yoon, S. H., Kim, M., Na, H., Park, S. C., Jeon, Y. S., Lee, J. H.& other authors (2012). Introducing EzTaxon-e: a prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. Int J Syst Evol Microbiol 62, 716-721 []. It was the No. 1 cited paper published in 2012 in the “Microbiology” & related category (by the Web of Science; 3,045 times cited out of 25,438 publications). 5,458 times cited by Google Scholar.
Chun, J., Lee, J. H., Jung, Y., Kim, M., Kim, S., Kim, B. K. & Lim, Y. W. (2007). EzTaxon: a web-based tool for the identification of prokaryotes based on 16S ribosomal RNA gene sequences. Int J Syst Evol Microbiol 57, 2259-2261 []. 2,448 times cited by Google Scholar.