# 16S rRNA Database

## 16S-ID Database Contents <a href="#span-classeztocsection-id16siddatabasecontentsspan16sid-database-contentsspan-classeztocsectionendsp" id="span-classeztocsection-id16siddatabasecontentsspan16sid-database-contentsspan-classeztocsectionendsp"></a>

### Standardized 16S rRNA gene sequence representing reference taxa <a href="#span-classeztocsection-idstandardized16srrnagenesequencerepresentingreferencetaxaspanstandardized-16" id="span-classeztocsection-idstandardized16srrnagenesequencerepresentingreferencetaxaspanstandardized-16"></a>

* All sequences span end-to-end between the standard full-length 16S PCR primers ([27F-1492R](https://help.ezbiocloud.net/16s-rrna-and-16s-rrna-gene/?ref=blog.ezbiocloudpro.com))
* Each reference taxon is represented by a single representative 16S sequence (i.e., non-redundant).

### Reference taxa mean <a href="#span-classeztocsection-idreferencetaxameanspanreference-taxa-meanspan-classeztocsectionendspan" id="span-classeztocsection-idreferencetaxameanspanreference-taxa-meanspan-classeztocsectionendspan"></a>

* Currently validly published taxonomic names
* Some of the invalid names (that are likely representing distinct species).
* Candidatus taxa
* Unnamed phylotypes that do not belong to the above. These include 16S amplicons and genome sequences.

Complete **taxonomic hierarchy** is given for all 16S sequences (from species to phylum). The hierarchy is based on the maximum likelihood phylogenetic tree of 16S with consideration of the currently accepted classification.

## Source of 16S-ID database <a href="#span-classeztocsection-idsourceof16siddatabasespansource-of-16sid-databasespan-classeztocsectionends" id="span-classeztocsection-idsourceof16siddatabasespansource-of-16sid-databasespan-classeztocsectionends"></a>

We have utilized multiple sources to collect the representative 16S sequences of the highest quality. The sources include:

* NCBI 16S amplicon sequences of validly published taxa: e.g.,[ AY692362 ](https://www.ezbiocloud.net/16SrRNA?ac=AY692362\&ref=blog.ezbiocloudpro.com)for Adiaceo aphidicola
* NCBI 16S amplicon sequences of phylotypes: e.g.,[ AJ290038 ](https://www.ezbiocloud.net/16SrRNA?ac=AJ290038\&ref=blog.ezbiocloudpro.com)for AJ290038\_s (phylotype corresponding species)
* 16S sequences extracted from NCBI genome assembly: e.g.,[ CP000238 ](https://www.ezbiocloud.net/16SrRNA?ac=CP000238\&ref=blog.ezbiocloudpro.com)for Baumannia cicadellinicola.
* 16S sequences extracted from JGI genome assembly (this genome data may not be available in NCBI): e.g.[ jgi.1096475 ](https://www.ezbiocloud.net/16SrRNA?ac=jgi.1096475\&ref=blog.ezbiocloudpro.com)for phylotype jgi.1096475\_s in the genus Geodermatophilus.
* 16S sequences compiled from Pacific Biosciences full-length sequencing of microbiome samples. These represent high-quality 16S sequences using PacBio’s circular consensus sequencing (ccs) technology: e.g. PAC001304 for phylotype[ PAC001304\_s](https://www.ezbiocloud.net/taxon?tn=PAC001304_s\&ref=blog.ezbiocloudpro.com).
* 16S sequences from the genomospecies (e.g.[ CP014326\_s](https://www.ezbiocloud.net/taxon?tn=CP014326_s\&ref=blog.ezbiocloudpro.com)). These are tentatively new species supported by whole genome sequences \[[Learn more](https://help.ezbiocloud.net/genomospecies/?ref=blog.ezbiocloudpro.com)].

Consequently, not all data are available in the NCBI database.

## Why do we use 16S sequences extracted from genome assemblies instead of PCR-derived sequences in some cases? <a href="#span-classeztocsection-idwhydoweuse16ssequencesextractedfromgenomeassembliesinsteadofpcrderivedseque" id="span-classeztocsection-idwhydoweuse16ssequencesextractedfromgenomeassembliesinsteadofpcrderivedseque"></a>

* Since typical WGS projects produce 50X or higher sequencing depth, the sequences provided within genome assemblies are usually more accurate than the sequences determined by PCR + Sanger sequencing which produces 1-5X sequencing depth.
* When we include genome sequence-derived 16S in the 16S-ID database, we always check the quality by manual alignment using secondary structural information. In our experience, using genome sequences can improve the quality of 16S databases for reference purposes.

## Database Citations <a href="#span-classeztocsection-iddatabasecitationsspandatabase-citationsspan-classeztocsectionendspan" id="span-classeztocsection-iddatabasecitationsspandatabase-citationsspan-classeztocsectionendspan"></a>

*Our database has been introduced in the following three publications. The numbers of citations are as of Mar. 28, 2023:*

Yoon, S. H., Ha, S. M., Kwon, S., Lim, J., Kim, Y., Seo, H. & Chun, J. (2017).\
Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int J Syst Evol Microbiol 67, 1613-1617 \[[Learn more](https://www.microbiologyresearch.org/content/journal/ijsem/10.1099/ijsem.0.001755?ref=blog.ezbiocloudpro.com#tab2)]. It was the No. 1 cited paper published in 2017 in the “Microbiology” & related category (by the Web of Science; 298 times cited out of 57,143 publications). 5,315 times cited by Google Scholar.[ Check out the publications citing this article.](https://scholar.google.co.kr/scholar?cites=17895195279591305687\&hl=en\&ref=blog.ezbiocloudpro.com)

Kim, O. S., Cho, Y. J., Lee, K., Yoon, S. H., Kim, M., Na, H., Park, S. C., Jeon, Y. S., Lee, J. H.& other authors (2012).\
Introducing EzTaxon-e: a prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. Int J Syst Evol Microbiol 62, 716-721 \[[Learn more](https://www.microbiologyresearch.org/content/journal/ijsem/10.1099/ijs.0.038075-0?ref=blog.ezbiocloudpro.com#tab2)]. It was the No. 1 cited paper published in 2012 in the “Microbiology” & related category (by the Web of Science; 3,045 times cited out of 25,438 publications). 5,458 times cited by Google Scholar.[ Check out the publications citing this article.](https://scholar.google.co.kr/scholar?cites=11184522402034558074\&hl=en\&ref=blog.ezbiocloudpro.com)

Chun, J., Lee, J. H., Jung, Y., Kim, M., Kim, S., Kim, B. K. & Lim, Y. W. (2007).\
EzTaxon: a web-based tool for the identification of prokaryotes based on 16S ribosomal RNA gene sequences. Int J Syst Evol Microbiol 57, 2259-2261 \[[Learn more](https://www.microbiologyresearch.org/content/journal/ijsem/10.1099/ijs.0.64915-0?ref=blog.ezbiocloudpro.com#tab2)]. 2,448 times cited by Google Scholar.[ Check out the publications citing this article.](https://scholar.google.co.kr/scholar?cites=14018266421910009886\&hl=en\&ref=blog.ezbiocloudpro.com)
