# 16S rRNA Database

## 16S-ID Database Contents <a href="#span-classeztocsection-id16siddatabasecontentsspan16sid-database-contentsspan-classeztocsectionendsp" id="span-classeztocsection-id16siddatabasecontentsspan16sid-database-contentsspan-classeztocsectionendsp"></a>

### Standardized 16S rRNA gene sequence representing reference taxa <a href="#span-classeztocsection-idstandardized16srrnagenesequencerepresentingreferencetaxaspanstandardized-16" id="span-classeztocsection-idstandardized16srrnagenesequencerepresentingreferencetaxaspanstandardized-16"></a>

* All sequences span end-to-end between the standard full-length 16S PCR primers ([27F-1492R](https://help.ezbiocloud.net/16s-rrna-and-16s-rrna-gene/?ref=blog.ezbiocloudpro.com))
* Each reference taxon is represented by a single representative 16S sequence (i.e., non-redundant).

### Reference taxa mean <a href="#span-classeztocsection-idreferencetaxameanspanreference-taxa-meanspan-classeztocsectionendspan" id="span-classeztocsection-idreferencetaxameanspanreference-taxa-meanspan-classeztocsectionendspan"></a>

* Currently validly published taxonomic names
* Some of the invalid names (that are likely representing distinct species).
* Candidatus taxa
* Unnamed phylotypes that do not belong to the above. These include 16S amplicons and genome sequences.

Complete **taxonomic hierarchy** is given for all 16S sequences (from species to phylum). The hierarchy is based on the maximum likelihood phylogenetic tree of 16S with consideration of the currently accepted classification.

## Source of 16S-ID database <a href="#span-classeztocsection-idsourceof16siddatabasespansource-of-16sid-databasespan-classeztocsectionends" id="span-classeztocsection-idsourceof16siddatabasespansource-of-16sid-databasespan-classeztocsectionends"></a>

We have utilized multiple sources to collect the representative 16S sequences of the highest quality. The sources include:

* NCBI 16S amplicon sequences of validly published taxa: e.g.,[ AY692362 ](https://www.ezbiocloud.net/16SrRNA?ac=AY692362\&ref=blog.ezbiocloudpro.com)for Adiaceo aphidicola
* NCBI 16S amplicon sequences of phylotypes: e.g.,[ AJ290038 ](https://www.ezbiocloud.net/16SrRNA?ac=AJ290038\&ref=blog.ezbiocloudpro.com)for AJ290038\_s (phylotype corresponding species)
* 16S sequences extracted from NCBI genome assembly: e.g.,[ CP000238 ](https://www.ezbiocloud.net/16SrRNA?ac=CP000238\&ref=blog.ezbiocloudpro.com)for Baumannia cicadellinicola.
* 16S sequences extracted from JGI genome assembly (this genome data may not be available in NCBI): e.g.[ jgi.1096475 ](https://www.ezbiocloud.net/16SrRNA?ac=jgi.1096475\&ref=blog.ezbiocloudpro.com)for phylotype jgi.1096475\_s in the genus Geodermatophilus.
* 16S sequences compiled from Pacific Biosciences full-length sequencing of microbiome samples. These represent high-quality 16S sequences using PacBio’s circular consensus sequencing (ccs) technology: e.g. PAC001304 for phylotype[ PAC001304\_s](https://www.ezbiocloud.net/taxon?tn=PAC001304_s\&ref=blog.ezbiocloudpro.com).
* 16S sequences from the genomospecies (e.g.[ CP014326\_s](https://www.ezbiocloud.net/taxon?tn=CP014326_s\&ref=blog.ezbiocloudpro.com)). These are tentatively new species supported by whole genome sequences \[[Learn more](https://help.ezbiocloud.net/genomospecies/?ref=blog.ezbiocloudpro.com)].

Consequently, not all data are available in the NCBI database.

## Why do we use 16S sequences extracted from genome assemblies instead of PCR-derived sequences in some cases? <a href="#span-classeztocsection-idwhydoweuse16ssequencesextractedfromgenomeassembliesinsteadofpcrderivedseque" id="span-classeztocsection-idwhydoweuse16ssequencesextractedfromgenomeassembliesinsteadofpcrderivedseque"></a>

* Since typical WGS projects produce 50X or higher sequencing depth, the sequences provided within genome assemblies are usually more accurate than the sequences determined by PCR + Sanger sequencing which produces 1-5X sequencing depth.
* When we include genome sequence-derived 16S in the 16S-ID database, we always check the quality by manual alignment using secondary structural information. In our experience, using genome sequences can improve the quality of 16S databases for reference purposes.

## Database Citations <a href="#span-classeztocsection-iddatabasecitationsspandatabase-citationsspan-classeztocsectionendspan" id="span-classeztocsection-iddatabasecitationsspandatabase-citationsspan-classeztocsectionendspan"></a>

*Our database has been introduced in the following three publications. The numbers of citations are as of Mar. 28, 2023:*

Yoon, S. H., Ha, S. M., Kwon, S., Lim, J., Kim, Y., Seo, H. & Chun, J. (2017).\
Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int J Syst Evol Microbiol 67, 1613-1617 \[[Learn more](https://www.microbiologyresearch.org/content/journal/ijsem/10.1099/ijsem.0.001755?ref=blog.ezbiocloudpro.com#tab2)]. It was the No. 1 cited paper published in 2017 in the “Microbiology” & related category (by the Web of Science; 298 times cited out of 57,143 publications). 5,315 times cited by Google Scholar.[ Check out the publications citing this article.](https://scholar.google.co.kr/scholar?cites=17895195279591305687\&hl=en\&ref=blog.ezbiocloudpro.com)

Kim, O. S., Cho, Y. J., Lee, K., Yoon, S. H., Kim, M., Na, H., Park, S. C., Jeon, Y. S., Lee, J. H.& other authors (2012).\
Introducing EzTaxon-e: a prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. Int J Syst Evol Microbiol 62, 716-721 \[[Learn more](https://www.microbiologyresearch.org/content/journal/ijsem/10.1099/ijs.0.038075-0?ref=blog.ezbiocloudpro.com#tab2)]. It was the No. 1 cited paper published in 2012 in the “Microbiology” & related category (by the Web of Science; 3,045 times cited out of 25,438 publications). 5,458 times cited by Google Scholar.[ Check out the publications citing this article.](https://scholar.google.co.kr/scholar?cites=11184522402034558074\&hl=en\&ref=blog.ezbiocloudpro.com)

Chun, J., Lee, J. H., Jung, Y., Kim, M., Kim, S., Kim, B. K. & Lim, Y. W. (2007).\
EzTaxon: a web-based tool for the identification of prokaryotes based on 16S ribosomal RNA gene sequences. Int J Syst Evol Microbiol 57, 2259-2261 \[[Learn more](https://www.microbiologyresearch.org/content/journal/ijsem/10.1099/ijs.0.64915-0?ref=blog.ezbiocloudpro.com#tab2)]. 2,448 times cited by Google Scholar.[ Check out the publications citing this article.](https://scholar.google.co.kr/scholar?cites=14018266421910009886\&hl=en\&ref=blog.ezbiocloudpro.com)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://kb.ezbiocloud.net/home/science-blogs/identify/16s-rrna-database.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
