LogoLogo
EzBioCloud
  • πŸ“„Overview
  • πŸ”†Highlights
  • πŸ”¬Science Blogs
    • Basics
      • Species
      • Species Taxonomy
      • Chimeras
      • Average Nucleotide Identity
      • OrthoANI
      • Genetic Resolution
    • Identify
      • 16S rRNA
      • Identification with 16S rRNA
      • 16S rRNA Resolution
      • 16S rRNA Database
      • Genome Identification
      • Genome Identification Process
      • Multi-Locus Sequence Typing
      • 16S vs Genome Identification
      • Subspecies
      • Phylogenomic Trees
      • Genome Database
      • Quality Control
    • Profile
      • Tetra-Nucleotide Frequencies
      • 16S Copy Number
      • Up-to-date Bacterial Core Genes
      • UBCG Technical Guide
      • UBCG Set
      • Depth of Sequencing
      • Metagenome-Assembled Genomes Suitability
      • 16S Versus Metagenomic Sequencing
      • Microbiomes
    • Detect
      • Clinical Metagenomics
      • Inferring with Amplicons
      • Pathogenicity Markers
      • Antimicrobial Resistance
      • Clinical Report Process
      • Defining a Pathogen
      • Human Pathogens
      • in silico Serotyping
    • Analyze
      • Alpha Diversity
      • Beta Diversity
      • Co-occurrence
      • Enterotyping
      • Taxonomic Composition
    • reAnalyze
      • reAnalyze #1 - Skin Disease
      • reAnalyze #2 - Skin Ageing
      • PreAnalyze #3 - Scalp Dandruff
  • βš—οΈProtocols
    • 16S Identification
      • Get Started
      • Prepare Samples
        • Private Samples
        • Public Samples
      • Navigate Menu
      • Upload Data
        • Single Upload
        • Batch Upload
      • Download Report
    • Genome Identification
      • Get Started
      • Prepare Samples
        • Private Samples
        • Public Samples
        • SRA Samples
      • Navigate Menu
      • Upload Data
        • Whole Genome
        • Illumina
        • Nanopore
      • Download Report
    • Shotgun Microbiome
      • Get Started
      • Download Samples
        • NCBI Route
        • Linux Route
      • Navigate Menus
      • Create Studies
      • Profile Samples
      • Describe Profiles
        • Retrieve Metadata
        • Organize Metadata
        • Upload Metadata
      • Create Datasets
      • Analyze Datasets
        • Quality Check
        • Pie Chart Composition
        • Summary Statistics
        • Group Composition
        • Alpha Diversity
        • Beta Diversity
        • Differential Abundance
        • Enterotype
        • Co-occurrence
        • Co-occurrence Spearman
        • Statistical Matching
        • LEfSe
        • Metadata EDA
        • Profile EDA
    • Clinical Metagenomics
  • πŸ›οΈDr. Chun's Lectures
  • πŸ”§Tools
  • 🧫Taxonomy
  • ❔FAQs
    • Identification
    • Clinical Metagenomics
    • Privacy Policy
    • Terms of Service
Powered by GitBook
LogoLogo

Legal

  • Terms of Service
  • Privacy Policy

EzBioCloudΒ© 2024. All Rights Reserved

On this page
  • What is the 16S copy number and why it matters?
  • How to correct taxonomic profile data using 16S copy numbers
  • Implementation in EzBioCloud 16S-based MTP
  1. Science Blogs
  2. Profile

16S Copy Number

PreviousTetra-Nucleotide FrequenciesNextUp-to-date Bacterial Core Genes

Last updated 1 year ago

What is the 16S copy number and why it matters?

The 16S rRNA gene (16S) has been widely used as a phylogenetic marker, particularly important for the taxonomic profiling of microbiome samples. Unlike other genes that code for proteins, the 16S-coding gene may be present in multiple copies in a single cell. Obviously, a bacterial strain must have at least one gene encoding 16S, but the copy number can go up to 15 (see the below chart). There is a positive correlation between the genome size and 16S copy number.

When we analyze microbiome data using 16S amplicon sequences, all quantitative measures are a form of NGS read counts that are assigned to the known taxa. In this case, we actually count the number of a marker gene, typically 16S, present in a microbiome sample. However, what we eventually want to know is not the number of 16S reads but the number of corresponding cells, or CFU (colony forming units).

How to correct taxonomic profile data using 16S copy numbers

The relative taxonomic compositional data of a microbiome sample can be corrected by simple calculation once we know the 16S copy numbers of all species. A problem is that we do not know these values for all species. To obtain accurate data, one or more complete genome sequence is required. Incomplete genome assemblies derived from short NGS reads contain either no or an inaccurate number of 16S gene sequences. At present, there are 3467 species represented by complete genome sequences (As of Dec. 2017). 16S copy numbers of the remaining species, including uncultured phylotypes, should be interpolated using the existing data.

Implementation in EzBioCloud 16S-based MTP

EzBioCloud 16S-based MTP allows you to instantly and interactively apply 16S copy number correction to the comparative analysis of multiple samples, the calculation of beta-diversity, and to Biomarker Discovery (e.g., LefSe) as well. Our database of 16S copy number is more comprehensive than any other database as we utilize an up-to-date version of the genome database (8,631 quality-controlled genomes of 3,302 species; as of March 2018).

Let’s assume that we are analyzing a human fecal sample. After sequencing, we obtained 100 reads assigned to Bacteroides fragilis and also 100 to Prevotella copri. Both species are frequently found in the human gut. Should we say that two species are present in equal numbers? According to the EzBioCloud database which provides information about 16S copy number, B. fragilis has 6 copies whereas P. copri contains 4 copies [Learn more for and ]. If we consider this, the corrected ratio between B. fragilis and P. copri should be 3:2, not 1:1. The necessity of 16S copy number correction or normalization has been raised by several studies (; ; ).

A couple of algorithms were proposed to predict the missing 16S copy numbers (; ). In EzBioCloud 16S-based MTP app, the PICRUSt algorithm () is used to generate the 16S copy number database for all species/phylotypes in the EzBioCloud 16S database (the below figure).

πŸ”¬
B. fragilis
P. copri
Kembel et al., 2012
Angly et al., 2014
Vandeputte, et al. 2017
Langille et al., 2013
Angly et al., 2014
Langille et al., 2013
16S copy numbers of bacteria in EzBioCloud database (generated from only complete genomes)
Prediction of 16S copy numbers using PICRUSt algorithm