Tetra-Nucleotide Frequencies
How are Tetra-Nucleotide Frequencies used in microbiome profiling?
Last updated
How are Tetra-Nucleotide Frequencies used in microbiome profiling?
Last updated
A tetra-nucleotide is a fragment of DNA sequence with 4 bases (e.g. AGTC or TTGG). Pride et al. (2003) showed that the frequency of tetra-nucleotides in bacterial genomes contain useful, albeit weak, phylogenetic signals. Even though tetra-nucleotide analysis (TNA) utilizes the information of whole genome, it is evident that it cannot replace other alignment-based phylogenetic methods such as or 16S rRNA phylogeny. However, TNA can be useful for phylogenetic characterization when whole genome or 16S rRNA gene information is not available. For example, a partial genomic fragment obtained from a metagenome can be identified by TNA (Teeling et al., 2004). TNA is also fast enough that it can be used as a search engine against a large genome database.
Information contained in a genome sequence can be transformed into an array of tetra-nucleotide frequencies (See the below figure).
Information of each genome sequence is now stored as counts of 256 tetra-nucleotides. When two genome sequences are similar, the more correlated these tetra-nucleotide patterns are. Therefore, statistical measures of tetra-nucleotide frequency correlation between two genome sequences can be roughly used to determine the genome-relatedness of two genomes.
Tetra-nucleotide correlation coefficient ranges from 0 to 1, and two identical genomes would produce 1.0.
Pride, D. T., Meinersmann, R. J., Wassenaar, T. M. & Blaser, M. J. Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Res .
Teeling, H., Meyerdierks, A., Bauer, M., Amann, R. & Glockner, F. O. Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ Microbiol .