# Tetra-Nucleotide Frequencies

## Tetra-Nucleotide Frequency Patterns

A tetra-nucleotide is a fragment of DNA sequence with 4 bases (e.g. AGTC or TTGG). Pride *et al.* (2003) showed that the frequency of tetra-nucleotides in bacterial genomes contain useful, albeit weak, phylogenetic signals. Even though tetra-nucleotide analysis (TNA) utilizes the information of whole genome, it is evident that it cannot replace other alignment-based phylogenetic methods such as [OrthoANI](https://help.ezbiocloud.net/orthoani-genomic-similarity/) or 16S rRNA phylogeny. However, TNA can be useful for phylogenetic characterization when whole genome or 16S rRNA gene information is not available. For example, a partial genomic fragment obtained from a metagenome can be identified by TNA (Teeling *et al.*, 2004). TNA is also fast enough that it can be used as a search engine against a large genome database.

## Algorithm <a href="#span-classeztocsection-idalgorithmspanalgorithmspan-classeztocsectionendspan" id="span-classeztocsection-idalgorithmspanalgorithmspan-classeztocsectionendspan"></a>

Information contained in a genome sequence can be transformed into an array of tetra-nucleotide frequencies (See the below figure).

<figure><img src="https://820779907-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FDWKOAVP0eaMhg1acSkor%2Fuploads%2FFSVfW5KpqEHZgnzH8m9o%2Fimage.png?alt=media&#x26;token=9c128d69-7a14-4de0-8d87-77105404aacb" alt=""><figcaption></figcaption></figure>

Information of each genome sequence is now stored as counts of 256 tetra-nucleotides. When two genome sequences are similar, the more correlated these tetra-nucleotide patterns are. Therefore, statistical measures of tetra-nucleotide frequency correlation between two genome sequences can be roughly used to determine the genome-relatedness of two genomes.

Tetra-nucleotide correlation coefficient ranges from 0 to 1, and two identical genomes would produce 1.0.

## References <a href="#span-classeztocsection-idreferencesspanreferencesspan-classeztocsectionendspan" id="span-classeztocsection-idreferencesspanreferencesspan-classeztocsectionendspan"></a>

1. Pride, D. T., Meinersmann, R. J., Wassenaar, T. M. & Blaser, M. J. Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Res [13, 145-158 (2003)](http://genome.cshlp.org/content/13/2/145.long).
2. Teeling, H., Meyerdierks, A., Bauer, M., Amann, R. & Glockner, F. O. Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ Microbiol [6, 938-947 (2004)](http://onlinelibrary.wiley.com/doi/10.1111/j.1462-2920.2004.00624.x/abstract;jsessionid=B9A8B9CFC9F73D55F2F18895A669BCD5.f01t04).
