📖
Terms and Definitions
  • General Terminology
  • Bioinformatics
    • Analyses
    • reAnalyze #1 - Skin Disease
    • reAnalyze #2 - Skin Ageing
    • reAnalyze #3 - Scalp Dandruff
    • reAnalyze #4 - Vaginal Infection
  • Taxonomy
  • Genome Identification Report
  • Clinical Metagenomics Report
Powered by GitBook
On this page
  • Species
  • Genome
  • OrthoANI
  • OrthoANI Coverage
  • 16S
  • Reference acc.
  • Reference strain
  • Reference type
  • Reference UBCG
  • Reference 16S
  • Sequencing (types of technology)
  • Reads (short SE, short PE)
  • N. reads
  • Bases
  • Mean len.
  • Q20 rate
  • Q30 rate
  • QC-pass reads
  • Genome sequence quality
  • Genome size
  • Assembly statistics and quality
  • Assembler
  • UBCG recovery
  • Normal range (genome BP and G+C percentage)
  • Coverage depth
  • UBCG duplication
  • G+C content
  • N. Contigs
  • RRS
  • N50
  • CSS
  • Domain affiliation check
  • Domain
  • MLST
  • MLST scheme
  • ST
  • Allele in each locus
  • Antibiotic resistance determinants summary
  • Antibiotic classes
  • Antibiotic subclasses
  • Resistance gene families
  • Pathogenicity marker summary
  • Pathogenicity markers
  • Scheme
  • Positive markers
  • Negative markers
  • Decision
  • ANI top hits
  • Reference genome
  • Strain
  • Genome Group
  • Top ANI hits
  • ANI
  • ANI cov (Q)
  • ANI cov (R)
  • 16S rRNA gene top hits
  • 16S completeness
  • Source profiles of the genome sequences
  • Antibiotic resistance determinants
  • Pathogenicity scheme
  • Pathogenicity markers
  • Virulence factor hits
  • Full list of virulence factor hits

General Terminology

Understanding biological terminology with all its acronyms and drawn-out Latin nomenclature is like learning an entirely new language. This page is to help you understand some commonly found terms. Let's start off with something fundamental to biology: 'species'.

Species

A species is a taxonomic unit that is registered and classified by global authorities. Identifying similar bacteria to a registered reference species, indicates relatedness in lineage and/or function. For bacteria, ‘species’ is a fluid concept as they are consistently mixing and evolving.

Genome

A genome is an organism’s entire genetic code. Bacterial genomes are usually circular and include plasmids (smaller separate sections). We compare genomes to see how genetically similar they are.

OrthoANI

ANI involves comparing lengths of genetic code against references from a registered database and calculating the similarity. If you submit a sample that has a high ANI (>95%) to a registered reference genome, that classifies your sample as belonging to the same species. OrthoANI takes fragmented orthologous reciprocal pairs into consideration for calculating nucleotide identities.

OrthoANI Coverage

‘Coverage’ is a measure of the sequencing depth. It is calculated as the ratio of the number of times a nucleotide or a base has been sequenced over the total number of possible nucleotides or bases that could have been sequenced. Due to the reciprocal pairing used in OrthoANI, query and reference coverages are separated.

16S

16S rRNA genes are small sections of code along prokaryotic genomes. They are used to identify species or genera at a lower resolution than using the entire genome. If you submit a sample that has a high 16S sequence similarity (>98.7%) to a registered reference 16S, that classifies your samples as belonging to the same species. Often, there are multiple 16S rRNA gene copies within one genome.

Reference acc.

A reference accession number is a unique identifier assigned to a specific sequence or set of sequences in a database, such as GenBank.

Reference strain

A reference strain is a specific strain of an organism that serves as the reference for comparison to other strains of the same species.

Reference type

The reference type is the category of reference genome used in a particular analysis, such as a draft genome, high-quality genome, or closed genome.

Reference UBCG

UBCG (Up-to-date Bacterial Core Gene) is a set of genes shared by all bacteria and is used as a reference in comparative genomic studies.

Reference 16S

The reference 16S rRNA gene sequence is used as a reference for comparison to 16S rRNA gene sequences from other bacteria and archaea.

Sequencing (types of technology)

Sequencing technology refers to the methods and instruments used to determine the sequence of nucleotides in DNA or RNA.

Reads (short SE, short PE)

Reads come in different sizes and forms, depending on the sequencer used. Short reads typically range from 100-500bp whereas long reads are around 10,000-100,000bp. SE and PE stand for single-end reads and paired-end reads, respectively. Paired-end reads are more accurate and can help identify Indels.

N. reads

Number of reads is a measurement of how many individual DNA fragments (reads) were obtained from your sample.

Bases

Bases refer to the individual nucleotides that make up DNA, including adenine (A), guanine (G), cytosine (C), and thymine (T) in DNA.

Mean len.

Mean length is the average length of a set of sequencing reads.

Q20 rate

Q20 rate is a measure of the accuracy of sequencing, calculated as the percentage of bases in a read that has a quality score of 20 or higher. Q20 represents 99% accuracy of a base call.

Q30 rate

Q30 rate is a measure of the accuracy of sequencing, calculated as the percentage of bases in a read that has a quality score of 30 or higher. Q30 represents 99.9% accuracy of a base call.

QC-pass reads

QC-pass reads are the sequencing reads that pass quality control (QC) checks and are used for further analysis.

Genome sequence quality

Genome sequence quality is a measure of the accuracy and completeness of a genome sequence.

Genome size

Genome size is the total amount of DNA in a genome, typically measured in base pairs.


Assembly statistics and quality

Assembler

An assembler is a software tool used to assemble the final genome sequence from sequencing reads.

UBCG recovery

UBCG recovery refers to the proportion of Up-to-date Bacterial Core Gene orthologs that have been successfully recovered and assembled in a genome sequencing project.

Normal range (genome BP and G+C percentage)

This is the expected range of genome base pairs and G+C for the identified species, according to a reference genome.

Coverage depth

Coverage depth refers to the average number of times a base in the genome has been sequenced. A high coverage depth ensures that even rare mutations and variations can be detected in the genome. We assume the assembled query genome is complete.

UBCG duplication

UBCG duplication refers to the presence of multiple copies of Up-to-date Bacterial Core Gene orthologs in a genome. This can be an indicator of genome duplication events or horizontal gene transfer.

G+C content

GC content refers to the proportion of guanine (G) and cytosine (C) bases in a genome. It is often used as a marker for the evolutionary relationships between genomes; closely related organisms share similar GC content.

N. Contigs

Number of contigs is a measurement of how many overlapping read fragments (contigs) were assembled from your sample.

RRS

Reference Representation Scores (RRS) are measures of chimerism and contamination. If genes consistently and closely map into the GUNC reference space, then the RRS are high (Orakov, et al., 2021).

N50

N50 refers to a statistic that summarizes the contiguity of a genome assembly. The N50 value is defined as the length such that 50% of the total assembly length is in contigs of that length or longer.

CSS

Clade Separation Scores (CSS) are high if gene classification to distinct lineages follows contig boundaries (Orakov, et al., 2021).


Domain affiliation check

Domain

Domain refers to the highest taxonomic rank in the classification of life, which is divided into three domains: Bacteria, Archaea, and Eukarya. Viruses are also not classified as living but are included here.


MLST

Multi-Locus Sequence Typing (MLST) is a method for subtyping bacteria based on the sequence of several housekeeping genes.

MLST scheme

MLST scheme refers to a specific set of genes used for MLST analysis. There are several available MLST schemes for different bacterial taxa.

ST

Sequence type (ST) refers to the specific allele combination at each locus used in an MLST scheme, which can be used to define a unique subtype of a bacterial species.

Allele in each locus

Allele in each locus refers to the specific variant of a gene used in an MLST scheme. The combination of alleles at each locus in an MLST scheme can define a unique ST.


Antibiotic resistance determinants summary

Antibiotic resistance determinants summary covers the presence and abundance of genes encoding resistance to antibiotics in a genome.

Antibiotic classes

Antibiotic classes are the different categories of antibiotics, such as beta-lactams, aminoglycosides, macrolides, etc.

Antibiotic subclasses

Antibiotic subclasses refer to specific subtypes of antibiotics within a larger class, such as penicillins, cephalosporins, etc.

Resistance gene families

Resistance gene families are groups of genes that encode resistance to antibiotics.


Pathogenicity marker summary

Pathogenicity markers

Pathogenicity markers are genes or markers that are associated with pathogenicity or virulence in bacteria.

Scheme

Typing Scheme is a specific method used to classify bacteria into subtypes, such as MLST or Whole Genome Sequencing-based methods.

Positive markers

Positive markers are specific genes or markers that are present in a genome and are associated with a particular phenotype or trait.

Negative markers

Negative markers are specific genes or markers that are absent in a genome and are associated with a particular phenotype or trait.

Decision

Decision indicates which pathotype the isolate belongs to.

ANI top hits

ANI top hits refer to the genomes that have the highest ANI values with a given genome, indicating a close evolutionary relationship.

Reference genome

A reference genome is a complete and annotated DNA sequence of an organism that serves as a basis for comparison and analysis of other genomes.

Strain

A strain is a sub-type of species, at a taxonomic higher resolution. Often, strains refer to a specific isolate or culture of a bacterial species that has been characterized in detail and is used as a reference or standard in research or diagnostic testing.

Genome Group

A grouping of genomes based on their evolutionary or phylogenetic relatedness.


Top ANI hits

ANI

The extent to which the genomes being compared are covered by the ANI analysis. This is often expressed as a percentage of the genomes that have been aligned.

ANI stands for "Average Nucleotide Identity," and it is a method commonly used in prokaryotic identification and taxonomy.

ANI cov (Q)

The extent to which the genomes being compared are covered by the OrthoANI analysis. This is often expressed as a percentage of the genomes that have been aligned. ‘Q’ is for query to reference genome alignment.

ANI cov (R)

The extent to which the genomes being compared are covered by the OrthoANI analysis. This is often expressed as a percentage of the genomes that have been aligned. ‘R’ is for reference to query genome alignment.


16S rRNA gene top hits

The 16s rRNA gene top hits are the genomes that have the highest similarity to a given genome when comparing only their 16s rRNA gene sequences (small section of a whole genome).

16S completeness

A measure of how complete a genome's 16s rRNA gene is. A complete 16s rRNA gene is necessary for accurate classification of a bacterium.

Source profiles of the genome sequences

Information about the source of the genome sequences being analyzed, such as the type of sample, location, and date of collection.

Antibiotic resistance determinants

DNA sequences in a genome that confer resistance to antibiotics.


Pathogenicity scheme

Pathogenicity markers

DNA sequences in a genome that are associated with the ability of an organism to cause disease.


Virulence factor hits

Full list of virulence factor hits

Proteins or other molecules produced by pathogens that contribute to their ability to cause disease.

NextBioinformatics

Last updated 11 months ago