Species Taxonomy
Last updated
Last updated
EzBioCloud© 2024. All Rights Reserved
This post explains the details of how bacterial species is defined. Here, we will outline the essence of currently acknowledged bacterial taxonomy. A bacterial species should have a designated type strain (nomenclatural type) which is a live microorganism. It should be available to anyone who wants to study it. Usually, type strains can be obtained from public or private/for-profit culture collections.
Modern species concept tries to adopt genomics into practice. The taxonomically accepted means for calculating a similarity between two genome sequences, use a series of bioinformatic algorithms to calculate Average Nucleotide Identity (ANI). See Chun et al. (2014, 2018) for how to apply ANI to bacterial taxonomy. The proposed cutoff of ANI is 95~96%. If a bacterial strain showed the ANI of this cutoff or higher to the type strain of species A, it is assigned to the species A. If we have the reference genome sequence database containing all species on earth, using genomics, the ANI-based approach will serve as a perfect platform. However, we do not have these data in hand. Not all type strains have been sequenced and there are more uncultured species than cultured ones so far.
The 16S sequence is still widely utilized in bacterial taxonomy. The way it is used is a bit different from that of genome sequences. A recent study showed that 98.7% can be used as a cutoff for recognizing species. Again, the type strain should be compared for the taxonomic purposes. If a strain shows a 16S similarity of 98.7% or lower to the species A, it does not belong to species A.
Otherwise, the strain may, or may not, belong to species A. As the similarity is higher, there is more chance of being a member of species A. However, in some exceptions, even two strains showing 100% identical 16S sequences can show <95% ANI, meaning that they belong to the different species. 98.7% cutoff should be used when sequencing errors are minimal. Therefore, applying 97% cutoff to defining OTUs is reasonable when single-pass NGS sequences are considered.
In conclusion, the combination of 16S and ANI similarities can be used for the classification and identification of bacteria.
Several types of species or similar terms can be defined and used for microbiome analysis.
This is the standard type of species. The description of the species is published, and the type strain is deposited to one or more culture collections. At present, the only conditions for validating the name of a species are (i) publication in any journal and (ii) deposition of the type strain to two culture collections in two different countries. The scientific community regulates the nomenclature (the process of naming), but not taxonomy itself. Therefore, a name is validated or said “valid”, not the species. The term valid species is not correct. If you are interested in how the bacterial name is regulated, consult the International Code of Nomenclature of Prokaryotes.
A species with the invalid name is similar to species with a valid name except that its name is not listed on the Approved List. This list of formally recognized names is published in the journal International Journal of Systematic and Evolutionary Microbiology (IJSEM). There are two types of the list. The Notification List contains the list of approved names that were published in IJSEM. This process is automatically done by an editor of IJSEM. If a paper describing new species or any other taxonomic changes was published in the journals other than IJSEM, the authors of the paper should submit the reprint to IJSEM. By doing so, the name of new or changed taxa is listed in the Validation List. Here are example articles of the Notification and Validation Lists. The main reasons for being invalid names are:
The type strain was not deposited to two different culture collections in the two different countries, so it does not meet the condition of validation. The (effective) publication was not yet submitted to IJSEM for validation.
Candidatus means a “candidate species”. The concept of Candidatus was first introduced by Murray & Stackebrandt (1995). It is not a part of the formal nomenclature, so you don’t italicize the name (e.g. Candidatus Carsonella ruddii, but not Candidatus Carsonella ruddii). Candidatus names are usually given to the candidate species that cannot be cultivated as pure cultures. Typical cases are the prokaryotic obligate endosymbionts of animals and plants such as Candidatus Carsonella ruddii which cannot be pure cultured.
In many cases, we know that a species exists but lacks supporting data to validate its name. We call it “phylotype”. Here are typical cases of phylotypes used in the EzBiome database.
Genomospecies: Genomospecies deserve species status and are supported by genomic data (e.g. ANI). However, they weren't named, so the EzBiome team gave unique names, usually derived from the accession numbers of INSDC databases. For example, the phylotype CP013274_s is represented by a genome sequence deposited to INSDC as a strain of Bacillus thuringiensis but showed <95% ANI to all of the known Bacillus species. Therefore, it is assigned as a new phylotype (equivalent to species). A genomospecies in the EzBiome database is always represented by an accurate 16S sequence.
16S Phylotype: If a 16S sequence is accurate, of full-length and matched with those of all known species with <98.7%, we are pretty sure that this 16S represents a species (=phylotype). There are >23 million 16S sequences in the INSDC database but not many can meet these criteria. The phylotypes that are defined by only 16S sequences are either from sequencing pure culture or metagenomic libraries. In the EzBiome database, extreme care is taken when we select reference sequences representing these phylotypes using manual alignment and curation. In addition, over 2,000 phylotypes have now been added from >3 million reads that were generated by Pacific Biosciences (PacBio) long read ccs sequencing. For example, PAC001304_s [See full taxonomy] is a phylotype belonging to the genus Prevotella and constitutes >30% of a deep-sequenced human fecal sample [Explore this sample]. In fact, four out of the top ten species in this microbiome sample are phylotypes represented by EzBiome’s PacBio-based reference sequences. Except for the species with a valid name, all other cases (species with an invalid name, Candidatus, phylotype) do not have a standing in the formal nomenclature. However, we could assign unique names or identifiers to all naturally existing species, which can greatly improve the taxonomic profiling of a microbiome, especially for large-scale comparisons.