Overview

We at the Génolevures Consortium have introduced a stable, unambiguous, and extensible nomenclature for unique chromosomal elements from genomic DNA. It is designed to be applied after initial identification of open reading frames and tRNAs, thus providing a set of stable anchors for annotation while allowing the insertion of a limited number of newly-discovered elements.

Reference: Durrens P, Sherman DJ, Yeast 22(5):337-42, 2005


Syntax

Definition  


The following syntax describes a systematic nomenclature for chromosomal elements.

name ::= species project chromosome serial type?
species ::= [A-Z]{4}
project ::= [0-9]
chromosome ::= [A-Z] | [A-Z]+
serial ::= [0-9]{5}
type ::= [a-z] except [cwoi]




species gives the name of the sequenced organism using a four-letter code: two letters for the genus, and two for the species. This is unique for eukaryotes (NCBI Taxonomy database, 2004).

project gives the number of the sequence of this genome, starting from 0, for the case where several genomes for the same species are sequenced or different strains and subspecies, or a genome re-annotation.

The chromosome gives the number of the chromosome using a one-letter code, starting from A (see below for the case of partially-sequenced or unassembled chromosomes).

The serial number gives a unique ordered element number, increasing from left to right on the sequenced strand, from telomere to telomere.
  • The numbering proceeds by increments of 11, providing ipso facto a simple error-correcting code for the initial numbering and leaving room for insertions.
  • Inserted elements are given interpolated serial numbers.
  • The numbering starts at the tenth serial number (00110) in order to leave room for sub-telomeric elements.
  • In the case of a gap in the sequence, 5 serial numbers are skipped per estimated kilobase of gap.

The element type gives an optional indication of the kind of element, using a one-letter code as below and extended by the user. To avoid confusion, the letters “c” and “w” (Saccharomyces cerevisiae strand codes) and “o” and “i” (easily confused with “0” and “1”) may not be used.
  • “g”, for an element that has or may have a translation product: protein-coding gene, pseudogene, relic.
  • “r”, for an element that has or may have a transcription product only: RNA gene, tRNA, rRNA, etc.
  • “s”, for a cis-active element having neither translation nor transcription products.
  • “t”, for repeat regions.

All elements are numbered regardless of type. Examples.

Incomplete genomes  


The following rules apply when the genome sequence is incomplete.
  • When the contig is assigned to a chromosome but not ordered, the chromosome indication is followed by the supercontig number and an underscore, for example, CAGL0B125_00330.

  • When the contig is ambiguously assigned to several chromosomes, the possible choices are concatenated, for example, CAGL0AB136_00121.

  • When the contig is not assigned to a chromosome, the chromosome indication is replaced by a period (“.”), for example, CAGL0.732_00187.


The syntax for incomplete genomes (compatible with the preceding) is thus extended to:

name ::= species project chromosome contig “_” serial type?
chromosome ::= [A-Z]{1,} | “.”
contig ::= [0-9]+


Example  

The examples are from the 4 complete genome species of the Génolevures project:

  1. Candida glabrata : 13 chromomes => elements : CAGL0A00110g-CAGL0M14113g
  2. Kluyveromyces lactis : 6 chromosomes => elements : KLLA0A00110g-KLLA0F28083g
  3. Debaryomyces hansenii : 7 chromosomes => elements : DEHA0A00110g-DEHA0G26851g
  4. Yarrowia lipolytica : 6 chromosomes => elements : YALI0A00110g-YALI0F32175g

  • Gene: CAGL0M14113g. On its elementary page, the gene is represented by a red arrow. A red arrow corresponds to a gene without intron.
  • Gene with intron: CAGL0M07832g. On its elementary page, the gene is represented by a red arrow connected with one or several red rectangle by a black line. Each red rectangle corresponds to an exon and the red arrow corresponds to a terminal exon.
  • Pseudogene: CAGL0A00110g. On its elementary page, the pseudogene is represented by a green arrow.
  • Centromere: CAGL0A00759s. On the Genome Browser, the centromere is represented by a blue oval.
  • tRNA: CAGL0M07788r. On its elementary page, the tRNA is represented by a black arrow.
  • Frameshifted genes: CAGL0A00110g and CAGL0A00121g are two open reading frames (ORFs) separated by frameshifts, but if associated, similar to a full-length gene. Each element has its own name.


RST

CNS Nomenclature  


    The CNS (Génopole) established the following nomenclature for the Random Sequence Tags (RSTs).

    FieldRepresentationRegexprExamplesNotes
    ProjectAlphabetical[A-Z]+AA, KThe X letter is reserved for preproduction tests
    Rearrangement versionNumeric[0-9]+0, 1
    Bank copy versionAlphabetical, 1 character[A-Z]A, B
    Bank nameAlphabetical[A-Z]+BA, ALetters V, W, X, Y, Z are are not to be used as the first letter of the bank name.
    TrayNumeric[0-9]+002, 01Fixed size within the same project.
    RowAlphabetical, 1 character[A-P]ARegexpr is [A-H] if the "cadran" field is mentioned.
    ColumnNumeric, 2 charactersfrom 01 to 2408
    PrimerAlphabetical
    Alphanumeric with surrounding '_' for PCR walking
    [A-Z]+
    _[A-Z0-9]+_
    D, DP
    _B128F1_
    VersionNumeric[1-9]+1

    Note: The alphabetical character O (big o) is forbidden in the alphabetical fields.

    Species Code  


    Génolevures has given a unique code name (Tagid) to each RST, containing a code to identify the species and coordinates corresponding to collection plates. This allows anyone to obtain any individual clone containing an insert of interest, by ordering here.

    Example of an RST code name:

    Species/project matching:

    SpeciesAbbrevSpecies Code
    Saccharomyces bayanus var. uvarumSbAS or XAS
    Saccharomyces exiguusSeAV or XAV
    Saccharomyces servazziiSsAT or XAT
    Zygosaccharomyces rouxiiZrAR or XAR
    Saccharomyces kluyveriSkAU or XAU
    Kluyveromyces thermotoleransKtAY or XAY
    Kluyveromyces lactisKlBA or XBA
    Kluyveromyces marxianus var. marxianusKmAZ or XAZ
    Pichia angustaPaBB or XBB
    Debaryomyces hansenii var. hanseniiDhBC or XBC
    Pichia sorbitophilaPsAX or XAX
    Candida tropicalisCtBD or XBD
    Yarrowia lipolyticaYlAW or XAW