Complete Genomes

Do not forget to cite Génolevures.

Important notice
The genome annotation files in EMBL format for the species presented in the Génolevures website represent the work done by the members of the Génolevures Consortium. Some files available at the EBI website or the NCBI website are the result of a reannotation process made without the consent of the original authors. The locus_tag were changed and some features were uncorrectly assigned. None of these changes are endorsed by the annotators of the Génolevures Consortium.
We, therefore, encourage our users to rely on the annotation files available on the Génolevures website.


Candida glabrata (strain CBS138)  


Chromosome EMBL File Sequence version Sequence date FASTA format
Cagl0A CR380947 CR380947.2 2008/09/10 Cagl0A_contig.fasta
Cagl0B CR380948 CR380948.1 2008/09/10 Cagl0B_contig.fasta
Cagl0C CR380949 CR380949.1 2008/09/10 Cagl0C_contig.fasta
Cagl0D CR380950 CR380950.1 2008/09/10 Cagl0D_contig.fasta
Cagl0E CR380951 CR380951.2 2008/09/10 Cagl0E_contig.fasta
Cagl0F CR380952 CR380952.1 2008/09/10 Cagl0F_contig.fasta
Cagl0G CR380953 CR380953.1 2008/09/10 Cagl0G_contig.fasta
Cagl0H CR380954 CR380954.1 2008/09/10 Cagl0H_contig.fasta
Cagl0I CR380955 CR380955.2 2008/09/10 Cagl0I_contig.fasta
Cagl0J CR380956 CR380956.2 2008/09/10 Cagl0J_contig.fasta
Cagl0K CR380957 CR380957.2 2008/09/10 Cagl0K_contig.fasta
Cagl0L CR380958 CR380958.2 2008/09/10 Cagl0L_contig.fasta
Cagl0M CR380959 CR380959.2 2008/09/10 Cagl0M_contig.fasta
Coding sequences Release Release Date DNA Protein
Whole genome 4 2012/02/09 Cagl_ORF.nt Cagl_ORF.aa
Cagl_ORF.ntwide
Annotation Release Release Date Genetic elements
Whole genome 4 2012/02/09 Cagl.csv

All the files concerning annotations are in FASTA format.
File Cagl_ORF.nt contains all the coding sequences of the C. glabrata genome in nucleotides
File Cagl_ORF.aa contains all the coding sequences of the C. glabrata genome translated into amino-acids.
File Cagl_ORF.ntwide contains all the coding sequences of the C. glabrata genome in nucleotides, surrounded by 1000 nt upstream and 300 nt downstream.
The description line of each coding sequence is composed of:
  • the locus name
  • the name of genome between square brackets
  • aliases of the locus name
  • information on the locus
File Cagl.csv contains annotation of each genetic element, with tabulation separated fields, for use in spreadsheets.

Debaryomyces hansenii (strain CBS767)  


Chromosome EMBL File Sequence version Sequence date FASTA format
Deha2A CR382133 CR382133.2 2008/09/10 Deha2A_contig.fasta
Deha2B CR382134 CR382134.2 2008/09/10 Deha2B_contig.fasta
Deha2C CR382135 CR382135.2 2008/09/10 Deha2C_contig.fasta
Deha2D CR382136 CR382136.2 2008/09/10 Deha2D_contig.fasta
Deha2E CR382137 CR382137.2 2008/09/10 Deha2E_contig.fasta
Deha2F CR382138 CR382138.2 2008/09/10 Deha2F_contig.fasta
Deha2G CR382139 CR382139.2 2008/09/10 Deha2G_contig.fasta
Coding sequences Release Release Date DNA Protein
Whole genome 2 2012/02/09 Deha_ORF.nt Deha_ORF.aa
Deha_ORF.ntwide
Annotation Release Release Date Genetic elements
Whole genome 4 2012/02/09 Deha.csv
Correspondence Release Release Date
DEHA0-DEHA2 1 2009/11/16 DEHA-correspondence.txt

All the files concerning annotations are in FASTA format.
File Deha_ORF.nt contains all the coding sequences of the D. hansenii genome in nucleotides
File Deha_ORF.aa contains all the coding sequences of the D. hansenii genome translated into amino-acids.
File Deha_ORF.ntwide contains all the coding sequences of the D. hansenii genome in nucleotides, surrounded by 1000 nt upstream and 300 nt downstream.
The description line of each coding sequence is composed of:
  • the locus name
  • the name of genome between square brackets
  • aliases of the locus name
  • information on the locus
File Deha.csv contains annotation of each genetic element, with tabulation separated fields, for use in spreadsheets.
File DEHA-correspondence.txt contains corresponding gene names between version 0 (outdated) and version 2 (current) of the D. hansenii genome.

Eremothecium (Ashbya) gossypii (strain ATCC10895)  


Chromosome EMBL File Sequence version Sequence date FASTA format
Ergo0A AE016814 AE016814.2 2010/06/29 Ergo0A_contig.fasta
Ergo0B AE016815 AE016815.2 2010/10/21 Ergo0B_contig.fasta
Ergo0C AE016816 AE016816.2 2010/10/21 Ergo0C_contig.fasta
Ergo0D AE016817 AE016817.2 2010/10/21 Ergo0D_contig.fasta
Ergo0E AE016818 AE016818.2 2010/06/30 Ergo0E_contig.fasta
Ergo0F AE016819 AE016819.2 2011/11/24 Ergo0F_contig.fasta
Ergo0G AE016820 AE016820.2 2010/06/30 Ergo0G_contig.fasta
Coding sequences Release Release Date DNA Protein
Whole genome 2 2012/02/09 Ergo_ORF.nt Ergo_ORF.aa
Ergo_ORF.ntwide
Annotation Release Release Date Genetic elements
Whole genome 4 2012/02/09 Ergo.csv
Correspondence Release Release Date
ERGO0-AGD 1 2009/11/16 ERGO-correspondence.txt

All the files concerning annotations are in FASTA format.
File Ergo_ORF.nt contains all the coding sequences of the D. hansenii genome in nucleotides
File Ergo_ORF.aa contains all the coding sequences of the D. hansenii genome translated into amino-acids.
File Ergo_ORF.ntwide contains all the coding sequences of the D. hansenii genome in nucleotides, surrounded by 1000 nt upstream and 300 nt downstream.
The description line of each coding sequence is composed of:
  • the locus name
  • the name of genome between square brackets
  • aliases of the locus name
  • information on the locus
File Ergo.csv contains annotation of each genetic element, with tabulation separated fields, for use in spreadsheets.
File ERGO-correspondence.txt contains corresponding gene names between Génolevures database and Ashbya Genome Database.

Kluyveromyces lactis (strain CLIB210)  


Chromosome EMBL File Sequence version Sequence date FASTA format
Klla0A CR382121 CR382121.1 2004/07/03 Klla0A_contig.fasta
Klla0B CR382122 CR382122.1 2004/07/03 Klla0B_contig.fasta
Klla0C CR382123 CR382123.1 2004/07/03 Klla0C_contig.fasta
Klla0D CR382124 CR382124.1 2004/07/03 Klla0D_contig.fasta
Klla0E CR382125 CR382125.1 2004/07/03 Klla0E_contig.fasta
Klla0F CR382126 CR382126.1 2004/07/03 Klla0F_contig.fasta
Coding sequences Release Release Date DNA Protein
Whole genome 3 2008/09/10 Klla_ORF.nt Klla_ORF.aa
Klla_ORF.ntwide
Annotation Release Release Date Genetic elements
Whole genome 4 2012/02/09 Klla.csv

All the files concerning annotations are in FASTA format.
File Klla_ORF.nt contains all the coding sequences of the K. lactis genome in nucleotides
File Klla_ORF.aa contains all the coding sequences of the K. lactis genome translated into amino-acids.
File Klla_ORF.ntwide contains all the coding sequences of the K. lactis genome in nucleotides, surrounded by 1000 nt upstream and 300 nt downstream.
The description line of each coding sequence is composed of:
  • the locus name
  • the name of genome between square brackets
  • aliases of the locus name
  • information on the locus
File Klla.csv contains annotation of each genetic element, with tabulation separated fields, for use in spreadsheets.

Kluyveromyces thermotolerans (strain CBS6340)  


Chromosome EMBL File Sequence version Sequence date FASTA format
Klth0A CU928165 CU928165.2 2009/10/08 Klth0A_contig.fasta
Klth0B CU928166 CU928166.2 2009/10/08 Klth0B_contig.fasta
Klth0C CU928167 CU928167.2 2009/10/08 Klth0C_contig.fasta
Klth0D CU928168 CU928168.2 2009/10/08 Klth0D_contig.fasta
Klth0E CU928169 CU928169.2 2009/10/08 Klth0E_contig.fasta
Klth0F CU928170 CU928170.2 2009/10/08 Klth0F_contig.fasta
Klth0G CU928171 CU928171.2 2009/10/08 Klth0G_contig.fasta
Klth0H CU928180 CU928180.2 2009/10/08 Klth0H_contig.fasta
Coding sequences Release Release Date DNA Protein
Whole genome 1 2008/09/10 Klth_ORF.nt Klth_ORF.aa
Klth_ORF.ntwide
Annotation Release Release Date Genetic elements
Whole genome 4 2012/02/09 Klth.csv

All the files concerning annotations are in FASTA format.
File Klth_ORF.nt contains all the coding sequences of the K. thermotolerans genome in nucleotides
File Klth_ORF.aa contains all the coding sequences of the K. thermotolerans genome translated into amino-acids.
File Klth_ORF.ntwide contains all the coding sequences of the K. thermotolerans genome in nucleotides, surrounded by 1000 nt upstream and 300 nt downstream.
The description line of each coding sequence is composed of:
  • the locus name
  • the name of genome between square brackets
  • aliases of the locus name
  • information on the locus
File Klth.csv contains annotation of each genetic element, with tabulation separated fields, for use in spreadsheets.

Pichia sorbitophila (strain CBS7064)  


Chromosome EMBL File Sequence version Sequence date FASTA format
Piso0A FO082059 FO082059.1 2011/12/09 Piso0A_contig.fasta
Piso0B FO082058 FO082058.1 2011/12/09 Piso0B_contig.fasta
Piso0C FO082057 FO082057.1 2011/12/09 Piso0C_contig.fasta
Piso0D FO082056 FO082056.1 2011/12/09 Piso0D_contig.fasta
Piso0E FO082055 FO082055.1 2011/12/09 Piso0E_contig.fasta
Piso0F FO082054 FO082054.1 2011/12/09 Piso0F_contig.fasta
Piso0G FO082053 FO082053.1 2011/12/09 Piso0G_contig.fasta
Piso0H FO082052 FO082052.1 2011/12/09 Piso0H_contig.fasta
Piso0I FO082051 FO082051.1 2011/12/09 Piso0I_contig.fasta
Piso0J FO082050 FO082050.1 2011/12/09 Piso0J_contig.fasta
Piso0K FO082049 FO082049.1 2011/12/09 Piso0K_contig.fasta
Piso0L FO082048 FO082048.1 2011/12/09 Piso0L_contig.fasta
Piso0M FO082047 FO082047.1 2011/12/09 Piso0M_contig.fasta
Piso0N FO082046 FO082046.1 2011/12/09 Piso0N_contig.fasta
Coding sequences Release Release Date DNA Protein
Whole genome 1 2008/09/10 Piso_ORF.nt Piso_ORF.aa
Piso_ORF.ntwide
Annotation Release Release Date Genetic elements
Whole genome 4 2012/02/09 Piso.csv

All the files concerning annotations are in FASTA format.
File Piso_ORF.nt contains all the coding sequences of the K. thermotolerans genome in nucleotides
File Piso_ORF.aa contains all the coding sequences of the K. thermotolerans genome translated into amino-acids.
File Piso_ORF.ntwide contains all the coding sequences of the K. thermotolerans genome in nucleotides, surrounded by 1000 nt upstream and 300 nt downstream.
The description line of each coding sequence is composed of:
  • the locus name
  • the name of genome between square brackets
  • aliases of the locus name
  • information on the locus
File Piso.csv contains annotation of each genetic element, with tabulation separated fields, for use in spreadsheets.

Saccharomyces cerevisiae (strain S288C)  


Chromosome EMBL File Sequence version Sequence date FASTA format
Sace0A BK006935 BK006935.2 2011/04/19 Sace0A_contig.fasta
Sace0B BK006936 BK006936.3 2011/05/20 Sace0B_contig.fasta
Sace0C BK006937 BK006937.2 2011/04/19 Sace0C_contig.fasta
Sace0D BK006938 BK006938.2 2011/04/19 Sace0D_contig.fasta
Sace0E BK006939 BK006939.3 2011/04/19 Sace0E_contig.fasta
Sace0F BK006940 BK006940.2 2011/04/19 Sace0F_contig.fasta
Sace0G BK006941 BK006941.2 2011/04/19 Sace0G_contig.fasta
Sace0H BK006934 BK006934.3 2011/04/19 Sace0H_contig.fasta
Sace0I BK006942 BK006942.2 2011/04/19 Sace0I_contig.fasta
Sace0J BK006943 BK006943.2 2011/04/19 Sace0J_contig.fasta
Sace0K BK006944 BK006944.2 2011/04/19 Sace0K_contig.fasta
Sace0L BK006945 BK006945.2 2011/04/19 Sace0L_contig.fasta
Sace0M BK006946 BK006946.2 2011/04/19 Sace0M_contig.fasta
Sace0N BK006947 BK006947.2 2011/04/19 Sace0N_contig.fasta
Sace0O BK006948 BK006948.2 2011/04/19 Sace0O_contig.fasta
Sace0P BK006949 BK006949.2 2011/04/19 Sace0P_contig.fasta
Coding sequences Release Release Date DNA Protein
Whole genome 1 2008/09/10 Sace_ORF.nt Sace_ORF.aa
Sace_ORF.ntwide
Annotation Release Release Date Genetic elements
Whole genome 4 2012/02/09 Sace.csv
Correspondence Release Release Date
SACE0-SGD 1 2009/11/16 SACE-correspondence.txt

All the files concerning annotations are in FASTA format.
File Sace_ORF.nt contains all the coding sequences of the K. thermotolerans genome in nucleotides
File Sace_ORF.aa contains all the coding sequences of the K. thermotolerans genome translated into amino-acids.
File Sace_ORF.ntwide contains all the coding sequences of the K. thermotolerans genome in nucleotides, surrounded by 1000 nt upstream and 300 nt downstream.
The description line of each coding sequence is composed of:
  • the locus name
  • the name of genome between square brackets
  • aliases of the locus name
  • information on the locus
File Sace.csv contains annotation of each genetic element, with tabulation separated fields, for use in spreadsheets.
File SACE-correspondence.txt contains corresponding gene names between Génolevures database and Saccharomyces Genome Database.

Saccharomyces kluyveri (strain CBS3082)  


Chromosome EMBL File Sequence version Sequence date FASTA format
Sakl0A CM000687 CM000687.1 2009/03/03 Sakl0A_contig.fasta
Sakl0B CM000688 CM000688.1 2009/03/03 Sakl0B_contig.fasta
Sakl0C CM000689 CM000689.1 2009/03/03 Sakl0C_contig.fasta
Sakl0D CM000690 CM000690.1 2009/03/03 Sakl0D_contig.fasta
Sakl0E CM000691 CM000691.1 2009/03/03 Sakl0E_contig.fasta
Sakl0F CM000692 CM000692.1 2009/03/03 Sakl0F_contig.fasta
Sakl0G CM000693 CM000693.1 2009/03/03 Sakl0G_contig.fasta
Sakl0H CM000694 CM000694.1 2009/03/03 Sakl0H_contig.fasta
Coding sequences Release Release Date DNA Protein
Whole genome 1 2008/09/10 Sakl_ORF.nt Sakl_ORF.aa
Sakl_ORF.ntwide
Annotation Release Release Date Genetic elements
Whole genome 4 2012/02/09 Sakl.csv

All the files concerning annotations are in FASTA format.
File Sakl_ORF.nt contains all the coding sequences of the S. kluyveri genome in nucleotides
File Sakl_ORF.aa contains all the coding sequences of the S. kluyveri genome translated into amino-acids.
File Sakl_ORF.ntwide contains all the coding sequences of the S. kluyveri genome in nucleotides, surrounded by 1000 nt upstream and 300 nt downstream.
The description line of each coding sequence is composed of:
  • the locus name
  • the name of genome between square brackets
  • aliases of the locus name
  • information on the locus
File Sakl.csv contains annotation of each genetic element, with tabulation separated fields, for use in spreadsheets.

Yarrowia lipolytica (strain CLIB122)  


Chromosome EMBL File Sequence version Sequence date FASTA format
Yali0A CR382127 CR382127.12 2008/10/23 Yali0A_contig.fasta
Yali0B CR382128 CR382128.11 2008/09/10 Yali0B_contig.fasta
Yali0C CR382129 CR382129.11 2008/09/10 Yali0C_contig.fasta
Yali0D CR382130 CR382130.11 2008/09/10 Yali0D_contig.fasta
Yali0E CR382131 CR382131.12 2008/09/10 Yali0E_contig.fasta
Yali0F CR382132 CR382132.11 2008/10/23 Yali0F_contig.fasta
Coding sequences Release Release Date DNA Protein
Whole genome 3 2008/09/10 Yali_ORF.nt Yali_ORF.aa
Yali_ORF.ntwide
Annotation Release Release Date Genetic elements
Whole genome 4 2012/02/09 Yali.csv

All the files concerning annotations are in FASTA format.
File Yali_ORF.nt contains all the coding sequences of the Y. lipolytica genome in nucleotides
File Yali_ORF.aa contains all the coding sequences of the Y. lipolytica genome translated into amino-acids.
File Yali_ORF.ntwide contains all the coding sequences of the Y. lipolytica genome in nucleotides, surrounded by 1000 nt upstream and 300 nt downstream.
The description line of each coding sequence is composed of:
  • the locus name
  • the name of genome between square brackets
  • aliases of the locus name
  • information on the locus
File Yali.csv contains annotation of each genetic element, with tabulation separated fields, for use in spreadsheets.

Zygosaccharomyces rouxii (strain CBS732)  


Chromosome EMBL File Sequence version Sequence date FASTA format
Zyro0A CU928173 CU928173.3 2010/01/14 Zyro0A_contig.fasta
Zyro0B CU928174 CU928174.3 2010/01/14 Zyro0B_contig.fasta
Zyro0C CU928175 CU928175.3 2010/01/14 Zyro0C_contig.fasta
Zyro0D CU928176 CU928176.3 2010/01/14 Zyro0D_contig.fasta
Zyro0E CU928181 CU928181.3 2010/01/14 Zyro0E_contig.fasta
Zyro0F CU928178 CU928178.3 2010/01/14 Zyro0F_contig.fasta
Zyro0G CU928179 CU928179.3 2010/01/14 Zyro0G_contig.fasta
Coding sequences Release Release Date DNA Protein
Whole genome 1 2008/09/10 Zyro_ORF.nt Zyro_ORF.aa
Zyro_ORF.ntwide
Annotation Release Release Date Genetic elements
Whole genome 4 2012/02/09 Zyro.csv

All the files concerning annotations are in FASTA format.
File Zyro_ORF.nt contains all the coding sequences of the Z. rouxii genome in nucleotides
File Zyro_ORF.aa contains all the coding sequences of the Z. rouxii genome translated into amino-acids.
File Zyro_ORF.ntwide contains all the coding sequences of the Z. rouxii genome in nucleotides, surrounded by 1000 nt upstream and 300 nt downstream.
The description line of each coding sequence is composed of:
  • the locus name
  • the name of genome between square brackets
  • aliases of the locus name
  • information on the locus
File Zyro.csv contains annotation of each genetic element, with tabulation separated fields, for use in spreadsheets.