Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.
The annotation produced for this release (101) was compared to the annotation in the previous release (100) for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.
The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI’s gene prediction software.
The annotation products are available in the sequence databases and on the FTP site.
The RefSeq genome records for Camelina sativa were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.
The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:
Query: annotated proteins
Target: Arabidopsis thaliana known RefSeq proteins
This report provides:
- Annotation Release information: The name of the release, important dates, the software version
- Assemblies: A brief description of the annotated assembly(ies)
- Gene and feature statistics: The counts and characteristics of the annotated features
- Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
- Masking of genomic sequence: How much of the genome was masked
- Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
- Comparison of the current and previous annotations: What proportion of the genes changed in this annotation
For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.
Genome The RefSeq genome records for Camelina sativa were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on