Personal tools
NoteImportantAs of March 15th, 2013, this site is maintained at http://irefindex.org

This site is for archival purposes and may eventually be deleted.

Protein identifier mapping

From Donaldson Group

Jump to: navigation, search

Last edited: 2011-11-18

We have made a file which provides mappings between iRefIndex identifiers and popular external identifiers. The current files contain all UniProt and RefSeq identifiers known to the current version of iRefIndex as documented on the sources page. For specific source documentation, see the sources for each released version.

Other database identifiers are provided as database/accession pairs only when the iRefIndex identifier (ROGID) does not have a corresponding UniProt or RefSeq record with an identical sequence.

File download location: ftp://ftp.no.embnet.org/irefindex/data/current/Mappingfiles/

The column descriptions:

Column number Column name Description
1dbSource of the external identifier (e.g. UniProt, RefSeq)
2accThe external identifier (e.g. Q4U9M9)
3entrezGeneidEntrez Gene ID. This is provided only for RefSeq identifiers; for other identifiers the value is -1 for this field. See note 1.
4irogidInteger version of the redundant group identifier(rogid)(e.g. 3156116, current maximum value=14005379, this is a MySQL int(11) field).
5rogidString version of the redundant object group identifier (64 bit version of the hash digest of primary amino acid sequence with the NCBI taxonomy identifier appended at the end). See note 2.
6icrogidInteger version of the canonical redundant object group (crogid) (A selected irogid to represent the canonical group). See note 3.
7crogidString version of the canonical(1) redundant object group (A selected rogid to represent the canonical group). See note 3.

Notes:

(1) Some protein sequence records can be mapped to an EntrezGene record but will not have an entry in this column because they are not RefSeq records. In these cases, use the irogid (or icrogid) to retrieve all other entries in this table with identical sequences (or belonging to the same canonical group)- one of these may have an entry in this Entrez Gene Id in this column.


(2) Please see http://www.ncbi.nlm.nih.gov/pubmed/18823568 for algorithm describing how you can generate this key from a protein sequence.


(3) Please refer the following page for details on canonicalization process. http://irefindex.uio.no/wiki/Canonicalization