Personal tools
NoteImportantAs of March 15th, 2013, this site is maintained at http://irefindex.org

This site is for archival purposes and may eventually be deleted.

Statistics iRefIndex 4.0

From Donaldson Group

Jump to: navigation, search

Contents

Summary

  • Total distinct interactions : 369,457
  • Total distinct proteins : 83,388

This page lists statistics for our internal version of iRefIndex that includes all of the data from sources used for the current build Sources_iRefIndex_4.0. This full build of the iRefIndex contains data that cannot be redistributed according to usage policies of the source databases (namely, from DIP, HPRD and MPact databases). Please contact ian.donaldson at biotek.uio.no if you would like to obtain a copy of the full iRefIndex build under an academic, collaborative agreement.

The data that are freely available at ftp://ftp.no.embnet.org/irefindex/data/current/ are a subset of the full build that we can freely redistribute according to the usage policies of the source databases. Please refer to http://irefindex.uio.no/wiki/Statistics_iRefIndex_free_4.0 for statistics that are applicable to this free dataset.

Interactions available from major taxonomies

Taxonomy Number of interactions
562 (Escherichia coli )1243
4932 (Saccharomyces cerevisiae )116321
6239 (Caenorhabditis elegans)11912
7227 (Drosophila melanogaster)46794
9606 (Homo sapiens)117535
10090 (Mus musculus)10098
10116 (Rattus norvegicus)3372

Interactions (Corresponds to Table 6 in PMID 18823568)

BIND62921
BioGrid20497 163891
DIP25914 28969 56441
HPRD2893 1958 839 37956
INTACT24239 25653 24807 8075 111235
MINT21991 34654 29988 6270 45260 76607
MPACT6904 8489 6777 0 6087 6426 13321
MPPI385 26 41 303 89 71 0 829
OPHID2210 1333 887 17913 7196 6396 0 183 47297
CORUM113 18 29 390 121 66 0 9 158 1919
BINDBioGridDIPHPRDINTACTMINTMPACTMPPIOPHIDCORUM
(25903)(111633)(13594)(15201)(55807)(15712)(1137)(238)(26571)(1403)

Interactors

BIND40801
BioGrid14442 27471
DIP15395 13084 20111
HPRD3320 2472 1249 9539
INTACT18121 16827 15687 5792 41587
MINT16178 15064 14987 4620 23418 28428
MPACT4638 4560 4632 0 4859 4756 4972
MPPI671 212 292 429 575 504 0 862
OPHID3242 2300 1241 7357 5747 4709 1 421 9629
CORUM1551 746 670 1845 2293 1733 0 321 1849 3581
BINDBioGridDIPHPRDINTACTMINTMPACTMPPIOPHIDCORUM
(18591)(8169)(1838)(1040)(11893)(3241)(17)(45)(1280)(626)

Summary of mapping interaction records to RIGs (Corresponds to Table 5 in PMID 18823568)

SourceTotal recordsProtein-only interactorsPPI Assigned to RIGIDUnique RIGIDs
BIND1936489395791291(97.1625%)62921 (68.9236%)
BioGrid240501240501240197(99.8736%)163891 (68.2319%)
dip576755767556608(98.1500%)56441 (99.7050%)
intact129092128326127893(99.6626%)111235 (86.9750%)
mint109412109412107823(98.5477%)76607 (71.0488%)
HPRD380373803738028(99.9763%)37956 (99.8107%)
ophid732577325772907(99.5222%)47297 (64.8731%)
MPACT165041650416286(98.6791%)13321 (81.7942%)
MPPI181418141697(93.5502%)829 (48.8509%)
CORUM210421042104(100.0000%)1919 (91.2072%)
ALL862044761587754834(99.1133%)369457 (48.9455%)

Assignment of protein interactors to ROGs (Corresponds to Table 3 in PMID 18823568)

SourceProtein_IntractorsAssigned%ArbitraryNewUnassignedUnique proteins
BIND28548227364695.854007942389440801
CORUM103161031499.98060203581
dip207281851389.3140126147947520111
BioGrid293181916265.35921005359827471
HPRD9565949399.2473531549539
intact979889438796.325118334723641587
mint769087374595.88732268947228428
MPACT403494011299.4126002374972
MPPI3628345695.2591039133862
ophid14642314536299.27541036992599629
All72070568819095.48841149015217580883388

ROG summary (Corresponds to Table 4 in PMID 18823568)

Decimal_scoreBinary_flagString_scoreScore_classProteinsPercentageBINDBioGridDIPMINTHPRDOPHIDMPPIMPACTIntActCORUM
1000000000000000001P156234178.0265%23268575030716160125715302330666911330
130000000000010000010SM15510.0765%055100000000
66000000000001000010SD120.0003%0200000000
65000000000001000001PD195331.3227%8084144603000000
42000000000000101010SVG11490.0207%000014900000
8193000010000000000001PI1480.0067%02000000460
8194000010000000000010SI1123951.7198%123365900000000
129000000000010000001PM15230.0726%47310000320170
554000000001000101010SVGO16110.0848%000061100000
10000000000000001010SV1130.0018%0024000070
2000000000000000010S1347554.8224%074021743224225100069272420
773000000001100000101PUO+290.0012%0003000060
774000000001100000110SUO+210.0001%0000000010
778000000001100001010SVO+210.0001%0000000010
5000000000000000101PU2228183.1661%00026401952032025191950
6000000000000000110SU27370.1023%0659604600080
16385000100000000000001PE21890.0262%000000001890
16386000100000000000010SE254050.7500%5405000000000
146000000000010010010STM310.0001%0100000000
145000000000010010001PTM31700.0236%1320000035030
8209000010000000010001PTI3130.0018%00000000130
81000000000001010001PTD314870.2063%1486001000000
26000000000000011010SVT310.0001%0000000010
18000000000000010010ST386511.2004%0148710150614600030
17000000000000010001PT3263983.6628%118760016040122460246010290
8210000010000000010010STI39030.1253%8554800000000
16401000100000000010001PTE430.0004%0000000030
16402000100000000010010STE43150.0437%314010000000
789000000001100010101PUTO+4140.0019%00000000140
790000000001100010110SUTO+410.0001%0000100000
22000000000000010110SUT4150.0021%01301100000
4373000001000100010101PUTL+590.0012%0001000080
12546000011000100000010SLI+566600.9241%0666000000000
810000000001100101010SVGO+5590.0082%00005900000
21000000000000010101PUT5330.0046%00040500024
131073100000000000000001PQ520.0003%0000000020
131077100000000000000101PUQ510.0001%0000000010
131089100000000000010001PTQ5380.0053%00000000380
1802000000011100001010SVOX+540.0006%0000000040
4354000001000100000010SL+541930.5818%0298412090000000
5378000001010100000010SXL+510.0001%0000000010
4482000001000110000010SML+54090.0567%040900000000
4394000001000100101010SVGL+5500.0069%00005000000
5381000001010100000101PUXL+5270.0037%00000190080
5382000001010100000110SUXL+530.0004%0000300000
5386000001010100001010SVXL+520.0003%0001000010
4374000001000100010110SUTL+5520.0072%00520000000
4357000001000100000101PUL+5840.0117%00000840000
86274010101000100000010SLEN+630.0004%0210000000
82034010100000001110010STGDEN620.0003%0000200000
81938010100000000010010STEN6240.0033%24000000000
81937010100000000010001PTEN650.0007%3000002000
81922010100000000000010SEN658230.8080%545233662000000
81921010100000000000001PEN628690.3981%246200490983702212
65601010000000001000001PDN610.0001%1000000000
65553010000000000010001PTN6260.0036%00000000260
65537010000000000000001PN664630.8968%001122638136010030990
196625110000000000010001PTNQ610.0001%0000000010

Scores (Corresponds to Table 2 in PMID 18823568)

CharacterDescription of feature (when the value is 1)Frequency
DThe source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.11025(1.5422%)
EThe protein reference was a retired NCBI Identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence.14638(2.0476%)
GThe interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.871(0.1218%)
LMore than one possible assignment is possible (see + above). The assignment with the largest (L) SEGUID is arbitrarily chosen (see Methods)11493(1.6076%)
MThe protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.1654(0.2314%)
+More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).11582(1.6201%)
NThe protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.15217(2.1286%)
OMore than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.700(0.0979%)
IThe protein reference used was an NCBI GenInfo Identifier (I).20019(2.8003%)
UThe protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.23804(3.3297%)
TThe taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made38162(5.3381%)
VThe protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.890(0.1245%)
QThe protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.42(0.0059%)
PThe interaction record's primary (P) reference for the protein was used to make the assignment633105(88.5589%)
SOne of the interaction record's secondary (S) references for the protein was used to make the assignment81792(11.4411%)
XMore than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record37(0.0052%)