Personal tools
NoteImportantAs of March 15th, 2013, this site is maintained at http://irefindex.org

This site is for archival purposes and may eventually be deleted.

Statistics iRefIndex 7.0

From Donaldson Group

Jump to: navigation, search

Contents

Summary

Last updated: 2010-05-21

  • Total interaction source records : 926,113
  • Total distinct interactions (based on RIGID): 433,617 ( 46.8 % of total interactions)
  • Total distinct proteins (based on ROGID)  : 83,234

This page lists statistics for our internal version of iRefIndex that includes all of the data from sources used for the current build Sources_iRefIndex_7.0. This full build of the iRefIndex contains data that cannot be redistributed according to usage policies of the source databases (namely, from DIP, HPRD, CORUM and MPact databases). Please contact ian.donaldson at biotek.uio.no if you would like to obtain a copy of the full iRefIndex build under an academic, collaborative agreement.

The data that are freely available at ftp://ftp.no.embnet.org/irefindex/data/archive/release_7.0/ are a subset of the full build that we can freely redistribute according to the usage policies of the source databases. Please refer to http://irefindex.uio.no/wiki/Statistics_iRefIndex_free_7.0 for statistics that are applicable to this free dataset.

Interactions available from major taxonomies

Top 15 uncorrected taxonomy groups in iRefIndex (Taxonomy identifiers as they appear in original source)

NCBI taxonomy identifier Scientific_name Number_of_interactions
9606Homo sapiens106497
7227Drosophila melanogaster47149
40674Mammalia35023
10090Mus musculus12974
6239Caenorhabditis elegans12973
4896Schizosaccharomyces pombe12250
562Escherichia coli12117
197Campylobacter jejuni12025
3702Arabidopsis thaliana6291
10116Rattus norvegicus4819
83333Escherichia coli K-123687
160Treponema pallidum3647
1142Synechocystis3065
36329Plasmodium falciparum 3D72758
  • Full list [[1]]

Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)

NCBI taxonomy identifier Scientific_name Number_of_interactions
9606Homo sapiens114644
7227Drosophila melanogaster47152
6239Caenorhabditis elegans12973
4896Schizosaccharomyces pombe12270
197Campylobacter jejuni12025
83333Escherichia coli K-1211179
10090Mus musculus11173
3702Arabidopsis thaliana6297
155864Escherichia coli O157:H7 EDL9334928
10116Rattus norvegicus3738
160Treponema pallidum3647
1148Synechocystis sp. PCC 68033177
36329Plasmodium falciparum 3D72758
83334Escherichia coli O157:H72560
  • Full list [[2]]

Interactions (Corresponds to Table 6 in PMID 18823568)

BIND62903
BIOGRID23140 226496
DIP26433 32865 61680
HPRD3029 11612 843 39966
INTACT24214 32627 25059 8652 117840
MINT22100 37903 30107 6758 46689 79710
MPACT6953 8271 6847 0 6147 6430 13321
MPPI376 141 41 304 92 76 0 830
OPHID2295 8356 882 18093 7330 6482 0 183 47530
CORUM232 167 64 549 220 107 0 15 236 2607
BIND_TRANSLATION42739 22497 18370 3980 19401 17755 2145 369 2928 193 0 48765
BINDBIOGRIDDIPHPRDINTACTMINTMPACTMPPIOPHIDCORUMI2DBIND_TRANSLATION
(10382)(157990)(16953)(13960)(58119)(16958)(1278)(240)(26312)(1836)(0)(2633)

Interactors

BIND40790
BIOGRID17313 29090
DIP15526 14440 20171
HPRD3407 6005 1236 9750
INTACT18310 20511 15628 6189 44616
MINT16422 17286 15000 4935 23979 29217
MPACT4655 4547 4646 0 4877 4761 4972
MPPI675 435 289 429 592 520 0 865
OPHID3307 5444 1226 7453 6036 4918 1 421 9645
CORUM2028 2064 842 2293 2908 2225 0 415 2245 4365
BIND_TRANSLATION24977 15261 12008 3576 15673 13975 2370 680 3448 1958 0 26792
BINDBIOGRIDDIPHPRDINTACTMINTMPACTMPPIOPHIDCORUMI2DBIND_TRANSLATION
(10653)(4470)(1727)(810)(13535)(3380)(15)(47)(1212)(682)(0)(617)

Summary of mapping interaction records to RIGs (Corresponds to Table 5 in PMID 18823568)

SourceTotal recordsProtein-only interactorsPPI Assigned to RIGIDUnique RIGIDs
bind1936489395791276(97.1466%)62903(68.9152%)
grid333977329642329185(99.8614%)226496(68.8051%)
dip629036290361843(98.3149%)61680(99.7364%)
intact140723139787139310(99.6588%)117840(84.5883%)
mint113258113258112159(99.0296%)79710(71.0688%)
HPRD400754007540075(100.0000%)39966(99.7280%)
ophid732577325773160(99.8676%)47530(64.9672%)
MPACT165041650416286(98.6791%)13321(81.7942%)
MPPI181418141701(93.7707%)830(48.7948%)
CORUM284428442844(100.0000%)2607(91.6667%)
BIND_Translation1406016158558274(94.6237%)48765(83.6823%)
ALL1119604935626926113(98.9832%)433617(46.8212%)


Assignment of protein interactors to ROGs (Corresponds to Table 3 in PMID 18823568)

SourceProtein_IntractorsAssigned%ArbitraryN_and_YUnassignedUnique proteins
bind28548227297095.617208602390540790
BIND_Translation18756317331392.40269625496412926792
CORUM1291612916100.00000004365
dip207851853689.1797125552746720171
grid363333025583.27145869520429090
HPRD9773966398.8745644609750
intact10859310544897.103919286226444616
mint790577574395.80812308622629217
MPACT403494011299.4126002374972
MPPI3628345695.2591045125865
ophid14642314533099.253510398469645
All93090288774995.36441693716653956383234


ROG summary (Corresponds to Table 4 in PMID 18823568)

Decimal_scoreBinary_flagString_scoreScore_classProteinsPercentageBINDBioGridDIPMINTHPRDOPHIDMPPIMPACTIntActCORUMBIND_Translation
1000000000000000001P160763165.2734%2320812862307331401256833023306661013061291619
8193000010000000000001PI115209416.3383%00010000480152045
2000000000000000010S1431004.6299%044174262442772006927255015432
8194000010000000000010SI1123361.3252%123360000000000
65000000000001000001PD180760.8675%80730030000000
41000000000000101001PVG115830.1701%01583000000000
10000000000000001010SV112930.1389%00514000023701037
129000000000010000001PM15280.0567%473000003202300
554000000001000101010SVGO14770.0512%0000477000000
42000000000000101010SVG11710.0184%00120159000000
66000000000001000010SD1990.0106%049000000086
5000000000000000101PU2228262.4520%000289019519320251917900
16386000100000000000010SE254290.5832%54230600000000
16385000100000000000001PE215320.1646%00020200007200610
6000000000000000110SU28150.0875%015845000100737
16449000100000001000001PDE21200.0129%0003400008600
773000000001100000101PUO+2130.0014%00030100900
774000000001100000110SUO+210.0001%00000000100
778000000001100001010SVO+210.0001%00000000100
17000000000000010001PT3160601.7252%117850016320122460247500
18000000000000010010ST381040.8706%00101606042000301043
8209000010000000010001PTI320520.2204%000000001302039
81000000000001010001PTD314970.1608%14960010000000
8210000010000000010010STI38550.0918%8550000000000
145000000000010010001PTM31840.0198%132000003501700
26000000000000011010SVT310.0001%00000000100
16402000100000000010010STE43170.0341%3160100000000
16401000100000000010001PTE42690.0289%0000000040265
22000000000000010110SUT4170.0018%003014000000
789000000001100010101PUTO+4140.0015%000000001400
790000000001100010110SUTO+410.0001%00001000000
4393000001000100101001PVGL+558540.6289%05854000000000
4362000001000100001010SVL+554980.5906%00000000005498
4354000001000100000010SL+553360.5732%014119500000004127
810000000001100101010SVGO+51930.0207%0000193000000
4357000001000100000101PUL+5840.0090%000008400000
4394000001000100101010SVGL+5700.0075%018061000000
4374000001000100010110SUTL+5520.0056%005200000000
131089100000000000010001PTQ5390.0042%000000003900
5381000001010100000101PUXL+5290.0031%0000019001000
4373000001000100010101PUTL+590.0010%00010000800
21000000000000010101PUT570.0008%00020500000
1802000000011100001010SVOX+540.0004%00000000400
5382000001010100000110SUXL+530.0003%00003000000
5386000001010100001010SVXL+520.0002%00010000100
131073100000000000000001PQ510.0001%00000000100
131077100000000000000101PUQ510.0001%00000000100
32769001000000000000001PY688730.9532%3051402785073150229700
81922010100000000000010SEN645240.4860%45150900000000
32833001000000001000001PDY67550.0811%7550000000000
65537010000000000000001PN66080.0653%17811122646011023400
32770001000000000000010SY65670.0609%0040525000095042
65601010000000001000001PDN65590.0600%5200247025350200
81921010100000000000001PEN64180.0449%00030022010392
73729010010000000000001PIN62230.0240%0000000022300
81937010100000000010001PTEN6640.0069%000000200062
32785001000000000010001PTY6320.0034%320000000000
81938010100000000010010STEN6190.0020%190000000000
65553010000000000010001PTN680.0009%00000000800
147473100100000000010001PTEQ610.0001%00000000100
40961001010000000000001PIY610.0001%00000000100
32786001000000000010010STY610.0001%00100000000
163841101000000000000001PYQ610.0001%00000000100


Scores (Corresponds to Table 2 in PMID 18823568)

CharacterDescription of feature (when the value is 1)Frequency
DThe source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.11106(1.2054%)
EThe protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, seeunce information obtained form UniProt.12693(1.3777%)
GThe interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.8348(0.9061%)
LMore than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)16937(1.8383%)
MThe protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.712(0.0773%)
+More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).17164(1.863%)
NThe protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.6423(0.6971%)
OMore than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.704(0.0764%)
IThe protein reference used was an NCBI GenInfo Identifier (I).167561(18.1868%)
UThe protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.23872(2.591%)
TThe taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made29603(3.2131%)
VThe protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.15147(1.644%)
QThe protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.43(0.0047%)
PThe interaction record's primary (P) reference for the protein was used to make the assignment832046(90.309%)
SOne of the interaction record's secondary (S) references for the protein was used to make the assignment89286(9.691%)
Y the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)10230(1.1103%)
XMore than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record38(0.0041%)



All iRefIndex Pages

Follow this link for a listing of all iRefIndex related pages (archived and current).