Personal tools
NoteImportantAs of March 15th, 2013, this site is maintained at http://irefindex.org

This site is for archival purposes and may eventually be deleted.

Statistics iRefIndex 8.0

From Donaldson Group

Jump to: navigation, search

Contents

Summary

Last updated: 2011-01-12

  • Total interaction source records : 1,057,642
  • Total distinct interactions (based on RIGID)  : 480,368 (45.42 % of total interactions)
  • Total distinct proteins (based on ROGID)  : 91,936
  • Distinct proteins used in consolidated interactions : 86, 757
  • Although 91,936 proteins were mapped to a ROGID, not all of them appear in iRefIndex interactions. This may be due to two reasons;
  1. When not all components of an interaction was mapped to a ROGID, that interaction is not included (as it would not have a RIGID). If a protein occurred only in interactions that were not included this will not appear as part of any interaction in iRefIndex.
  2. Proteins interacting with only none-proteins (DNA, small molecules etc..) will not appear in iRefIndex.

This page lists statistics for of iRefIndex that includes all of the data from sources used for the current build Sources_iRefIndex_8.0.

Interactions available from major taxonomies

Top 15 uncorrected taxonomy groups in iRefIndex (Taxonomy identifiers as they appear in original source)

NCBI taxonomy identifier Scientific_name Number_of_interactions
559292Saccharomyces cerevisiae S288c162083
9606Homo sapiens130732
4932Saccharomyces cerevisiae60164
7227Drosophila melanogaster46917
40674Mammalia36385
10090Mus musculus18085
83333Escherichia coli K-1217224
6239Caenorhabditis elegans13831
4896Schizosaccharomyces pombe13460
197Campylobacter jejuni12025
3702Arabidopsis thaliana7994
10116Rattus norvegicus6688
562Escherichia coli5294
632Yersinia pestis3818
160Treponema pallidum3647
  • Full list [[1]]

Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)

NCBI taxonomy identifier Scientific_name Number_of_interactions
4932Saccharomyces cerevisiae186503
9606Homo sapiens138480
7227Drosophila melanogaster46921
83333Escherichia coli K-1217008
10090Mus musculus14615
6239Caenorhabditis elegans13831
4896Schizosaccharomyces pombe13471
197Campylobacter jejuni12025
3702Arabidopsis thaliana7996
10116Rattus norvegicus5057
155864Escherichia coli O157:H7 str. EDL9334924
632Yersinia pestis3822
160Treponema pallidum3647
1148Synechocystis sp. PCC 68033229
1392Bacillus anthracis3087
  • Full list [[2]]

Interactions (Corresponds to Table 6 in PMID 18823568)

BIND62862
GRID24223 245516
DIP26437 38962 89630
INTACT24797 34903 37889 131339
MINT22660 41142 36873 48037 85802
HPRD2653 11892 1574 6777 5537 40569
OPHID2346 8736 1442 7366 6822 13071 47522
MPACT7084 8466 7002 6171 6470 0 0 13328
MPPI381 145 63 95 93 212 183 0 830
CORUM238 172 114 238 119 342 236 0 15 2607
BIND_TRANSLATION47304 22231 25048 22972 21585 2 0 6883 113 14 49527
BINDGRIDDIPINTACTMINTHPRDOPHIDMPACTMPPICORUMBIND_TRANSLATION
(9440)(170793)(27921)(61935)(18079)(19394)(28503)(1152)(267)(1918)(1812)

Interactors

BIND40752
GRID17804 31832
DIP17676 18297 29980
INTACT18892 22275 24274 51140
MINT16979 18412 19687 25467 31660
HPRD3288 6324 3868 6321 4934 9851
OPHID3377 5783 4203 6770 5379 6712 9642
MPACT4718 4610 4725 4880 4796 0 1 4978
MPPI679 454 469 626 562 369 422 0 864
CORUM2049 2246 2208 3137 2528 2032 2244 0 416 4365
BIND_TRANSLATION28781 14239 14973 15386 14073 758 794 4303 292 659 30021
BINDGRIDDIPINTACTMINTHPRDOPHIDMPACTMPPICORUMBIND_TRANSLATION
(6570)(5419)(2336)(15150)(3987)(1185)(1080)(15)(39)(555)(1003)


Summary of mapping interaction records to RIGs (Corresponds to Table 5 in PMID 18823568)

SourceTotal recordsProtein-only interactorsPPI Assigned to RIGIDUnique RIGIDs
bind1936489395791228(97.0955%)62862(68.9065%)
grid362355357976357524(99.8737%)245516(68.6712%)
dip909949099489911(98.8098%)89630(99.6875%)
intact156558154962154305(99.5760%)131339(85.1165%)
mint122775122775122298(99.6115%)85802(70.1581%)
HPRD830228302283022(100.0000%)40569(48.8654%)
ophid732577325773160(99.8676%)47522(64.9563%)
MPACT165041650416293(98.7215%)13328(81.8020%)
MPPI181418141699(93.6604%)830(48.8523%)
CORUM284428442844(100.0000%)2607(91.6667%)
BIND_Translation1499186658365358(98.1602%)49527(75.7780%)
ALL125368910646881057642(99.3382%)480368(45.4188%)


Assignment of protein interactors to ROGs (Corresponds to Table 3 in PMID 18823568)

SourceProtein_IntractorsAssigned%ArbitraryN_and_YUnassignedUnique proteins
bind28548227280495.559108705392440752
BIND_Translation20185618645392.36936113807149730021
CORUM1291612916100.00000004365
dip309782943095.002964142547829980
grid393523230482.08996833321231832
HPRD12381212054197.3581311515609851
intact12904312485296.752237378032851140
mint875098372795.67822363913831660
MPACT403494011899.4275012304978
MPPI3628345795.2867041130864
ophid14642314517499.1470241100269642
All1101348105191695.51171093031559694391936


ROG summary (Corresponds to Table 4 in PMID 18823568)

Decimal_scoreBinary_flagString_scoreScore_classProteinsPercentagebindgriddipintactmintHPRDophidMPACTBIND_TranslationMPPICORUM
1000000000000000001P170861464.3406%2319261982201210768158601256823066681917302312916
2000000000000000010S1578585.2534%0528292361268214150693558200
554000000001000101010SVGO1336373.0542%002003363500000
131201100000000010000001PMQ1210141.9080%000000002101400
8194000010000000000010SI1123361.1201%123360000000000
65000000000001000001PD180750.7332%80730002000000
42000000000000101010SVG124220.2199%0010400231800000
41000000000000101001PVG118950.1721%01895000000000
129000000000010000001PM15480.0498%473004300000320
139265100010000000000001PIQ12340.0212%0000000023400
10000000000000001010SV11720.0156%00514423000000
8193000010000000000001PI1590.0054%000518000000
9000000000000001001PV160.0005%00015000000
66000000000001000010SD140.0004%04000000000
130000000000010000010SM110.0001%00000100000
5000000000000000101PU2232662.1125%00037211601935625175853200
16386000100000000000010SE254050.4908%54050000000000
147458100100000000000010SEQ218730.1701%00000000187300
16385000100000000000001PE21560.0142%0001479000000
6000000000000000110SU21290.0117%00100174800000
147457100100000000000001PEQ2440.0040%000000004400
773000000001100000101PUO+2180.0016%00090090000
770000000001100000010SO+260.0005%00060000000
16514000100000010000010SME230.0003%30000000000
1797000000011100000101PUOX+220.0002%00020000000
774000000001100000110SUO+210.0001%00010000000
778000000001100001010SVO+210.0001%00010000000
17000000000000010001PT3896408.1391%11776105580254317060122062888470
18000000000000010010ST3566325.1421%020925105568600000
131217100000000010010001PTMQ338560.3501%00000000385600
81000000000001010001PTD314960.1358%14960000000000
8210000010000000010010STI38550.0776%8550000000000
145000000000010010001PTM31890.0172%132002200000350
139281100010000000010001PTIQ3410.0037%000000004100
163985101000000010010001PTMYQ3270.0025%000000002700
8209000010000000010001PTI3130.0012%000130000000
16530000100000010010010STME3130.0012%130000000000
146000000000010010010STM330.0003%00000300000
26000000000000011010SVT310.0001%00010000000
147474100100000000010010STEQ45070.0460%0001000050600
16402000100000000010010STE43170.0288%3160100000000
789000000001100010101PUTO+4140.0013%000140000000
22000000000000010110SUT490.0008%00100800000
16401000100000000010001PTE420.0002%00020000000
131073100000000000000001PQ5121091.0995%000600001210300
810000000001100101010SVGO+574670.6780%00000746700000
4393000001000100101001PVGL+568260.6198%06826000000000
4394000001000100101010SVGL+532190.2923%0010400311500000
131089100000000000010001PTQ58160.0741%00012000080400
4354000001000100000010SL+55350.0486%0752620000000
4357000001000100000101PUL+52260.0205%0000002220400
4373000001000100010101PUTL+5650.0059%000800005700
5381000001010100000101PUXL+5290.0026%000910190000
5386000001010100001010SVXL+5180.0016%000171000000
21000000000000010101PUT5110.0010%00020050400
4374000001000100010110SUTL+5100.0009%001000000000
1802000000011100001010SVOX+540.0004%00040000000
4358000001000100000110SUL+510.0001%00100000000
5378000001010100000010SXL+510.0001%00010000000
32769001000000000000001PY6147281.3373%3325102548282407320529350
65601010000000001000001PDN681950.7441%5200224702530763650
81922010100000000000010SEN644220.4015%44220000000000
65537010000000000000001PN619730.1791%96018992352915317055110
32833001000000001000001PDY67550.0686%7550000000000
32770001000000000000010SY63860.0350%0223294250013200
212993110100000000000001PENQ62580.0234%0000000025800
163969101000000010000001PMYQ62110.0192%0000000021100
73729010010000000000001PIN62020.0183%0002020000000
163841101000000000000001PYQ61170.0106%0000000011700
32785001000000000010001PTY6810.0074%3500000004600
196737110000000010000001PMNQ6400.0036%000000004000
81921010100000000000001PEN6280.0025%0001110000160
196625110000000000010001PTNQ6230.0021%000100002200
65617010000000001010001PTDN6210.0019%000000002100
81938010100000000010010STEN6200.0018%190100000000
196609110000000000000001PNQ6170.0015%000100001600
163857101000000000010001PTYQ6150.0014%000000001500
213009110100000000010001PTENQ6140.0013%000000001400
65553010000000000010001PTN6120.0011%00180300000
81937010100000000010001PTEN650.0005%00003000020
196753110000000010010001PTMNQ620.0002%00000000200
32897001000000010000001PMY620.0002%00000000020
196673110000000001000001PDNQ620.0002%00000000200
32786001000000000010010STY620.0002%00200000000
147473100100000000010001PTEQ620.0002%00000000200
81986010100000001000010SDEN610.0001%10000000000

Scores (Corresponds to Table 2 in PMID 18823568)

CharacterDescription of feature (when the value is 1)Frequency
DThe source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.18549(1.6951%)
EThe protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt.13070(1.1944%)
GThe interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.55466(5.0688%)
LMore than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)10930(0.9988%)
MThe protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.25909(2.3677%)
+More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).18443(1.6854%)
NThe protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.15235(1.3923%)
OMore than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.41150(3.7605%)
IThe protein reference used was an NCBI GenInfo Identifier (I).13740(1.2556%)
UThe protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.23781(2.1732%)
TThe taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made154714(14.1386%)
VThe protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.55668(5.0873%)
QThe protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.41222(3.7671%)
PThe interaction record's primary (P) reference for the protein was used to make the assignment905994(82.7948%)
SOne of the interaction record's secondary (S) references for the protein was used to make the assignment188271(17.2052%)
Y the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)16324(1.4918%)
XMore than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record54(0.0049%)

All iRefIndex Pages

Follow this link for a listing of all iRefIndex related pages (archived and current).