Help
InnateDB is designed with Mozilla Firefox Web Browser.
Please see below for help and tutorials for InnateDB. If you have any questions about the database, please feel free to contact us.
A PowerPoint tutorial to using InnateDB is available here.
A guide to using Cerebral, for pathway and interaction network visualization in InnateDB, is available here.
A guide to how manual curation of molecular interactions is done in InnateDB, is available here.
Notes on using Cerebral
Please note: Java Runtime Environment (JRE) version 1.5.0 or greater must be installed for Cerebral to work.
When using Firefox in a linux environment,
please ensure that the Cerebral .jnlp file is opened using javaws rather than Firefox itself.
Note that on Mac OS X machines the latest Java update (Java update 4 - 1.6.0_13) seems to have caused an issue in launching .jnlp files. Launching these files is necessary to launch the webstart version of Cytoscape/Cerebral and CyOog. To fix this simply open the directory /System/Library/CoreServices. The issue should then be resolved.
Notes on linking to InnateDB
One can link to gene cards using an InnateDB or Ensembl Gene ID specified in the following URL format:
http://www.innatedb.ca/getGeneCard.do?id=ENSG00000136560
http://www.innatedb.ca/getGeneCard.do?id=IDBG-73552
Advanced Search for Genes and Proteins
This page provides the user with a number of options to search for
genes or proteins of interest.
Choose organism - InnateDB contains human and mouse genes and proteins.
The default on this page is to search for proteins/genes by the protein/gene
name. A search of InnateDB by the Protein/Gene name will automatically search
the full gene name, all encoded protein names, the gene symbol and all synonym
symbols (alternative names) for the gene. This option provides
users the flexibility to find a gene/protein without having to know
the HUGO name for it, which may be different from the common
name(s). An exact search will be much faster
and will return database entries that exactly match your query.
Several other options for searching the database are provided on this page:
- InnateDB molecule ID: stable identifiers for a gene/protein in InnateDB (e.g. IDBG-82738 is the ID for human TLR4).
- HUGO gene symbol/ID: Human Genome Organization Gene Nomenclature Committee official symbol or numerical ID. (For human genes only).
- Gene Ontology (GO) Accession: search for genes/proteins that are annotated by the GO consortium as being involved in a particular function, role or localization using the GO ID number (e.g. GO:0006954 is the ID for inflammatory response).
- Gene Ontology (GO) Term: as above but allows free-text search (e.g. inflammatory). Using exact searches is not recommended here.
- Ensembl ID: search using a human or mouse gene, protein or transcript accession number from the Ensembl database.
- UniProt Accession: search using an accession number from the UniProt protein database.
- Entrez Gene ID: search using an ID number from NCBI's Entrez Gene database.
- RefSeq Accession: search using an accession number from NCBI's RefSeq database.
- UniGene ID: search using an ID number from NCBI's UniGene database.
- OMIM ID: search using an ID number from the Online Mendelian Inheritance in Man database.
- EMBL Accession: search for a particular gene/protein based on the EMBL accession number.
- PFAM Accession: search for encoded proteins containing matching PFAM domains.
- InterPro Accession: search for encoded proteins containing matching InterPro domains.
Choose to return Genes (default) or proteins using the "Return a list of" menu.
Advanced Search for Interactions
InnateDB contains detailed information for more than 150,000 human and
mouse molecular interactions integrated from several of the major
public interaction databases along with several thousand manually-curated innate
immunity relevant interactions. See the statistics page for further details.
The Advanced Search for Interactions page allows users to search
InnateDB for molecular interactions of interest.
- Interaction Participant: interactions involving particular genes/proteins of interest may be searched similar to gene/protein searches as described above.
- Interaction Xref: search for interactions using an ID number for InnateDB (e.g. IDB-104135), by PubMed ID or an ID from one of the external interaction databases.
- Interaction Level: by default interactions involving molecules that directly interact with the genes/proteins of interest are returned. By choosing "Show direct and 2nd order interactions" both direct and secondary interactions are returned. Secondary interactors are molecules which interact with the direct interactors of the genes/proteins of interest.
- Host system: interaction searches can be limited to interactions detected in vitro, in vivo or ex vivo. The default is to return all.
- Cell Type: interaction searches can be limited to interactions annotated to occur in a particular cell type. Choose a cell type by browsing through the hierarchy of cell type terms or search for a cell type of interest by typing the name in the box provided and hitting enter. (e.g. try type 'neut' and hit enter). Open Biomedical Ontologies (OBO) cell type ontologies are used. Note: interactions with at least one evidence matching the chosen cell type term will be returned. Checking the "extended search" checkbox will also include all children terms of the selected term in the search.
- Tissue Type: interaction searches can be limited to interactions annotated to occur in a particular tissue type. Choose a tissue type similar to as described for cell type. OBO BRENDA tissue type terms are used. Note: interactions matching the chosen tissue type term AND all of that terms children terms will be returned.
- Interaction Type: interaction searches can be limited to interactions involved in a particular molecular function (e.g. phosphorylation, acetylation, etc).
- Molecule Type: search for interactions where at least one participant in the interaction is of the selected molecule type (protein, DNA or RNA).
- Interaction Detection Method: select a particular interaction detection method (e.g.coimmunoprecipitation) from the menu to return interactions detected using this method. Selection is done similar to cell and tissue type boxes. OBO PSI Molecular Interaction terms are used.
To reduce redundancy, interactions in InnateDB that have the same participants and interaction type are grouped together by default. Choose 'No' to return all redundant interactions separately.
Search Interactions or Genes by Pathway
The pathway advanced search page allows users to search for genes or
interactions found within a particular pathway. More than 3,500 pathways
have been loaded in InnateDB from major public pathway databases.
Pathways in InnateDB and the molecules in them are species-specific.
Please ensure you have selected the correct species.
You can select an example pathway from the top
drop-down menu or search any of the 3,500+ pathways from all data
sources by typing the pathway name in the second box.
You can also search by the external database pathway ID (e.g. REACTOME: 166016).
By selecting "Return a list of Genes", a list of all genes annotated in the pathway will be returned.
By default, a list of molecular interactions are returned which is restricted to interactions only
between annotated members of the pathway. If "No" is selected, a more
comprehensive list of interactions is returned, displaying interactions
between pathway members and all other molecules with which they
interact.
Note: If a pathway member has no annotated interactions, then it will
not be returned using the "Return a list of interactions" function.
Data Analysis - Upload your own data
We have had reports of users having some issues uploading data to InnateDB while using the Google Chrome browser. InnateDB is best used with the Firefox browser and is also tested with current versions of IE and Safari.
1. Select a file to upload by clicking on the "Upload File" button - upload a tab-delimited file of protein/gene identifiers or accession numbers and obtain a list of all genes, proteins, pathways, interactors or interactions that they are associated with. Alternatively, click on the "Web Form" button and paste your tab-delimited data in the text box (max. 1000 lines)
Note: There should be only one accession number per row. Probes that map to multiple genes should be removed.
Accession numbers from the following databases are currently accepted:
- Ensembl
- RefSeq
- Entrez
- UniProt
- InnateDB (gene IDs only)
2. Click on the column headers to specify which column in your data file contains the identifiers/accession numbers for each gene (and which database they come from). This is called the "Cross-reference ID".
You can only specify one cross-reference ID column. Please note that when using identifiers from InnateDB, only gene IDs are allowed, not interactions IDs!
3. Specify the Cross-reference database. This is the database where
the identifiers in the cross-reference column come from.
4. If you have included gene expression data - identify which columns contain the gene expression values and their associated p-values.
You may also identify the column containing the probe IDs if you have included them in your file.
Including quantitative data such as gene expression data is optional but a very useful way to investigate
quantitative data in a pathway and interaction network context and
to carry out subsequent analysis such as pathway
over-representation analysis. It is used to include gene expression values
in your file that are mapped to molecule cross-references.
Expression values must be in the format where a value of +2 represents a 2 fold increase
in expression and a value of -2 a 2 fold decrease in expression.
You can specify values from up to ten different conditions or
time-points. You can also specify a name for each condition.
4. Choose whether you want to return interactions,
interactors, genes or pathways associated with your list of genes or proteins.
- Returning a list of interactions allows one to identify all interactions in InnateDB in which the genes (or their encoded products) in the uploaded list are a participant and to construct a network of these interactions for visualization and further analysis. Detailed annotation and evidence is then available for each interaction. The resulting interaction network may then be downloaded in a variety of supported formats or interactively visualized using Cerebral.
- Returning a list of interactors allows one to identify all molecules in InnateDB which interact with the genes (or their encoded products) in the uploaded list.
- Returning a list of genes provides detailed annotation for each gene in the uploaded list and is a prerequisite to performing a Gene Ontology over-representation analysis.
- Returning a list of pathways provides pathway annotation for each gene in the uploaded list and is a prerequisite to performing a pathway over-representation analysis.
You can choose to filter the results by using one of the following methods:
- Genes - This will not return interactions involving any molecule other than those in the uploaded file. i.e. if molecule A interacts with B and C but only A and B are in your file, the interaction between A and C will not appear in the returned results. This is very useful to construct a network of interactions only between molecules in the uploaded list (e.g. differentially expressed genes).
- Pathways - This option limits the interactions returned to a particular pathway. You can search for any of the 3,500+ pathways from all data sources by typing the name of the pathway in the text box and by selecting one of the given choices.
Pathway Over-representation Analysis
To do pathway over-representation analysis (ORA) you first need to upload a
list of gene identifiers and associated fold-change in gene expression
values (and P values) as described above.
InnateDB recommends that you to upload
All genes from your array dataset not just
differentially expressed (DE) genes (probes mapping to multiple
different genes should be removed).
The pathway ORA tool uses the proportion of DE
genes on the whole array to determine if a particular pathway is
significant.
InnateDB also provides users with the option of
uploading a subset of genes and performing the pathway ORA
analysis. This subset analysis uses a slightly different algorithm
that does not take gene expression values into account.
This is necessary as the algorithm does not know the proportion of DE genes on the array.
Therefore, this analysis cannot handle data from multiple conditions.
If you have multiple probes for the same gene these values will be
averaged for the purposes of the pathway ORA.
Make sure you select "Return a list of Pathways" from the
menu in the data analysis section.
A list of pathways associated with the uploaded genes will be
returned.
To do the pathway ORA click on the red Pathway
ORA button at the top of the page.
This will take you to a page where you can choose the parameters for
the pathway over-representation analysis.
First you need to specify whether you are analyzing an entire
array dataset or just a subset of genes.
If you try to analyze a subset
of genes using the entire dataset algorithm or vice versa your results
will NOT be correct.
If you are analyzing a complete array dataset choose the
following parameters for the pathway over-representation analysis.
- Fold-Change Cutoff (+/-): choose what fold-change in gene expression threshold should be used to determine which genes are differently expressed. Default = +/- 1.5.
- Expression P-Value Cutoff: choose what P value threshold associated with each fold-change in gene expression value should be used to determine which genes are differently expressed. Default P < 0.05.
Now choose the analysis algorithm and multiple testing correction method.
- Choose algorithm: several different statistical methods are available to determine if pathways are significantly associated with DE genes - Hypergeometric, Fisher & Chi Square.
- Choose Correction Method: two options to correct for multiple testing are included - The Benjamini & Hochberg correction for the FDR and the more conservative Bonferroni correction.
Hit submit. A new page will be returned showing the pathways that
are significantly associated with up-regulated genes.
Click the
green button to see pathways that
are significantly associated with down-regulated genes.
Click on
the 'summary' link to see information for all genes in the pathway.
The interactions in the pathway, along with overlaid gene
expression data can be visualized by clicking on the 'visualize' link.
Gene Ontology Over-representation Analysis
To do gene ontology over-representation analysis (ORA) you first need to upload a
list of gene identifiers and associated fold-change in gene expression
values (and P values) as described above.
Make sure you select "Return a list of Genes" from the 4th drop-down
menu in the data analysis section.
Gene annotation including gene ontology terms associated with the uploaded genes will be
returned. (To show associated gene ontology terms go into the
show/hide options at the top of the page and select the relevant
columns to display - this is not necessary for the ORA tool).
To do the gene ontology ORA click on the red Ontology
ORA button at the top of the page.
This will take you to a page where you can choose the parameters for
the over-representation analysis.
Please see the pathway ORA section above for further details
regarding these options.