Creative Commons License
This document is licensed under a Creative Commons License.


PaLS: Pathway and Literature Strainer


  1. Introduction and purpose
  2. Usage
  3. Examples
  4. Authors
  5. Terms of use
  6. Privacy and Security
  7. References
  8. Acknowledgements

Introduction and purpose

PaLS is a tool that analyzes sets of lists of genes or single lists of genes. It filters those genes/clones/proteins that are referenced by a given percentage of PubMed references, Gene Ontology terms, KEGG pathways or Reactome pathways.

The conversions from your input identifiers to PubMed, GO, KEGG and Reactome Pathway IDs is taken from the same set of pregenerated tables used in IDconverter and IDClight. These tables are updated every two months (following Ensembl release schedule), and for the conversions used in PaLS follow the path shown in the figure:

Regardless of the caveats, please enjoy and let us know your suggestions, comments and complaints.

Usage

Input Data

The organism (human, mouse, or rat) has to be selected, as well as the type of identifier in the input file. The types of identifiers allowed are: Gene Name (HUGO),GenBank accession, UniGene cluster ID, Ensembl gene ID, Clone ID, Affymetrix ID, RefSeq_RNA, RefSeq_peptide, EntrezGene ID, and SwissProt name.

The input file can be a lists or several lists of identifiers in a single plain text file with the following structure:

#First_list_name
id1
id2
id3
#Second_list_name
id4
id1
id5

Human Ensembl Gene IDs example:

#First List	
ENSG00000156006	
ENSG00000170956	
ENSG00000105352	
ENSG00000142615	
ENSG00000204496	
ENSG00000124469	
ENSG00000147224	
ENSG00000115602	
ENSG00000147168	
ENSG00000141682	
#Second List
ENSG00000206235 
ENSG00000206299	
ENSG00000016082	
ENSG00000116703	
ENSG00000145545	
ENSG00000171873

Comments:

Input from other Asterias applications

It is now possible to send results from the rest of the applications of Asterias to PaLS. In the following examples, scroll to the bottom of the results to get to the PaLS links: ADaCGH (analysis of aCGH data), Pomelo II (differential expression from simple and complex experimental designs and survival data), Tnasas (class prediction from expression data), GeneSrF (gene selection for class prediction using random forests), and SignS (molecular signatures with survival data).

Filtering criteria

For each of the four databases considered (PubMed references, Gene Ontology terms, KEGG pathways, and Reactome pathways) three different analysis/filtering methods are offered:

  1. Per list analysis. For each of the lists in the input files, a list with all the common records in the database that make reference to more than the percentage entered by the user will be shown.
  2. Global analysis. Same as the previous analysis but grouping together all IDs in all lists (and removing the repeats).
  3. Global analysis. Common references to more than a given percentage of IDs in more than a given percentage of the lists.
If a percentage threshold is not entered, that specific analysis is not performed. For instance, it is recommended the user leaves all but the first type of analysis empty for an input file with a single list, as it makes no sense to perform the 2nd and 3rd filtering methods.

Example: Given the following thresholds for PubMed references:

the expected output will be:
  1. For each list, those articles common to more than 60% of the IDs.
  2. For the whole set of non repeated IDs, those articles common to more than 50% of the IDs.
  3. Those articles that are common to at least 40% of IDs in at least 90% of the lists.

Output

The results are shown in a different tab for each database (PubMed references, Gene Ontology terms, KEGG pathways, and Reactome pathways). For each of those, the results of the first analysis is shown on top, with the following structure for each list:

Per list, common references to ≥ X% of IDs
List nameGraph plot
 Linked reference 1Percentage
and members
 Linked reference 2Percentage
and members

Next is the second analysis, with a similar structure:

Global, common references to ≥ Y% of IDs
Graph plot
Linked reference 1Percentage
and members
Linked reference 2Percentage
and members

Finally, the output for the the third analysis, with the structure:

Global, common references to ≥ Z1% of IDs in ≥ Z2% of lists
Linked reference 1Members
Linked reference 2Members

All members lists are enriched with links to IDClight, so more information about each ID can be obtained.

Graphs

The graph or network plots shown in the output are meant as a visual aid to the user. They are displayed to make evident the organization of the data. The distance among identifiers is related to the number of references they share (the more references are shared, the more close they are). The graph plots are available in four different formats: png, postscript, pdf, and svg (compatible with some browsers).

Notes:

Examples

Use any of these two files as an example of input file:

Authors

Andrés Caņada and Andreu Alibés created PaLS with help from Ramón Díaz-Uriarte.

This application is entirely written in Python and is running on a cluster of machines using Debian GNU/Linux as operating system, Apache as web server and MySQL as database server.

The graphs plots are created using the NetworkX Python package.

Terms of use


Privacy and Security

Uploaded data set are saved in temporary directories in the server and are accessible through the web until they are erased after some time. Anybody can access those directories, nevertheless the name of the files are not trivial, thus it is not easy for a third person to access your data.

In any case, you should keep in mind that communications between the client (your computer) and the server are not encrypted at all, thus it is also possible for somebody else to look at your data while you are uploading or downloading them.


References

PaLS itself: Alibés, A, et al. (2008) PaLS: filtering common literature, biological terms and pathway information. Acids Res. 2008 36: W364-W367."

IDconverter: Alibés, A, et al. (2007) IDconverter and IDClight: Conversion and annotation of gene and protein IDs, BMC Bioinformatics, 8:9 link

NCBI: D.L. Wheeler, et al (2006) Database resources of the National Center for Biotechnology Information, Nucleic Acids Research 34:D173 link

KEGG: M. Kanehisa, et al (2006) From genomics to chemical genomics: new developments in KEGG, Nucleic Acids Research 34:D354 link

Reactome: G. Joshi-Tope, et al (2005) Reactome: a knowledgebase of biological pathways, Nucleic Acids Research 33:D428 link

Gene Ontology: M. Ashburner, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nature Genetics 25, 25 link

Ensembl: E. Birney, et al. (2006) Ensembl 2006, Nucleic Acids Research 34:D556 link

Acknowledgements

We are grateful to Edward R. Morrissey for providing us with his knowledge of AJAX and his AJAX code implemented in Pomelo II.

We thank Alex Canada from designpeople.net for the web design.

We thank Jane L. Rosov for her assitance with the process of obtaining the NLM License Agreement.

Copyright

This document is copyrighted. Copyright © 2006-07 Andreu Alibés.

Last Update: December 26th, 2007. Contact