
This document is licensed under a Creative Commons License.
PaLS: Pathway and Literature Strainer
- Introduction and purpose
- Usage
- Examples
- Authors
- Terms of use
- Privacy and Security
- References
- Acknowledgements
PaLS is a tool that analyzes sets of lists of genes or single lists of genes. It filters those genes/clones/proteins
that are referenced by a given percentage of PubMed references, Gene Ontology terms, KEGG pathways or Reactome pathways.
The conversions from your input identifiers to PubMed, GO, KEGG and Reactome Pathway IDs is taken from the same set of
pregenerated tables used in IDconverter and
IDClight. These tables are updated every two months (following Ensembl release
schedule), and for the conversions used in PaLS follow the path shown in the figure:
Regardless of the caveats, please enjoy and let us know your suggestions, comments and complaints.
The organism (human, mouse, or rat) has to be selected, as well as the type of identifier in the input file. The types
of identifiers allowed are: Gene Name (HUGO),GenBank accession, UniGene cluster ID, Ensembl gene ID, Clone ID, Affymetrix ID,
RefSeq_RNA, RefSeq_peptide, EntrezGene ID, and SwissProt name.
The input file can be a lists or several lists of identifiers in a single plain text file with the following structure:
#First_list_name
id1
id2
id3
#Second_list_name
id4
id1
id5
Human Ensembl Gene IDs example:
#First List
ENSG00000156006
ENSG00000170956
ENSG00000105352
ENSG00000142615
ENSG00000204496
ENSG00000124469
ENSG00000147224
ENSG00000115602
ENSG00000147168
ENSG00000141682
#Second List
ENSG00000206235
ENSG00000206299
ENSG00000016082
ENSG00000116703
ENSG00000145545
ENSG00000171873
Comments:
- There is no specific limit for the number of IDs or the number of lists that can be entered, but obviously the waiting time
increases for large files.
- If a given ID is not a correct ID or is not found in our database, it is excluded from the analysis.
- Repeated IDs inside a list is only considered once.
It is now possible to send results from the rest of the
applications of Asterias to PaLS. In the following examples, scroll to
the bottom of the results to get to the PaLS links: ADaCGH
(analysis of aCGH data), Pomelo
II (differential expression from simple and complex experimental
designs and survival data), Tnasas
(class prediction from expression data), GeneSrF
(gene selection for class prediction using random forests), and SignS
(molecular signatures with survival data).
For each of the four databases considered (PubMed references, Gene Ontology terms, KEGG pathways, and Reactome pathways) three
different analysis/filtering methods are offered:
- Per list analysis. For each of the lists in the input files, a list with all the common records in the database that make reference to more than the percentage entered by the user will be shown.
- Global analysis. Same as the previous analysis but grouping together all IDs in all lists (and removing the repeats).
- Global analysis. Common references to more than a given percentage of IDs in more than a given percentage of the lists.
If a percentage threshold is not entered, that specific analysis is not performed. For instance, it is recommended the user leaves all but the first type of analysis empty for an input file with a single list, as it makes no sense to perform the 2nd and 3rd filtering methods.
Example: Given the following thresholds for PubMed references:

the expected output will be:
- For each list, those articles common to more than 60% of the IDs.
- For the whole set of non repeated IDs, those articles common to more than 50% of the IDs.
- Those articles that are common to at least 40% of IDs in at least 90% of the lists.
The results are shown in a different tab for each database (PubMed references, Gene Ontology terms, KEGG pathways, and Reactome pathways). For each of those,
the results of the first analysis is shown on top, with the following structure for each list:
| Per list, common references to ≥ X% of IDs |
| List name | Graph plot |
|   | Linked reference 1 | Percentage and members |
|   | Linked reference 2 | Percentage and members |
Next is the second analysis, with a similar structure:
| Global, common references to ≥ Y% of IDs |
| Graph plot |
| Linked reference 1 | Percentage and members |
| Linked reference 2 | Percentage and members |
Finally, the output for the the third analysis, with the structure:
| Global, common references to ≥ Z1% of IDs in ≥ Z2% of lists |
| Linked reference 1 | Members |
| Linked reference 2 | Members |
All members lists are enriched with links to IDClight, so more information about each ID can be obtained.
The graph or network plots shown in the output are meant as a visual aid to the user. They are displayed to make evident the organization of the data.
The distance among identifiers is related to the number of references they share (the more references are shared, the more close they are). The graph plots
are available in four different formats: png, postscript, pdf, and svg (compatible with some browsers).
Notes:
- Identifiers without any link to the rest of the members of a given list are not displayed.
- For now, and due both to performance issues and clarity of the resulting graph, plots for graphs with more than 100 nodes are not created.
- The graph plots do not take into account the percentage thresholds selected by the user. The display all links found in the database.
Use any of these two files as an example of input file:
- Human RefSeq_RNA identifiers: File and Ouput example
- Mouse Ensembl Gene ID identifiers: File
- Human HUGO names related to cancer: File
Andrés Caņada and Andreu Alibés created PaLS with help from
Ramón Díaz-Uriarte.
This application is entirely written in Python and is running on a cluster of machines
using Debian GNU/Linux as operating system, Apache as web server and MySQL as database server.
The graphs plots are created using the NetworkX Python package.
- You acknowledge that this Software is experimental in nature
and is supplied "AS IS", without obligation by the authors, the CNIO's
Statistical Computing Team or the CNIO to provide accompanying
services or support. The entire risk as to the quality and performance of the
Software is with you. The CNIO and the authors expressly disclaim any and all
warranties regarding the software, whether express or implied, including but
not limited to warranties pertaining to merchantability or fitness for a
particular purpose.
- If you use PaLS for any publication, we would appreciate if you
could let us know and if you cite our program (you know, "credit where
credit is due"). Please, provide the complete reference ("PaLS: filtering common literature, biological
terms and pathway information. Andreu Alibes, Andres Caņada, and Ramon
Diaz-Uriarte Nucl. Acids Res. 2008 36: W364-W367.") and the main web site:
http://pals.bioinfo.cnio.es).
- We appreciate if you give us feedback concerning bugs, errors or misconfigurations.
Complaints or suggestions are welcome.
- PaLS uses databases from
PubMed, from the US National Library of Medicine NLM, which have been
downloaded locally with the appropriate License
Agreement. Please beware of the following:
- NLM represents that the data provided under this Agreement
were formulated with a reasonable standard of care. Except for
this representation, and as otherwise specifically provided in
this Agreement, NLM makes no representation or warranties,
expressed or implied. This includes, but is not limited to,
any implied warranty of merchantability or fitness for a
particular purpose, with respect to the NLM databases, and NLM
specifically disclaims any such warranties and
representations.
- Duplication, resale,or redistribution of NLM data obtained from
PaLS must conform to fair use guidelines and U.S. and international
copyright law. Any duplication, resale, or redistribution must also
conform to NLM's quality assurance requirements listed in Paragraphs
D.2.a. through D.2.o of this
license agreement, copyright constraints listed in Paragraph F.,
and usage reports listed in Paragraph J. Written approval from NLM is
required before a non-U.S. licensee duplicates, resells, or
redistributes NLM data (except cataloging records) to others.
- Unless otherwise prohibited, organizations or institutions may
download small amounts of NLM-produced citations for
redistribution. For MEDLINE, this is about 1,000 per month or 12,000
records for each year of coverage. For other MEDLARS databases, it is
approximately 25% of the records in the file except for
AIDSLINE, AIDSTRIALS, and AIDSDRUGS which may be downloaded in their
entirety. Since NLM makes corrections and enhancements to and performs
maintenance on these records at least annually, you should plan to
replace or correct the records once a year to ensure that they are
still correct and searchable as a group.
- NLM databases are produced by a U.S. government agency and as such
the contents are not covered by copyright domestically. They may be
copyrighted outside the U.S. Some NLM produced data is from
copyrighted publications of the respective copyright claimants. Users
of the NLM databases are solely responsible for compliance with any
copyright restrictions and are referred to the publication data
appearing in the bibliographic citations, as well as to the copyright
notices appearing in the original publications, all of which are
incorporated by reference. Users should consult legal counsel before
using NLM-produced records to be certain that their plans are in
compliance with appropriate laws.
- All records must be identified as being derived from NLM databases.
- Some material in the NLM databases is from copyrighted publications of
the respective copyright claimants. Users of the NLM databases are
solely responsible for compliance with any copyright restrictions and
are referred to the publication data appearing in the bibliographic
citations, as well as to the copyright notices appearing in the
original publications, all of which are hereby incorporated by
reference.
- NLM assumes no responsibility or liability associated with PaLS
(or any of PaLS users') use and/or reproduction of copyrighted
material. Anyone contemplating reproduction of all or any portion of
any of the NLM databases should consult legal counsel.
- See the NCBI's Disclaimer and Copyright notice about
PubMed records.
- The databases PaLS is currently using were downloaded on 2007-01-02.
Uploaded data set are saved in temporary directories in the server and are
accessible through the web until they are erased after some time. Anybody can
access those directories, nevertheless the name of the files are not
trivial, thus it is not easy for a third person to access your data.
In any case, you should keep in mind that communications between the client
(your computer) and the server are not encrypted at all, thus it is also
possible for somebody else to look at your data while you are uploading or
downloading them.
PaLS itself: Alibés, A, et al. (2008) PaLS: filtering common literature, biological
terms and pathway information. Acids Res. 2008 36: W364-W367."
IDconverter: Alibés, A, et al. (2007) IDconverter and
IDClight: Conversion and annotation of gene and protein IDs, BMC Bioinformatics, 8:9
link
NCBI: D.L. Wheeler, et al (2006) Database resources of the National Center for Biotechnology Information, Nucleic Acids Research 34:D173
link
KEGG: M. Kanehisa, et al (2006) From genomics to chemical genomics: new developments in KEGG, Nucleic Acids Research 34:D354
link
Reactome: G. Joshi-Tope, et al (2005) Reactome: a knowledgebase of biological pathways, Nucleic Acids Research 33:D428
link
Gene Ontology: M. Ashburner, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nature Genetics 25, 25
link
Ensembl: E. Birney, et al. (2006) Ensembl 2006, Nucleic Acids Research 34:D556
link
We are grateful to Edward R. Morrissey for providing us with his knowledge of AJAX and his
AJAX code implemented in Pomelo II.
We thank Alex Canada from designpeople.net for the web design.
We thank Jane L. Rosov for her assitance with the process of obtaining the NLM License Agreement.
Copyright
This document is copyrighted. Copyright © 2006-07 Andreu Alibés.
Last Update: December 26th, 2007. Contact