X!!TANDEM WITH PTMTREESEARCH ========================= read me: http://beetle/thegpm2/doc/ptmsearch_help.htm CONTENT of this read_me_and_installation_guide.txt ================================================== I. PTMTreeSearch documentation II. PTMTreeSearch source files III. INCLUDE PTMTreeSearch TO X!TANDEM (single thread version) IV. INCLUDE PTMTreeSearch TO X!!TANDEM (paralell version with MPI) V. READY TO USE SOURCE CODES OF X!TANDEM (and X!!TANDEM) with PTMTreeSearch I. PTMTreeSearch documentation ======================= Motivation: Tandem mass spectrometry has become a standard tool for identifying post-translational modifications (PTMs) of proteins. Algorithmic searches for PTMs from tandem mass spectrum data (MS/MS) tend to be hampered by noisy data as well as by a combinatorial explosion of search space. This leads to high uncertainty and long search-execution times. Results: To address this issue, we present PTMTreeSearch, a new algorithm that uses a large database of known PTMs to identify PTMs from MS/MS data. For a given peptide sequence, PTMTreeSearch builds a computational tree wherein each path from the root to the leaves is labeled with the amino acids of a peptide sequence. Branches then represent PTMs. Various empirical tree pruning rules have been designed to decrease the search-execution time by eliminating biologically unlikely solutions. PTMTreeSearch first identifies a relatively small set of high confidence PTM types, and in a second stage, performs a more exhaustive search on this restricted set using relaxed search parameter settings. An analysis of experimental data shows that using the same criteria for false discovery, PTMTreeSearch annotates more peptides than the current state-of-the-art methods and PTM identification algorithms, and achieves this at roughly the same execution time. PTMTreeSearch is implemented as a plugable scoring function in the X!Tandem search engine. Citation: If PTMTreeSearch is used in a study please cite the following article: PTMTreeSearch: a novel two-stage tree-search algorithm with pruning rules for the identification of post-translational modification of proteins in MS/MS spectra Attila Kertesz-Farkas, Beata Reiz, Roberto Vera, Michael P Myers, Sandor Pongor Bioinformatics 30 (2), 234-241, 2014 II. PTMTreeSearch source files: ======================= stack_PTMTreeSearch //extended stl template library PTMTreeSearch.h //frame of the PTMTreeSearch algorithm PTMTreeSearch.cpp //frame of the PTMTreeSearch algorithm PTMTreeSearchscore.h //implementation of the PTMTreeSearch scoring algorithm PTMTreeSearchscore.cpp //implementation of the PTMTreeSearch scoring algorithm III. INCLUDE PTMTreeSearch TO X!TANDEM (single thread version) ===================================================== 1. download X!Tandem source codes from: http://thegpm.org/ 2. copy PTMTreeSearch files into X!Tandem source codes. 3. undefine the macro #USE_MPI in the source file PTMTreeSearch.5 (comment out the 12th line in the PTMTreeSearch.h) 4. a) include the header file #include "PTMTreeSearch.h" to the mrefine.cpp file. b) in the mscore.h at 452th line change the "bool get_aa(vector &_m,const size_t _a,double &_d);" to virtual function "virtual bool get_aa(vector &_m,const size_t _a,double &_d);" NOTE: This functions is a member of the "class mscore". c) include the following source code part into mrefine.cpp code, bool mrefine::refine() function, around the 347th line. <-- from here /* * 6. new mrefine derived classes here */ //PTMTreeSearch stuff begin iRound = 7; m_pProcess->set_round(iRound); // round 6 strKey = "refine, PTMTreeSearch"; m_pProcess->m_xmlValues.get(strKey,strValue); if(strValue == "yes") { PTMTreeSearch *m_pPTMTreeSearch; // the object that is used to process PTMTreeSearch m_pProcess->m_bSaps = false; m_pPTMTreeSearch = PTMTreeSearchmanager::create_PTMTreeSearch(m_pProcess->m_xmlValues); if (m_pPTMTreeSearch == NULL) { cout << "Failed to create PTMTreeSearch\n"; return false; } m_pPTMTreeSearch->set_mprocess(m_pProcess); m_pPTMTreeSearch->refine(); } //PTMTreeSearch stuff end <-- till here d) in the mprocess.h file change the variable visibility from 'protected' -> 'public' around line: 355 (So, the variables beginning with string m_strLastMods; int m_iCurrentRound; bool m_bPermute; bool m_bPermuteHigh; bool m_bCrcCheck; ... end so on... should be publid) e) in the mdomains.h file please define the variable ' double m_fPval;' in the class maa and add m_fPval = rhs.m_fPval; to the 'maa& operator=(const maa &rhs)' function (before the 'return' command) f) in the mscore.h change the variable visibility from 'protected' -> 'public' around the line 388 (before 'float m_fErr; // error for the fragment ions' so that the variable M_fErr would be public) 5. compile the project. PTMTreeSearch is implemented as a pluggable scoring function. In order to use the PTMTreeSearch include the following line to the input.xml file. ptmtreesearch-score IV. INCLUDE PTMTreeSearch TO X!!TANDEM (parallel version with MPI) ========================================================== 1. download X!!Tandem source codes from: ftp://maguro.cs.yale.edu/Projects/Tandem you will need installed MPI and boosts packages as well 2. copy PTMTreeSearch files into X!Tandem source codes. 3. Same as section III/4 4. compile the project. PTMTreeSearch is implemented as a pluggable scoring function. In order to use the PTMTreeSearch include the following line to the input.xml file. PTMTreeSearch-score V. READY TO USE SOURCE CODES OF X!TANDEM (and X!!TANDEM) with PTMSEACH ======================================================================== the XTandem_PTMTreeSearch_src.zip contains the PTMTreeSearch source codes that are already incorporated to the X!!Tandem After unzipping it is ready for 'make'. NOTE: 1. PTMTreeSearch has been tested and compiled under linux operating system. 2. You might need to download protein sequence fasta files. 3. X!Tandem prints the identified protein id's to the standard output. This can be avoid by commenting out the 283 and 284 lines in the mreport.cpp file.