X!!TANDEM WITH PTMTREESEARCH
=========================

read me:
http://beetle/thegpm2/doc/ptmsearch_help.htm

CONTENT of this read_me_and_installation_guide.txt
==================================================
I.   PTMTreeSearch documentation
II.  PTMTreeSearch source files
III. INCLUDE PTMTreeSearch TO X!TANDEM (single thread version)
IV.  INCLUDE PTMTreeSearch TO X!!TANDEM (paralell version with MPI)
V.   READY TO USE SOURCE CODES OF X!TANDEM (and X!!TANDEM) with PTMTreeSearch


I.   PTMTreeSearch documentation
=======================
Motivation: Tandem mass spectrometry has become a standard tool for identifying post-translational
modifications (PTMs) of proteins. Algorithmic searches for PTMs from tandem mass spectrum
data (MS/MS) tend to be hampered by noisy data as well as by a combinatorial explosion
of search space. This leads to high uncertainty and long search-execution times.

Results: To address this issue, we present PTMTreeSearch, a new algorithm that
uses a large database of known PTMs to identify PTMs from MS/MS data. For a given
peptide sequence, PTMTreeSearch builds a computational tree wherein each path from
the root to the leaves is labeled with the amino acids of a peptide sequence.
Branches then represent PTMs. Various empirical tree pruning rules have been
designed to decrease the search-execution time by eliminating biologically unlikely
solutions. PTMTreeSearch first identifies a relatively small set of high confidence 
PTM types, and in a second stage, performs a more exhaustive search on this restricted
set using relaxed search parameter settings. An analysis of experimental data shows that
using the same criteria for false discovery, PTMTreeSearch annotates more peptides than
the current state-of-the-art methods and PTM identification algorithms, and achieves this
at roughly the same execution time. PTMTreeSearch is implemented as a plugable scoring
function in the X!Tandem search engine.

Citation:
If PTMTreeSearch is used in a study please cite the following article:
PTMTreeSearch: a novel two-stage tree-search algorithm with pruning rules for the identification of post-translational modification of proteins in MS/MS spectra
Attila Kertesz-Farkas, Beata Reiz, Roberto Vera, Michael P Myers, Sandor Pongor
Bioinformatics 30 (2), 234-241, 2014


II.  PTMTreeSearch source files:
=======================
stack_PTMTreeSearch   	//extended stl template library
PTMTreeSearch.h			//frame of the PTMTreeSearch algorithm
PTMTreeSearch.cpp		//frame of the PTMTreeSearch algorithm
PTMTreeSearchscore.h	//implementation of the PTMTreeSearch scoring algorithm
PTMTreeSearchscore.cpp	//implementation of the PTMTreeSearch scoring algorithm

III. INCLUDE PTMTreeSearch TO X!TANDEM (single thread version)
=====================================================
1.
download X!Tandem source codes from:
http://thegpm.org/

2.
copy PTMTreeSearch files into X!Tandem source codes.

3.
undefine the macro #USE_MPI in the source file PTMTreeSearch.5 (comment out the 12th line in the PTMTreeSearch.h)

4.
a) include the header file #include "PTMTreeSearch.h" to the mrefine.cpp file.
b) in the mscore.h at 452th line change the "bool get_aa(vector<maa> &_m,const size_t _a,double &_d);"
to virtual function 						"virtual bool get_aa(vector<maa> &_m,const size_t _a,double &_d);"
NOTE: This functions is a member of the "class mscore".

c) include the following source code part into mrefine.cpp code, bool mrefine::refine() function, around the 347th line.
<-- from here
/*
 * 6. new mrefine derived classes here
 */

//PTMTreeSearch stuff begin
	iRound = 7;
	m_pProcess->set_round(iRound); // round 6
	strKey = "refine, PTMTreeSearch";
	m_pProcess->m_xmlValues.get(strKey,strValue);
	if(strValue == "yes")	{
		PTMTreeSearch *m_pPTMTreeSearch; //  the object that is used to process PTMTreeSearch
		m_pProcess->m_bSaps = false;
		m_pPTMTreeSearch = PTMTreeSearchmanager::create_PTMTreeSearch(m_pProcess->m_xmlValues);
		if (m_pPTMTreeSearch == NULL) {
			cout << "Failed to create PTMTreeSearch\n";
			return false;
		}
		m_pPTMTreeSearch->set_mprocess(m_pProcess);
		m_pPTMTreeSearch->refine();
	}

//PTMTreeSearch stuff end

<-- till here

d) in the mprocess.h file change the variable visibility from 'protected' -> 'public' around line: 355
(So, the variables beginning with 	
	string m_strLastMods;
	int m_iCurrentRound;
	bool m_bPermute;
	bool m_bPermuteHigh;
	bool m_bCrcCheck;
	... end so on...
should be publid)

e) in the mdomains.h file please define the variable '	double m_fPval;' in the class maa and add
		m_fPval = rhs.m_fPval; to the 	'maa& operator=(const maa &rhs)' function (before the 'return' command)

f) in the mscore.h change the variable visibility from 'protected' -> 'public' around the line 388 
(before 	'float m_fErr; // error for the fragment ions' so that the variable M_fErr would be public)


5.
compile the project.


PTMTreeSearch is implemented as a pluggable scoring function. In order to use the PTMTreeSearch

include the following line to the input.xml file.
<note label="scoring, algorithm" type="input">ptmtreesearch-score</note>


IV.  INCLUDE PTMTreeSearch TO X!!TANDEM (parallel version with MPI)
==========================================================
1.
download X!!Tandem source codes from:
ftp://maguro.cs.yale.edu/Projects/Tandem
you will need installed MPI and boosts packages as well

2.
copy PTMTreeSearch files into X!Tandem source codes.


3.
Same as section III/4

4.
compile the project.


PTMTreeSearch is implemented as a pluggable scoring function. In order to use the PTMTreeSearch
include the following line to the input.xml file.
<note label="scoring, algorithm" type="input">PTMTreeSearch-score</note>


V.   READY TO USE SOURCE CODES OF X!TANDEM (and X!!TANDEM) with PTMSEACH
========================================================================
the XTandem_PTMTreeSearch_src.zip contains the PTMTreeSearch source codes that are already incorporated to the X!!Tandem
After unzipping it is ready for 'make'.


NOTE:
1. PTMTreeSearch has been tested and compiled under linux operating system.
2. You might need to download protein sequence fasta files. 
3. X!Tandem prints the identified protein id's to the standard output. This can be avoid by commenting out the 283 and 284 lines in the mreport.cpp file.