Fork me on GitHub
thumbnail

Mass Spec Benchmark

Benchmark for the sampling procedure used in the Mass Spec paper PubMed logo

tickVerified to work with the latest stable IMP release (2.9.0). The files are also available at GitHub.
Additional software needed to use these files: IMP install instructions

Anaconda logo To install the software needed to reproduce this system with the Anaconda Python command line tool (conda), run the following commands:

conda config --add channels salilab
conda install imp

UCSF logo To set up the environment on the UCSF QB3 cluster to run this system, run:

module load imp
Tags benchmark native mass spec

DOI

Benchmark for the sampling procedure used in the Mass Spec paper.

The method employs a custom MS connectivity restraint implemented in C++ within the open source IMP program. It is capable of sampling and evaluating candidate structures from information derived from connectivity information (e.g. native MS, cross-linking MS). The method exploits a Monte Carlo search followed by a conjugate gradients (CG) optimization step. From a large number of candidate model structures generated by sampling the conformational spaces of input structures (at the moment two input coordinate files are allowed), the algorithm gives a score (MS connectivity score) based on a Minimal Spanning Tree approach.

Steps

Each case in the benchmark is placed in its own directory (e.g. tAABC). The procedure is the same for each benchmark case; a number of Python scripts should be run in that directory. The assumption is made that IMP is already installed and is available in the Python path.

1. Generation and optimization of structural models

Models can be generated by running ms_cg.py.

This script first sets up the proteins in the system by calling create_protein. This takes two arguments, the name of the protein and the number of residues. For example, create_protein("ProteinA", 100) creates a protein called ProteinA containing 100 residues.

The connectivity restraint (IMP.core.MSConnectivityRestraint) is then set up. It is told how many copies of each protein are present by calling add_type. For example:

    iA = r.add_type([rps[0], rps[1], rps[2]]) # three copies of protein A
    iB = r.add_type([rps[3]]) # one copy of protein B
    iC = r.add_type([rps[4]]) # one copy of protein C

Next it is told the disassembly pathway of protein complexes by calling add_composite, for example:

    n1 = r.add_composite([iA, iB, iC, iD])
    n2 = r.add_composite([iA, iB, iC], n1)
    n3 = r.add_composite([iA, iB], n2)
    n4 = r.add_composite([iA, iC], n2)
    n5 = r.add_composite([iA, iB, iD], n1)
    n6 = r.add_composite([iB, iD], n5)

The above commands correspond to the disassembly pathway below: (todo)

The optimization can be tuned by editing the Python script. Common optimization parameters include the number of Monte Carlo steps, the number of conjugate gradient steps, and the number of models to generate. Generally speaking, more models should be generated for more topologically complex systems, or those containing many subunits.

2. Output and visualization

The ms_cg.py script generates a number of structural models created by putting together the input proteins and their number of copies. Each model is named output/model/configuration.<number>.pym, and can be visualized in the PyMol program. The scores for each structure calculated using the MS connectivity restraint are output as output/scores.txt.

The script also generates clusters as output/model/cluster.<number>.pym, employing a k-means clustering approach. The number of clusters and the number of iterations used to generate them can be changed by editing the ms_cg.py script.

3. Comparing the generated models with the known topologies

The algorithm can be benchmarked by measuring the closeness-of-fit between the structural models and known topologies. The first step is to convert the *.pym files into *.mfj. This can be done by first running the pym2tcl.py script and then the tcl2mfj.py script. These generate *.mfj files in the output/mfj directory.

Next, run the score_connect.py script which compares the topologies of structural models generated with the native-like topology usually obtained from the reference structure by scoring a) the interactions and b) the non-interactions. This means that if any interaction that exists in the reference structure is not found in the model, a penalty of 2.0 is added to this model. If, on the other hand, a non-existing interaction in the reference structure is found in the model, then the model receives a penalty of 1.0. The scores are summarized in the output file output/score-connect.txt.

Info

Author(s): Argyris Politis, email: argyris.politis@kcl.ac.uk

Version: 1.0

License: LGPL. This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Last known good IMP version: build info build info

Testable: Yes.

Parallelizeable: No

Publications: