Localizing gdown1 on POL II using integrative structure modeling
These scripts demonstrate the use of IMP and PMI in the modeling of the Pol II(G) complex using DSS chemical crosslinks.
Input data (directory data
):
40 DSS crosslinks involving gdown1 were identified via mass spectrometry. 14 of these crosslinks were intramolecular and 26 were intermolecular with Pol II. The Pol II structure used was obtained from the PDB (code 5FLM); it was determined primarily based on a cryoEM density map at 3.4 Å resolution (EMDB code: 3218) Bernesky, 2016 Nature.
Representation of gdown1 relied on (i) secondary structure and disordered regions predicted by PSIPRED based on the gdown1 sequence (Buchan et al., 2013; Jones, 1999).
Running the simulations (directory production_scripts
)

sample.py
: PMI modeling scripts for running the production simulations: The search for goodscoring models relied on Gibbs sampling, based on the Metropolis Monte Carlo algorithm. We suggest producing at least 5 million models from 100 independent runs, each starting from a different initial conformation of gdown1 to have proper statistics. 
submit.sub
: SGE cluster based submission scripts to run automatically 100 independent runs. 
The compressed 100 independent trajectories are accessible at:
/salilab/park3/ganesans/gdown1_trajectories/gdown1.tar.gz
Analyzing the simulations (directory analysis_scripts
)
Various scripts to analysis the simulations. We give more details scripts that allows us to test for sampling convergence.

select_good_scoring_models.py
: Python script that reads and selects RMF files based on good scoring model criteria explained in the methods section. 
random_subsets_best_score_convergence.py
: we test if adding more models improves our sampling of top scores. The input to the script is a list of all theTotal_Score
from the simulations. For each subset, we perform 100 subsamplings to compute error bars. We give the file twice to check how the error bars varies. 
Kolmogorov_Smirnov_2Samples.py
: we test if the distribution from two independent subsets are not unsimilar. 
The goodscoring models that have been selected for precisionbased clustering based on RMSD metric are located in:
results/Models
. The file name format is${trajectory_number}_${frame_number}.rmf3

sample_precision.py
: Given a RMSD matrix (results/Clustering/rmsd_matrix.tar.gz
), we compute the 𝛘2test for homogeneity of proportions{McDonald, 2014} and compute the best precision for which sampling has converged. 
rmsf.py
: Given a list of structures, compute the average RMSF, which indicates the precision of the structures in a cluster.
Plotting the results (directory plotting_scripts
)
Python scripts for generating figures from the paper.
Information
Author(s): Sai J. Ganesan
Date: September 6th, 2018
License: CC BYSA 4.0 This work is licensed under the Creative Commons AttributionShareAlike 4.0 International License.
Testable: Yes
Parallelizeable: Yes
Publications: Miki Jishage, Xiaodi Yu, Yi Shi, Sai J. Ganesan, Andrej Sali, Brian T. Chait, Francisco Asturias, and Robert G. Roeder Architecture of Pol II(G) and molecular mechanism of transcription regulation by Gdown1 Nat Struct Mol Biol. (2018) 25(9):859867. doi: https://doi.org/10.1038/s4159401801185