1 Stage 1 - Gathering of data {#rnapolii_1}
2 ===========================
4 In
this stage, we find all available experimental data that we wish to utilize in structural modeling. In theory, any method that provides information about absolute or relative structural information can be used.
6 ## Data
for yeast RNA Polymerase II
7 The `rnapolii/data` folder in the tutorial input files contains the data included in
this example:
9 * Sequence information (FASTA files
for each subunit)
10 * [Electron density maps](http:
11 * [High resolution structure from x-ray crystallography](http:
12 * Chemical crosslinking datasets (we use two data sets, one from [Al Burlingame
's lab](http://www.mcponline.org/content/13/2/420.long), and another from [Juri Rappsilber's lab](http:
16 Each residue included in modeling must be explicitly defined in the FASTA text file. Each individual component (i.e., a protein chain) is identified by a
string in the FASTA header line. From `1WCM.fasta.txt`:
19 MVGQQYSSAPLRTVKEVQFGLFSPEEVRAISVAKIRFPETMDETQTRAKIGGLNDPRLGSIDRNLKCQTCQEGMNECPGH
20 FGHIDLAKPVFHVGFIAKIKKVCECVCMHCGKLLLDEHNELMRQALAIKDSKKRFAAIWTLCKTKMVCETDVPSEDDPTQ
24 MSDLANSEKYYDEDPYGFEDESAPITAEDSWAVISAFFREKGLVSQQLDSFNQFVDYTLQDIICEDSTLILEQLAQHTTE
25 SDNISRKYEISFGKIYVTKPMVNESDGVTHALYPQEARLRNLTYSSGLFVDVKKRTYEAIDVPGRELKYELIAEESEDDS
28 defines two chains with unique IDs of 1WCM:A and 1WCM:B respectively. The entire complex is 12 chains and 4582 residues.
30 **Electron Density Map**
31 The electron density map of the entire RNA Poly II complex is at 20.9 Angstrom resolution. The raw data file
for this is stored in `emd_1883.map.mrc`.
32 <img src=
"rnapolii_em_raw.png" width=
"300px" />
33 _Electron microscopy density map
for yeast RNA Polymerase II_
35 **Electron Density as Gaussian Mixture
Models**
36 Gaussian mixture models (GMMs) are used to greatly speed up scoring by approximating the electron density of individual subunits and experimental EM maps. A GMM has been created
for the experimental density map, and is stored in `emd_1883.map.mrc.gmm.50.mrc`. The weight, center, and covariance matrix of each Gaussian used to approximate the original EM density can be seen in the corresponding `.txt` file.
37 <img src=
"rnapolii_em_gmm_50.png" width=
"250px" />
38 _The EM data represented as a 50 Gaussian mixture model_
42 High resolution coordinates
for all 12 chains of RNA Pol II are found in `1WCM.pdb`.
43 <img src=
"rnapolii_all_1wc4.png" width=
"300px" />
44 _Coordinates from PDBID [1WCM](http:
46 **Chemical Cross-Links**
47 All chemical cross-linking data is located in `polii_xlinks.csv` and `polii_juri.csv`. These files contain multiple comma-separated columns; four of these specify the protein and residue number
for each of the two linker residues.
54 The length of the DSS/BS3 cross-linker reagent, 21 angstroms, will be specified later in the modeling script.
58 With data gathered, we can now proceed to \ref rnapolii_2.