Generating a model using multiple templates with different ligands
Hi,
I have been trying to generate a model of the AhR using three different templates with different ligands in each of them. When I submit the script for the model building, I get the error below:
"_modeller.ModellerError: read_te_290E> Number of residues in the alignment and pdb files are different: 110 109 For alignment entry: 1 3f1o_pasA"
I tried changing the number highlighted in the alignment file to 110, but the same error appeared. Was I supposed to change anything in the pdb file? Also, since I would like to consider the three ligands, is that correct to add three dots at the end of the alignment and pap file? The alignment, .pap, and script for the model building are below.
Model building script:
# Comparative modeling with ligand transfer from the template
from modeller import * # Load standard Modeller classes from modeller.automodel import * # Load the AutoModel class import sys
log.verbose() # request verbose output env = Environ() # create a new MODELLER environment to build this model in
# directories for input atom files env.io.atom_files_directory = ['.', '../atom_files']
# Read in HETATM records from template PDBs env.io.hetatm = True
a = AutoModel(env, alnfile='ahr-mult.ali', knowns=('3f1o_pasA','3h7w_pasA','3h82_pasA'), sequence='AhR', assess_methods=(assess.DOPE))
a.starting_model= 1 # index of the first model a.ending_model = 100 # index of the last model # (determines how many models to calculate) a.make() # do the actual comparative modeling
Alignment file:
>P1;3f1o_pasA structureX:3f1o_pas_fit.pdb:236:A:+109:A:MOL_ID 1; MOLECULE ENDOTHELIAL PAS DOMAIN-CONTAINING PROTEIN 1; CHAIN A; FRAGMENT HIF2 ALPHA C-TERMINAL PAS DOMAIN; SYNONYM EPAS-1, MEMBER OF PAS PROTEIN 2, BASIC-HELIX-LOOP- PROTEIN MOP2, HYPOXIA-INDUCIBLE FACTOR 2 ALPHA, HIF-2 ALPHA ALPHA, HIF-1 ALPHA-LIKE FACTOR, HLF; ENGINEERED YES; MUTATION YES; MOL_ID 2; MOLECULE ARYL HYDROCARBON RECEPTOR NUCLEAR TRANSLOCATOR; CHAIN B; FRAGMENT ARNT C-TERMINAL PAS DOMAIN; SYNONYM ARNT PROTEIN, DIOXIN RECEPTOR, NUCLEAR TRANSLOCATO HYPOXIA-INDUCIBLE FACTOR 1 BETA, HIF-1 BETA; ENGINEERED YES; MUTATION YES:MOL_ID 1; ORGANISM_SCIENTIFIC HOMO SAPIENS; ORGANISM_COMMON HUMAN; ORGANISM_TAXID 9606; GENE EPAS1, HIF2A, HYPOXIA INDUCIBLE FACTOR 2 ALPHA, MOP2; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21; EXPRESSION_SYSTEM_VECTOR_TYPE PHIS-GB1-PARALLEL; MOL_ID 2; ORGANISM_SCIENTIFIC HOMO SAPIENS; ORGANISM_COMMON HUMAN; ORGANISM_TAXID 9606; GENE ARNT, ARYL HYDROCARBON RECEPTOR NUCLEAR TRANSLOCATOR; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21; EXPRESSION_SYSTEM_VECTOR_TYPE PHIS-PARALLEL: 1.60: 0.17 -FKGLDSKTFLSEHSMDMKFTYCDDRITELIGYHPEELLGR-SAYEFYHALDSENMTKSHQNLCTKGQVVSGQYR MLAKHGGYVWLETQGTVIYN-----PQCIMCVNYVLSEIEK.*
>P1;3h7w_pasA structureX:3h7w_pas_fit.pdb:236:A:+108:A:MOL_ID 1; MOLECULE ENDOTHELIAL PAS DOMAIN-CONTAINING PROTEIN 1; CHAIN A; FRAGMENT HIF2ALPHA C-TERMINAL PAS DOMAIN (UNP RESIDUES 239 SYNONYM EPAS-1, MEMBER OF PAS PROTEIN 2, BASIC-HELIX-LOOP- PROTEIN MOP2, HYPOXIA-INDUCIBLE FACTOR 2 ALPHA, HIF-2 ALPHA ALPHA, HIF-1 ALPHA-LIKE FACTOR, HLF; ENGINEERED YES; MUTATION YES; MOL_ID 2; MOLECULE ARYL HYDROCARBON RECEPTOR NUCLEAR TRANSLOCATOR; CHAIN B; FRAGMENT ARNT C-TERMINAL PAS DOMAIN (UNP RESIDUES 356 TO 4 SYNONYM ARNT PROTEIN, CLASS E BASIC HELIX-LOOP-HELIX PROTE BHLHE2, DIOXIN RECEPTOR, NUCLEAR TRANSLOCATOR, HYPOXIA-INDU FACTOR 1 BETA, HIF-1 BETA; ENGINEERED YES; MUTATION YES:MOL_ID 1; ORGANISM_SCIENTIFIC HOMO SAPIENS; ORGANISM_COMMON HUMAN; ORGANISM_TAXID 9606; GENE EPAS1, HIF2A, HYPOXIA INDUCIBLE FACTOR 2 ALPHA, MOP2; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21(DE3); EXPRESSION_SYSTEM_VECTOR_TYPE PLASMID; EXPRESSION_SYSTEM_PLASMID PHIS-GB1-HIF2APAS-B; MOL_ID 2; ORGANISM_SCIENTIFIC HOMO SAPIENS; ORGANISM_COMMON HUMAN; ORGANISM_TAXID 9606; GENE ARNT, ARYL HYDROCARBON RECEPTOR NUCLEAR TRANSLOCATOR, EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21(DE3); EXPRESSION_SYSTEM_VECTOR_TYPE PLASMID; EXPRESSION_SYSTEM_PLASMID PHIS-GB1-ARNT-PAS-B: 1.65: 0.20 -FKGLDSKTFLSEHSMDMKFTYCDDRITELIGYHPEELLGR-SAYEFYHALDSENMTKSHQNLCTKGQVVSGQYR MLAKHGGYVWLETQGTVIY------PQCIMCVNYVLSEIEK.*
>P1;3h82_pasA structureX:3h82_pas_fit.pdb:-1:A:+115:A:MOL_ID 1; MOLECULE ARYL HYDROCARBON RECEPTOR NUCLEAR TRANSLOCATOR; CHAIN B; FRAGMENT ARNT C-TERMINAL PAS DOMAIN (UNP RESIDUES 356 TO 4 SYNONYM ARNT PROTEIN, CLASS E BASIC HELIX-LOOP-HELIX PROTE BHLHE2, DIOXIN RECEPTOR, NUCLEAR TRANSLOCATOR, HYPOXIA-INDU FACTOR 1 BETA, HIF-1 BETA; ENGINEERED YES; MUTATION YES; MOL_ID 2; MOLECULE ENDOTHELIAL PAS DOMAIN-CONTAINING PROTEIN 1; CHAIN A; FRAGMENT HIF2ALPHA C-TERMINAL PAS DOMAIN (UNP RESIDUES 239 SYNONYM EPAS-1, MEMBER OF PAS PROTEIN 2, BASIC-HELIX-LOOP- PROTEIN MOP2, HYPOXIA-INDUCIBLE FACTOR 2 ALPHA, HIF-2 ALPHA ALPHA, HIF-1 ALPHA-LIKE FACTOR, HLF; ENGINEERED YES; MUTATION YES:MOL_ID 1; ORGANISM_SCIENTIFIC HOMO SAPIENS; ORGANISM_COMMON HUMAN; ORGANISM_TAXID 9606; GENE ARNT, ARYL HYDROCARBON RECEPTOR NUCLEAR TRANSLOCATOR, EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21(DE3); EXPRESSION_SYSTEM_VECTOR_TYPE PLASMID; EXPRESSION_SYSTEM_PLASMID PHIS-GB1-ARNT-PAS-B; MOL_ID 2; ORGANISM_SCIENTIFIC HOMO SAPIENS; ORGANISM_COMMON HUMAN; ORGANISM_TAXID 9606; GENE EPAS1, HIF2A, HYPOXIA INDUCIBLE FACTOR 2 ALPHA, MOP2; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21(DE3); EXPRESSION_SYSTEM_VECTOR_TYPE PLASMID; EXPRESSION_SYSTEM_PLASMID PHIS-GB1-HIF2APAS-B: 1.50: 0.20 EFKGLDSKTFLSEHSMDMKFTYCDDRITELIGYHPEELLGR-SAYEFYHALDSENMTKSHQNLCTKGQVVSGQYR MLAKHGGYVWLETQGTVIYNPRNLQPQCIMCVNYVLSEIEK.*
>P1;AhR sequence:AhR:: :: :::-1.00:-1.00 EIR-TKNFIFRTKHKLDFTPIGCDAKGRIVLGYTEAELCTRGSGYQFIHAADMLYCAESHIRMIKTGESGMIVFR LLTKNNRWTWVQSNARLLYK--NGRPDYIIVTQRPLTDEEG...*
pap file: _aln.pos 10 20 30 40 50 60 3f1o_pasA -FKGLDSKTFLSEHSMDMKFTYCDDRITELIGYHPEELLGR-SAYEFYHALDSENMTKSHQNLCTKG 3h7w_pasA -FKGLDSKTFLSEHSMDMKFTYCDDRITELIGYHPEELLGR-SAYEFYHALDSENMTKSHQNLCTKG 3h82_pasA EFKGLDSKTFLSEHSMDMKFTYCDDRITELIGYHPEELLGR-SAYEFYHALDSENMTKSHQNLCTKG AhR EIR-TKNFIFRTKHKLDFTPIGCDAKGRIVLGYTEAELCTRGSGYQFIHAADMLYCAESHIRMIKTG _consrvd * * * ** ** ** * * * * ** * ** *
_aln.pos 70 80 90 100 110 3f1o_pasA QVVSGQYRMLAKHGGYVWLETQGTVIYN-----PQCIMCVNYVLSEIEK/.-- 3h7w_pasA QVVSGQYRMLAKHGGYVWLETQGTVIY------PQCIMCVNYVLSEIEK/.-- 3h82_pasA QVVSGQYRMLAKHGGYVWLETQGTVIYNPRNLQPQCIMCVNYVLSEIEK/.-- AhR ESGMIVFRLLTKNNRWTWVQSNARLLYK--NGRPDYIIVTQRPLTDEEG/... _consrvd * * * * * * * * *
Best Regards,
Amanda F. Ghilardi
On 9/20/22 12:54 PM, Franceschini Ghilardi, Amanda (BIDMC - Lijun Sun - General Surg SF) via modeller_usage wrote: > I have been trying to generate a model of the AhR using three different > templates with different ligands in each of them. When I submit the > script for the model building, I get the error below: > > "_modeller.ModellerError: read_te_290E> Number of residues in the > alignment and pdb files are different:110 109 For alignment entry:13f1o_pasA"
The sequence in your alignment file must match that in the PDB file exactly.
The "236:A:+109:A" syntax in your alignment file header instructs Modeller to read residues from the PDB file starting at residue 236 in chain A, and ending once it hits the end of the file or has read 109 residues, whichever happens first.
> I tried changing the number highlighted in the alignment file to 110, > but the same error appeared.
In that case you must have reached the end of the file. See also https://salilab.org/modeller/FAQ.html#17
> Also, since I would like to consider the three ligands, is > that correct to add three dots at the end of the alignment and pap file?
Yes, you would need three dots in your target sequence. Align each dot with a dot in the relevant template, and Modeller will copy the ligand from template to target.
> _aln.pos 70 80 90 100 110 > 3f1o_pasA QVVSGQYRMLAKHGGYVWLETQGTVIYN-----PQCIMCVNYVLSEIEK/.-- > 3h7w_pasA QVVSGQYRMLAKHGGYVWLETQGTVIY------PQCIMCVNYVLSEIEK/.-- > 3h82_pasA QVVSGQYRMLAKHGGYVWLETQGTVIYNPRNLQPQCIMCVNYVLSEIEK/.-- > AhR ESGMIVFRLLTKNNRWTWVQSNARLLYK--NGRPDYIIVTQRPLTDEEG/... > _consrvd * * * * * * * * *
This isn't right - you have asked Modeller to take the first ligand from all three templates (you will likely get a nonsensical ligand containing atoms from all three template ligands) and have not told it where to get the other two ligands from. You need to add gaps to your alignment so that the dots line up correctly, just as is done for amino acids. For example, if the first ligand should come from the first template, the second ligand from the second template, and the third ligand from the third template, your alignment would look like:
>P1;3f1o_pasA ... SEIEK/.--*
>P1;3h7w_pasA ... SEIEK/-.-*
>P1;3h82_pasA ... SEIEK/--.*
>P1;AhR ... TDEEG/...*
Ben Webb, Modeller Caretaker
participants (2)
-
Franceschini Ghilardi, Amanda (BIDMC - Lijun Sun - General Surg SF)
-
Modeller Caretaker