Hello,

I am building a system using PMI and IMP.pmi.topology.TopologyReader. I have proteins, DNA and RNA. In my PDBs, the DNA is encoded as DA, DC,DT,DG and the RNA as A,C,U,G. In topology, I specify the objects as DNA and RNA:

|DNA1N |dark magenta|myfasta.fasta|DNAN,DNA

If I have A,C,G,U and T in my fasta file I get the following warnings and errors for DNA.WARNING: Inconsistency between FASTA sequence and PDB sequence. FASTA type 1 "G" and PDB type "DG"
WARNING: Inconsistency between FASTA sequence and PDB sequence. FASTA type 2 "G" and PDB type "DG"
WARNING: Inconsistency between FASTA sequence and PDB sequence. FASTA type 3 "G" and PDB type "DG"The script then continues and modeling proceeds. However, when I try to implement the "deposition" part covered in https://integrativemodeling.org/tutorials/deposition/ , there are several errors due to not recognizing residue types, since the alphabet is defined as the peptide alphabet in the ihm module by default. How should the DNA be encoded in the fasta file?

Additionally, we have two copies of one protein. This causes the second copy to not be created as an asymmetric unit in create_component in the protocol output of mmcif.py. We could fix this by changing the function as followed:

def create_component(self, state, name, modeled, asym_name=None):

if asym_name is None:

asym_name = name

new_comp = name not in self._all_components

self._all_components[name] = None

if modeled:

state.all_modeled_components.append(name)

self.asym_units[asym_name] = None # this was originally in the if statement below

if new_comp:

# assign asym once we get sequence

self.all_modeled_components.append(name)

Thanks,

Swantje

Swantje Lenz
M. Sc. Biotechnology

Fachgebiet Bioanalytik (TIB 4/4-3), Institut für Biotechnologie, Technische Universität Berlin
Technische Universität Berlin | Gustav-Meyer-Allee 25 | Gebäude 17a | 13355 Berlin
Tel +49 30 314-72906 | web: http://www.bioanalytik.tu-berlin.de/