Hello,
I am building a system using PMI and IMP.pmi.topology.TopologyReader. I have proteins, DNA and RNA. In my PDBs, the DNA is encoded as DA, DC,DT,DG and the RNA as A,C,U,G. In topology, I specify the objects as DNA and RNA:
|DNA1N |dark magenta|myfasta.fasta|DNAN,DNA
|longRNAR1 |magenta |myfasta.fasta|longRNAR,RNA|
If I have A,C,G,U and T in my fasta file I get the
following warnings and errors for DNA.WARNING:
Inconsistency between FASTA sequence and PDB sequence. FASTA type 1 "G" and PDB type "DG"
WARNING: Inconsistency between FASTA sequence and PDB
sequence. FASTA type 2 "G" and PDB type "DG"
WARNING: Inconsistency between FASTA sequence and PDB
sequence. FASTA type 3 "G" and PDB type "DG"The
script then continues and modeling proceeds. However, when I try to implement the "deposition" part covered in
https://integrativemodeling.org/tutorials/deposition/
, there are several errors due to not recognizing residue types, since the alphabet is defined as the peptide alphabet in the ihm module by default. How should the DNA be encoded in the fasta file?
Additionally, we have two copies of one protein. This causes the second copy to not be created as an asymmetric unit in create_component in the protocol output of mmcif.py. We could fix this by changing the function as followed: