File naming

Next: File types Up: Miscellaneous rules and features Previous: Controlling breakpoints and the Contents Index

Subsections

File naming

There are several filename generating mechanisms that facilitate file handling. Not all of them apply to all file types.

Environment variables

There can be UNIX shell environment variables in any input or output filename. The environment variables have to be in the format ${VARNAME} or $(VARNAME). Also, two predefined macros are available for string variables:

${LIB} is expanded into the $LIB_MODELLER variable defined in modlib/libs.lib (equal to $MODINSTALL9v13/modlib);
${JOB} is expanded into the root of the script filename, or '(stdin)' if instructions are being read from standard input;

Reading or writing files

Any input file for MODELLER (alignments, PDB files, etc) can be compressed. If the name of an input file ends with a '.Z', '.gz', '.bz2', or '.7z' extension, or the specified input file cannot be found but a compressed version (with extension) does, then MODELLER automatically uncompresses the file before reading it. (Note that it uses the gzip, bzip2 and 7za programs to do this, so they must be installed on your system in order for this to work. Also, any '.7z' archives must contain only a single member, which is the file to be uncompressed, just as with '.gz' or '.bz2' files.) The uncompressed copy of the file is created in the system temporary directory (deduced by checking the 'MODELLER_TMPDIR', 'TMPDIR', 'TMP' and 'TEMP' environment variables in that order, falling back to /tmp on Unix and C:\ on Windows), or the current working directory if the temporary directory is read-only.

Any files written out by MODELLER can also be compressed. If the output file name ends in '.gz' or '.bz2' extension, a temporary uncompressed copy is created in the same way as above, and when the file is closed, the file is compressed with gzip or bzip2 and placed in the final location. (Writing out files in '.Z' or '.7z' format is not currently supported.)

Many MODELLER functions that take file names can also be given file handles; these can either be modfile.File() objects or Python filelike objects such as sys.stdout.

Coordinate files and derivative data

When accessing an atom file, if MODELLER cannot find the specified filename or a compressed version of it (see above) it tries adding the extensions '.atm', '.pdb', '.ent', and '.crd' in this order, then also with the 'pdb' prefix. If the filename is not an absolute path (i.e., it does not start with '/') then this search is then repeated through all the directories in io_data.atom_files_directory. PDB-style subdirectories (the first two characters after the digit in the PDB code) are also searched for each directory e.g., 1abc is searched for in the 'ab' subdirectory, pdb4xyz.ent in the 'xy' subdirectory.

Any derivative data that MODELLER may need, including residue solvent accessibilities, hydrogen bonding information, dihedral angles, residue neighbors, etc., are calculated on demand from the atomic coordinates. The most time consuming operation is calculating solvent accessibility, but even this calculation takes less than 1 sec for a 200 residue protein on a Pentium III workstation.

MODELLER stores the filenames of coordinate sets in the alignment arrays. These arrays are used by alignment.compare_structures(), Restraints.make(), alignment.malign3d(), alignment.align2d(), and several other commands. If these filenames do not change when the structures are needed for the second time, the coordinate files are not re-read because they should already be in memory. This creates a problem only when the contents of a structure file changes since it was last read during the current job.

Unicode

MODELLER supports Unicode for file naming, so files named using non-English characters can be accessed. If you wish to access such a file, specify the file name in your data file (e.g. alignment file) or Python 2 script in UTF-8 encoding. MODELLER will raise a UnicodeError if your filenames are not valid UTF-8. (If using Python 3, you need do nothing special, since it already understands Unicode.) Since UTF-8 is a superset of ASCII, if you are using only English characters you need do nothing special. ^5.2

MODELLER input files are assumed to be UTF-8 encoded. However, most of the data MODELLER handles is not Unicode-enabled (for example, PDB files and one letter residue types have to be ASCII, not Unicode), so you should not use non-English characters, except in filenames.

Next: File types Up: Miscellaneous rules and features Previous: Controlling breakpoints and the Contents Index

Automatic builds 2014-02-11