next up previous contents index
Next: File types Up: Miscellaneous rules and features Previous: Controlling breakpoints and the   Contents   Index

Subsections


File naming

There are several filename generating mechanisms that facilitate file handling. Not all of them apply to all file types.

Environment variables

There can be UNIX shell environment variables in any input or output filename. The environment variables have to be in the format ${VARNAME} or $(VARNAME). Also, four predefined macros are available for string variables:

Automatic filename generation

For any filename, input or output, if the value of the variable is 'default' (case insensitive), the actual filename is constructed within the routine that will use the filename. The name is constructed by the same rule as that for the ${DEFAULT} environment variable (Section 2.1.4). The only difference between the two cases is that SET FILE = 'default' may not work as expected if the TOP variables defining the filename change between the SET command and the command that will use the filename, whereas SET FILE = '${DEFAULT}' will work as expected because the filename FILE is actually constructed during the SET command.2.2

Directory prefixes

Input

For many input filenames, the full filename is obtained by looking for the file in the list of directories specified in the TOP variable DIRECTORY. The directories in DIRECTORY are separated by colons (':') (e.g., `dir1:dir2:dir3:...'). DIRECTORY can also contain the current directory (` ' or `./').

The directory prefix for the input atom coordinate filenames is obtained in a similar way, except that ATOM_FILES_DIRECTORY is used instead of DIRECTORY. Moreover, there is an additional mechanism for reading an atom coordinate file that requires specifying the protein code only (see below in Section on coordinate files and derivative data).

The list of directories is not scanned for the input filenames that start with '/'.

In contrast, the INCLUDE_FILE file is looked for in the distribution's $BIN_MODELLER7v7 directory (equal to $MODINSTALL7v7/bin directory) in addition to the DIRECTORY directories. This allows for an easy inclusion of the predefined system '__*.top' files by the INCLUDE command.

Output

For all output filenames, except for those that start with '/', the full output filename is obtained by pre-fixing the filename with OUTPUT_DIRECTORY.


Coordinate files and derivative data

When accessing an atom file, a specified filename is tried first. If this is unsuccessful, MODELLER automatically expands the original filename by adding extension '.Z'. This allows it to detect atom files compressed with the UNIX compress command. If the compressed file exists, MODELLER automatically uncompresses it, reads it, and puts it back into the original state after the reading is finished. If the specified file is still not found, the extensions '.atm', '.pdb', '.ent', and '.crd' are tried in this order, without and with extension '.Z', then also with the 'pdb' prefix. This search for the atom file is repeated through all the directories in ATOM_FILES_DIRECTORY (directories are separated by ':'), unless input atom filename starts with '/', in which case ATOM_FILES_DIRECTORY is neglected. Finally, if still unsuccessful and the file specified by the environment variable $PDBENT exists, the coordinate filename (e.g., the 4 character PDB code) is matched to the list of the full PDB filenames in $PDBENT (compressed and uncompressed). For example, $PDBENT file may be:

/disk2/pdb/pdb.pdb.bnl.gov/all_entries/uncompressed_files/pdb1ema.ent
/disk2/pdb/pdb.pdb.bnl.gov/all_entries/uncompressed_files/pdb1hbp.ent
/disk2/pdb/pdb.pdb.bnl.gov/all_entries/uncompressed_files/pdb1gpy.ent
/disk2/pdb/pdb.pdb.bnl.gov/all_entries/uncompressed_files/pdb6gpb.ent
/disk2/pdb/pdb.pdb.bnl.gov/all_entries/uncompressed_files/pdb1fia.ent
etc.

Any derivative data that MODELLER may need, including residue solvent accessibilities, hydrogen bonding information, dihedral angles, residue neighbors, etc., are calculated on demand from the atomic coordinates. The most time consuming operation is calculating solvent accessibility, but even this calculation takes less than 1 sec for a 200 residue protein on a Pentium III workstation.

MODELLER stores the filenames of coordinate sets in the alignment arrays. These arrays are used by COMPARE, MAKE_RESTRAINTS, MALIGN3D, ALIGN2D, and several other commands. If these filenames do not change when the structures are needed for the second time, the coordinate files are not re-read because they should already be in memory. This creates a problem only when the contents of a structure file changes since it was last read during the current job.


next up previous contents index
Next: File types Up: Miscellaneous rules and features Previous: Controlling breakpoints and the   Contents   Index
Ben Webb 2004-10-04