If the format is PIR or FASTA:
For the PIR and FASTA formats, make sure the order of sequences in the chains_list and seq_database_file is the same for faster access (there can of course be more sequences in the sequence file than there are sequence codes in the codes file).
Additionally, if the sequences are in 'PIR' format, then the protein type and resolution fields are stored in the database format. (see Section 4.9.1 for description of 'PIR' fields).
The protein type field is encoded in a single letter format. 'S' for sequence and 'X' for structures of any kind. This information is transferred to the profile arrays when using profile.build(). (See also profile.read()).
The resolution field is used to pick representatives from the clusters in sequence_db.filter().
None of the options above apply to the BINARY format, which, in return, is very fast (i.e., 3 seconds for 300 MB of 800,000 sequences in the TrEMBL database).
Example: See profile.build() command.