Name | Last modified | Size | Description | |
---|---|---|---|---|
Parent Directory | - | |||
Homo_sapiens_2020_dssp.tar.xz | 2020-08-19 01:07 | 2.1G | ||
Homo_sapiens_2020.tar | 2021-10-24 09:46 | 7.5G | ||
Homo_sapiens_2020.summary.txt | 2020-08-18 04:44 | 79M | ||
Homo_sapiens.GRCh38.pep.all.mapping | 2020-08-18 10:41 | 34M | ||
Homo_sapiens.GRCh38.pep.all.fa.gz | 2020-08-18 10:26 | 14M | ||
Homo_sapiens.GRCh38.pep.abinitio.mapping | 2020-08-18 10:41 | 8.6M | ||
Homo_sapiens.GRCh38.pep.abinitio.fa.gz | 2020-08-18 10:26 | 13M | ||
GCF_000001405.39_GRCh38.p13_protein.mapping | 2020-08-18 10:41 | 13M | ||
GCF_000001405.39_GRCh38.p13_protein.faa.gz | 2020-08-18 10:26 | 25M | ||
The sequences in the Homo_sapiens_2020 dataset come from both RefSeq and Ensembl.
All Models and Alignments: Homo_sapiens_2020.tar
Summary table: Homo_sapiens_2020.summary.txt
DSSP results for all models: Homo_sapiens_2020_dssp.tar.xz
The summary table lists all models by ModBase seq_id (unique for a given sequence), ModBase model_id (unique for a given model) and database_id (the primary identifier from RefSeq or Ensembl for the sequence for which the model was built, with a suffix since multiple models may exist for a single sequence).
Multiple database IDs may exist for a given model_id or seq_id (for example, the sequence may be in both RefSeq and Ensembl). Only one of the IDs is listed in the summary table. To see the other database IDs, refer to the mapping files:
Each line in a mapping file gives a ModBase seq_id followed by the FASTA header from the original Ensemble or RefSeq file, which contains a database ID.
Models are provided in PDB format. If you prefer mmCIF files, you can use the modbase_pdb_to_cif.py script from the modbase_utils GitHub repository to generate them. ModBase mmCIF files are compliant with the PDBx and ModelCIF mmCIF dictionaries.