Index of /ftp/databases/modbase/projects/genomes/H_sapiens/2020

Name	Last modified	Size

Parent Directory		-
GCF_000001405.39_GRCh38.p13_protein.faa.gz	2020-08-18 10:26	25M
GCF_000001405.39_GRCh38.p13_protein.mapping	2020-08-18 10:41	13M
Homo_sapiens.GRCh38.pep.abinitio.fa.gz	2020-08-18 10:26	13M
Homo_sapiens.GRCh38.pep.abinitio.mapping	2020-08-18 10:41	8.6M
Homo_sapiens.GRCh38.pep.all.fa.gz	2020-08-18 10:26	14M
Homo_sapiens.GRCh38.pep.all.mapping	2020-08-18 10:41	34M
Homo_sapiens_2020.summary.txt	2020-08-18 04:44	79M
Homo_sapiens_2020.tar	2021-10-24 09:46	7.5G
Homo_sapiens_2020_dssp.tar.xz	2020-08-19 01:07	2.1G

The sequences in the Homo_sapiens_2020 dataset come from both RefSeq and Ensembl.

All Models and Alignments: Homo_sapiens_2020.tar

Summary table: Homo_sapiens_2020.summary.txt

DSSP results for all models: Homo_sapiens_2020_dssp.tar.xz

The summary table lists all models by ModBase seq_id (unique for a given sequence), ModBase model_id (unique for a given model) and database_id (the primary identifier from RefSeq or Ensembl for the sequence for which the model was built, with a suffix since multiple models may exist for a single sequence).

Multiple database IDs may exist for a given model_id or seq_id (for example, the sequence may be in both RefSeq and Ensembl). Only one of the IDs is listed in the summary table. To see the other database IDs, refer to the mapping files:

Ensembl:
- Pep dataset: Homo_sapiens.GRCh38.pep.all.mapping
- Ab-initio dataset: Homo_sapiens.GRCh38.pep.abinitio.mapping
RefSeq:
- Sequences from GCF_000001405.39_GRCh38.p13_protein.faa.gz: GCF_000001405.39_GRCh38.p13_protein.mapping

Each line in a mapping file gives a ModBase seq_id followed by the FASTA header from the original Ensemble or RefSeq file, which contains a database ID.

Models are provided in PDB format. If you prefer mmCIF files, you can use the modbase_pdb_to_cif.py script from the modbase_utils GitHub repository to generate them. ModBase mmCIF files are compliant with the PDBx and ModelCIF mmCIF dictionaries.