Conformational ensembles of the human IDRome

  • Giulio Tesei (Creator)
  • Anna Ida Trolle (Creator)
  • Nicolas Jonsson (Creator)
  • Johannes Betz (Creator)
  • Frederik E. Knudsen (Creator)
  • Francesco Pesce (Creator)
  • Kristoffer E. Johansson (Creator)
  • Kresten Lindorff-Larsen (Creator)

Dataset

Description

This repository contains Python code, Jupyter Notebooks, and data for reproducing the results presented in the manuscript Conformational ensembles of the human intrinsically disordered proteome (DOI 10.1038/s41586-023-07004-5). conformational_ensembles.zip contains simulation trajectories and time series of conformational properties for all the 28,058 IDRs in the pLDDT-based set (also available at sid.erda.dk/sharelink/AVZAJvJnCO). _2023_Tesei_IDRome-5.zip is a copy of github.com/KULL-Centre/_2023_Tesei_IDRome/tree/v5, which includes the following files and folders: CSV file IDRome_DB.csv lists amino acid sequences, sequence features, and conformational properties of all the 28,058 IDRs in the pLDDT-based set IDRLab.ipynb: Notebooks on Google Colab to generate conformational ensembles of user-supplied sequences using the CALVADOS model IDR_SVR_predictor.ipynb: Notebooks on Google Colab to predict scaling exponents and conformational entropies per residue using the SVR models CSV file IDRome_DB_SPOT.csv lists amino acid sequences, sequence features, and conformational properties of all the 28,058 IDRs in the SPOT-based set seq_conf_prop.ipynb reproduces Fig. 1, 3, and Extended Data Fig. 2, 5, 6e-t, and 7 go_analysis.ipynb reproduces Fig. 2 conservation_analysis.ipynb reproduces Fig. 4 clinvar_fmug.ipynb reproduces Fig. 5 and Extended Data Fig. 9 uniprot_domains.ipynb reproduces Extended Data Fig. 1 svr_models.ipynb reproduces Extended Data Fig. 8 go_uniprot_calls.ipynb performs API calls to obtain gene ontology terms from UniProt calc_seq_prop.ipynb and calc_seq_prop_SPOT.ipynb compute sequence descriptors and generate the IDRome_DB.csv and IDRome_DB_SPOT.csv files CALVADOS_tests.ipynb reproduces Extended Data Fig. 3 AF2_PAEs.ipynb reproduces Extended Data Fig. 4 CD-CODE.ipynb reproduces Extended Data Fig. 6a-d md_simulations/ contains code and data related to single-chain simulations performed using the CALVADOS model and HOOMD-blue v2.9.3 installed with mphowardlab/azplugins (see github.com/KULL-Centre/_2023_Tesei_IDRome/README.md for installation instructions) idr_selection/ contains code and data to generate the pLDDT-based and SPOT-based sets of IDRs idr_orthologs/ contains code and data to generate the set of orthologs of human IDRs svr_models/ contains scikit-learn SVR models generated in svr_models.ipynb zscores/ contains code and data to calculate NARDINI z-scores go_analyses/ contains input and output data related to the Gene Ontology analyses in go_analysis.ipynb QCDPred/ contains code and data related to QCD calculations clinvar_fmug_cdcode/ contains code and data related to the analysis of ClinVar, FMUG, and CD-CODE databases
Date made available2023
PublisherZenodo

Cite this