Beskrivelse
This repository contains Python code, Jupyter Notebooks, and data for reproducing the results presented in the manuscript Conformational ensembles of the human intrinsically disordered proteome (DOI 10.1038/s41586-023-07004-5). conformational_ensembles.zip contains simulation trajectories and time series of conformational properties for all the 28,058 IDRs in the pLDDT-based set (also available at sid.erda.dk/sharelink/AVZAJvJnCO). _2023_Tesei_IDRome-5.zip is a copy of github.com/KULL-Centre/_2023_Tesei_IDRome/tree/v5, which includes the following files and folders: CSV file IDRome_DB.csv lists amino acid sequences, sequence features, and conformational properties of all the 28,058 IDRs in the pLDDT-based set IDRLab.ipynb: Notebooks on Google Colab to generate conformational ensembles of user-supplied sequences using the CALVADOS model IDR_SVR_predictor.ipynb: Notebooks on Google Colab to predict scaling exponents and conformational entropies per residue using the SVR models CSV file IDRome_DB_SPOT.csv lists amino acid sequences, sequence features, and conformational properties of all the 28,058 IDRs in the SPOT-based set seq_conf_prop.ipynb reproduces Fig. 1, 3, and Extended Data Fig. 2, 5, 6e-t, and 7 go_analysis.ipynb reproduces Fig. 2 conservation_analysis.ipynb reproduces Fig. 4 clinvar_fmug.ipynb reproduces Fig. 5 and Extended Data Fig. 9 uniprot_domains.ipynb reproduces Extended Data Fig. 1 svr_models.ipynb reproduces Extended Data Fig. 8 go_uniprot_calls.ipynb performs API calls to obtain gene ontology terms from UniProt calc_seq_prop.ipynb and calc_seq_prop_SPOT.ipynb compute sequence descriptors and generate the IDRome_DB.csv and IDRome_DB_SPOT.csv files CALVADOS_tests.ipynb reproduces Extended Data Fig. 3 AF2_PAEs.ipynb reproduces Extended Data Fig. 4 CD-CODE.ipynb reproduces Extended Data Fig. 6a-d md_simulations/ contains code and data related to single-chain simulations performed using the CALVADOS model and HOOMD-blue v2.9.3 installed with mphowardlab/azplugins (see github.com/KULL-Centre/_2023_Tesei_IDRome/README.md for installation instructions) idr_selection/ contains code and data to generate the pLDDT-based and SPOT-based sets of IDRs idr_orthologs/ contains code and data to generate the set of orthologs of human IDRs svr_models/ contains scikit-learn SVR models generated in svr_models.ipynb zscores/ contains code and data to calculate NARDINI z-scores go_analyses/ contains input and output data related to the Gene Ontology analyses in go_analysis.ipynb QCDPred/ contains code and data related to QCD calculations clinvar_fmug_cdcode/ contains code and data related to the analysis of ClinVar, FMUG, and CD-CODE databases
Dato for tilgængelighed | 2023 |
---|---|
Forlag | Zenodo |