Estimating heritability and genetic correlations from large health datasets in the absence of genetic data

Gengjie Jia; Yu Li; Hanxin Zhang; Ishanu Chattopadhyay; Anders Boeck Jensen; David R. Blair; Lea Davis; Peter N. Robinson; Torsten Dahlen; Søren Brunak; Mikael Benson; Gustaf Edgren; Nancy J. Cox; Xin Gao; Andrey Rzhetsky

doi:10.1038/s41467-019-13455-0

Estimating heritability and genetic correlations from large health datasets in the absence of genetic data

Gengjie Jia, Yu Li, Hanxin Zhang, Ishanu Chattopadhyay, Anders Boeck Jensen, David R. Blair, Lea Davis, Peter N. Robinson, Torsten Dahlen, Søren Brunak, Mikael Benson, Gustaf Edgren, Nancy J. Cox, Xin Gao, Andrey Rzhetsky

Disease Systems Biology Program

4 Citationer (Scopus)

Abstract

Typically, estimating genetic parameters, such as disease heritability and between-disease genetic correlations, demands large datasets containing all relevant phenotypic measures and detailed knowledge of family relationships or, alternatively, genotypic and phenotypic data for numerous unrelated individuals. Here, we suggest an alternative, efficient estimation approach through the construction of two disease metrics from large health datasets: temporal disease prevalence curves and low-dimensional disease embeddings. We present eleven thousand heritability estimates corresponding to five study types: twins, traditional family studies, health records-based family studies, single nucleotide polymorphisms, and polygenic risk scores. We also compute over six hundred thousand estimates of genetic, environmental and phenotypic correlations. Furthermore, we find that: (1) disease curve shapes cluster into five general patterns; (2) early-onset diseases tend to have lower prevalence than late-onset diseases (Spearman’s ρ = 0.32, p < 10^–16); and (3) the disease onset age and heritability are negatively correlated (ρ = −0.46, p < 10^–16).

Originalsprog	Engelsk
Artikelnummer	5508
Tidsskrift	Nature Communications
Vol/bind	10
Antal sider	11
ISSN	2041-1723
DOI	https://doi.org/10.1038/s41467-019-13455-0
Status	Udgivet - 1 dec. 2019

Adgang til dokumentet

10.1038/s41467-019-13455-0Licens: CC BY

Estimating heritability and genetic correlations from large health datasets in the absence of genetic dataForlagets udgivne version, 3,22 MBLicens: CC BY

Citationsformater

Jia, G., Li, Y., Zhang, H., Chattopadhyay, I., Jensen, A. B., Blair, D. R., Davis, L., Robinson, P. N., Dahlen, T., Brunak, S., Benson, M., Edgren, G., Cox, N. J., Gao, X., & Rzhetsky, A. (2019). Estimating heritability and genetic correlations from large health datasets in the absence of genetic data. Nature Communications, 10, Artikel 5508. https://doi.org/10.1038/s41467-019-13455-0

@article{08db7d6b60d7471b966cec46cd0eaf6e,

title = "Estimating heritability and genetic correlations from large health datasets in the absence of genetic data",

abstract = "Typically, estimating genetic parameters, such as disease heritability and between-disease genetic correlations, demands large datasets containing all relevant phenotypic measures and detailed knowledge of family relationships or, alternatively, genotypic and phenotypic data for numerous unrelated individuals. Here, we suggest an alternative, efficient estimation approach through the construction of two disease metrics from large health datasets: temporal disease prevalence curves and low-dimensional disease embeddings. We present eleven thousand heritability estimates corresponding to five study types: twins, traditional family studies, health records-based family studies, single nucleotide polymorphisms, and polygenic risk scores. We also compute over six hundred thousand estimates of genetic, environmental and phenotypic correlations. Furthermore, we find that: (1) disease curve shapes cluster into five general patterns; (2) early-onset diseases tend to have lower prevalence than late-onset diseases (Spearman{\textquoteright}s ρ = 0.32, p < 10–16); and (3) the disease onset age and heritability are negatively correlated (ρ = −0.46, p < 10–16).",

author = "Gengjie Jia and Yu Li and Hanxin Zhang and Ishanu Chattopadhyay and Jensen, {Anders Boeck} and Blair, {David R.} and Lea Davis and Robinson, {Peter N.} and Torsten Dahlen and S{\o}ren Brunak and Mikael Benson and Gustaf Edgren and Cox, {Nancy J.} and Xin Gao and Andrey Rzhetsky",

year = "2019",

month = dec,

day = "1",

doi = "10.1038/s41467-019-13455-0",

language = "English",

volume = "10",

journal = "Nature Communications",

issn = "2041-1723",

publisher = "nature publishing group",

}

TY - JOUR

T1 - Estimating heritability and genetic correlations from large health datasets in the absence of genetic data

AU - Jia, Gengjie

AU - Li, Yu

AU - Zhang, Hanxin

AU - Chattopadhyay, Ishanu

AU - Jensen, Anders Boeck

AU - Blair, David R.

AU - Davis, Lea

AU - Robinson, Peter N.

AU - Dahlen, Torsten

AU - Brunak, Søren

AU - Benson, Mikael

AU - Edgren, Gustaf

AU - Cox, Nancy J.

AU - Gao, Xin

AU - Rzhetsky, Andrey

PY - 2019/12/1

Y1 - 2019/12/1

N2 - Typically, estimating genetic parameters, such as disease heritability and between-disease genetic correlations, demands large datasets containing all relevant phenotypic measures and detailed knowledge of family relationships or, alternatively, genotypic and phenotypic data for numerous unrelated individuals. Here, we suggest an alternative, efficient estimation approach through the construction of two disease metrics from large health datasets: temporal disease prevalence curves and low-dimensional disease embeddings. We present eleven thousand heritability estimates corresponding to five study types: twins, traditional family studies, health records-based family studies, single nucleotide polymorphisms, and polygenic risk scores. We also compute over six hundred thousand estimates of genetic, environmental and phenotypic correlations. Furthermore, we find that: (1) disease curve shapes cluster into five general patterns; (2) early-onset diseases tend to have lower prevalence than late-onset diseases (Spearman’s ρ = 0.32, p < 10–16); and (3) the disease onset age and heritability are negatively correlated (ρ = −0.46, p < 10–16).

AB - Typically, estimating genetic parameters, such as disease heritability and between-disease genetic correlations, demands large datasets containing all relevant phenotypic measures and detailed knowledge of family relationships or, alternatively, genotypic and phenotypic data for numerous unrelated individuals. Here, we suggest an alternative, efficient estimation approach through the construction of two disease metrics from large health datasets: temporal disease prevalence curves and low-dimensional disease embeddings. We present eleven thousand heritability estimates corresponding to five study types: twins, traditional family studies, health records-based family studies, single nucleotide polymorphisms, and polygenic risk scores. We also compute over six hundred thousand estimates of genetic, environmental and phenotypic correlations. Furthermore, we find that: (1) disease curve shapes cluster into five general patterns; (2) early-onset diseases tend to have lower prevalence than late-onset diseases (Spearman’s ρ = 0.32, p < 10–16); and (3) the disease onset age and heritability are negatively correlated (ρ = −0.46, p < 10–16).

U2 - 10.1038/s41467-019-13455-0

DO - 10.1038/s41467-019-13455-0

M3 - Journal article

C2 - 31796735

SN - 2041-1723

VL - 10

JO - Nature Communications

JF - Nature Communications

M1 - 5508

ER -

Estimating heritability and genetic correlations from large health datasets in the absence of genetic data

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater