Estimating heritability and genetic correlations from large health datasets in the absence of genetic data

Gengjie Jia; Yu Li; Hanxin Zhang; Ishanu Chattopadhyay; Anders Boeck Jensen; David R. Blair; Lea Davis; Peter N. Robinson; Torsten Dahlen; Søren Brunak; Mikael Benson; Gustaf Edgren; Nancy J. Cox; Xin Gao; Andrey Rzhetsky

doi:10.1038/s41467-019-13455-0

Estimating heritability and genetic correlations from large health datasets in the absence of genetic data

Gengjie Jia, Yu Li, Hanxin Zhang, Ishanu Chattopadhyay, Anders Boeck Jensen, David R. Blair, Lea Davis, Peter N. Robinson, Torsten Dahlen, Søren Brunak, Mikael Benson, Gustaf Edgren, Nancy J. Cox, Xin Gao, Andrey Rzhetsky

Disease Systems Biology Program

4 Citations (Scopus)

Abstract

Typically, estimating genetic parameters, such as disease heritability and between-disease genetic correlations, demands large datasets containing all relevant phenotypic measures and detailed knowledge of family relationships or, alternatively, genotypic and phenotypic data for numerous unrelated individuals. Here, we suggest an alternative, efficient estimation approach through the construction of two disease metrics from large health datasets: temporal disease prevalence curves and low-dimensional disease embeddings. We present eleven thousand heritability estimates corresponding to five study types: twins, traditional family studies, health records-based family studies, single nucleotide polymorphisms, and polygenic risk scores. We also compute over six hundred thousand estimates of genetic, environmental and phenotypic correlations. Furthermore, we find that: (1) disease curve shapes cluster into five general patterns; (2) early-onset diseases tend to have lower prevalence than late-onset diseases (Spearman’s ρ = 0.32, p < 10^–16); and (3) the disease onset age and heritability are negatively correlated (ρ = −0.46, p < 10^–16).

Original language	English
Article number	5508
Journal	Nature Communications
Volume	10
Number of pages	11
ISSN	2041-1723
DOIs	https://doi.org/10.1038/s41467-019-13455-0
Publication status	Published - 1 Dec 2019

Access to Document

10.1038/s41467-019-13455-0Licence: CC BY

Estimating heritability and genetic correlations from large health datasets in the absence of genetic dataFinal published version, 3.22 MBLicence: CC BY

Cite this

Jia, G., Li, Y., Zhang, H., Chattopadhyay, I., Jensen, A. B., Blair, D. R., Davis, L., Robinson, P. N., Dahlen, T., Brunak, S., Benson, M., Edgren, G., Cox, N. J., Gao, X., & Rzhetsky, A. (2019). Estimating heritability and genetic correlations from large health datasets in the absence of genetic data. Nature Communications, 10, Article 5508. https://doi.org/10.1038/s41467-019-13455-0

@article{08db7d6b60d7471b966cec46cd0eaf6e,

title = "Estimating heritability and genetic correlations from large health datasets in the absence of genetic data",

abstract = "Typically, estimating genetic parameters, such as disease heritability and between-disease genetic correlations, demands large datasets containing all relevant phenotypic measures and detailed knowledge of family relationships or, alternatively, genotypic and phenotypic data for numerous unrelated individuals. Here, we suggest an alternative, efficient estimation approach through the construction of two disease metrics from large health datasets: temporal disease prevalence curves and low-dimensional disease embeddings. We present eleven thousand heritability estimates corresponding to five study types: twins, traditional family studies, health records-based family studies, single nucleotide polymorphisms, and polygenic risk scores. We also compute over six hundred thousand estimates of genetic, environmental and phenotypic correlations. Furthermore, we find that: (1) disease curve shapes cluster into five general patterns; (2) early-onset diseases tend to have lower prevalence than late-onset diseases (Spearman{\textquoteright}s ρ = 0.32, p < 10–16); and (3) the disease onset age and heritability are negatively correlated (ρ = −0.46, p < 10–16).",

author = "Gengjie Jia and Yu Li and Hanxin Zhang and Ishanu Chattopadhyay and Jensen, {Anders Boeck} and Blair, {David R.} and Lea Davis and Robinson, {Peter N.} and Torsten Dahlen and S{\o}ren Brunak and Mikael Benson and Gustaf Edgren and Cox, {Nancy J.} and Xin Gao and Andrey Rzhetsky",

year = "2019",

month = dec,

day = "1",

doi = "10.1038/s41467-019-13455-0",

language = "English",

volume = "10",

journal = "Nature Communications",

issn = "2041-1723",

publisher = "nature publishing group",

}

TY - JOUR

T1 - Estimating heritability and genetic correlations from large health datasets in the absence of genetic data

AU - Jia, Gengjie

AU - Li, Yu

AU - Zhang, Hanxin

AU - Chattopadhyay, Ishanu

AU - Jensen, Anders Boeck

AU - Blair, David R.

AU - Davis, Lea

AU - Robinson, Peter N.

AU - Dahlen, Torsten

AU - Brunak, Søren

AU - Benson, Mikael

AU - Edgren, Gustaf

AU - Cox, Nancy J.

AU - Gao, Xin

AU - Rzhetsky, Andrey

PY - 2019/12/1

Y1 - 2019/12/1

N2 - Typically, estimating genetic parameters, such as disease heritability and between-disease genetic correlations, demands large datasets containing all relevant phenotypic measures and detailed knowledge of family relationships or, alternatively, genotypic and phenotypic data for numerous unrelated individuals. Here, we suggest an alternative, efficient estimation approach through the construction of two disease metrics from large health datasets: temporal disease prevalence curves and low-dimensional disease embeddings. We present eleven thousand heritability estimates corresponding to five study types: twins, traditional family studies, health records-based family studies, single nucleotide polymorphisms, and polygenic risk scores. We also compute over six hundred thousand estimates of genetic, environmental and phenotypic correlations. Furthermore, we find that: (1) disease curve shapes cluster into five general patterns; (2) early-onset diseases tend to have lower prevalence than late-onset diseases (Spearman’s ρ = 0.32, p < 10–16); and (3) the disease onset age and heritability are negatively correlated (ρ = −0.46, p < 10–16).

AB - Typically, estimating genetic parameters, such as disease heritability and between-disease genetic correlations, demands large datasets containing all relevant phenotypic measures and detailed knowledge of family relationships or, alternatively, genotypic and phenotypic data for numerous unrelated individuals. Here, we suggest an alternative, efficient estimation approach through the construction of two disease metrics from large health datasets: temporal disease prevalence curves and low-dimensional disease embeddings. We present eleven thousand heritability estimates corresponding to five study types: twins, traditional family studies, health records-based family studies, single nucleotide polymorphisms, and polygenic risk scores. We also compute over six hundred thousand estimates of genetic, environmental and phenotypic correlations. Furthermore, we find that: (1) disease curve shapes cluster into five general patterns; (2) early-onset diseases tend to have lower prevalence than late-onset diseases (Spearman’s ρ = 0.32, p < 10–16); and (3) the disease onset age and heritability are negatively correlated (ρ = −0.46, p < 10–16).

U2 - 10.1038/s41467-019-13455-0

DO - 10.1038/s41467-019-13455-0

M3 - Journal article

C2 - 31796735

SN - 2041-1723

VL - 10

JO - Nature Communications

JF - Nature Communications

M1 - 5508

ER -

Estimating heritability and genetic correlations from large health datasets in the absence of genetic data

Abstract

Access to Document

Fingerprint

Cite this