A metric for cross-sample comparisons using logit and probit

Kristian Bernt Karlson

Abstract

The logit model is a widely used regression technique in social research. However, the use and interpretation of coefficients from logit models have proven contentious. Problems arise because the mean and the variance of discrete variables cannot be separated. Logit coefficients are identified relative to an arbitrary scale, which makes the coefficients difficult both to interpret and to compare across groups or samples. Do differences in coefficients reflect true differences or differences in scales? This cross-sample comparison problem raises concerns for comparative research. However, we suggest a new correlation metric, derived from logit models, which gives new interpretation to the estimates of logit models (log odds-ratios). The metric leads the way to a reorientation of the use of logit models, because it helps to clarify what logit coefficients are and how and when logit coefficients can (or cannot) be used in comparative research. The metric recovers the correlation between a predictor variable x and a continuous latent outcome variable y* assumed to underlie a binary observed outcome y. This metric is truly invariant to differences in the marginal distributions of x and y* across groups or samples, making it suitable for situations met in real applications in comparative research. Our derivations also extend to the probit and to ordered and multinomial models. The new metric is implemented in the Stata command nlcorr.

Original language	English
Publication date	1 Jul 2011
Publication status	Published - 1 Jul 2011
Externally published	Yes
Event	2011 German Stata Users Group meeting - Universität Bamberg, Bamberg, Germany Duration: 1 Jul 2011 → 1 Jul 2011

Conference

Conference	2011 German Stata Users Group meeting
Location	Universität Bamberg
Country/Territory	Germany
City	Bamberg
Period	01/07/2011 → 01/07/2011

Cite this

@conference{54b99a7a55ea4307b00b3b50e3ebd67a,

title = "A metric for cross-sample comparisons using logit and probit",

abstract = "The logit model is a widely used regression technique in social research. However, the use and interpretation of coefficients from logit models have proven contentious. Problems arise because the mean and the variance of discrete variables cannot be separated. Logit coefficients are identified relative to an arbitrary scale, which makes the coefficients difficult both to interpret and to compare across groups or samples. Do differences in coefficients reflect true differences or differences in scales? This cross-sample comparison problem raises concerns for comparative research. However, we suggest a new correlation metric, derived from logit models, which gives new interpretation to the estimates of logit models (log odds-ratios). The metric leads the way to a reorientation of the use of logit models, because it helps to clarify what logit coefficients are and how and when logit coefficients can (or cannot) be used in comparative research. The metric recovers the correlation between a predictor variable x and a continuous latent outcome variable y* assumed to underlie a binary observed outcome y. This metric is truly invariant to differences in the marginal distributions of x and y* across groups or samples, making it suitable for situations met in real applications in comparative research. Our derivations also extend to the probit and to ordered and multinomial models. The new metric is implemented in the Stata command nlcorr.",

author = "Karlson, {Kristian Bernt}",

note = "Invited speaker; 2011 German Stata Users Group meeting ; Conference date: 01-07-2011 Through 01-07-2011",

year = "2011",

month = jul,

day = "1",

language = "English",

}

TY - ABST

T1 - A metric for cross-sample comparisons using logit and probit

AU - Karlson, Kristian Bernt

N1 - Invited speaker

PY - 2011/7/1

Y1 - 2011/7/1

N2 - The logit model is a widely used regression technique in social research. However, the use and interpretation of coefficients from logit models have proven contentious. Problems arise because the mean and the variance of discrete variables cannot be separated. Logit coefficients are identified relative to an arbitrary scale, which makes the coefficients difficult both to interpret and to compare across groups or samples. Do differences in coefficients reflect true differences or differences in scales? This cross-sample comparison problem raises concerns for comparative research. However, we suggest a new correlation metric, derived from logit models, which gives new interpretation to the estimates of logit models (log odds-ratios). The metric leads the way to a reorientation of the use of logit models, because it helps to clarify what logit coefficients are and how and when logit coefficients can (or cannot) be used in comparative research. The metric recovers the correlation between a predictor variable x and a continuous latent outcome variable y* assumed to underlie a binary observed outcome y. This metric is truly invariant to differences in the marginal distributions of x and y* across groups or samples, making it suitable for situations met in real applications in comparative research. Our derivations also extend to the probit and to ordered and multinomial models. The new metric is implemented in the Stata command nlcorr.

AB - The logit model is a widely used regression technique in social research. However, the use and interpretation of coefficients from logit models have proven contentious. Problems arise because the mean and the variance of discrete variables cannot be separated. Logit coefficients are identified relative to an arbitrary scale, which makes the coefficients difficult both to interpret and to compare across groups or samples. Do differences in coefficients reflect true differences or differences in scales? This cross-sample comparison problem raises concerns for comparative research. However, we suggest a new correlation metric, derived from logit models, which gives new interpretation to the estimates of logit models (log odds-ratios). The metric leads the way to a reorientation of the use of logit models, because it helps to clarify what logit coefficients are and how and when logit coefficients can (or cannot) be used in comparative research. The metric recovers the correlation between a predictor variable x and a continuous latent outcome variable y* assumed to underlie a binary observed outcome y. This metric is truly invariant to differences in the marginal distributions of x and y* across groups or samples, making it suitable for situations met in real applications in comparative research. Our derivations also extend to the probit and to ordered and multinomial models. The new metric is implemented in the Stata command nlcorr.

M3 - Conference abstract for conference

T2 - 2011 German Stata Users Group meeting

Y2 - 1 July 2011 through 1 July 2011

ER -

A metric for cross-sample comparisons using logit and probit

Abstract

Conference

Fingerprint

Cite this