Adversarial Removal of Demographic Attributes Revisited

Maria Barrett; Yova Kementchedjhieva; Yanai Elazar; Desmond Elliott; Anders Søgaard

doi:10.18653/v1/D19-1662

Adversarial Removal of Demographic Attributes Revisited

Maria Barrett, Yova Kementchedjhieva, Yanai Elazar, Desmond Elliott, Anders Søgaard

Abstract

Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on. We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to their particular data sample. We further show that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples. In other words, the biases detected in Elazar and Goldberg (2018) seem restricted to their particular data sample, and would therefore not bias the decisions of the model on new samples, whether in-domain or out-of-domain. In light of this, we discuss better methodologies for detecting bias in our models.

Originalsprog	Udefineret/Ukendt
Titel	Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Antal sider	6
Udgivelsessted	Hong Kong, China
Forlag	Association for Computational Linguistics (ACL)
Publikationsdato	1 nov. 2019
Sider	6329-6334
DOI	https://doi.org/10.18653/v1/D19-1662
Status	Udgivet - 1 nov. 2019

Adgang til dokumentet

10.18653/v1/D19-1662

https://www.aclweb.org/anthology/D19-1662

Citationsformater

Barrett, M., Kementchedjhieva, Y., Elazar, Y., Elliott, D., & Søgaard, A. (2019). Adversarial Removal of Demographic Attributes Revisited. I Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (s. 6329-6334). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/D19-1662

Adversarial Removal of Demographic Attributes Revisited. / Barrett, Maria ; Kementchedjhieva, Yova; Elazar, Yanai et al.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics (ACL), 2019. s. 6329-6334.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › peer review

Barrett, M , Kementchedjhieva, Y, Elazar, Y, Elliott, D & Søgaard, A 2019, Adversarial Removal of Demographic Attributes Revisited. i Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics (ACL), Hong Kong, China, s. 6329-6334. https://doi.org/10.18653/v1/D19-1662

Barrett M , Kementchedjhieva Y, Elazar Y, Elliott D , Søgaard A. Adversarial Removal of Demographic Attributes Revisited. I Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics (ACL). 2019. s. 6329-6334 doi: 10.18653/v1/D19-1662

Barrett, Maria ; Kementchedjhieva, Yova ; Elazar, Yanai et al. / Adversarial Removal of Demographic Attributes Revisited. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China : Association for Computational Linguistics (ACL), 2019. s. 6329-6334

@inproceedings{5cdb0c5bd11840d1a526082e35129172,

title = "Adversarial Removal of Demographic Attributes Revisited",

abstract = "Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on. We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to their particular data sample. We further show that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples. In other words, the biases detected in Elazar and Goldberg (2018) seem restricted to their particular data sample, and would therefore not bias the decisions of the model on new samples, whether in-domain or out-of-domain. In light of this, we discuss better methodologies for detecting bias in our models.",

author = "Maria Barrett and Yova Kementchedjhieva and Yanai Elazar and Desmond Elliott and Anders S{\o}gaard",

year = "2019",

month = nov,

day = "1",

doi = "10.18653/v1/D19-1662",

language = "Udefineret/Ukendt",

pages = "6329--6334",

booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",

publisher = "Association for Computational Linguistics (ACL)",

address = "USA",

}

TY - GEN

T1 - Adversarial Removal of Demographic Attributes Revisited

AU - Barrett, Maria

AU - Kementchedjhieva, Yova

AU - Elazar, Yanai

AU - Elliott, Desmond

AU - Søgaard, Anders

PY - 2019/11/1

Y1 - 2019/11/1

N2 - Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on. We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to their particular data sample. We further show that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples. In other words, the biases detected in Elazar and Goldberg (2018) seem restricted to their particular data sample, and would therefore not bias the decisions of the model on new samples, whether in-domain or out-of-domain. In light of this, we discuss better methodologies for detecting bias in our models.

AB - Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on. We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to their particular data sample. We further show that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples. In other words, the biases detected in Elazar and Goldberg (2018) seem restricted to their particular data sample, and would therefore not bias the decisions of the model on new samples, whether in-domain or out-of-domain. In light of this, we discuss better methodologies for detecting bias in our models.

U2 - 10.18653/v1/D19-1662

DO - 10.18653/v1/D19-1662

M3 - Konferencebidrag i proceedings

SP - 6329

EP - 6334

BT - Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

PB - Association for Computational Linguistics (ACL)

CY - Hong Kong, China

ER -