Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Fernando  Alva-Manchego; Joachim Bingel; Gustavo H.  Paetzold; Carolina  Scarton; Lucia  Specia

Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Fernando Alva-Manchego, Joachim Bingel, Gustavo H. Paetzold, Carolina Scarton, Lucia Specia

Department of Computer Science

Abstract

Current research in text simplification has
been hampered by two central problems:
(i) the small amount of high-quality parallel
simplification data available, and (ii)
the lack of explicit annotations of simplification
operations, such as deletions or substitutions,
on existing data. While the recently
introduced Newsela corpus has alleviated
the first problem, simplifications
still need to be learned directly from parallel
text using black-box, end-to-end approaches
rather than from explicit annotations.
These complex-simple parallel
sentence pairs often differ to such a high
degree that generalization becomes difficult.
End-to-end models also make it hard
to interpret what is actually learned from
data. We propose a method that decomposes
the task of TS into its sub-problems.
We devise a way to automatically identify
operations in a parallel corpus and introduce
a sequence-labeling approach based
on these annotations. Finally, we provide
insights on the types of transformations
that different approaches can model

Original language	English
Title of host publication	Proceedings of the The 8th International Joint Conference on Natural Language Processing
Publisher	Asian Federation of Natural Language Processing
Publication date	2017
Pages	295–305
ISBN (Print)	978-1-948087-00-1
Publication status	Published - 2017
Event	8th International Joint Conference on Natural Language Processing - Taipei, Taiwan, Province of China Duration: 27 Nov 2017 → 1 Dec 2017

Conference

Conference	8th International Joint Conference on Natural Language Processing
Country/Territory	Taiwan, Province of China
City	Taipei,
Period	27/11/2017 → 01/12/2017

Access to Document

http://aclweb.org/anthology/I17-1000

Cite this

Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs. / Alva-Manchego, Fernando ; Bingel, Joachim; Paetzold, Gustavo H. et al.

Proceedings of the The 8th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 2017. p. 295–305.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Alva-Manchego, F, Bingel, J, Paetzold, GH, Scarton, C & Specia, L 2017, Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs. in Proceedings of the The 8th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, pp. 295–305, 8th International Joint Conference on Natural Language Processing, Taipei, Taiwan, Province of China, 27/11/2017. <http://aclweb.org/anthology/I17-1000>

@inproceedings{af42fe80f82f4611bf25f279e59df893,

title = "Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs",

abstract = "Current research in text simplification hasbeen hampered by two central problems:(i) the small amount of high-quality parallelsimplification data available, and (ii)the lack of explicit annotations of simplificationoperations, such as deletions or substitutions,on existing data. While the recentlyintroduced Newsela corpus has alleviatedthe first problem, simplificationsstill need to be learned directly from paralleltext using black-box, end-to-end approachesrather than from explicit annotations.These complex-simple parallelsentence pairs often differ to such a highdegree that generalization becomes difficult.End-to-end models also make it hardto interpret what is actually learned fromdata. We propose a method that decomposesthe task of TS into its sub-problems.We devise a way to automatically identifyoperations in a parallel corpus and introducea sequence-labeling approach basedon these annotations. Finally, we provideinsights on the types of transformationsthat different approaches can model",

author = "Fernando Alva-Manchego and Joachim Bingel and Paetzold, {Gustavo H.} and Carolina Scarton and Lucia Specia",

year = "2017",

language = "English",

isbn = "978-1-948087-00-1",

pages = "295–305",

booktitle = "Proceedings of the The 8th International Joint Conference on Natural Language Processing",

publisher = "Asian Federation of Natural Language Processing",

note = "8th International Joint Conference on Natural Language Processing ; Conference date: 27-11-2017 Through 01-12-2017",

}

TY - GEN

T1 - Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

AU - Alva-Manchego, Fernando

AU - Bingel, Joachim

AU - Paetzold, Gustavo H.

AU - Scarton, Carolina

AU - Specia, Lucia

PY - 2017

Y1 - 2017

N2 - Current research in text simplification hasbeen hampered by two central problems:(i) the small amount of high-quality parallelsimplification data available, and (ii)the lack of explicit annotations of simplificationoperations, such as deletions or substitutions,on existing data. While the recentlyintroduced Newsela corpus has alleviatedthe first problem, simplificationsstill need to be learned directly from paralleltext using black-box, end-to-end approachesrather than from explicit annotations.These complex-simple parallelsentence pairs often differ to such a highdegree that generalization becomes difficult.End-to-end models also make it hardto interpret what is actually learned fromdata. We propose a method that decomposesthe task of TS into its sub-problems.We devise a way to automatically identifyoperations in a parallel corpus and introducea sequence-labeling approach basedon these annotations. Finally, we provideinsights on the types of transformationsthat different approaches can model

AB - Current research in text simplification hasbeen hampered by two central problems:(i) the small amount of high-quality parallelsimplification data available, and (ii)the lack of explicit annotations of simplificationoperations, such as deletions or substitutions,on existing data. While the recentlyintroduced Newsela corpus has alleviatedthe first problem, simplificationsstill need to be learned directly from paralleltext using black-box, end-to-end approachesrather than from explicit annotations.These complex-simple parallelsentence pairs often differ to such a highdegree that generalization becomes difficult.End-to-end models also make it hardto interpret what is actually learned fromdata. We propose a method that decomposesthe task of TS into its sub-problems.We devise a way to automatically identifyoperations in a parallel corpus and introducea sequence-labeling approach basedon these annotations. Finally, we provideinsights on the types of transformationsthat different approaches can model

M3 - Article in proceedings

SN - 978-1-948087-00-1

SP - 295

EP - 305

BT - Proceedings of the The 8th International Joint Conference on Natural Language Processing

PB - Asian Federation of Natural Language Processing

T2 - 8th International Joint Conference on Natural Language Processing

Y2 - 27 November 2017 through 1 December 2017

ER -

Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Abstract

Conference

Access to Document

Fingerprint

Cite this