Abstract
Current research in text simplification has
been hampered by two central problems:
(i) the small amount of high-quality parallel
simplification data available, and (ii)
the lack of explicit annotations of simplification
operations, such as deletions or substitutions,
on existing data. While the recently
introduced Newsela corpus has alleviated
the first problem, simplifications
still need to be learned directly from parallel
text using black-box, end-to-end approaches
rather than from explicit annotations.
These complex-simple parallel
sentence pairs often differ to such a high
degree that generalization becomes difficult.
End-to-end models also make it hard
to interpret what is actually learned from
data. We propose a method that decomposes
the task of TS into its sub-problems.
We devise a way to automatically identify
operations in a parallel corpus and introduce
a sequence-labeling approach based
on these annotations. Finally, we provide
insights on the types of transformations
that different approaches can model
been hampered by two central problems:
(i) the small amount of high-quality parallel
simplification data available, and (ii)
the lack of explicit annotations of simplification
operations, such as deletions or substitutions,
on existing data. While the recently
introduced Newsela corpus has alleviated
the first problem, simplifications
still need to be learned directly from parallel
text using black-box, end-to-end approaches
rather than from explicit annotations.
These complex-simple parallel
sentence pairs often differ to such a high
degree that generalization becomes difficult.
End-to-end models also make it hard
to interpret what is actually learned from
data. We propose a method that decomposes
the task of TS into its sub-problems.
We devise a way to automatically identify
operations in a parallel corpus and introduce
a sequence-labeling approach based
on these annotations. Finally, we provide
insights on the types of transformations
that different approaches can model
Original language | English |
---|---|
Title of host publication | Proceedings of the The 8th International Joint Conference on Natural Language Processing |
Publisher | Asian Federation of Natural Language Processing |
Publication date | 2017 |
Pages | 295–305 |
ISBN (Print) | 978-1-948087-00-1 |
Publication status | Published - 2017 |
Event | 8th International Joint Conference on Natural Language Processing - Taipei, Taiwan, Province of China Duration: 27 Nov 2017 → 1 Dec 2017 |
Conference
Conference | 8th International Joint Conference on Natural Language Processing |
---|---|
Country/Territory | Taiwan, Province of China |
City | Taipei, |
Period | 27/11/2017 → 01/12/2017 |