TY - BOOK
T1 - Improving natural language processing with human data
T2 - Eye tracking and other data sources reflecting cognitive text processing
AU - Barrett, Maria
PY - 2018/10
Y1 - 2018/10
N2 - When humans perform everyday tasks like reading, speaking, and writing, they cognitively also completemany of the tasks that natural language processing strives for computers to replicate. The traces of humancognitive processing can be collected in various data sources such as eye tracking during reading, keystrokelogs from typing and acoustic cues, where milliseconds matter.This thesis shows that there is an unused potential for utilizing eye tracking and other data sourcesreflecting human cognitive processing of text for natural language processing.This thesis presents several studies where traces of human text processing can be used to improve a broadrange of established natural language processing tasks. The tasks span part-of-speech induction, syntacticparsing, sentiment classification, grammatical error detection and detection of abusive language. The thesisfurthermore demonstrates some transfer across related languages by using English eye-tracking recordingsto improve French part-of-speech induction.Technology for recording keystroke logs and prosody features is already common. And the recentadvancements of low-cost eye tracking technology promise eye-tracking data to be available in largerquantities, also for low-resource languages. Real-world eye-tracking data poses new challenges comparedto laboratory data. One study in this thesis presents first evidence that despite the noise and idiosyncrasies,real-world reading data recorded with a consumer-grade eye tracker can be modelled in machine learningmodels.
AB - When humans perform everyday tasks like reading, speaking, and writing, they cognitively also completemany of the tasks that natural language processing strives for computers to replicate. The traces of humancognitive processing can be collected in various data sources such as eye tracking during reading, keystrokelogs from typing and acoustic cues, where milliseconds matter.This thesis shows that there is an unused potential for utilizing eye tracking and other data sourcesreflecting human cognitive processing of text for natural language processing.This thesis presents several studies where traces of human text processing can be used to improve a broadrange of established natural language processing tasks. The tasks span part-of-speech induction, syntacticparsing, sentiment classification, grammatical error detection and detection of abusive language. The thesisfurthermore demonstrates some transfer across related languages by using English eye-tracking recordingsto improve French part-of-speech induction.Technology for recording keystroke logs and prosody features is already common. And the recentadvancements of low-cost eye tracking technology promise eye-tracking data to be available in largerquantities, also for low-resource languages. Real-world eye-tracking data poses new challenges comparedto laboratory data. One study in this thesis presents first evidence that despite the noise and idiosyncrasies,real-world reading data recorded with a consumer-grade eye tracker can be modelled in machine learningmodels.
M3 - Ph.D. thesis
BT - Improving natural language processing with human data
PB - Det Humanistiske Fakultet, Københavns Universitet
ER -