Improving natural language processing with human data: Eye tracking and other data sources reflecting cognitive text processing

Maria Barrett

Improving natural language processing with human data: Eye tracking and other data sources reflecting cognitive text processing

Institut for Nordiske Studier og Sprogvidenskab

387 Downloads (Pure)

Abstract

When humans perform everyday tasks like reading, speaking, and writing, they cognitively also completemany of the tasks that natural language processing strives for computers to replicate. The traces of humancognitive processing can be collected in various data sources such as eye tracking during reading, keystrokelogs from typing and acoustic cues, where milliseconds matter.This thesis shows that there is an unused potential for utilizing eye tracking and other data sourcesreflecting human cognitive processing of text for natural language processing.This thesis presents several studies where traces of human text processing can be used to improve a broadrange of established natural language processing tasks. The tasks span part-of-speech induction, syntacticparsing, sentiment classification, grammatical error detection and detection of abusive language. The thesisfurthermore demonstrates some transfer across related languages by using English eye-tracking recordingsto improve French part-of-speech induction.Technology for recording keystroke logs and prosody features is already common. And the recentadvancements of low-cost eye tracking technology promise eye-tracking data to be available in largerquantities, also for low-resource languages. Real-world eye-tracking data poses new challenges comparedto laboratory data. One study in this thesis presents first evidence that despite the noise and idiosyncrasies,real-world reading data recorded with a consumer-grade eye tracker can be modelled in machine learningmodels.

Originalsprog	Engelsk

Forlag	Det Humanistiske Fakultet, Københavns Universitet
Antal sider	171
Status	Udgivet - okt. 2018

Adgang til dokumentet

Ph.d. afhandling 2018 BarrettForlagets udgivne version, 1,81 MBLicens: CC BY-NC-ND

Citationsformater

@phdthesis{d79125254c3949b49aad66a84b899371,

title = "Improving natural language processing with human data: Eye tracking and other data sources reflecting cognitive text processing",

abstract = "When humans perform everyday tasks like reading, speaking, and writing, they cognitively also completemany of the tasks that natural language processing strives for computers to replicate. The traces of humancognitive processing can be collected in various data sources such as eye tracking during reading, keystrokelogs from typing and acoustic cues, where milliseconds matter.This thesis shows that there is an unused potential for utilizing eye tracking and other data sourcesreflecting human cognitive processing of text for natural language processing.This thesis presents several studies where traces of human text processing can be used to improve a broadrange of established natural language processing tasks. The tasks span part-of-speech induction, syntacticparsing, sentiment classification, grammatical error detection and detection of abusive language. The thesisfurthermore demonstrates some transfer across related languages by using English eye-tracking recordingsto improve French part-of-speech induction.Technology for recording keystroke logs and prosody features is already common. And the recentadvancements of low-cost eye tracking technology promise eye-tracking data to be available in largerquantities, also for low-resource languages. Real-world eye-tracking data poses new challenges comparedto laboratory data. One study in this thesis presents first evidence that despite the noise and idiosyncrasies,real-world reading data recorded with a consumer-grade eye tracker can be modelled in machine learningmodels.",

author = "Maria Barrett",

year = "2018",

month = oct,

language = "English",

publisher = "Det Humanistiske Fakultet, K{\o}benhavns Universitet",

address = "Denmark",

}

TY - BOOK

T1 - Improving natural language processing with human data

T2 - Eye tracking and other data sources reflecting cognitive text processing

AU - Barrett, Maria

PY - 2018/10

Y1 - 2018/10

N2 - When humans perform everyday tasks like reading, speaking, and writing, they cognitively also completemany of the tasks that natural language processing strives for computers to replicate. The traces of humancognitive processing can be collected in various data sources such as eye tracking during reading, keystrokelogs from typing and acoustic cues, where milliseconds matter.This thesis shows that there is an unused potential for utilizing eye tracking and other data sourcesreflecting human cognitive processing of text for natural language processing.This thesis presents several studies where traces of human text processing can be used to improve a broadrange of established natural language processing tasks. The tasks span part-of-speech induction, syntacticparsing, sentiment classification, grammatical error detection and detection of abusive language. The thesisfurthermore demonstrates some transfer across related languages by using English eye-tracking recordingsto improve French part-of-speech induction.Technology for recording keystroke logs and prosody features is already common. And the recentadvancements of low-cost eye tracking technology promise eye-tracking data to be available in largerquantities, also for low-resource languages. Real-world eye-tracking data poses new challenges comparedto laboratory data. One study in this thesis presents first evidence that despite the noise and idiosyncrasies,real-world reading data recorded with a consumer-grade eye tracker can be modelled in machine learningmodels.

AB - When humans perform everyday tasks like reading, speaking, and writing, they cognitively also completemany of the tasks that natural language processing strives for computers to replicate. The traces of humancognitive processing can be collected in various data sources such as eye tracking during reading, keystrokelogs from typing and acoustic cues, where milliseconds matter.This thesis shows that there is an unused potential for utilizing eye tracking and other data sourcesreflecting human cognitive processing of text for natural language processing.This thesis presents several studies where traces of human text processing can be used to improve a broadrange of established natural language processing tasks. The tasks span part-of-speech induction, syntacticparsing, sentiment classification, grammatical error detection and detection of abusive language. The thesisfurthermore demonstrates some transfer across related languages by using English eye-tracking recordingsto improve French part-of-speech induction.Technology for recording keystroke logs and prosody features is already common. And the recentadvancements of low-cost eye tracking technology promise eye-tracking data to be available in largerquantities, also for low-resource languages. Real-world eye-tracking data poses new challenges comparedto laboratory data. One study in this thesis presents first evidence that despite the noise and idiosyncrasies,real-world reading data recorded with a consumer-grade eye tracker can be modelled in machine learningmodels.

M3 - Ph.D. thesis

BT - Improving natural language processing with human data

PB - Det Humanistiske Fakultet, Københavns Universitet

ER -