LAIR: A Language for Automated Semantics-Aware Text Sanitization based on Frame Semantics

Steffen Hedegaard, Søren Houen, Jakob Grue Simonsen

Abstract

We present \lair{}: A domain-specific language that enables users to specify actions to be taken upon meeting specific semantic frames in a text, in particular to rephrase and redact the textual content. While \lair{} presupposes superficial knowledge of frames and frame semantics, it requires only limited prior programming experience. It neither contain scripting or I/O primitives, nor does it contain general loop constructions and is not Turing-complete. We have implemented a \lair{} compiler and integrated it in a pipeline for automated redaction of web pages. We detail our experience with automated redaction of web pages for subjectively undesirable content; initial experiments suggest that using a small language based on semantic recognition of undesirable terms can be highly useful as a supplement to traditional methods of text sanitization.
OriginalsprogEngelsk
TitelProceedings of the 3rd IEEE International Conference on Semantic Computing (ICSC 2009)
Antal sider6
ForlagIEEE Computer Society Press
Publikationsdato2009
Sider47-52
ISBN (Trykt)978-0-7695-3800-6
DOI
StatusUdgivet - 2009
BegivenhedIEEE International Conference on Semantic Computing - Berkeley, USA
Varighed: 14 sep. 200916 sep. 2009
Konferencens nummer: 3

Konference

KonferenceIEEE International Conference on Semantic Computing
Nummer3
Land/OmrådeUSA
ByBerkeley
Periode14/09/200916/09/2009

Fingeraftryk

Dyk ned i forskningsemnerne om 'LAIR: A Language for Automated Semantics-Aware Text Sanitization based on Frame Semantics'. Sammen danner de et unikt fingeraftryk.

Citationsformater