Abstract
Knowledge bases have the potential to advance artificial intelligence, but often suffer from recall problems, i.e., lack of knowledge of new entities and relations. On the contrary, social media such as Twitter provide abundance of data, in a timely manner: information spreads at an incredible pace and is posted long before it makes it into more commonly used resources for knowledge extraction. In this paper we address the question whether we can exploit social media to extract new facts, which may at first seem like finding needles in haystacks. We collect tweets about 60 entities in Freebase and compare four methods to extract binary relation candidates, based on syntactic and semantic parsing and simple mechanism for factuality scoring. The extracted facts are manually evaluated in terms of their correctness and relevance for search. We show that moving from bottom-up syntactic or semantic dependency parsing formalisms to top-down frame-semantic processing improves the robustness of knowledge extraction, producing more intelligible fact candidates of better quality. In order to evaluate the quality of frame semantic parsing on Twitter intrinsically, we make a multiply frame-annotated dataset of tweets publicly available.
Original language | English |
---|---|
Title of host publication | PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE : AAAI 2015 |
Number of pages | 6 |
Publisher | Association for the Advancement of Artificial Intelligence |
Publication date | 1 Jun 2015 |
Pages | 2447-52 |
Publication status | Published - 1 Jun 2015 |