Tutorial: Web-Based Open-Domain Information Extraction

Marius Pasca  - Google

 

This tutorial provides an overview of extraction methods developed in the area of Web-based open-domain information extraction, whose purpose is the acquisition of open-domain classes, instances and relations from Web text. The extraction methods operate over unstructured or semi-structured text. They take advantage of weak supervision provided in the form of seed examples or small amounts of annotated data, or draw upon knowledge already encoded within resources created strictly by experts or collaboratively by users. The tutorial teaches the audience about existing resources that include instances and relations; details of methods for extracting such data from structured and semi-structured text available on the Web; and strengths and limitations of resources extracted from text as part of recent literature, with applications in knowledge discovery and information retrieval.

 

Speakers:

 

Marius Pasca  - Google

Marius Pasca is a research scientist at Google. He graduated with a Ph.D. degree in Computer Science from Southern Methodist University, Dallas, Texas and an M.Sc. degree in Computer Science from Joseph Fourier University, Grenoble, France. He is the author of the book "Open-domain question answering from large text collections". He served on the program committees of ACL, IJCAI, WWW, SIGIR, HLT, EMNLP, NAACL and AAAI, including area co-chair positions at HTL-06, CIKM-08 and EMNLP-09. Current research interests include factual information extraction from unstructured text within documents and queries, and its applications to Web search.