Publications & Resources

A New Framework for Textual Information Mining Over Parse Trees

Sep 2011

Hamid Mousavi, Deirdre Kerr and Markus R. Iseli

This paper introduces a new text mining framework using a tree-based Linguistic Query Language, called LQL. The framework generates more than one parse tree for each sentence using a probabilistic parser, and annotates each node of these parse trees with main-parts information. The main-parts can be specialized for different domains based on a user-generated list of concepts. Using main-parts-annotated parse trees for a given textual dataset, the system can efficiently answer individual queries as well as mine the text for a given set of queries. The framework also has the ability to support grammatical ambiguity through probabilistic rules and linguistic exceptions in order to increase the quality of the extracted information.

Mousavi, H., Kerr, D., & Iseli, M. R. (2011). A new framework for textual information mining over parse trees (CRESST Report 805). Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).