Data Extraction with NLP techniques and its Transformation to Linked Data
Speaker:
Martin Nečaský, Barbora Hladká, Vincent Kríž
Abstract:
According to the statistics provided by the International Data Corporation, 90% of all available digital data is unstructured and its amount currently grows twice as fast as structured data. In many domains, large collections of unstructured documents form main sources of information. Their efficient browsing and querying present key aspects in many areas of human activities. The project INTLIB, an INTelligent LIBrary, assumes a collection of documents related to a particular problem domain on the input. In the first phase we extract a knowledge base from the collection using natural language processing tools. In the second phase we deal with efficient and user friendly visualization and querying the extracted knowledge. We will present our results on both legislative and enviromental domain.