Gathering Information on the Web by Consistent Entity Augmentation
Keywords:Web table, data integration, entity augmentation, consistency
AbstractUsers usually want to gather information about what they are interested in, which could be achieved by entity augmentation using a vast amount of web tables. Existing techniques assume that web tables are entity-attribute binary tables. As for tables having multiple columns to be augmented, they will be split into several entity-attribute binary relations, which would cause semantic fragmentation. Furthermore, the result table consolidated by binary relations will suffer from entity inconsistency and low precision. The objective of our research is to return a consistent result table for entity augmentation when given a set of entities and attribute names. In this paper we propose a web information gathering framework based on consistent entity augmentation. To ensure high consistency and precision of the result table we propose that answer tables for building result table should have consistent matching relationships with each other. Instead of splitting tables into pieces we regard web tables as nodes and consistent matching relationships as edges to make a consistent clique and expand it until its coverage for augmentation query reaches certain threshold gamma. It is proved in this paper that a consistent result table could be built by considering tables in consistent clique to be answer tables. We tested our method on four real-life datasets, compared it with different answer table selection methods and state-of-the-art entity augmentation technique based on table fragmentation as well. The results of a comprehensive set of experiments indicate that our entity augmentation framework is more effective than the existing method in getting consistent entity augmentation results with high accuracy and reliability.
Download data is not yet available.
How to Cite
Sun, W., & Wang, N. (2020). Gathering Information on the Web by Consistent Entity Augmentation. COMPUTING AND INFORMATICS, 38(5), 1039–1066. https://doi.org/10.31577/cai_2019_5_1039