Semantic Analysis Using Linked Open Data: An Information Content-Based Approach

Rouzbeh Meymandpour
2015, May
Published in: 
The University of Sydney
The Semantic Web is a collection of standards and technologies that makes Web documents ready to be consumed, reused and shared by applications. Linked Open Data (LOD) is a recent community-driven effort to provide access to a large amount of structured data in diverse domains using semantic technologies and through open standards and liberal licences. This not only offers unprecedented opportunities for developing novel and innovative applications but also makes the application development more efficient and cost-effective. LOD is a complex semantic network of information resources interlinked via meaningful, semantic relations. A wide range of entities such as movies, artists, books, etc. are represented as resources in LOD and are semantically linked to other related entities. These entities are described using billions of statements with various levels of informativeness which can significantly affect the quality of LOD-based semantic applications. A primary challenge in semantic analysis is to systematically define what is considered to be useful information. This thesis addresses the problem of reliable and valid measurement of LOD informativeness based on the concept of information content (IC): defining information as a measurable mathematical quantity. We extend the notion of IC measurement to LOD, and develop, evaluate and experiment with several measures of informativeness. By building on a valid mathematical definition of LOD which complies with accepted standards and principles, we ensure that our proposed measures are robust and reliable. This is supported by experimental evaluations using well-established benchmark data and evaluation metrics. These experiments also demonstrate the applicability and value of the proposed measures in diverse applications and domains. First, we propose partitioned information content (PIC) which is a measure of the information content of entities in LOD. As a fundamental application area, PIC is applied to entity ranking problem. The PIC-based approach for ranking universities shows a high degree of correlation with international, well-established ranking systems. Second, we develop the generated information content (GIC) measure that assesses the informativeness of relations in LOD. It has a wide range of applications in semantic navigation, faceted browsing and visualisation. Third, this thesis presents a novel, PIC-based semantic similarity measure of resources, called PICSS. We apply PICSS to develop a hybrid recommender system. The experimental evaluation of the proposed approach shows that it outperforms the comparable recommender systems, especially, in situations where there is a lack of information on newly added items. Finally, PICSS-based measures are applied to address the problem of lack of diversity in recommendations in order to better satisfy users’ requirements and to increase the average diversity of the recommendations, while preserving the overall accuracy.