A Semantic Similarity Measure for Linked Data: An Information Content-Based Approach

Journal Paper
R. Meymandpour
J. Davis
2016, October
Published in: 
Knowledge-Based Systems
Linked Data allows structured data to be published in a standard manner so that datasets from diverse domains can be interlinked. By leveraging Semantic Web standards and technologies, a growing amount of semantic content has been published on the Web as Linked Open Data (LOD). The LOD cloud has made available a large volume of structured data in a range of domains via liberal licenses. The semantic content of LOD in conjunction with the advanced searching and querying mechanisms provided by SPARQL has opened up unprecedented opportunities not only for enhancing existing applications, but also for developing new and innovative semantic applications. However, SPARQL is inadequate to deal with functionalities such as comparing, prioritizing, and ranking search results which are fundamental to applications such as recommendation provision, matchmaking, social network analysis, visualization, and data clustering. This paper addresses this problem by developing a systematic measurement model of semantic similarity between resources in Linked Data. By drawing extensively on a feature-based definition of Linked Data, it proposes a generalized information content-based approach that improves on previous methods which are typically restricted to specific knowledge representation models and less relevant in the context of Linked Data. It is validated and evaluated for measuring item similarity in recommender systems. The experimental evaluation of the proposed measure shows that our approach can outperform comparable recommender systems that use conventional similarity measures.