Extracting Ontological Structures from Collaborative Tagging Systems

H. Lin
J. Davis
2012, August
Published in: 
The University of Sydney
The World Wide Web has undergone significant evolution in the past decade. The Web in its present form (often referred to as Web 2.0) is a major shift from the largely exposure-based features of Web 1.0. Also known as the social web or the read-write web, Web 2.0 introduced the critical feature of user contribution. Its impact has been massive in the rise of a vast array of social media sites and applications. However, our ability to access and use such content is somewhat limited. There is a need for new and innovative approaches to organising and retrieving online information in general and user-contributed content in particular. Recently, folksonomy has emerged to help users share web-based information created by users, allowing users to organise resources using their own tags. However, our ability to search for information based on folksonomies is somewhat limited. This is largely because of its flat, non-hierarchical structure combined with tag vocabulary that largely consists of terms that are typically not found in dictionaries or thesauri. A promising solution that can transform a collection of tags into a queryable semantic web knowledge base is to build ontologies from the folksonomies. Our goal is to extract an ontological structure from a folksonomy and facilitate its ability to evolve automatically as usage patterns change. We demonstrate that the resulting structure is significantly more efficient at supporting semantic-based exploration and search of online resources. This thesis explores two questions. First, can knowledge be discovered in folksonomies and transferred into lightweight ontological structures using traditional automated computation? Second, how can ontological structures evolve and improve with end-user knowledge that has been solicited through crowdsourcing activities? To address these two questions, we developed a new framework, termed "Ontological Structures Extraction 2.0". Our goal is to merge the useful aspects of ontologies and folksonomies. By extracting an ontological structure from the tags collected in a folksonomy, we can add explicit semantics to Web 2.0 applications, and use the knowledge of search engine users to help build semantic web structures. Specifically, our model does an initial automated extraction by exploiting the power of low support association rules mining supplemented by an upper ontology such as WordNet. Also, it integrates the knowledge of search engine users to help evolve the extracted ontology with the employment of crowdsourcing. We implemented a semantic search application called SmartFolks to test semantic searches done on the extracted ontological structure. We also developed and tested a prototype hybrid human-machine system, OntoAssist. By piggybacking OntoAssist with an existing search engine, users can refine their online searches by choosing the relationships between query keywords and relevant terms presented in the search results. This helps the initial ontology to evolve as well as providing better search results. The automated algorithm returned promising initial results using two datasets from Flickr and CiteULike. We evaluated SmartFolks with a test dataset of 25,000 images from MIR Flickr. Comparing SmartFolks with benchmarks from MIR shows that semantic web technology improves user search experience and information retrieval. Two important, labour intensive tasks in ontology development are domain term selection and relationship assignment. We assessed the ability of non-experts to contribute to the ontology by engaging workers from Amazon Mechanical Turk (MTurk) to use our OntoAssist search tool. The experiments were completed in a short time at low cost with more than 90 percent accuracy. The OntoAssist tool is based on Yahoo! Search BOSS API and is available at the demonstration website www.hahia.com. The evidence we submit indicates that knowledge from flat folksonomy structures can be extracted and enriched. This is a sound approach for solving the semantic search problems in collaborative tagging systems and for improving the precision and quality of information retrieved from the WorldWideWeb.