BEOMAPS: Ad-hoc topic maps for enhanced search in social network data.

Intelligent Web and Information Systems - Machine Intelligence, Computer Science, Aalborg University
Peter Dolog, Martin Leginus

Text Information Management and Analysis Group, University of Illinois at Urbana-Champaign , USA
ChengXiang Zhai

Early version of the prototype (due to the vast amount of processed data and our hardware capabilities the system performance is slower) - please use only Public tweets view

Description of research project:

Social media is ubiquitous. Almost 91% of online adults use social media regularly with social networking being the most popular online activity (20% of their online time). For instance, 140 million active users of Twitter send more than 340 million tweets daily. The boom of social media and the amount of information generated every day brings new opportunities for business, society and the world itself. Social media analytics help companies to better understand the market environment (suppliers, competitors and customers opinions). Society might benefit from easier communication between the governments and the public, the monitoring of social media might provide valuable insights for medical, societal and disaster management decision making. However, to fully exploit the potential of social media data, there is a need for social analytics tools which will enable more effective ways to understand, analyze and exploit social media information. Existing systems facilitate opinion mining, topic mining or trend analysis combined with visual analytics. Despite these advanced techniques for social media analysis, the current systems do not facilitate user browsing and navigation to relevant topics. For instance on Twitter, when browsing and exploring a topic about the Russian gas conflict with the Ukraine - expressed as a query russia, gas, war relevant tweets will be retrieved and presented to the end user. Several existing systems for social media data might provide sentiment, trend and visual analysis for this query. However, it is not clear and obvious to the user that there are other existing relevant topics which should be considered during the analysis process. Further, it is not readily obvious how to facilitate a systematic, user-defined navigation to relevant topics. Let us assume that a user is interested in the topic the Russian gas conflict with the Ukraine from the perspective of a Russian gas extractor company called Gazprom. With the existing systems, the user would struggle to realize that fuelfreebies, naftogaz, energysecurity, eu2030, russianspring, gazprommt, eess are related relevant topics (see the below presented screenshot of the system). Similarly, it would be difficult to determine which relevant topics are the most recent, popular or positive for this information need.

The aim of this project is to develop a novel system - a proof of concept that will enable more effective search, exploration, analysis and browsing of social media data. The main novelty of the system is an ad-hoc multi-dimensional topic map. The ad-hoc topic map can be generated and visualized according to multiple predefined dimensions e.g., recency, relevance, popularity or location based dimension. These dimensions will provide a better means for enhanced browsing, understanding and navigating to related relevant topics from underlying social media data. The ad-hoc aspect of the topic map allows user-guided exploration and browsing of the underlying social media topics space. It enables the user to explore and navigate the topic space through user-chosen dimensions and ad-hoc user-defined queries. Similarly, as in standard search engines, we consider the possibility of freely defined ad-hoc queries to generate a topic map as a possible paradigm for social media data exploration, navigation and browsing. An additional benefit of the novel system is an enhanced query expansion to allow users narrow their difficult queries with the terms suggested by an ad-hoc multi-dimensional topic map. Further, ad-hoc topic maps enable the exploration and analysis of relations between individual topics, which might lead to serendipitous discoveries.

The intention of this research project is to identify and discover the advantages and limitations of ad-hoc topic maps for enhanced browsing of social media data. In the first phase of the research project, we will focus on completing and releasing the current proof-of-concept system for public use (till the end of July 2014). The proof-of-concept system is built on top of publicly available data from Twitter. Then, twitter users can perform browsing and searching on top of general public stream as well as on top of their own personal stream of data. Non-Twitter users will be able to explore the public general stream. Untill the end of September 2014, we plan to perform the user evaluation of the system which will consist of a comparison of user browsing behaviour when (not) exploiting the ad-hoc topic map i.e., A/B testing. Further, we will analyze search patterns of users and their needs which will enable better understanding of arising challenges of ad-hoc topic maps and user needs when using such retrieval interfaces. The findings from these user evaluation will be published at ICWE2015. Based on the user feedback and interactions with the system, we will focus on further enhancements such as diversification, clustering of redundant topics or personalized generation of topic maps. The exact direction of further research will be decided according to the most acute needs of the system users.

If successful, this project would open up a new paradigm of information interaction and lead to a useful tool for many users to interact with social media data. The developed system is based on open source technologies and the source code will be publicly available for researchers and other public. The findings from the research project will be transferred into research publications (ICWE2015 + other information retrieval conferences), public presentations and the developed system might serve as an open source platform for other researchers.

The following screenshot of the system demonstrates the contributions of an ad-hoc topic map. The topic map generated for the initial query: russia gas war and the ad-hoc query gazprom enables an exploration of the topic from the perspective of this Russian gas extracting company Gazprom. This ad-hoc dimension enables a novel way of users browsing and exploring social media data. The system allows users freely to explore social media data topic space through ad-hoc queries with predefined dimensions. In the presented example, it is clear that topics like fuelfreebies, naftogaz, energysecurity, eu2030, russianspring, gazprommt, eess are relevant and could be further explored by the user. If the user would switch a dimension metric to recency, the most recent related topics would be suggested such as austria, southstream, baltics.

A screenshot of the ad-hoc topic map prototype search on Twitter.

An example of query russia gas war with generated topic map where the second dimension corresponds to gazprom.
Terms like: fuelfreebies, naftogaz, energysecurity, eu2030, russianspring, gazprommt, eess etc. might be used to further browse and explore the topic.
Presented hashtags in this example are ordered by relevance with respect to the main query and ad-hoc dimension query.