Sunday, September 11, 2011

Chatter Mining

Figure 14: Global geocoded tone of all Summary of World Broadcasts content, January 1979–April 2011 mentioning “bin Laden”.

I read a fascinating article today at the BBC website about using supercomputers to predict revolution and other social movements. They appear to be extremely proficient. Scientists are data mining millions of online articles to accurately predict sentiment and trends on the street.

The paper, written by Kalev H. Leetaru, was introduced on the online peer review First Monday and here is a link. Leetaru is from from the University of Illinois' Institute for Computing in the Humanities, Arts and Social Science.

Apparently the computer predicted Bin Laden's whereabouts to 200km, the fall of Mubarek against conventional wisdom and instability in Saudi Arabia. It is titled Culturomics 2.0, Forecasting large scale human behavior using global news media tone in time and space.

I reprint the abstract and a portion of the introduction below:

News is increasingly being produced and consumed online, supplanting print and broadcast to represent nearly half of the news monitored across the world today by Western intelligence agencies. Recent literature has suggested that computational analysis of large text archives can yield novel insights to the functioning of society, including predicting future economic events. Applying tone and geographic analysis to a 30–year worldwide news archive, global news tone is found to have forecasted the revolutions in Tunisia, Egypt, and Libya, including the removal of Egyptian President Mubarak, predicted the stability of Saudi Arabia (at least through May 2011), estimated Osama Bin Laden’s likely hiding place as a 200–kilometer radius in Northern Pakistan that includes Abbotabad, and offered a new look at the world’s cultural affiliations. Along the way, common assertions about the news, such as “news is becoming more negative” and “American news portrays a U.S.–centric view of the world” are found to have merit.



The emerging field of “Culturomics” seeks to explore broad cultural trends through the computerized analysis of vast digital book archives, offering novel insights into the functioning of human society (Michel, et al., 2011). Yet, books represent the “digested history” of humanity, written with the benefit of hindsight. People take action based on the imperfect information available to them at the time, and the news media captures a snapshot of the real–time public information environment (Stierholz, 2008). News contains far more than just factual details: an array of cultural and contextual influences strongly impact how events are framed for an outlet’s audience, offering a window into national consciousness (Gerbner and Marvanyi, 1977). A growing body of work has shown that measuring the “tone” of this real–time consciousness can accurately forecast many broad social behaviors, ranging from box office sales (Mishne and Glance, 2006) to the stock market itself (Bollen, et al., 2011).


Can the public tone of global news data forecast even broader behaviors, such as the stability of nations, the location of terrorist leaders, or even offer new insight on conflict and cooperation among countries, as accurately as it predicts movie sales or stock movements? This study makes use of a 30–year translated archive of news reports from nearly every country of the world, applying a range of computational content analysis approaches including tone mining, geocoding, and network analysis, to present “Culturomics 2.0.” The traditional Culturomics approach treats every word or phrase as a generic object with no associated meaning and measures only the change in the frequency of its usage over time. The Culturomics 2.0 approach introduced in this paper focuses on extending this model by imbuing the system with higher–level knowledge about each word, specifically focusing on “news tone” and geographic location, given their importance to the understanding of news coverage. Translating textual geographic references into mappable coordinates and quantifying the latent “tone” of news into computable numeric data permits an entirely new class of research questions to be explored via the news media not possible through the traditional frequency count approach.


This study will explore how the latent tone of a large digital news archive can be visualized to understand macro–level changes in global society in both time and space. Measuring the tone of news coverage about a single geography over time, a fundamentally new approach to conflict early warning is developed that “passively crowdsources” the global mood about each country in the world. This is found to offer highly accurate short–term forecasts of national stability. Focusing on the spatial dimension and moving from the country to the city level, the geographic framing of the news is found to offer significant insights into both nationalistic views of the world and the way in which cultures and “civilizations” are portrayed by the media. Finally, mapping the geographies most closely associated with Osama bin Laden by the news media prior to his capture is found to fairly accurately pinpoint his actual location. Global news media tone that is temporally and spatially aware is found to offer an intriguing new approach to modeling the behavior of global society itself.

No comments: