[Research notes] The Wukan’s protests: just-in-time identification of international media events

Severo Marta – Université Lille 3 / Laboratoire GERiiCO
Giraud Timothée – CNRS / UMS RIATE
Douay Nicolas – Université Paris-Diderot / UMR Géographie-Cités 


Recently, the emergence of a huge amount of digital traces concerning social phenomena has deeply impacted the research on such items. Social scientists are trying to manage these new data and to find out how they can intervene in the study of their research objects. One of the most interesting perspectives that digital traces can open is surely the chance to study a “just-in-time” phenomenon, as it unfolds. This paper aims at analysing how digital traces can affect the research on international media events and at questioning whether, thanks to this new kind of data, it is possible to identify them as they unfold.
In the limits of this short research note, we are meant to present the research questions and the preliminary results of a research project, called GEOMEDIA, which intends to build a sensor of international media events, based on the just-in-time analysis of RSS feeds of newspapers. This project is piloted by the International College for Territorial Sciences (www.gis-cist.fr) and financed by the French national research agency ANR – www.geomediatic.net).

The identification of international media events

In the last decades, several scholars have worked on the definition and identification of media events (Galtung and Ruge, 1965; McCombs and Shaw, 1972; Dayan and Katz, 1992). Among them, some investigated cross-national media coverage of different types of events (Herkenrath and Knoll, 2011; Koopmaas and Vliegenthart, 2011) and focused on mechanisms that may explain diffusion of media attention. By creating a GEOMEDIA research group that combines specialists in geography, media studies and computer science, the International College for Territorial Sciences hopes to develop fruitful interactions between disciplines necessary to study international events (Wolton, 2003) with a multi-dimensional viewpoint (Steinberger et al. 2005) and thanks to a “just-in-time” sensor based on media digital traces. Among the numerous research issues that this project copes with, this paper will discuss two of them:

  1. The data. Where to find and how to collect media data useful for a “just-in-time” analysis?
  2. The spatio-temporal analysis of data. How to study diffusion of event news and notably the interactions between the special and temporal dimension?

These two issues will be developed though out the presentation of a case study: the analysis of the Wukan’s protests. Wukan is a small village of 20,000 habitants in southern of China. On the 23rd of September 2011, several newspapers published the news that villagers rioted over land grab (Douay, 2011; Douay et al, 2012). In few months about one thousand articles have been published on worldwide foreign dailies about the Wukan’s protests.

 Media data: from commercial databases to RSS fields

As known (Earl et al, 2004), the use of newspaper data for studying events such as collective actions (as in our case) may raise several critics concerning the data collection and the selection and description bias related to articles’ content (McCarthy and McPhail, 1996). Yet, one of the reasons that motivated our study was the chance of building a coherent and complete corpus of articles.
Another important issue related to this type of data is that they can be retrieved only in commercial databases such as DowJones Factiva (used in this research), LexisNexis or Europresse. The use of these databases is not only expensive, but it also raises several technical (i.e. it is not possible to extract more than 100 items simultaneously) and methodological problems (i.e. the lack of transparency concerning keywords and the the inhomogeneous coverage of sources). This is why that the use we have done so far of this data is limited to counting articles by periods (days, weeks, months or years).
For these reasons, as a first step of our project, we are focusing our efforts on the search of other kind of media data suitable for building a “just-in-time” sensor of international events. We are testing the interest of using RSS feeds provided by the online version of Worldwide newspapers. RSS are supposed to have three great advantages: they are free; they may be archived and tagged without limits; they are generally provided as the news is ready (and consequently as the event unfolds) and they can therefore be suitable for a “just-in-time” analysis. We propose to build a database storing RSS feeds associated with articles published in one hundred newspapers in different parts of the World and to extract two types of information: flows among countries and international events. As a first step of this research, we are carrying out some case studies, as this one about Wukan, to testing the validity of RSS (compared to the entire articles) to identify events and the interest of combining media and geographical data for studying international events.

A preliminary case study: the Wukan’s protests

For studying the Wukan’s protests we used two types of corpus. We started by analysing a traditional corpus of newspaper articles extracted by Factiva. Then, we compared the results obtained by this corpus with a corpus constituted by RSS feeds. Since our project has just started, the RSS database at our disposal is still incomplete and it was not possible to perform the same type of quantitative analysis that we performed on Factiva. Yet, it was possible to carry out a qualitative analysis in order to highlight advantages and drawback of the use of RSS feeds as media sensors for the just-in-time identification of international events.
So, first of all, by using Factiva, we collected 952 articles published in worldwide newspapers including the search string “wukan” from August 2011 until May 2012. We focused on the geographical and chronological distribution of the articles that talked about the event. To treat this data, we developed some R scripts and packages that are available online (http://wukan.ums-riate.fr in French) and can be easily reapplied on other datasets.
As a second step, we built a corpus of RSS items from worldwide newspapers websites including the search string “wukan” in the title or in the description during the same period. To do it, we used the newspapers’ RSS feed archive that we are designing in the context of the ANR Geomedia project and is still in a alphe test phase. Currently, it archives 132 feeds of 41 countries. If this database has the advantage to provide the researcher with easy accessible, just-in-time and free data, it has the important limit that we are currently archiving only newspapers in French and English. Our dataset is therefore smaller compared to the one of Factiva that proposes media in all languages. We believe, however, that this data is sufficient to identify international media events. Our RSS corpus about the Wukan’s protest is constituted by 128 items.


Figure 1. Geographical distribution of articles about the Wukan’s protests published between September 2011 and May 2012. Source: Factiva

As regards the geographical distribution (fig. 1), articles concerning the event have been published in 41 countries. Most of them have been published in the United Kingdom (143), Hong Kong (141) and United States (78). As regards the Hong Kong newspapers, their interest in the events is obvious considering the geographical proximity to the Wukan village and their special attention in the events taking place in Mainland China. Yet it is also important to remind that several worthwhile English-language newspapers, both national (i.e. South China Moring Post) and international (i.e. Wall Street Journal – Asia edition), are located in Hong Kong.
As regards the geographical distribution emerged by the RSS corpus, most of items are in the same countries that we identified with the Factiva corpus, that is to say US with 48 items and UK with 38. Hong Kong items are less numerous because most of them are in Chinese and consequently are not included yet in our database. Moreover, even for Hong Kong English-language newspapers included in our database, such us the SCMP, it was not possible to identify all the occurrences because items about Wukan where included in feeds about national politics that are currently not stored in our database. This data makes it clear that in order to use RSS feeds as media sensors for studying the spatial dimension of international events (that is to say which countries are talking about an event or in generally which country is talking about which other country), it is necessary to have a RSS database including newspapers in all employed languages. Building such a database has two main obstacles: multi-language text analysis is still an emerging research field; and, more importantly, the current offer of newspapers’ RSS feeds doesn’t allow to cover all countries in a valid way. Yet, even considering these critical limits, further research may be done to verify whether a RSS newspaper database may be built to study spatial dimensions of international events (even if without a worldwide representativeness).

Figure 2. Chronological distribution of articles about the Wukan’s protests published between August 2011 and May 2012. Source: Factiva.

timeline_rssFigure 3. Chronological distribution of items about the Wukan’s protests published between August 2011 and May 2012. Source: Geomedia RSS database.

newyorktimesFigure 4. Chronological distribution of articles and items about the Wukan’s protests published between August 2011 and May 2012 by The New York Times.

As regards the chronological distribution, what is interesting is that the two corpora have a quite similar distribution (fig. 3 and fig. 4). We tested this result by comparing Factiva’s and RSS’s data of a same medium, that is to say The New York Times (fig. 5). Even if number of articles and items is not statistically relevant, the clear similarity in the distribution of the two sources encourage us to continue working on the validation of RSS as media sensors, especially for studying chronological dimension of international events.

Through the analysis of both corpora, we may highlight the same moments of the protest:

1. The beginning of the protest. In the first weeks (23rd of September 2011 – 4th of October 2011), the Wukan’s protests clearly didn’t set the media international agenda (only 28 articles), yet these facts drew the attention of some international newspapers that decided to cover the news.

2. Explosion of the protest. In the following weeks, newspapers interrupted the coverage of the Wukan events until the end of November when newspapers reported an escalation of violence through strikes, demonstrations and riots (Koopmans, 2004). A thousand police laid siege to the village on December 14th, preventing food and goods from entering the village. On December 21st, after several days of resistance, villagers won their “small victory” (Financial Times, 21/12/2012). It is in these days that we find the main media pick.

3. Elections. Another important pick in the press coverage corresponds to the period of the elections in February. On March 3rd the municipal election designated a seven-member village committee, including a village chief and his two deputies, who would control local finances and the sale and apportioning of collectively owned village land.

4. Punishment of involved officials. The last episode in the Wukan’s report happens on the last week of April, when the Chinese authorities have punished 20 officials and former village leaders of Wukan and expelled them from the Party.

Figure 5. Main events identified in the articles about the Wukan’s protests published between September 2011 and May 2012 (see the dynamic graph)


In this paper, using RSS feeds of newspapers, we analysed a case study, the protests of Wukan, and we compared results obtained with a traditional corpus extracted from Factiva in order to test the validity of RSS for studying international media events.  On the one hand, as regards the geographical distribution, Factiva data has clearly highlighted the impressive media coverage of this protest that transformed it from a local movement to a global media event. RSS data allowed identifying the countries that published more items on the event, but was not able to show the worldwide distribution. Further research is necessary to verify the possibility of building a RSS database to study the global spatial distribution of an international event. On the other hand, as regards the chronological distribution, results foster more optimism. RSS and Factiva data identified similar peaks. In both corpora, we could find and describe the same events. Even, decreasing at the level to a single newspaper, articles and items distribute in a similar way on time. Considering all that, we are encouraged to continue our research on RSS feeds as media sensors of international events.


Bandurski, D., 2012, “Chinese-language coverage of Wukan”, China Media Project, url: http://cmp.hku.hk/2011/12/19/17650/ (retrieved on 8th June 2012)

Centre on Housing Rights and Evictions (COHRE), 2008, One World, Whose Dream? Housing Rights Violations and the Beijing Olympic Games, Geneva, Switzerland.

Dayan D. & Katz E., 1992, Media Events: The Live Broadcasting of History, Cambridge, Harvard University Press.

Douay N., Severo M. & Giraud T., 2012, “La carte du sang de l’immobilier chinois, un cas de cyber-activisme”, L’information géographique, Vol. 76, n. 1, pp. 74-88.

Douay N., 2011,“Urban planning and cyber-citizenry in China How the 2.0 opposition organises itself”, China Perspectives, n˚2011/1, Hong Kong, Centre d’études français sur la Chine contemporaine, pp. 77-79.

Earl, J., Martin, A., McCarthy, J. D., et Soule, S. A., 2004, “The Use of Newspaper Data in the Study of Collective Action”, Annual Review of Sociology, vol. 30, n. 1, pp. 65-80.

Galtung, J. & Ruge, H.M., 1965, “The structure of foreign news”, Journal of Peace Research, Vol. 2, n. 1, pp. 64-91.

Koopmans , R. and Vliegenthart  R., “Media Attention as the Outcome of a Diffusion Process—A Theoretical Framework and Cross-National Evidence on Earthquake Coverage Ruud and Rens”, European Sociological Review, Vol. 27, n. 5, pp. 636-653.

McCarthy, J., & McPhail, C., 1996, “Images of Protest : Dimensions of Selection Bias in Media Coverage of Washington, 1982 and 1991”, American sociological review, Vol. 61, n. 3, pp. 478-499.

McCombs, M.E. & Shaw, D.L., 1972, “The Agenda-Seting Function of Mass Media”. The Public Opinion Qarterly, Vol. 36, n. 2, pp.176-187.

Rawnsley G.D., 2006, The media, internet and governance in China. Url : http://ics-www.leeds.ac.uk/papers/gdr/exhibits/5/Media_and_governance_in_China.pdf (retrieved on 8th June 2012)

Steinberger, R., Pouliquen, B. et Ignatet, C., 2005, “NewsExplorer : multilingual news analysis with cross-lingual linking”, Proceedings of the 27th International Conference Information Technology Interfaces.

Tong J, 2009, “Press self-censorship in China: a case study in the transformation of discourse”, Discourse & Society, Vol. 20, n. 5,  pp.593-612.

Tong J & Sparks C, 2009, “Investigative journalism in China today”. Journalism Studies, Vol. 10, n. 3, pp. 337-352.

Wolton, D., 2003, L’autre mondialisation, Paris, Flammarion.


2 thoughts on “[Research notes] The Wukan’s protests: just-in-time identification of international media events

  1. Pingback: Un exemple de comparaison entre données Factiva et données RSS | GEOMEDIATIC

  2. Pingback: Un exemple de comparaison entre données Factiva et données RSS | Corpus Géomédia

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s