Modelling memory and news trajectories

February 6, 2013
By

(This article was first published on Quantifying Memory, and kindly contributed to R-bloggers)

Modelling memory<a id="models"></a>

In the text below I present two models I've made to quantify and visualise the diverging trajectories of memory and news events, and conclude that linear regression may be used to test which model best describes the story. First, though, I contextualise this with an illustration from the Russian media landscape.

In recent years there has been a growing perception that the state sanctioned media is an unreliable and biased news source, and that genuine information may be read online in independent publications and blogs. Such reader flight constitutes a threat both to traditional media, and those hoping to control what information is in the public domain, so it is perhaps unsurprising that there has been an effort to discredit blogs as an information source.

One typical allegation is that blogs, often hosted abroad (see a trend here, anyone?) may be used by unscrupulous individuals to artificially promote a story
through a so called 'information dump' [информационный вброс]. Igor' Ashmanov in an article widely cited across the Runet, alleged that stories critical of the Orthodox Church tended to be artificially promoted online. These would be old stories, or as he put it: 'not natural events, that by their nature interest everyone, but rather “news”, artificially promoted [раскручено]. How might one identify such a story? By it being present in blogs, but absent in 'real' media. Ashmanov does not so much deny the authenticity of the 'information attacks' as deny their legitimacy, and concludes there is evidence of an 'artificial' anti-Church campaign online.

As part of this Ashmanov modelled the development of a 'natural' news story, where the number of articles is initially high, then gradually dies down unless new information emerges. In contrast to this is a pattern where information is pumped into online sources in an attempt to make a story go viral online:

A few days later Rossiiskaia Gazeta republished most of his arguments, along with images virtually identical to his, but this time the purpose was not to spare Orthodox blushes, but rather to discredit the internet as an information source. The argument promoted was that the Russian protest movement in general, and Navalny in particular, made use of the internet to promote non-stories. The article concludes: the revelations published in blogs are almost certainly information dumps. 'Be alert, because nearly always such 'dumps' are intended to manipulate public opinon. Do not let yourself be fooled. [Будьте бдительны, ведь практически всегда такие “вбросы” соЕдаются с целью манипуляции общественным мнением. Не дайте себя обмануть.] The argument becomes a tautological one: this story that you only see online because it is critical of the regime must be a fake, because it's not visible in 'reputable' sources. The irony, of course, is that much government-backed anti-protester propaganda took precisely the form alleged by Rossiiskaia Gazeta.

The elephant in the building is that all news stories are to a degree artificial, especially those that exhibit a cyclical pattern, and that this pattern is that of any 'memory event'. At various key moments certain events re-surface as historical analogies (see e.g. Chernobyl). The question at stake here is: who gets to select events that can legitimately be commemorated. The attack against online news-stories implies a denial or at least monopoly of memory by suggesting only 'real' news stories are worthy of public attention; if a story resurfaces it does so for sinister reasons. Denying the past representation, references or a place in discussion encourages a short-sightedness familiar from dystopian novels.

Modelling memory

See the example of Chernobyl. Notice how every year there is a spike in references, with a larger spike in 2006 during the 20 year anniversary of the nuclear disaster. In 2011 there is a more protracted spike when the 25 year anniversary coincided with an analogous crisis in Japan. This cyclical pattern is a typical memory event, taking the form of:

  • a constant
  • annual commemoration
  • larger commemorations (usually multiples of 5)
  • a gradual change over time

After adding weights for the anniversaries the spike in March 2011 emerges as anomalous, and is easily explained by Chernobyl being invoked not in relation to the historical event, but as a prism through which to interpret the present tragedy.

Modelling news

Much as suggested above by Ashmanov, 'normal' news stories experience rapid decay: there will normally be considerable press hype, then residual stories as the narrative drags on and loses its news value, until the point where the story disappears or is re-invigorated as new, more exciting evidence emerges (for this, see for instance Lance Armstrong). The details of the model are outlined at the end, but in brief, it takes the number of articles in the first month, and estimates the next month's figure. If the story re-emerges, the subsequent trajectory is adjusted, though the value causing the trajectory change is not 'predicted' by the model, only those that follow. For an illustration of this, see the trajectory for the Khodorkovsky' trials, a series of high-profile cases which re-emerged in the news throughout the 200s:

plot of chunk unnamed-chunk-1

As you can see, the news model does not do a good job predicting stories that keep going and going. Nonetheless, the graph illustrates how the model adjusts the trajectory.

Comparing the models:

We can assess the degree to which a story is purely news, or whether it is mobilised as a memory event by formalising the models and running a regression analysis (for formulas and code see end). This approach allows us to determine whether an event has gone from being a news story, to being somehow remembered.

Below I present three distinct scenarios:

  • Mainly memory: the signing of the December 1993 Belavezha Accords, which resulted in the dissolution of the USSR
  • Memory potential: the Beslan Hostage Crisis, a large news story at the time, which still is regularly invoked in media; increasingly it is taking on a cyclical pattern
  • Mainly news: the Kondopoga pogrom where violent clashes between Russian nationalists and Chechens took place in 2006.
Mainly memory: Belavezha

The regression equation shows that central press coverage is best explained by an anniversaries model. The model only describes about 15% of the variance (see the adjusted r-squared score), so there are numerous and diverse occasions in which Belavezha is invoked, but nonetheless, there is a significant decline in references over time, along with highly significant five-year increases of attention.

plot of chunk unnamed-chunk-2


Call:
lm(formula = fmla, data = data)

Residuals:
Min 1Q Median 3Q Max
-1.836 -0.945 -0.341 0.442 6.173

Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.53e+00 1.06e+00 3.33 0.0011 **
date -1.99e-04 7.97e-05 -2.49 0.0138 *
a1 7.61e-01 4.47e-01 1.70 0.0905 .
a2 2.71e+00 8.66e-01 3.13 0.0021 **
e_news NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.3 on 147 degrees of freedom
Multiple R-squared: 0.165, Adjusted R-squared: 0.148
F-statistic: 9.67 on 3 and 147 DF, p-value: 7.22e-06
Memory potential: the Beslan Hostage Crisis

The Beslan hostage crisis is an example of a news story which has continued to resonate, and the regression equation suggests that both the news model and the anniversaries model have explanatory potential. While references to Beslan will never reach the levels of 2004, there is statistical evidence that the event has become a memory marker.

plot of chunk unnamed-chunk-3


Call:
lm(formula = fmla, data = data)

Residuals:
Min 1Q Median 3Q Max
-21.21 -2.10 -0.02 1.87 40.74

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.63534 14.77815 2.01 0.04786 *
date -0.00209 0.00103 -2.02 0.04598 *
a1 11.74556 2.89963 4.05 0.00011 ***
a2 -3.91319 8.06376 -0.49 0.62863
e_news 0.94116 0.03925 23.98 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.56 on 92 degrees of freedom
Multiple R-squared: 0.912, Adjusted R-squared: 0.908
F-statistic: 237 on 4 and 92 DF, p-value: <2e-16
Mainly news: the Kondopoga pogrom

In contrast to Beslan, references to the pogrom in Kondopoga almost entirely followed a typical news trajectory, suggesting the event has not been consistently commemorated. Indeed, over time the event has periodically disappeared from the central press (e.g. 2009). The regression summary shows that the anniversaries variables add nothing to the news model.

plot of chunk unnamed-chunk-4


Call:
lm(formula = fmla, data = data)

Residuals:
Min 1Q Median 3Q Max
-3.435 -0.592 -0.193 0.339 5.466

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.776700 4.539844 1.93 0.058 .
date -0.000571 0.000313 -1.82 0.073 .
a1 -0.520545 0.657403 -0.79 0.431
a2 1.414971 1.502359 0.94 0.350
e_news 1.140368 0.069558 16.39 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.34 on 64 degrees of freedom
Multiple R-squared: 0.872, Adjusted R-squared: 0.864
F-statistic: 109 on 4 and 64 DF, p-value: <2e-16

Conclusions

In conclusion: memory and news models may help determine which events are likely to keep resonating, and which are likely to fade. The models, especially the anniversaries model, will never accurately predict variation, because as the event becomes a point of reference it will feature in increasingly diverse contexts; nonetheless there should by an increase in publications at anniversary time, as the marker is directly commemorated. This is clearly in evidence for older events, such as Belavezha, and is likely to be the case in the future for Beslan, though not for Kondopoga.

The dirty details:

anniversaries model:

  • annual anniversaries (a1): articles published within 12 days of the annual anniversary
  • five year anniversaries (a2): articles published during the calendar year. This controls for a general increase around the larger date, and will generally only be significant in conjunction with a1 which controls for a smaller interval.
  • date: the number of days since the original event. One might expect a regular increase or decrease in interest over time.

The anniversaries model is then entirely predictive. That is not the case for the news model:

news model

News stories will by their nature be most prominent when the events they discuss are new, but that is not to say one would expect them to immediately disappear. Studies by Fan(?) et. al. have attempted to measure the influence of a news-story, and have estimated this to have a half-life of one day, that is, Fan suggests a story will be most powerful on the first day, have half the resonance on the second, a quarter on the third, etc. Possible memory events, though, are large stories which may develop over weeks and months; here I examine a whole decade. For this reason data is aggregated into months, quarters, and years. This study assumes that unless there is a significant new development in a story, it will disappear as it becomes old news.

I have found the use of half-life useful only for the first few time-aggregates of media coverage; more accurate is the prediction that on average stories diminish at a rate to the power of minus one (n-1), that is, the original value divided by the number of days since the event: if on day 1 a story is mentioned 100 times, one would expect 50 mentions on day two, followed by 33, 25, 20, 17, 14, etc., on the subsequent days. This quite accurately captures the persistence of large stories

The news model is in part descriptive, that is, given an initial point, it will predict how the story develops. In other words the inclusion of the news model is soft proof of a news trajectory, but allows for strong evidence of the anniversaries model (the predictive model must improve on the descriptive one).

News has been coded as follows, for n (observed value), and p (predicted value)

  • p1 = n1
  • Subsequent points are calculated: p = n1/n[i]
  • If there is a development in the story, the trajectory is adjusted to take account of the new peak, that is, subsequent predicted values are calculated from the new peak. However, the original trajectory remains a minimum value below which predictions will not drop.
  • Peaks coinciding with anniversaries are not considered
  • Peak values are not included in the predicted value; in other words, the model is descriptive for the first point only; subsequent peaks are only factored in to alter the predicted subsequent trajectory.
  • A peak is defined as being larger than the two previous observations (n[i]>n[i-2]&n[i]>n[i-1])
  • Peaks of less than 10 articles per month are not considered.

To leave a comment for the author, please follow the link and comment on his blog: Quantifying Memory.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.