How We Monitored Politico's Memory Hole
During the three weeks between June 26 and July 17, Slate monitoredPoliticofor changes made to already-published articles. To do so, contributor Jeremy Singer-Vine wrote a series of computer scripts in the programming language Python. Here's a basic sketch of what the scripts did:
- Monitored for new articles. Every 15 minutes, the program scrapedPolitico to find new articles. If it found one, the program added the article's identification number to a database.
- Found updates. Another script, also running every 15 minutes, downloaded the current version of the database articles at regular intervals: hourly for the first 24 hours and then daily for the next six days.
- Found changes.Each time an article was redownloaded, the newest version was compared, using Python's difflib (short for "difference library"), a built-in function of the language that compares one version to another. If the program found any differences between the two copies, no matter how minor, it added the altered content to a text file.
The system isn't foolproof. Interesting changes were often buried in a heap of standard updates and trivial edits. It's possible that we missed some number of noteworthy changes in this noise. If an article was modified and then returned to its original state within the 15-minute or 24-hour windows between successive visits by our software, we would have missed the change. We only monitored articles that made Politico's home page, and not articles published solely on Politico's blogs or subpages, such as politico.com/lobbying or politico.com/click. Click here for a spreadsheet listing all the articles we monitored.