Last year around November 20th, evidently already bored at the start of Thanksgiving break, I started taking a look at PRAW, a Python wrapper that lets you easily scrape data from Reddit. It's exceptionally easy to use (as you'd expect from a Python library) and its applications are extensive for anyone interested in politics and technology.
Having just started tinkering with it I thew together a little script (available on my GitHub) that periodically tallied the number of posts on a few subreddits mentioning Trump (you could easily enough do this for any keyword or list of keywords on any subreddit). The Reddit scraping aspect of this was trivial but I, in my infinite wisdom, wanted to complicate things by throwing another API in the mix, gspread, which interfaces with Google Sheets.
Why? At the time my excuse was wanting to be able to look at the results of my script from any device, anywhere. When it worked it was pretty neat but funneling data from a script running on a Raspberry Pi through Google's API to a Google Sheet slowed things down and restricted my write time in a way that's probably not worth it for the cool factor of watching the graph update from my phone. The API also timed out after a day or so of running for no reason whatsoever . . . not great.
For anyone wanting to use a script like this I would recommend leaving gspread out and instead plotting within a Jupyter Notebook using something like pyplot since Google Sheets' plotting is extremely limited anyways. You may lose the cool factor but your script won't break daily.
Despite kneecapping myself by using Google Sheets, I did get about 24 hours of data for three subreddits: /r/Politics, /r/News, and /r/Worldnews. Here's the results:
Funnily, these graphs probably speak more to how these subreddits are moderated (/r/News doesn't much like political posts) than about the politics of the community but with further analysis and more data I'm sure something interesting could be dug up. This project was more of a proof of concept than any attempt at social science. I have some ideas for a more in-depth analysis done on Twitter that I'm working on currently but I hope to swing back to Reddit and dig deeper when I can. Stay tuned.
Comments