Collecting geocoded tweets with R and Java

[This article was first published on Random Thoughts on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Number of tweets in different languages posted
around Germany
There are many thing one can do with tweets (sentiment analysis, maps, …). This entry shows you how you can access the publicly available API using Java and how to analyse the data using R. For my purpose I am collecting geocoded  tweets around Germany. To collect the tweets I wrote a little java program (see below) which uses the twitter library twitter4J. This can be run on a machine in the background. Here is the java program which collects the tweets and stores them to disk. Except about 1 GB per week.

Collecting the geocoded tweets

Using this script I collected approx 1.3 Mio tweets in a weeks. The tweets are stored one line per tweet and one file per hour e.g. 2013-05-21T19_51_03.json. The content of the file would look like:

{"created_at":"Tue May 21 17:51:09 +0000 2013","id":336901993555709952,"id_str":"336901993555709952","text":"@OmegaBlue69 ... {"created_at":"Tue May 21 17:51:10 +0000 2013","id":336901996680450048,"id_str":"336901996680450048","text":"Sweet1 ....

Handling the json-file

The first task extracts the relevant information from these files. The following script reads the json files line by line and writes the coordinates, languages and for each tweet to a text-file e.g. “2013-05-21T19_51_03.coords.txt” using rjson.

Putting it all together

The next script picks up all text files with coordinate information, merges infrequent levels and does the color-coding. Finally it creates a simple barplot and stores the data in a data.frame and the colors in a vector cols
In another blog I describe how this data can be used to create a zoomable map.

To leave a comment for the author, please follow the link and comment on their blog: Random Thoughts on R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)