Blog Archives

Personal data collection and analysis

April 16, 2017
By
Personal data collection and analysis

Motivation behind this example I was diagnosed with sleep apnea last year, and have to use a continuous positive airway pressure (CPAP) machine to sleep well enough to feel alert during the day. The machine uploads data (via cellular connection) to a website that will give me results for the last two weeks. This data includes both usage (time...

Read more »

Inauguration speeches

January 27, 2017
By
Inauguration speeches

Acquiring inauguration speeches Though not about Greenville especially, it might be interesting to quantitatively analyze inauguration speeches. This analysis will be done using two paradigms: the tm package and the tidytext package. We will read the speeches in such a way that we use the tidytext package; later on we will use some tools from that package to make analyses...

Read more »

What do they talk about on Greenville Reddit?

December 20, 2016
By

Reddit is a discussion forum website with many discussion rooms (“subreddits”) on different topics. Greenville, SC has its own subreddit. It might be of interest to see what kind of discussions take place. We can do this in a systematic way using t...

Read more »

Greenville on Twitter

December 20, 2016
By
Greenville on Twitter

In this blogpost, we use R to use Twitter data to analyze topics of interest to Greenville, SC. We will describe obtaining, manipulating, and summarizing the data. Twitter is a “microblogging” service where users can, usually publicly, share links, pictures, or short comments (up to 140 characters) onto a timeline. The public timeline consists of all public tweets, but people...

Read more »

Plotting GeoJSON polygons on a map with R

December 15, 2016
By

In a previous post we plotted some points, retrieved from a public dataset in GeoJSON format, on top of a Google Map of the area surrounding Greenville, SC. In this post we plot some public data in GeoJSON format as well, but instead of particular points, we plot polygons. Polygons describe an area rather than a single point. As...

Read more »

I set up a new data analysis blog

December 11, 2016
By

Well, I tried to write a blog post using the RStudio Rmarkdown system, and utterly failed. Thus, I set up a system where I could write from RStudio. So I set up a Github pages blog at randomjohn.github.io. There I can easily write and publish posts involving data analysis.

Read more »

Plotting GeoJSON data on a map with R

December 10, 2016
By

GeoJSON is a standard text-based data format for encoding geographical information, which relies on the JSON (Javascript object notation) standard. There are a number of public datasets for Greenville, SC that use this format, and, the R programming language makes working with these data easy. Install the rgeojson library, which is part of the ROpenSci family of packages. In this...

Read more »

Windows 10 anniversary updates includes a whole Linux layer – this is good news for data scientists

September 24, 2016
By

If you are on Windows 10, no doubt you have heard that Microsoft included the bash shell in its 2016 Windows 10 anniversary update. What you may not know is that this is much, much more than just the bash shell. This is a whole Linux layer that enables you to use Linux tools, and does away with a...

Read more »

Which countries have Regrexit?

June 26, 2016
By
Which countries have Regrexit?

This doesn't have a lot to do with bio part of biostatistics, but is an interesting data analysis that I just started. In the wake of the Brexit vote, there is a petition for a redo. The data for the petition is here, in JSON format.Fortunately, in R, ...

Read more »

Simulating a Weibull conditional on time-to-event is greater than a given time

May 20, 2016
By
Simulating a Weibull conditional on time-to-event is greater than a given time

Recently, I had to simulate a time-to-event of subjects who have been on a study, are still ongoing at the time of a data cut, but who are still at risk of an event (e.g. progressive disease, cardiac event, death). This requires the simulation of a con...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)