The Role of Small Data and Vacation Recap Example

July 5, 2017
By

(This article was first published on novyden, and kindly contributed to R-bloggers)

Wikipedia defines small data ‘small’ enough for human comprehension but then it goes further by qualifying data in a volume and format that makes it accessible, informative and actionable. I am not certain the latter is always true: smaller footprint doesn’t automatically qualify data as informative and actionable without more work. In my book small data usually scales to kilobytes and has just a handful of dimensions. But its main feature remains human comprehension which really means there is simple story behind it. 

In the grand scheme of big data things small data story is the last mile of data science analysis. It still requires interpretation (or representation) in the form of visualization or application.

Case in point could be Google spreadsheet I kept while on vacation in Italy with daily recordings of miles and steps walked. Later I added main attractions for each day. The result was my personal small data covering about 2 weeks of touring Italy with bases in Rome and later in Sicily (this sentence was the story):

Google sheet of activities while on vacation in Italy


As-is this spreadsheet is destined to Google archives contributing to ever growing collection of docs I created and happily forgot about. So I created this visualization that represents both most of data and the story:

Small data visualization


Before explaining how this visualization was created with R I ought to acknowledge that Google spreadsheets offer adding a chart or graph to a document. But its functionality appears rather limited without resorting to JavaScript API.

Using R googlesheets package to source Google docs makes them integral part of data sources available from within R code:
For details on how code above authenticates with Google servers and processes documents see very detailed vignettes

Now we can get back to small data and its simple story. Which means single visualization may include most if not all of it. In case of small data the goal is designing such chart without sacrificing clarity.

Core attributes days (Date) and miles walked (Distance; I chose miles over Steps for simplicity) suggest a line chart with timeline along x-axis and distance for y-axis. But there are 2 more factors to incorporate: Place indicating where the base city was each day and Label for major attractions.

Base city receives color identification with deep red for Rome and olive green for Syracuse. Major attractions text was attached to each point with smart justifications to fit inside the chart:


Had I kept more detailed log I would have ended up with more dimensions to use. For example, miles driven by car or train, time spent at leisure versus touring, number of cities and places visited, historical marker attributes and so on. But that moves us further away from small data domain as footprint and dimensions grow and story becomes less comprehensible. One of indicators of this is that it becomes harder to collect data manually. Instead, there are apps that would do it for me, for example, Life Cycle or Apple Health.

Ultimately any big data problem is reduced to one or more small data ones by aggregating, regressions, clustering or some other data science method. The path to big data insights is a journey from big to small data in search of simple story. So learning how to deal with small data is where it all both ends and begins.

To leave a comment for the author, please follow the link and comment on their blog: novyden.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)