mockaRoo – making realistic test data in R

[This article was first published on R – It's a Locke, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When I’m building stuff in R like packages, models, etc. I find myself wishing for realistic looking test data without having to resort to getting data off my production server. To that end I’ve been on the hunt for a way of generating decent test data. A few months back I stumbled upon the neat system Mockaroo which provides a GUI to build some data that suits your needs.

Mockaroo is a really impressive service with a wide spread of different data types. They also have simple ways of adding things like within group differences to data so that you can mock realistic class differences. They use the freemium model so you can get a thousand rows per download, which is pretty sweet. The big BUT you can feel coming on is this – it’s a GUI! I don’t want to have spend time hand cranking a data extract.

Thankfully, they have a GUI for getting data too and it’s pretty simply to use so I’ve started making a package for it.

I’ve started the package on github and will be developing it over the next month or two. It’s up and working, but only in the most primitive way as I’d like to get some feedback from folks who might find this useful around how the interface for generating your desired data schema should work.

The really nice thing about this is that I should also be able to include a shiny gadget / Rstudio add-in so there can also be a GUI for producing mock data.^

There will be some inherent limitations as the API (currently) does not possess the facility to create scenarios and other Mockaroo concepts – you can only use existing ones – but this should be a nice utility package.

Mockaroo requires a JSON representation of the desired schema and aside of some common fields (i.e. name and type) there’s a lot of optional ones. I’m most comfortable with tables so I’m inclined to have some sort of tabular interface that converts to JSON but perhaps people want helper functions like mock_emails().

Do you have a need for mock data? What have you found to be a challenge in the past? What sort of interface will make the mockaRoo package an intuitive one for you?

PS I also made a very simple Docker container with the mockaroo npm module installed. So if you want to have a JavaScript playground, fill ya boots

^ I am aware of the irony in this part of the proposed package!

The post mockaRoo – making realistic test data in R appeared first on It's a Locke.

To leave a comment for the author, please follow the link and comment on their blog: R – It's a Locke.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)