Simulate data with R

July 23, 2014

(This article was first published on The Beginner Programmer, and kindly contributed to R-bloggers)

Last semester I was attending a boring class, even though the professor was really clever, he was always bouncing around the main theme and never got straight to the point. While thinking about everything but the class, I had an idea: when you are given a set of data, say X and Y, you can easily compute a linear regression model, e.g. the regression line, and find out information on the data. Now, you will also find information on the error that the linear model made in predicting the data. By finding out the distribution of the error you can somehow simulate data similar to the original, from the regression line, by simply adding a random error (whose distribution is known) to the predicted data.
Furthermore, we know from the regression line that the expected error is 0.

Here is the code to implement this idea in R. You can get the data to work on in the bottom of the page.

The result should look something like this: In blue the actual data and in red the simulated one.

Hope this was useful, if you know the name of this method, please leave a comment and let me know. Click here to get the data I used.

To leave a comment for the author, please follow the link and comment on their blog: The Beginner Programmer. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)