# Simulating some synthetic data.

January 12, 2017
By

(This article was first published on Data R Value, and kindly contributed to R-bloggers)

In many cases we require some data with certain characteristics to develop a model, perform research, to test an algorithm or simply to practice.
Here I show an example of how to generate some synthetic data that can help you to generate your own.

We will need the ggplot2 library to display our data:

library(ggplot2)

Now we define the dimensions of the arrangement we need:

lrows <- 3035
lcols <- 11

in this case 3035 rows and 11 columns.

Now we define the array first containing zeros on all entries:

syn_data <- array(data = 0, dim = c(lrows, lcols))

Our data look like this:

Now let’s name each field (column):

colnames(syn_data) <- c(“ONE”,”NUMBER”,”R1″,”R2″,”R3″,
“R4”, “R5”, “R6″,”NINE”,”TEN”, “ELEVEN”
)

Now let’s assign values to some columns.

syn_data[,2] <- c(seq(lrows, 1))
syn_data[,1] <- c(runif(lrows, 0.0, 7.5))
syn_data[,9] <- c(runif(lrows, 10, 100))
syn_data[,10] <-c(runif(lrows, 5.0, 50))
syn_data[,11] <-c(runif(lrows, 30.0, 60.0))

You can see each line of the script and see what kind of value it assigned to each entry of which column:

Now for columns R1 to R6 I want to assign a random integer value between 1 and 56 for which we use the following -for- and -while- cycle:

for(i in 1:lrows){
j = 1
while(j <= 6){
syn_data[i,j+2] <- sample(1:56,1)
j = j + 1
}
}

Now our data looks like this:

So far we have our synthetic data. Now let’s do some treatments.

First we convert the array to a Data Frame type object:

syn_data <- as.data.frame(syn_data)

Now let us calculate the mean of each row from R1 to R6 and accumulate these means in a vector:

smeans = vector()

for(i in 1:lrows){
smeans[i] <- sum(syn_data[i , 3:8])/6
}

Finally we perform a visualization of the vector of means:

This is a very crude example and is actually inefficient but it is a start. It is up to you to improve it and adapt it to your needs.

You can download the script from this example in:

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...