A web interface for regression analysis: Walkthrough

June 18, 2016
By

(This article was first published on Antoine's data science views, and kindly contributed to R-bloggers)

After the quick overview, here is a quick walkthrough to some categorical analysis.

Open the app: Here

1. Import the data:

Here are some homemade data, done with the following R code:

set.seed(3467)
x=1:400+rnorm(400,0,1)
y1=x*2.5+40+rnorm(400,0,50)
y2=x*4.5+80+rnorm(400,0,50)
group=rep(c('G1','G2'),each=400)
x=c(x,x)
y=c(y1,y2)
DF=data.frame(x=x,y=y,group=group)
write.csv(DF,'DF.csv')

Click on import data, select your data and set rownames to first column. You should then get a quick overview of the data:

2. Let’s take a closer looks to our data:

Go to Data->View Data: and choose x, y and group as the variable to display. We can see that we have two groups (Group1, Group2). Lets take a closer look to x and y distribution

Now clic on View boxplot:

Here is the distribution of our datas, there doesn’t seems to be gap in any of these, let’s do some regression !
3. Rename our variables:

x and y aren’t very explicit variable name, let’s rename them as input and response.
Go to Data->Data engineering, and select y as the variable to modify, select rename as the operation to apply and Response as the name. Create the new var !
Do the same with x.
Go to Data->View Data:


4.Run a first model

Clic on the model tab, and run the following model: Response~input by selecting Response as the variable to predict and input as the predictor. Run the model!
5.Model Summary

Go to summary, as we can see, our model is an okay model and is significant. Hovewer, it seems like we’re missing some pattern. Let’s take a look at the plot:
Well, it looks like the two groups have a really different line and we should have ignore interaction.
6.Interaction model

Go back to the model tab, add Group in the input and set the interaction between group and the response.
You can check the summary again, our model performs far better, furthermore, looking at the graph:
That’s better, and our two groups are significantly different.
7.Outliers and assumption

Since we created the data, we shouldn’t have issues with the regression assumption.
Let’s go to diagnostic->normality. As expected our residuals ar normally distributed.

Let’s go to diagnostic->outliers:
On the summary tab, for each observations, Cook’s D, internally studentised residuals and hat’s value are computed. Observations 28, 389, 407,436 and 789 are outliers, let’s delete them and rerun the model. (You can also take a look to the other outliers tab to have a visualisation of the different outlyingness measures).
8.Save and compare model
Go back to the model tab and save the model as model1.
Rerun a model without the interaction between Group and the input and save it as model 2.
You can run a Lack-of-Fit analysis on the Model Comparison tab:
Using the different criterion, it seems that the interaction model is better (lower AIC and BIC), conclusive F-test, which is what w would have expected given the way we created the data.
Thanks for following this quick walkthrough and I hope you’ll like the app !
Antoine

To leave a comment for the author, please follow the link and comment on their blog: Antoine's data science views.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)