(This article was first published on

**Antoine's data science views**, and kindly contributed to R-bloggers)After the quick overview, here is a quick walkthrough to some categorical analysis.

**Open the app: **Here

**1. Import the data:**

Here are some homemade data, done with the following R code:

`set.seed(3467)`

x=1:400+rnorm(400,0,1)

y1=x*2.5+40+rnorm(400,0,50)

y2=x*4.5+80+rnorm(400,0,50)

group=rep(c('G1','G2'),each=400)

x=c(x,x)

y=c(y1,y2)

DF=data.frame(x=x,y=y,group=group)

write.csv(DF,'DF.csv')

Click on import data, select your data and set rownames to first column. You should then get a quick overview of the data:

**2. Let’s take a closer looks to our data:***Go to Data->View Data:* and choose x, y and group as the variable to display. We can see that we have two groups (Group1, Group2). Lets take a closer look to x and y distribution

Now clic on View boxplot:

Here is the distribution of our datas, there doesn’t seems to be gap in any of these, let’s do some regression !

**3. Rename our variables:**

x and y aren’t very explicit variable name, let’s rename them as input and response.

*Go to Data->Data engineering*, and select y as the variable to modify, select rename as the operation to apply and Response as the name. Create the new var !

Do the same with x.

*Go to Data->View Data:*

**4.Run a first model**

Clic on the model tab, and run the following model: Response~input by selecting Response as the variable to predict and input as the predictor. Run the model!

**5.Model Summary**

Go to summary, as we can see, our model is an okay model and is significant. Hovewer, it seems like we’re missing some pattern. Let’s take a look at the plot:

Well, it looks like the two groups have a really different line and we should have ignore interaction.

**6.Interaction model**

Go back to the model tab, add Group in the input and set the interaction between group and the response.

You can check the summary again, our model performs far better, furthermore, looking at the graph:

That’s better, and our two groups are significantly different.

**7.Outliers and assumption**

Since we created the data, we shouldn’t have issues with the regression assumption.

Let’s go to diagnostic->normality. As expected our residuals ar normally distributed.

Let’s go to diagnostic->normality. As expected our residuals ar normally distributed.

Let’s go to diagnostic->outliers:

On the summary tab, for each observations, Cook’s D, internally studentised residuals and hat’s value are computed. Observations 28, 389, 407,436 and 789 are outliers, let’s delete them and rerun the model. (You can also take a look to the other outliers tab to have a visualisation of the different outlyingness measures).

**8.Save and compare model**

Go back to the model tab and save the model as model1.

Rerun a model without the interaction between Group and the input and save it as model 2.

You can run a Lack-of-Fit analysis on the Model Comparison tab:

Using the different criterion, it seems that the interaction model is better (lower AIC and BIC), conclusive F-test, which is what w would have expected given the way we created the data.

Thanks for following this quick walkthrough and I hope you’ll like the app !

Antoine

To

**leave a comment**for the author, please follow the link and comment on their blog:**Antoine's data science views**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...