# Articles by George Pipis

### Linear Regression and Type I Error

December 7, 2020 |

Linear Regression Linear regression is a basic approach to modelling the linear relationship between a dependent variable y and one ...

December 6, 2020 |

The fair premium in lottery games can be defined as the expected pay-off. For example, consider the game where you ...

November 28, 2020 |

Back in 2001 when I entered university to study Statistics, our professor told us that "Statistics is a perfect way ...

### How to Test for Randomness

November 15, 2020 |

I have been contacted by many people asking me to predict the outcome of some events that in theory are ...

### How to Scrape Data from Euroleague

November 14, 2020 |

We will provide you an example of how you can get the results of the Euroleague games in a structured ...

### How to Build a Predictive Soccer Model

November 14, 2020 |

We will provide you an example of how you can start building your predictive sport model, specifically for soccer, but ...

### Skewness and Kurtosis in Statistics

November 9, 2020 |

Most commonly a distribution is described by its mean and variance which are the first and second moments respectively. Another ...

### Undersampling by Groups in R

November 6, 2020 |

When we are dealing with unbalanced classes in Machine Learning projects there are many approaches that you can follow. Just ...

### Excess Deaths during the 1st Wave of Covid-19

November 2, 2020 |

Abstract Our goal is to provide some summary statistics of deaths across countries during the 1st Wave of Covid-19 and ...

### Tidyverse Tips

November 1, 2020 |

I have found the following commands quite useful during the EDA part of any Data Science project. We will work ...

### Hack: The ‘[‘ in R lists

October 18, 2020 |

Assume that you have a list and you want to get the n-th element of each component or generally to ...

### Hack: The “count(case when … else … end)” in dplyr

October 18, 2020 |

When I run quires in SQL (or even HiveQL, Spark SQL and so on), it is quite common to use ...

### Hack: How to Install and Load Packages Dynamically

October 16, 2020 |

When we share an R script file with someone else, we assumed that they have already installed the required R ...

### ANOVA vs Multiple Comparisons

October 15, 2020 |

When we run an ANOVA, we analyze the differences among group means in a sample. In its simplest form, ANOVA ...

### How to get Data from Different Sources in R

October 6, 2020 |

The data that we want to get could be in different places and in different formats. We will provide some ...

### Hack: How to Convert all Character Variables to Factors

October 6, 2020 |

Let's say that we want to convert all Character Variables to Factors and we are dealing with a large data ...

### How to Convert Continuous variables into Categorical by Creating Bins

September 29, 2020 |

A very common task in data processing is the transformation of the numeric variables (continuous, discrete etc) to categorical by ...

### The fastest way to Read and Writes file in R

September 25, 2020 |

Compare Read and Write files time When we are dealing with large datasets, and we need to write many csv ...

### How to Connect R with SQL

September 24, 2020 |

Need to Connect R with SQL It is common for Data Analysts/Scientists to connect R with SQL. For that reason, ...