May 2018

Building Business Operating System in R at Inspire

May 18, 2018 | Wei Lin

I joined Inspire as the first data scientist. I still remember that Blake told me that he had a team of 40 quantitative analysts at his former company but never hired a data scientist before. To him, hiring me as a data scientist is an adventurous decision, while to me, as ...
[Read more...]

Testing Entry with R Rmarkdown File

May 18, 2018 | R on Chi's Impe[r]fect Blog

Hello! World! Just figuring out how the blog post works with this random set of coffee data! Espresso Drinks Visualized with ggplot2 Pie Chart Pie chart can be created with using polar coordinate.
## Pie Chart
coffee_long  %>% ggplot() +
  geom_bar(aes(x=sqrt(total.amount)/2, y = amount, 
               fill=fct_rev(ingredient.f), width=sqrt(total.amount)), 
           stat="identity", position="fill") + 
  facet_wrap(~name2, ncol=4) +
  geom_text(aes(x=sqrt(total.amount), y=Inf, label=""), size=7) +
  theme_void(base_family="Roboto Condensed") +
  coord_polar(theta="y") +
  scale_fill_hue(name="Ingredient", l=80) +
  theme(legend.position="top")
Espresso Drinks Visualized with ggplot2 Bar Chart
## Bar Chart

coffee_long  %>% ggplot() +
  geom_bar(aes(x=3, y = amount, fill=fct_rev(ingredient.f), width=sqrt(total.amount)/2),
           stat="identity", position="stack") + 
  facet_wrap(~name2, ncol=4) +
  theme_void(base_family="Roboto Condensed") +
  scale_fill_hue(name="Ingredient", l=80) +
  theme(legend.position="top")
[Read more...]

Animating a Monte Carlo Simulation

May 18, 2018 | R on Thomas Roh

Introduction Oftentimes, I run into difficulty trying to explain some of the concepts of statistical sampling with audiences that either have very limited or no understanding of statistics. Given that the majority of communication of analysis has to be digested in a 1-2 hour meeting, data visualization typically is the ...
[Read more...]

My eRum 2018 biggest highlights

May 18, 2018 | Peter Laurinec

On the range of dates 14.-16. May 2018, the European R users meeting (eRum) was held in Budapest. I was there as an active participant since I had the presentation about time series data mining. The eRum 2018 was a very successful event and I want to thank organizers of this event ...
[Read more...]

What Makes a Song (More) Popular

May 18, 2018 |

Earlier this week, the Association for Psychological Science sent out a press release about a study examining what makes a song popular:Researchers Jonah Berger of the University of Pennsylvania and Grant Packard of Wilfrid Laurier University were interested in understanding the relationship between similarity and success. In a recent ...
[Read more...]

How To Plot With Dygraphs: Exercises

May 18, 2018 | Euthymios Kasvikis

INTRODUCTION The dygraphs package is an R interface to the dygraphs JavaScript charting library. It provides rich facilities for charting time-series data in R, including: 1. Automatically plots xts time-series objects (or any object convertible to xts.) 2. Highly configurable axis and series display (including optional second Y-axis.) 3. Rich interactive features, including ... [Read more...]

R Improvements for Bio7 2.8

May 18, 2018 | R – Bio7 Website

18.05.2018 The next release of Bio7 adds a lot of new R features and improvements. One minor change is that the default perspective after the startup of Bio7 now is the R perspective to emphazise the importance of R within this software. The R-Shell view has been simplified and the R ... [Read more...]

eRum (2018) Top Twenty

May 17, 2018 | R on datawookie

My Top 20 highlights about eRum (2018) in Budapest. In no particular order: Returning to my favourite European city after so many years. Discovering the cheap and efficient bus 100E, which shuttles back and forth between the airport and city. I have previously only made this trip by car. Partial support from ...
[Read more...]

NYC restaurants reviews and inspection scores

May 17, 2018 | Akshay Vaghani

  If you ever pass outside a restaurant in New York City, you’ll notice a prominently displayed letter grade. Since July 2010, the Health Department has required restaurants to post letter grades showing sanitary inspection results. An A grade attests to top marks for health and safety, so you can feel ...
[Read more...]

Maths in Sport Script

May 17, 2018 | Analysis of AFL

############################################################################
#BROWNLOW PREDICTION WITH FREE DATA!!!!

        ################
        ### 

##from fitzRoy figures
# ptm <- proc.time()
library(tidyverse)
df<-fitzRoy::get_afltables_stats(start_date = "1897-01-01", end_date = Sys.Date())
names(df)
# df<-afldata::afldata
team_stats<-df%>%
  dplyr::select(Date, First.name,Surname,Season, Round, Playing.for, Kicks:Goal.Assists)%>%
  group_by(Date, Season, Round, Playing.for)%>%
  summarise_if(is.numeric,funs(sum=c(sum(.))))

player_stats<-df%>%
  dplyr::select(Date, First.name,Surname,Season, Round, Playing.for, Kicks:Goal.Assists)

complete_df<-left_join(player_stats,team_stats, by=c("Date"="Date", "Season"="Season",  "Playing.for"="Playing.for"))

#but we also need margins as per honours stuff

dataset_scores<-fitzRoy::match_results
names(dataset_scores)
dataset_scores1<-dataset_scores%>%dplyr::select (Date, Round, Home.Team, Home.Points,Game)
dataset_scores2<-dplyr::select(dataset_scores, Date, Round, Away.Team, Away.Points,Game)

colnames(dataset_scores1)[3]<-"Team"
colnames(dataset_scores1)[4]<-"Points"
colnames(dataset_scores2)[3]<-"Team"
colnames(dataset_scores2)[4]<-"Points"

df5<-rbind(dataset_scores1,dataset_scores2)
dataset_margins<-df5%>%group_by(Game)%>%
  arrange(Game)%>%
  mutate(margin=c(-diff(Points),diff(Points)))
# View(dataset_margins)
dataset_margins$Date<-as.Date(dataset_margins$Date)
complete_df$Date<-as.Date(complete_df$Date)

complete_df<-left_join(complete_df,dataset_margins,by=c("Date"="Date",  "Playing.for"="Team"))


complete_df_ratio<-complete_df%>%
  mutate(kick.ratio=Kicks/Kicks_sum,
         Marks.ratio=Marks/Marks_sum,
         handball.ratio=Handballs/Handballs_sum,
         Goals.ratio=Goals/Goals_sum,
         behinds.ratio=Behinds/Behinds_sum,
         hitouts.ratio=Hit.Outs/Hit.Outs_sum,
         tackles.ratio=Tackles/Tackles_sum,
         rebounds.ratio=Rebounds/Rebounds_sum,
         inside50s.ratio=Inside.50s/Inside.50s_sum,
         clearances.ratio=Clearances/Clearances_sum,
         clangers.ratio=Clangers/Clangers_sum,
         freefors.ratio=Frees.For/Frees.For_sum,
         freesagainst.ratio=Frees.Against/Frees.Against_sum,
         Contested.Possessions.ratio=Contested.Possessions/Contested.Possessions_sum,
         Uncontested.Possessions.ratio=Uncontested.Possessions/Uncontested.Possessions_sum,
         contested.marks.ratio=Contested.Marks/Contested.Marks_sum,
         marksinside50.ratio=Marks.Inside.50/Marks.Inside.50_sum,
         one.percenters.ratio=One.Percenters/One.Percenters_sum,
         bounces.ratio=Bounces/Bounces_sum,
         goal.assists.ratio=Goal.Assists/Goal.Assists_sum,
         disposals.ratio=(Kicks+Handballs)/(Kicks_sum+Handballs_sum))
df<-complete_df_ratio%>%dplyr::select(Date, First.name, Surname, Season, Round.x, Playing.for,-Brownlow.Votes, Brownlow.Votes_sum,everything())
df<-df%>%dplyr::select(-Brownlow.Votes,everything())
df[is.na(df)] <- 0
in.sample  <- subset(df, Season %in% c(2013:2016))

in.sample$Brownlow.Votes <- factor(in.sample$Brownlow.Votes)

in.sample<-in.sample%>%filter(Round.x %in% c("1","2","3","4","5","6","7","8",
                                             "9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24"))


names(in.sample)

in.sample$Player<-paste(in.sample$First.name,in.sample$Surname)

in.sample<-in.sample%>%dplyr::select(Player, Date, Season, Round.x, Playing.for, margin:Brownlow.Votes)




library(ordinal)

fm1<-clm(Brownlow.Votes~ kick.ratio +  handball.ratio +  Marks.ratio +  
           disposals.ratio+  hitouts.ratio+
           freefors.ratio +  freesagainst.ratio +  tackles.ratio +  Goals.ratio +   behinds.ratio + Contested.Possessions.ratio+
           Uncontested.Possessions.ratio +  clangers.ratio +    contested.marks.ratio + marksinside50.ratio +
           clearances.ratio +   rebounds.ratio +    inside50s.ratio +   one.percenters.ratio +  bounces.ratio+
           goal.assists.ratio  +margin, 
         data = in.sample)

library(MASS)

fm2<- stepAIC(fm1, direction='backward',type=AIC)

                      ####################
                    ###Get the out.sample

### Example using data from footywire to show that in fitzRoy 
### fans have access to both popular websites

names(fitzRoy::player_stats)
df_2017<-fitzRoy::player_stats%>%
  filter(Season==2017)

team_stats_out<-df_2017%>%
  dplyr::select(Date, Player,Season, Round, Team, CP:T5)%>%
  group_by(Date,Season, Round, Team)%>%
  summarise_if(is.numeric,funs(sum=c(sum(.))))

player_stats_out<-df_2017%>%
  dplyr::select(Date, Player,Season, Round, Team, CP:T5)


complete_df_out<-left_join(player_stats_out,team_stats_out, by=c("Date"="Date", "Season"="Season",  "Team"="Team"))



dataset_scores<-fitzRoy::match_results
names(dataset_scores)
dataset_scores1<-dataset_scores%>%dplyr::select (Date, Round, Home.Team, Home.Points,Game)
dataset_scores2<-dplyr::select(dataset_scores, Date, Round, Away.Team, Away.Points,Game)

colnames(dataset_scores1)[3]<-"Team"
colnames(dataset_scores1)[4]<-"Points"
colnames(dataset_scores2)[3]<-"Team"
colnames(dataset_scores2)[4]<-"Points"


df5<-rbind(dataset_scores1,dataset_scores2)
dataset_margins<-df5%>%group_by(Game)%>%
  arrange(Game)%>%
  mutate(margin=c(-diff(Points),diff(Points)))
dataset_margins$Date<-as.Date(dataset_margins$Date)
complete_df_out$Date<-as.Date(complete_df_out$Date)

dataset_margins<-dataset_margins %>%mutate(Team = str_replace(Team, "Brisbane Lions", "Brisbane"))

dataset_margins<-dataset_margins %>%mutate(Team = str_replace(Team, "Footscray", "Western Bulldogs"))


complete_df_out<-left_join(complete_df_out,dataset_margins,by=c("Date"="Date",  "Team"="Team"))

names(complete_df_out)

####create the new ratios
complete_df_ratio_out<-complete_df_out%>%
  mutate(kick.ratio=K/K_sum,
         Marks.ratio=M/M_sum,
         handball.ratio=HB/HB_sum,
         Goals.ratio=G/G_sum,
         behinds.ratio=B/B_sum,
         hitouts.ratio=HO/HO_sum,
         tackles.ratio=T/T_sum,
         rebounds.ratio=R50/R50_sum,
         inside50s.ratio=I50/I50_sum,
         clearances.ratio=(CCL+SCL)/(CCL_sum+SCL_sum),
         clangers.ratio=CL/CL_sum,
         freefors.ratio=FF/FF_sum,
         freesagainst.ratio=FA/FA_sum,
         Contested.Possessions.ratio=CP/CP_sum,
         Uncontested.Possessions.ratio=UP/UP_sum,
         contested.marks.ratio=CM/CM_sum,
         marksinside50.ratio=MI5/MI5_sum,
         one.percenters.ratio=One.Percenters/One.Percenters_sum,
         bounces.ratio=BO/BO_sum,
         goal.assists.ratio=GA/GA_sum,
         disposals.ratio=D/D_sum)




conforming<-complete_df_ratio_out%>%
  dplyr::select(Player, Date, Season, Round.x, Team, margin, 
                kick.ratio:disposals.ratio)

conforming$Brownlow.Votes<-0
out.sample=conforming

newdata   <- out.sample[ , -ncol(out.sample)]


pre.dict    <- predict(fm2,newdata=newdata, type='prob')
pre.dict.m  <- data.frame(matrix(unlist(pre.dict), nrow= nrow(newdata)))
colnames(pre.dict.m) <- c("vote.0", "vote.1", "vote.2", "vote.3")

newdata.pred  <- cbind.data.frame(newdata, pre.dict.m)


#### Step 1: Get expected value on Votes
newdata.pred$expected.votes <- newdata.pred$vote.1 + 2*newdata.pred$vote.2 + 3*newdata.pred$vote.3

####Join back on matchID whoops!


get_match_ID<-fitzRoy::player_stats

xx<-get_match_ID%>%dplyr::select(Date, Player, Match_id)
newdata.pred<-left_join(newdata.pred, xx, by=c("Date"="Date",  "Player"="Player"))



newdata.pred<-filter(newdata.pred, Date<"2017-09-01")


sum1 <- aggregate(vote.1~Match_id, data = newdata.pred, FUN = sum ); names(sum1) <- c("Match_id", "sum.vote.1");
sum2 <- aggregate(vote.2~Match_id, data = newdata.pred, FUN = sum ); names(sum2) <- c("Match_id", "sum.vote.2");
sum3 <- aggregate(vote.3~Match_id, data = newdata.pred, FUN = sum ); names(sum3) <- c("Match_id", "sum.vote.3");

#### Step 3: Add sum of each vote by matchId to big table
newdata.pred <- merge(newdata.pred, sum1, by = "Match_id")
newdata.pred <- merge(newdata.pred, sum2, by = "Match_id")
newdata.pred <- merge(newdata.pred, sum3, by = "Match_id")

#### Step 4: Add std1/2/3
newdata.pred$std.1  <- (newdata.pred$sum.vote.1/newdata.pred$vote.1)^-1
newdata.pred$std.2  <- (newdata.pred$sum.vote.2/newdata.pred$vote.2)^-1
newdata.pred$std.3  <- (newdata.pred$sum.vote.3/newdata.pred$vote.3)^-1


#### Step 5: Expected standard game vote
newdata.pred$exp_std_game_vote <- newdata.pred$std.1 + 2*newdata.pred$std.2 + 3*newdata.pred$std.3  


#### Step 6: List of winners

newdata.pred$PlayerName<-paste(newdata.pred$Player," ",newdata.pred$Team)
winners.stdgame   <- aggregate(exp_std_game_vote~PlayerName, data = newdata.pred, FUN = sum );
winners.stdgame   <- winners.stdgame[order(-winners.stdgame$exp_std_game_vote), ]
winners.stdgame[1:10, ]

# proc.time() - ptm
[Read more...]

Visualisation of Squiggle Tipsters

May 17, 2018 | Analysis of AFL

Something I thought would be interesting is trying to visualise how the different tipsters on squiggle rate match-ups. A simple way to do this would be to look at squiggle margins by tipster and visualise it on a plot. To hopefully encourage you to give it a go at home ...
[Read more...]

makemeauseR

May 17, 2018 | Analysis of AFL

The best way to learn things in my opinion is through examples. What this part of the site aims to cover an introduction to statistics course one might take at university. But instead of boring examples going to use much more exciting examples relating... [Read more...]

Analyzing Customer Data from Square

May 17, 2018 | Steven M. Mortimer

The Square Data Model Authenticating Pulling Transaction Data Spend by Customer Group Issues with the APIsCannot Request Specific Fields Cannot Update Customer Groups Programmatically Customer ID Not On Transactions The Square Data Model Whether you own your own business or consult for a business using Square to capture payment data, ...
[Read more...]

Analyzing Customer Data from Square

May 17, 2018 | Blog-rss on stevenmortimer.com

The Square Data Model Authenticating Pulling Transaction Data Spend by Customer Group Issues with the APIsCannot Request Specific Fields Cannot Update Customer Groups Programmatically Customer ID Not On Transactions The Square Data Model Whether you own your own business or consult for a business using Square to capture payment data, ...
[Read more...]

Beautiful and Powerful Correlation Tables in R

May 17, 2018 | Dominique Makowski

Another correlation function?! A table A Plot A print Options Fun with p-hacking Credits Another correlation function?! Yes, the correlation function from the psycho package.
<span>devtools</span><span>::</span><span>install_github</span><span>(</span><span>"neuropsychology/psycho.R"</span><span>)</span><span>  </span><span># Install the newest version</span><span>

</span><span>library</span><span>(</span><span>psycho</span><span>)</span><span>
</span><span>library</span><span>(</span><span>tidyverse</span><span>)</span><span>

</span><span>cor</span><span> </span><span><-</span><span> </span><span>psycho</span><span>::</span><span>affective</span><span> </span><span>%>%</span><span> 
  </span><span>correlation</span><span>()</span><span>
</span>
This function automatically select numeric variables and run a correlation analysis. It returns a psychobject. A table We can then extract a formatted table ...
[Read more...]
1 5 6 7 8 9 16

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)