May 2018

LexisNexisTools. My first `R` package

May 18, 2018 | Johannes B. Gruber on Johannes B. Gruber

My PhD supervisor once told me that everyone doing newspaper analysis starts by writing code to read in files from the ‘LexisNexis’ newspaper archive. However, while I do recommend this exercise, not everyone has the time. These are the first words of the introduction to my first R package, LexisNexisTools. ...

Building Business Operating System in R at Inspire

May 18, 2018 | Wei Lin

I joined Inspire as the first data scientist. I still remember that Blake told me that he had a team of 40 quantitative analysts at his former company but never hired a data scientist before. To him, hiring me as a data scientist is an adventurous decision, while to me, as ...

Testing Entry with R Rmarkdown File

May 18, 2018 | R on Chi's Impe[r]fect Blog

Hello! World! Just figuring out how the blog post works with this random set of coffee data! Espresso Drinks Visualized with ggplot2 Pie Chart Pie chart can be created with using polar coordinate.

## Pie Chart
coffee_long  %>% ggplot() +
  geom_bar(aes(x=sqrt(total.amount)/2, y = amount, 
               fill=fct_rev(ingredient.f), width=sqrt(total.amount)), 
           stat="identity", position="fill") + 
  facet_wrap(~name2, ncol=4) +
  geom_text(aes(x=sqrt(total.amount), y=Inf, label=""), size=7) +
  theme_void(base_family="Roboto Condensed") +
  coord_polar(theta="y") +
  scale_fill_hue(name="Ingredient", l=80) +
  theme(legend.position="top")

Espresso Drinks Visualized with ggplot2 Bar Chart

## Bar Chart

coffee_long  %>% ggplot() +
  geom_bar(aes(x=3, y = amount, fill=fct_rev(ingredient.f), width=sqrt(total.amount)/2),
           stat="identity", position="stack") + 
  facet_wrap(~name2, ncol=4) +
  theme_void(base_family="Roboto Condensed") +
  scale_fill_hue(name="Ingredient", l=80) +
  theme(legend.position="top")

Animating a Monte Carlo Simulation

May 18, 2018 | R on Thomas Roh

Introduction Oftentimes, I run into difficulty trying to explain some of the concepts of statistical sampling with audiences that either have very limited or no understanding of statistics. Given that the majority of communication of analysis has to be digested in a 1-2 hour meeting, data visualization typically is the ...

My eRum 2018 biggest highlights

May 18, 2018 | Peter Laurinec

On the range of dates 14.-16. May 2018, the European R users meeting (eRum) was held in Budapest. I was there as an active participant since I had the presentation about time series data mining. The eRum 2018 was a very successful event and I want to thank organizers of this event ...

What Makes a Song (More) Popular

May 18, 2018 |

Earlier this week, the Association for Psychological Science sent out a press release about a study examining what makes a song popular:Researchers Jonah Berger of the University of Pennsylvania and Grant Packard of Wilfrid Laurier University were interested in understanding the relationship between similarity and success. In a recent ...

How To Plot With Dygraphs: Exercises

May 18, 2018 | Euthymios Kasvikis

INTRODUCTION The dygraphs package is an R interface to the dygraphs JavaScript charting library. It provides rich facilities for charting time-series data in R, including: 1. Automatically plots xts time-series objects (or any object convertible to xts.) 2. Highly configurable axis and series display (including optional second Y-axis.) 3. Rich interactive features, including ... [Read more...]

‘LMX ot NOSJ!’ Interchanging Classic Data Formats With Single `blackmagic` Incantations

May 18, 2018 | hrbrmstr

The D.C. Universe magic hero Zatanna used spells (i.e. incantations) to battle foes and said spells were just sentences said backwards, hence the mixed up jumble in the title. But, now I’m regretting not naming the package zatanna and reversing the function names to help ensure they’...

R Improvements for Bio7 2.8

May 18, 2018 | R – Bio7 Website

18.05.2018 The next release of Bio7 adds a lot of new R features and improvements. One minor change is that the default perspective after the startup of Bio7 now is the R perspective to emphazise the importance of R within this software. The R-Shell view has been simplified and the R ... [Read more...]

Automated Feature Selection using bounceR

May 18, 2018 | Lukas Strömsdörfer

Automated Data Science From a very philosophical point of view, as humans evolve we tend to automate repetitive tasks in order to waste our time with more pleasant matters. The same holds true for the field of data science as a whole, as much as for many tasks at STATWORX. ...

eRum (2018) Top Twenty

May 17, 2018 | R on datawookie

My Top 20 highlights about eRum (2018) in Budapest. In no particular order: Returning to my favourite European city after so many years. Discovering the cheap and efficient bus 100E, which shuttles back and forth between the airport and city. I have previously only made this trip by car. Partial support from ...

NYC restaurants reviews and inspection scores

May 17, 2018 | Akshay Vaghani

If you ever pass outside a restaurant in New York City, you’ll notice a prominently displayed letter grade. Since July 2010, the Health Department has required restaurants to post letter grades showing sanitary inspection results. An A grade attests to top marks for health and safety, so you can feel ...

Maths in Sport Script

May 17, 2018 | Analysis of AFL

############################################################################
#BROWNLOW PREDICTION WITH FREE DATA!!!!

        ################
        ### 

##from fitzRoy figures
# ptm <- proc.time()
library(tidyverse)
df<-fitzRoy::get_afltables_stats(start_date = "1897-01-01", end_date = Sys.Date())
names(df)
# df<-afldata::afldata
team_stats<-df%>%
  dplyr::select(Date, First.name,Surname,Season, Round, Playing.for, Kicks:Goal.Assists)%>%
  group_by(Date, Season, Round, Playing.for)%>%
  summarise_if(is.numeric,funs(sum=c(sum(.))))

player_stats<-df%>%
  dplyr::select(Date, First.name,Surname,Season, Round, Playing.for, Kicks:Goal.Assists)

complete_df<-left_join(player_stats,team_stats, by=c("Date"="Date", "Season"="Season",  "Playing.for"="Playing.for"))

#but we also need margins as per honours stuff

dataset_scores<-fitzRoy::match_results
names(dataset_scores)
dataset_scores1<-dataset_scores%>%dplyr::select (Date, Round, Home.Team, Home.Points,Game)
dataset_scores2<-dplyr::select(dataset_scores, Date, Round, Away.Team, Away.Points,Game)

colnames(dataset_scores1)[3]<-"Team"
colnames(dataset_scores1)[4]<-"Points"
colnames(dataset_scores2)[3]<-"Team"
colnames(dataset_scores2)[4]<-"Points"

df5<-rbind(dataset_scores1,dataset_scores2)
dataset_margins<-df5%>%group_by(Game)%>%
  arrange(Game)%>%
  mutate(margin=c(-diff(Points),diff(Points)))
# View(dataset_margins)
dataset_margins$Date<-as.Date(dataset_margins$Date)
complete_df$Date<-as.Date(complete_df$Date)

complete_df<-left_join(complete_df,dataset_margins,by=c("Date"="Date",  "Playing.for"="Team"))


complete_df_ratio<-complete_df%>%
  mutate(kick.ratio=Kicks/Kicks_sum,
         Marks.ratio=Marks/Marks_sum,
         handball.ratio=Handballs/Handballs_sum,
         Goals.ratio=Goals/Goals_sum,
         behinds.ratio=Behinds/Behinds_sum,
         hitouts.ratio=Hit.Outs/Hit.Outs_sum,
         tackles.ratio=Tackles/Tackles_sum,
         rebounds.ratio=Rebounds/Rebounds_sum,
         inside50s.ratio=Inside.50s/Inside.50s_sum,
         clearances.ratio=Clearances/Clearances_sum,
         clangers.ratio=Clangers/Clangers_sum,
         freefors.ratio=Frees.For/Frees.For_sum,
         freesagainst.ratio=Frees.Against/Frees.Against_sum,
         Contested.Possessions.ratio=Contested.Possessions/Contested.Possessions_sum,
         Uncontested.Possessions.ratio=Uncontested.Possessions/Uncontested.Possessions_sum,
         contested.marks.ratio=Contested.Marks/Contested.Marks_sum,
         marksinside50.ratio=Marks.Inside.50/Marks.Inside.50_sum,
         one.percenters.ratio=One.Percenters/One.Percenters_sum,
         bounces.ratio=Bounces/Bounces_sum,
         goal.assists.ratio=Goal.Assists/Goal.Assists_sum,
         disposals.ratio=(Kicks+Handballs)/(Kicks_sum+Handballs_sum))
df<-complete_df_ratio%>%dplyr::select(Date, First.name, Surname, Season, Round.x, Playing.for,-Brownlow.Votes, Brownlow.Votes_sum,everything())
df<-df%>%dplyr::select(-Brownlow.Votes,everything())
df[is.na(df)] <- 0
in.sample  <- subset(df, Season %in% c(2013:2016))

in.sample$Brownlow.Votes <- factor(in.sample$Brownlow.Votes)

in.sample<-in.sample%>%filter(Round.x %in% c("1","2","3","4","5","6","7","8",
                                             "9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24"))


names(in.sample)

in.sample$Player<-paste(in.sample$First.name,in.sample$Surname)

in.sample<-in.sample%>%dplyr::select(Player, Date, Season, Round.x, Playing.for, margin:Brownlow.Votes)




library(ordinal)

fm1<-clm(Brownlow.Votes~ kick.ratio +  handball.ratio +  Marks.ratio +  
           disposals.ratio+  hitouts.ratio+
           freefors.ratio +  freesagainst.ratio +  tackles.ratio +  Goals.ratio +   behinds.ratio + Contested.Possessions.ratio+
           Uncontested.Possessions.ratio +  clangers.ratio +    contested.marks.ratio + marksinside50.ratio +
           clearances.ratio +   rebounds.ratio +    inside50s.ratio +   one.percenters.ratio +  bounces.ratio+
           goal.assists.ratio  +margin, 
         data = in.sample)

library(MASS)

fm2<- stepAIC(fm1, direction='backward',type=AIC)

                      ####################
                    ###Get the out.sample

### Example using data from footywire to show that in fitzRoy 
### fans have access to both popular websites

names(fitzRoy::player_stats)
df_2017<-fitzRoy::player_stats%>%
  filter(Season==2017)

team_stats_out<-df_2017%>%
  dplyr::select(Date, Player,Season, Round, Team, CP:T5)%>%
  group_by(Date,Season, Round, Team)%>%
  summarise_if(is.numeric,funs(sum=c(sum(.))))

player_stats_out<-df_2017%>%
  dplyr::select(Date, Player,Season, Round, Team, CP:T5)


complete_df_out<-left_join(player_stats_out,team_stats_out, by=c("Date"="Date", "Season"="Season",  "Team"="Team"))



dataset_scores<-fitzRoy::match_results
names(dataset_scores)
dataset_scores1<-dataset_scores%>%dplyr::select (Date, Round, Home.Team, Home.Points,Game)
dataset_scores2<-dplyr::select(dataset_scores, Date, Round, Away.Team, Away.Points,Game)

colnames(dataset_scores1)[3]<-"Team"
colnames(dataset_scores1)[4]<-"Points"
colnames(dataset_scores2)[3]<-"Team"
colnames(dataset_scores2)[4]<-"Points"


df5<-rbind(dataset_scores1,dataset_scores2)
dataset_margins<-df5%>%group_by(Game)%>%
  arrange(Game)%>%
  mutate(margin=c(-diff(Points),diff(Points)))
dataset_margins$Date<-as.Date(dataset_margins$Date)
complete_df_out$Date<-as.Date(complete_df_out$Date)

dataset_margins<-dataset_margins %>%mutate(Team = str_replace(Team, "Brisbane Lions", "Brisbane"))

dataset_margins<-dataset_margins %>%mutate(Team = str_replace(Team, "Footscray", "Western Bulldogs"))


complete_df_out<-left_join(complete_df_out,dataset_margins,by=c("Date"="Date",  "Team"="Team"))

names(complete_df_out)

####create the new ratios
complete_df_ratio_out<-complete_df_out%>%
  mutate(kick.ratio=K/K_sum,
         Marks.ratio=M/M_sum,
         handball.ratio=HB/HB_sum,
         Goals.ratio=G/G_sum,
         behinds.ratio=B/B_sum,
         hitouts.ratio=HO/HO_sum,
         tackles.ratio=T/T_sum,
         rebounds.ratio=R50/R50_sum,
         inside50s.ratio=I50/I50_sum,
         clearances.ratio=(CCL+SCL)/(CCL_sum+SCL_sum),
         clangers.ratio=CL/CL_sum,
         freefors.ratio=FF/FF_sum,
         freesagainst.ratio=FA/FA_sum,
         Contested.Possessions.ratio=CP/CP_sum,
         Uncontested.Possessions.ratio=UP/UP_sum,
         contested.marks.ratio=CM/CM_sum,
         marksinside50.ratio=MI5/MI5_sum,
         one.percenters.ratio=One.Percenters/One.Percenters_sum,
         bounces.ratio=BO/BO_sum,
         goal.assists.ratio=GA/GA_sum,
         disposals.ratio=D/D_sum)




conforming<-complete_df_ratio_out%>%
  dplyr::select(Player, Date, Season, Round.x, Team, margin, 
                kick.ratio:disposals.ratio)

conforming$Brownlow.Votes<-0
out.sample=conforming

newdata   <- out.sample[ , -ncol(out.sample)]


pre.dict    <- predict(fm2,newdata=newdata, type='prob')
pre.dict.m  <- data.frame(matrix(unlist(pre.dict), nrow= nrow(newdata)))
colnames(pre.dict.m) <- c("vote.0", "vote.1", "vote.2", "vote.3")

newdata.pred  <- cbind.data.frame(newdata, pre.dict.m)


#### Step 1: Get expected value on Votes
newdata.pred$expected.votes <- newdata.pred$vote.1 + 2*newdata.pred$vote.2 + 3*newdata.pred$vote.3

####Join back on matchID whoops!


get_match_ID<-fitzRoy::player_stats

xx<-get_match_ID%>%dplyr::select(Date, Player, Match_id)
newdata.pred<-left_join(newdata.pred, xx, by=c("Date"="Date",  "Player"="Player"))



newdata.pred<-filter(newdata.pred, Date<"2017-09-01")


sum1 <- aggregate(vote.1~Match_id, data = newdata.pred, FUN = sum ); names(sum1) <- c("Match_id", "sum.vote.1");
sum2 <- aggregate(vote.2~Match_id, data = newdata.pred, FUN = sum ); names(sum2) <- c("Match_id", "sum.vote.2");
sum3 <- aggregate(vote.3~Match_id, data = newdata.pred, FUN = sum ); names(sum3) <- c("Match_id", "sum.vote.3");

#### Step 3: Add sum of each vote by matchId to big table
newdata.pred <- merge(newdata.pred, sum1, by = "Match_id")
newdata.pred <- merge(newdata.pred, sum2, by = "Match_id")
newdata.pred <- merge(newdata.pred, sum3, by = "Match_id")

#### Step 4: Add std1/2/3
newdata.pred$std.1  <- (newdata.pred$sum.vote.1/newdata.pred$vote.1)^-1
newdata.pred$std.2  <- (newdata.pred$sum.vote.2/newdata.pred$vote.2)^-1
newdata.pred$std.3  <- (newdata.pred$sum.vote.3/newdata.pred$vote.3)^-1


#### Step 5: Expected standard game vote
newdata.pred$exp_std_game_vote <- newdata.pred$std.1 + 2*newdata.pred$std.2 + 3*newdata.pred$std.3  


#### Step 6: List of winners

newdata.pred$PlayerName<-paste(newdata.pred$Player," ",newdata.pred$Team)
winners.stdgame   <- aggregate(exp_std_game_vote~PlayerName, data = newdata.pred, FUN = sum );
winners.stdgame   <- winners.stdgame[order(-winners.stdgame$exp_std_game_vote), ]
winners.stdgame[1:10, ]

# proc.time() - ptm

Visualisation of Squiggle Tipsters

May 17, 2018 | Analysis of AFL

Something I thought would be interesting is trying to visualise how the different tipsters on squiggle rate match-ups. A simple way to do this would be to look at squiggle margins by tipster and visualise it on a plot. To hopefully encourage you to give it a go at home ...

makemeauseR

May 17, 2018 | Analysis of AFL

The best way to learn things in my opinion is through examples. What this part of the site aims to cover an introduction to statistics course one might take at university. But instead of boring examples going to use much more exciting examples relating... [Read more...]

Analyzing Customer Data from Square

May 17, 2018 | Steven M. Mortimer

The Square Data Model Authenticating Pulling Transaction Data Spend by Customer Group Issues with the APIsCannot Request Specific Fields Cannot Update Customer Groups Programmatically Customer ID Not On Transactions The Square Data Model Whether you own your own business or consult for a business using Square to capture payment data, ...

drake’s improved high-performance computing power

May 17, 2018 | rOpenSci - open tools for open science

The drake R package is not only a reproducible research solution, but also a serious high-performance computing engine. The Get Started page introduces drake, and this technical note draws from the guides on high-performance computing and timing. You can help! Some of these features are brand new, and others are ... [Read more...]

Understanding PCA using Stack Overflow data

May 17, 2018 | Rstats on Julia Silge

This year, I have given some talks about understanding principal component analysis using what I spend day in and day out with, Stack Overflow data. You can see a recording of one of these talks from rstudio::conf 2018. When I have given these talks, I’ve focused a lot on ... [Read more...]

Analyzing Customer Data from Square

May 17, 2018 | Blog-rss on stevenmortimer.com

The Square Data Model Authenticating Pulling Transaction Data Spend by Customer Group Issues with the APIsCannot Request Specific Fields Cannot Update Customer Groups Programmatically Customer ID Not On Transactions The Square Data Model Whether you own your own business or consult for a business using Square to capture payment data, ...

Beautiful and Powerful Correlation Tables in R

May 17, 2018 | Dominique Makowski

Another correlation function?! A table A Plot A print Options Fun with p-hacking Credits Another correlation function?! Yes, the correlation function from the psycho package.

<span>devtools</span><span>::</span><span>install_github</span><span>(</span><span>"neuropsychology/psycho.R"</span><span>)</span><span>  </span><span># Install the newest version</span><span>

</span><span>library</span><span>(</span><span>psycho</span><span>)</span><span>
</span><span>library</span><span>(</span><span>tidyverse</span><span>)</span><span>

</span><span>cor</span><span> </span><span><-</span><span> </span><span>psycho</span><span>::</span><span>affective</span><span> </span><span>%>%</span><span> 
  </span><span>correlation</span><span>()</span><span>
</span>

This function automatically select numeric variables and run a correlation analysis. It returns a psychobject. A table We can then extract a formatted table ...

« 1 … 5 6 7 8 9 … 16 »

Copyright © 2022 | MH Corporate basic by MH Themes