Open Data R Meetup: exploring the Distribution of Traffic Accidents in Belgrade, 2015 in R

January 31, 2017
By

(This article was first published on The Exactness of Mind, and kindly contributed to R-bloggers)

The R code that accompanies this post is found on GitHub: you will find R, Rmd, and HTML files there that were used during the first Open Data R Meetup held in Belgrade, 31 January 2017, organized by Data Science Serbia in Startit Center, Savska 5, Belgrade Serbia. The Open Data initiative in Serbia is still young, our Open Data Portal is still under development, and guess what – we from Data Science Serbia will join the Working Group for Open Data of the Directorate for eGovernment to help open, standardize, structure, publish, and analyse the many forthcoming open data sets from our country – in R, of course 🙂 

The data set under exploration here encompasses data on traffic accidents in Belgrade for 2015 (December 2015 data are missing). The notebook focuses on an exploratory analysis of this test open data set that was provided at the Open Data Portal of the Republic of Serbia (the portal is currently under development). The data set was kindly provided to the Open Data Portal by the Republic of Serbia Ministry of Interior. Many more open data sets will be indexed and uploaded in the forthcoming weeks and months.

The Distribution of Traffic Accidents 2015, Belgrade. Part of the city core is shown on the map produced by ggmap, ggplot2 w. geom_density2d() and stat_density2d().

Besides focusing on the exploration and visualization of this test data set, we have demonstrated the basic usage of {weatherData} to fetch historical weather data to R, {wbstats} to access the rich World Data Bank time series, and {ISOcodes} packages in R.

Some exploratory modeling (Negative Binomial Regression with glm.nb() and Ordinal Logistic Regression with clm()from {ordinal}) is exercised merely to assess the prima facie effects of the most influential factors.

Predicted vs. Observed number of traffic accidents frequency per day, Belgrade 2015. Negative binomial regression for overdispersed frequency data with glm.nb().

Hopefully, this is just a begining of our exploratory analyses of open data in R; in the following months, Data Science Serbia will work hard to enable cross-country open data comparisons by elaborating on the forthcoming Serbian open data sets, and promote R as the lingua franca of the discipline. 

To leave a comment for the author, please follow the link and comment on their blog: The Exactness of Mind.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)