2813 search results for "GIS"

Data fishing: R and XML part 3

February 18, 2013
By
Data fishing: R and XML part 3

I’ve recently posted two blogs about gathering data from web pages using functions in R. Both examples showed how we can create our own custom functions to gather data about Minnesota lakes from the Lakefinder website. The first post was an example showing the use of R to create our own custom functions to get

Read more »

Predictors, responses and residuals: What really needs to be normally distributed?

February 18, 2013
By
Predictors, responses and residuals: What really needs to be normally distributed?

Introduction Many scientists are concerned about normality or non-normality of variables in statistical analyses. The following and similar sentiments are often expressed, published or taught: "If you want to do statistics, then everything needs to be normally distributed." "We normalized…Read more →

Read more »

Run production, one team at a time

February 17, 2013
By

In a previous post, I used R to process data from the Lahman database to calculate index values that compare a team's run production to the league average for that year.  For the purpose of that exercise, I started the sequence at 1947, but for what follows I re-ran the code with the time period...

Read more »

A look at strucchange and segmented

February 17, 2013
By
A look at strucchange and segmented

After last week's post it was commented that strucchange and segmented would be more suitable for my purpose. I had a look at both. Strucchange can find a jump in a time series, which was what I was looking for. In contrast segmented is more suitable f...

Read more »

Finding outliers in numerical data

Finding outliers in numerical data

One of the topics emphasized in Exploring Data in Engineering, the Sciences and Medicine is the damage outliers can do to traditional data characterizations.  Consequently, one of the procedures to be included in the ExploringData package is FindOutliers, described in this post.  Given a vector of numeric values, this procedure supports four different methods for identifying possible outliers.Before...

Read more »

Video: Data Mining with R

February 15, 2013
By

Yesterday's Introduction to R for Data Mining webinar was a record setter, with more than 2000 registrants and more than 700 attending the live session presented by Joe Rickert. If you missed it, I've embedded the video replay below, and Joe's slides (with links to many useful resources) are also available. During the webinar, Joe demoed several examples of...

Read more »

Clustering Loss Development Factors

February 15, 2013
By
Clustering Loss Development Factors

  Anytime I get a new hammer, I waste no time in trying to find something to bash with it. Prior to last year, I wouldn’t have known what a cluster was, other than the first half of a slang term used to describe a poor decision-making process. Now I’ve seen it in action a

Read more »

FillIn: a function for filling in missing data in one data frame with info from another

February 15, 2013
By

Sometimes I want to use R to fill in values that are missing in one data frame with values from another. For example, I have data from the World Bank on government deficits. However, there are some country-years with missing data. I gathered data from ...

Read more »

GPS Basemaps in R Using get_map

February 14, 2013
By
GPS Basemaps in R Using get_map

There are many different maps you can use for a background map for your gps or other latitude/longitude data (i.e. any time you're using geom_path, geom_segment, or geom_point.)get_mapHelpfully, there's just one function that will allow you to query Google Maps, OpenStreetMap, Stamen maps, or CloudMade maps: get_map in the ggmap package. You could also use either get_googlemap, get_openstreetmap, get_stamenmap, or get_cloudmademap, but...

Read more »

Version 1.0 of multilevelPSA Available on CRAN

February 14, 2013
By
Version 1.0 of multilevelPSA Available on CRAN

Version 1.0 of multilevelPSA has been released to CRAN. The multilevelPSA package provides functions to estimate and visualize propensity score models with multilevel, or clustered, data. The graphics are an extension of PSAgraphics package by Helmreich and Pruzek. The example below will investigate the differences between private and public school internationally using the Programme of International Student Assessment...

Read more »