**DataScience+**, and kindly contributed to R-bloggers)

Building off other industry-specific posts, I want to use healthcare data to demonstrate the use of R packages. The data can be downloaded here. To read the .CSV file in R you might read the post how to import data in R. Packages in R are stored in libraries and often are pre-installed, but reaching the next level of skill requires being able to know when to use new packages and what they contain. With that let’s get to our example.

## gsub

When working with vectors and strings, especially in cleaning up data, `gsub`

makes cleaning data much simpler. In my healthcare data, I wanted to convert dollar values to integers (ie. $21,000 to 21000), and I used `gsub`

as seen below.

Reading the data in R from CSV file. I am naming the dataset “hosp”.

hosp <- read.csv("Payment_and_value_of_care_-_Hospital.csv")

In the code below I will remove hospitals without estimates

hospay<-hosp[hosp$Payment.category !="Not Available" & hosp$Payment.category !="Number of Cases Too Small",]

Now its time to remove the dollar signs and commas in estimate values

hospay$Payment <- as.numeric(gsub("[$,]","",hospay$Payment)) hospay$Lower.estimate <- as.numeric(gsub("[$,]", "", hospay$Lower.estimate)) hospay$Higher.estimate <- as.numeric(gsub("[$,]", "", hospay$Lower.estimate)) head(hospay$Payment)[1] 13469 12863 12308 12222 21376 14740

## reshape2

In looking at the data, I wanted to focus on the Payment estimate. So I used the `melt()`

function that is part of `reshape2`

. Melt allows pivot-table style capabilities to restructure data without losing values.

library(reshape2) hosp_mel<-melt(data=hospay,id=c(2,5,9,11), measure=as.numeric(c(13)), value.name='Estimate') names(hosp_melt)[1] "Hospital.name" "State" "Payment.measure.name" "Payment.category" "variable" "Estimate"

## sqldf

With my data melted, I wanted to get the average estimate for heart attack patients by state. This is a classic SQL query, so bringing in `sqldf`

allows for that.

library(sqldf) names(hosp_melt) [3] <- "paymentmeasurename" hosp_est <- sqldf("select State, avg(Estimate) as Estimate from hosp_melt where paymentmeasurename = 'Payment for heart attack patients' group by State") head(hosp_est)State Estimate 1 AK 20987.60 2 AL 21850.32 3 AR 21758.00 4 AZ 22690.62 5 CA 22707.45 6 CO 21795.30

If you have any question feel free to leave a comment below.

**leave a comment**for the author, please follow the link and comment on their blog:

**DataScience+**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...