A Return to Reliable R

September 5, 2012
By

(This article was first published on Data and Analysis with R, at Work, and kindly contributed to R-bloggers)

The saga with Statistica continues:

Statistica kept crashing on me while doing my data processing.  One of the big problems was a wonderful bug that occurred when some of my text data variables were coded (unsurprisingly) as text!  Under this condition, I would only be able to add a certain small number of extra variables when I needed to make them, and then after that, any extra variable that I tried to add would crash the program!

I was told that this is a known bug in Statistica and they’re hoping to fix it with an update coming around by the end of the year.  In the meanwhile, a workaround is to go into the “Variable Specs” for any variable coded as Text and recode it as “Double”, save the worksheet, then try again.  That seemed to get rid of the crashing, but then my biographical ID column that held all the original database IDs for the individuals in my dataset got messed up.  Numerous IDs, which were previously unique, became spontaneously reassigned to more than one person.  I can’t have that because once I’m done with the dataset, I have to return important parts of it back to the clients I work with so they can put certain new columns into their database.  So it was a bit of a catch 22.

My supervisor advised me to make a new, strictly numeric, ID column outside of Statistica, and import only the new ID column, and not the old one, back into the program.  I did that, and all seemed well until finally it crashed, yet again!  This time, I had no clue whatsoever why the crash happened.

That’s when I told myself “screw it, I’m wasting time in Statistica and am going to do the rest of this analysis in R”.  Man, is it ever nice to be back in R.  Ironically, things are much more simple and flow a lot faster for me.  The only problem is that I have a few projects coming up soon that really need a data analysis program that can handle humongous data sets.  For that reason, I’m probably going to have to see if reinstalling Statistica makes it more reliable to work with.  If not, I suppose I’ll have to move on to other options!


To leave a comment for the author, please follow the link and comment on his blog: Data and Analysis with R, at Work.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.