I must admit that data cleaning sometimes feels like the necessary data step before the fun and also much more value creating process: Analysis!
But in every coding project that a data scientist is involved in, the first step is always to get a clear understanding of the dataset by descriptive statistics and to clean the dataset.
These steps are typically done in a large coding scale, which typically is a big problem for the data scientist because it leaves less time for the much more interesting and also value creating process: AI analysis and data analysis. Furthermore, when the data scientist is working with ad-hoc data analytical tasks, data cleaning can be a huge problem due to short deadlines. Most articles I have read use coding instead of packages by combining descriptive statistics with data cleaning.
Therefore I here present the most essential data cleaning code for ad-hoc task in R done with R packages. I use two of the most elegant and efficient R packages for descriptive statistics and data cleaning:
Hmisc. This frees the data scientists time schedule and leaves much more time for the more value creating process: AI analysis and data analysis.
# Datamanagement packages library(skimr) library(Hmisc) # Load dataset data("mydata") # Fast data management & data cleaning # Descriptive statistics before data cleaning skim(mydata) # Data cleaning cleandata <- mydata[complete.cases(mydata),] cleandata <- unique(cleandata) View(cleandata) # Descriptive statistics after data cleaning skim(cleandata)
And there you have it – elegant and essential data cleaning and also Descriptive statistics including histograms – before and after cleaning of the dataset. Done with the most efficient coding!
Happy data cleaning!
- Efficient data management and SQL data selection in R
- Proteomics Data Analysis (2/3): Data Filtering and Missing Value Imputation
- Clean Your Data in Seconds with This R Function
- Hands-on Tutorial on Python Data Processing Library Pandas – Part 2
- Hands-on Tutorial on Python Data Processing Library Pandas – Part 1