Calculate all the CVs of all the QC Levels of all the Methods of all the Instruments at all the Sites all at once … with Sunquest LIS and dplyr

[This article was first published on The Lab-R-torian, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Background

As part of our lab accreditation requirements, we have to provide measurement uncertianty estimates for all tests at all hospital sites. As you might imagine, with thousands of testcodes in Sunquest LIS, getting all the coefficients of variation (CVs) represents a daunting task for the quality technologist to accomplish. As it turns out, by capturing the ssh session in a .txt file, you can use R’s dplyr package to do this all in few lines of code.

Getting the Raw Data

You need to get the raw data from Sunquest. You can capture the telnet (yes… older versions of Sunquest use telnet and pass patient information and user passwords unencrypted across the hospital network o_O) or the ssh session to a file using the Esker SmarTerm which Sunquest packages in their product and refers to as “roll-n-scroll”. People disparriage SmarTerm as an old “dos tool”–whereas Sunquest is hosted on AIX operating system. SmarTerm access to Sunquest is a gagillion times faster than the GUI and permits us to capture the raw QC data we need. To capture the session select from the dropdown menu as shown here:

SQ Screenshot1

If you are using Mac OS or Linux OS, you can also capture the ssh session by connecting from the terminal and using tee to dump the session to a file.

ssh [email protected] | tee captured_session.txt

Once you have connected, use the QC function and select output printer 0 (meaning the screen) and make these selections, changing the dates as appropriate:

SQ Screenshot1

If you make no selections at all for any of:

  • TEST:
  • WORKSHEET:
  • METHOD:
  • CONTROL:
  • SHIFT #:
  • TECH:
  • TESTS REQUESTED:

then you will extract everything, which is what you want and which will make for a very big .txt file. There will be a delay and then thousands of QC results will dump to the screen and to your file. When this is complete, end your SmarTerm or ssh or telnet (cringe) session. I saved my text dump as raw_SQ8.txt.

Getting it intro R and parsing it

Your data will come out as a fixed with file with no delimiters. It will also have a bunch of junk at the bottom and top of the file detailing your commands from the start and end of the session. These need to be discarded. I just used grep() to find all the lines with the appropriate date pattern. After reading it in, because I am lazy, I wrote it back out and read it in again with read.fwf()

library(tidyverse)
library(lubridate)
library(knitr)

# Note to my friend SK - yes... this is mostly in base-R... 

# create a connection
con <- file(file.path("raw_SQ8.txt"))
raw.qc.data <- readLines(con)
close(con)
#find good rows
good.data <- grep("[0-9]{2}(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[2][0][0-9]{6}",raw.qc.data)
raw.qc.data <- raw.qc.data[good.data]
#remove a screwball encoding character
raw.qc.data[1] <- substr(raw.qc.data[1],6,nchar(raw.qc.data[1]))
con <- file("temp.txt")
#rewrite the file with no garbage in it.
writeLines(raw.qc.data, con)
close(con)
raw.qc.data <- read.fwf("temp.txt",c(6,6,6,20,13,6,2,15,100))
file.remove("temp.txt")
names(raw.qc.data) <- c("test.code","instr.code","qc.name","qc.expire",
                        "date.performed","tech.code","shift",
                        "result","modifier")
raw.qc.data <- data.frame(lapply(raw.qc.data, trimws))
raw.qc.data$result <- as.numeric(as.character(raw.qc.data$result))
raw.qc.data$date.performed <- dmy_hm(raw.qc.data$date.performed)
raw.qc.data$tech.code <- as.numeric(raw.qc.data$tech.code) #anonymize tech codes
raw.qc.data <- arrange(raw.qc.data, instr.code, test.code)

Now that all the data munging is done, we can examine the data:

test.code instr.code qc.name qc.expire date.performed tech.code shift result modifier
BCL JBGAS RAD1 R0173 EXP MAR 2021 2019-11-15 09:17:00 68 2 122 NA
BCL JBGAS RAD1 R0173 EXP MAR 2021 2019-11-15 20:51:00 68 3 122 NA
BCL JBGAS RAD1 R0173 EXP MAR 2021 2019-11-15 21:47:00 68 3 122 NA
BCL JBGAS RAD1 R0173 EXP MAR 2021 2019-11-15 21:50:00 68 3 122 NA
BCL JBGAS RAD1 R0173 EXP MAR 2021 2019-11-17 07:10:00 15 1 122 NA
BCL JBGAS RAD1 R0173 EXP MAR 2021 2019-11-17 07:11:00 15 1 122 NA

And finally, we can make the dplyr magic happen and discard results for which the counts are too small, which I have chosen to be <20:

raw.qc.data %>% dplyr::filter(!is.na(result)) %>%
  group_by(instr.code,test.code,qc.name,qc.expire) %>%
  summarise(median = median(result),
            IQR = IQR(result),
            mean = mean(result),
            SD = sd(result),
            min = min(result),
            max = max(result),
            CV = round(sd(result, na.rm = TRUE)/mean(result, na.rm = TRUE)*100,2),
            count = n()) %>%
  filter(count ≥ 20) %>%
  arrange(instr.code, test.code, median) -> summary.table

Which gives us output like this:

head(summary.table)

instr.code test.code qc.name qc.expire median IQR mean SD min max CV count
JBGAS BCL RAD3 R0141 EXP SEP 2017 65.0 1.000 65.145454 0.6503043 63.0 66.0 1.00 55
JBGAS BCL RAD2 R0175 EXP MAR 2021 97.0 0.000 97.128205 0.3364820 97.0 98.0 0.35 78
JBGAS BCL RAD1 R0173 EXP MAR 2021 122.0 0.000 122.122807 0.5691527 121.0 124.0 0.47 57
JBGAS BGLUC RAD1 R0173 EXP MAR 2021 1.5 0.000 1.507017 0.0257713 1.5 1.6 1.71 57
JBGAS BGLUC RAD2 R0175 EXP MAR 2021 5.6 0.075 5.585897 0.0639081 5.4 5.7 1.14 78
JBGAS BGLUC RAD3 R0141 EXP SEP 2017 13.7 0.100 13.763636 0.1310409 13.4 14.1 0.95 55

This permits us to toss out results with low counts. But what about handling outliers? Well, we can calculate the z-scores of the raw data by joining the the mean and SD results back to the raw data.

raw.qc.data %>%
  left_join(select(summary.table,c(instr.code:qc.expire, mean, SD)),
             by = c("test.code","instr.code", "qc.name", "qc.expire")) %>%
  mutate(z.score = (result - mean)/SD) -> raw.qc.data

This will permit you to suppress results outside a certain z-score. So, let’s suppress all results with an undefined z-score and all results with a z-score >= 4:

raw.qc.data %>%
  drop_na(z.score) %>%
  filter(abs(z.score) < 4) -> raw.qc.data

Now , we can re-run the dplyr summary:

raw.qc.data %>% dplyr::filter(!is.na(result)) %>%
  group_by(instr.code,test.code,qc.name,qc.expire) %>%
  summarise(median = median(result),
            IQR = IQR(result),
            mean = mean(result),
            SD = sd(result),
            min = min(result),
            max = max(result),
            CV = round(sd(result, na.rm = TRUE)/mean(result, na.rm = TRUE)*100,2),
            count = n()) %>%
  filter(count ≥ 20) %>%
  arrange(instr.code, test.code, median) -> summary.table.no.outliers

And now we have a summary of every QC CV in our Sunquest system with outliers suppressed:

head(summary.table.no.outliers)

instr.code test.code qc.name qc.expire median IQR mean SD min max CV count
JBGAS BCL RAD3 R0141 EXP SEP 2017 65.0 1.000 65.145454 0.6503043 63.0 66.0 1.00 55
JBGAS BCL RAD2 R0175 EXP MAR 2021 97.0 0.000 97.128205 0.3364820 97.0 98.0 0.35 78
JBGAS BCL RAD1 R0173 EXP MAR 2021 122.0 0.000 122.122807 0.5691527 121.0 124.0 0.47 57
JBGAS BGLUC RAD1 R0173 EXP MAR 2021 1.5 0.000 1.507017 0.0257713 1.5 1.6 1.71 57
JBGAS BGLUC RAD2 R0175 EXP MAR 2021 5.6 0.075 5.585897 0.0639081 5.4 5.7 1.14 78
JBGAS BGLUC RAD3 R0141 EXP SEP 2017 13.7 0.100 13.763636 0.1310409 13.4 14.1 0.95 55

And there we have it:

SQ Screenshot1

Now I can write the output file

write_csv(summary.table.no.outliers, "QC_summary.csv")

With dplyr, if you direct your energies to the right place, you reap much. Similarly:

“But seek ye first the kingdom of God, and his righteousness; and all these things shall be added unto you.”

Matthew 6:33

To leave a comment for the author, please follow the link and comment on their blog: The Lab-R-torian.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)