Correlations Between Texas High School Academic Competition Results and SAT/ACT Scores

Posted on July 20, 2018 by r on Tony ElHabr in R bloggers | 0 Comments

[This article was first published on r on Tony ElHabr, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

I wanted to do a follow-up on my series of posts about Texas high school University Interscholastic League (UIL) academic competitions to more closely evaluate the relationship between the school performance in those competitions with school-wide SAT) and ACT scores. For those who may not be familiar with these tests, these are the two most popular standardized tests used for college admission in the United States.

In my introduction to that series, I stated the following: School-wide … scores on state- and national-standardized tests (e.g. the SAT) certainly are the most common measure of academic strength, but I think rankings by academic competitions may be more indicative.

Essentially, I was implying that the academic UIL scores may not correspond well, or at all, with standardized test scores. However, I did not attempt to prove this hypothesis, which is what I set out to do here. While I’m at it, I’ll show the code and provide some commentary to explain my process.

Data Collection

While I already have collected and cleaned the UIL data that I’ll need by virtue of my work for my series of posts analyzing the UIL competitions, I did not retrieve data for standardized test scores. To my delight, the Texas Education Agency’s website publishes Texas high school SAT and ACT scores for the years 2011 through 2015. The task of scraping from this source is a perfect use-case for the super-handy {xml2} and {rvest} packages, as well the well-known awesome {stringr} and {purrr} packages in the {tidyverse}.

library("tidyverse")
library("rlang")
library("teplot") # Personal package.

urls_tea <-
  "https://tea.texas.gov/acctres/sat_act_index.html" %>%
  xml2::read_html() %>%
  rvest::html_nodes(xpath = "//tr //td //a") %>%
  rvest::html_attr("href") %>% 
  str_subset("^\\/acctres\\/[:alpha:]{3}_[Cc]ampus_[Dd]ata")
urls_tea

create_path_tea <-
  function(url_suffix, dir = "data-raw", ext = "csv") {
    if(!dir.exists(dir)) {
       dir.create(dir)
    }
    url_suffix %>%
      str_remove_all("acctres|\\/") %>% 
      paste0(".", ext) %>% 
      file.path(dir, .)
  }

# NOTE(s):
# + `urls_tea_dl` is actually the same as `url_tea` because `purrr::walk()` returns its first argument.
# + `mode = "wb"` is important! Otherwise, the downloaded files have empty lines every other line
# (due to the way that CR and LFs are handled.
urls_tea_dl <-
  urls_tea %>%
  walk(
    ~download.file(
      url = paste0("https://tea.texas.gov/", .x),
      destfile = create_path_tea(url_suffix = .x),
      mode = "wb"
    )
  )

Data Cleaning

Next, I bind the data from all of the downloaded files together and do some cleaning. I put these actions in function(s) because I plan on re-using them in future posts where I explore this data set in other ways.

One relatively significant choice that I make here is to only include the data for the school-wide level (via the "All Students" filtering criteria), although data for different demographics within each school is provided. The other data set that I am evaluating—the academic UIL data— does not have demographci-specific information, so I want to treat the two set as “equally” as possible.

Additionally, in order to better understand the resulting data set, the reader should be made aware of some of the details of the tests. The SAT has math, reading, and writing sections, each having minimum and maximum scores of 200 and 800, meaning that the total can range from 600 to 2400. The ACT has math, reading, english, and science sections, each having a minimum and maximum score of 1 and 36, combined for a single compos score also ranging from 1 to 36. To eliminate duplicate columns representing the same underlying “thing”. I don’t distinguish the math and reading section scores for each test in separate columns, I rename the ACT’s compos score to total, following the convention used for the SAT’s cumulative score. The other sections—writing for the SAT and english and science for the ACT— are not really analogous to sections in the other test, so they are filled with NAs appropriately.

Finally, for the interEsted reader, there are some details regarding the code implementation that I document in comments (both for explaining actions for myself and for the reader).

import_tea_data <-
  function(path, rgx_grp) {
    res <-
      path %>%
      readr::read_csv() %>%
      rename_all(funs(tolower))
    
    if(!is.null(rgx_grp)) {
      res <-
        res %>%
        filter(group %>% str_detect(rgx_grp))
    }
    res <-
      res %>%
      select(
        matches(
          "^group$|name$|math|reading|writing|total|english|science|compos"
        )
      )
    res
  }

import_tea_data_cleanly <-
  function(urls, rgx_grp, ...) {

    res <-
      urls %>%
      create_path_tea(...) %>%
      tibble(path = .) %>%
      mutate(
        test = stringr::str_extract(path, "([Ss][Aa][Tt])|([Aa][Cc][Tt])") %>% toupper(),
        year = stringr::str_extract(path, "[:digit:]+") %>% as.integer()
      ) %>%
      mutate(contents = purrr::map(path, ~import_tea_data(.x, rgx_grp = rgx_grp))) %>%
      unnest() %>%
      # NOTE: No longer need this columns(s) any more.
      select(-path) %>%
      mutate_at(vars(total), funs(ifelse(test == "ACT", compos, .))) %>% 
      # NOTE: No longer need this column(s) any more.
      select(-compos) %>% 
      # NOTE: Rearranging score columns in a more logical fashion.
      select(-total, everything(), total) %>% 
      # NOTE: Renaming "important" columns.
      rename(school = campname,
             district = distname,
             county = cntyname,
             city = regnname) %>%
      mutate_if(is.character, funs(str_replace_all(., "=|\"", ""))) %>%
      mutate_at(vars(school, district, county, city), funs(toupper)) %>% 
      # NOTE: Some county names are truncated and end with COUN or COUNT.
      # (The max seems to be 18 characters).
      # Fortunately, ther are no county names with COUN in their names, so the following
      # regular expression is sufficient.
      mutate_at(vars(county), funs(str_remove_all(., "\\s+COUN.*$"))) %>%
      # NOTE: Remove all HS/H S at the end of school names, as well as ampersands.
      # This seems to improve join percentages with other data sets.
      mutate_at(vars(school), funs(str_remove_all(., "([H]\\s*[S]$)|(\\s+\\&)") %>% str_trim())) %>%
      # NOTE: This is (try to) to resolve duplicates in raw data.
      # group_by_at(vars(matches("test|year|school|district|county|city"))) %>% 
      # summarise_all(funs(max(., na.rm = TRUE))) %>% 
      # ungroup() %>% 
      arrange(test, year, school)
    res
  }

schools_tea <-
  urls_tea %>%
  import_tea_data_cleanly(rgx_grp = "All Students")
schools_tea

test	year	school	district	county	city	math	reading	writing	english	science	total
ACT	2011	A C JONES	BEEVILLE ISD	BEE	CORPUS CHRISTI	19	18	NA	17	19	18
ACT	2011	A J MOORE ACAD	WACO ISD	MCLENNAN	WACO	19	18	NA	16	18	18
ACT	2011	A M CONS	COLLEGE STATION ISD	BRAZOS	HUNTSVILLE	26	24	NA	23	24	24
ACT	2011	A MACEO SMITH HIGH SCHOOL	DALLAS ISD	DALLAS	RICHARDSON	16	14	NA	13	15	14
ACT	2011	ABBOTT SCHOOL	ABBOTT ISD	HILL	WACO	20	20	NA	19	21	20
ACT	2011	ABERNATHY	ABERNATHY ISD	HALE	LUBBOCK	22	20	NA	19	21	21
ACT	2011	ABILENE	ABILENE ISD	TAYLOR	ABILENE	21	21	NA	20	21	21
ACT	2011	ACADEMY	ACADEMY ISD	BELL	WACO	24	23	NA	21	24	23
ACT	2011	ACADEMY HIGH SCHOOL	HAYS CISD	HAYS	AUSTIN	NA	NA	NA	NA	NA	NA
ACT	2011	ACADEMY OF CAREERS AND TECHNOLOGIE	ACADEMY OF CAREERS AND TECHNOLOGIE	BEXAR	SAN ANTONIO	15	14	NA	12	14	14
ACT	2011	ACADEMY OF CREATIVE ED	NORTH EAST ISD	BEXAR	SAN ANTONIO	NA	NA	NA	NA	NA	NA
ACT	2011	ADRIAN SCHOOL	ADRIAN ISD	OLDHAM	AMARILLO	19	18	NA	20	19	19
ACT	2011	ADVANTAGE ACADEMY	ADVANTAGE ACADEMY	DALLAS	RICHARDSON	18	20	NA	19	16	18
ACT	2011	AGUA DULCE	AGUA DULCE ISD	NUECES	CORPUS CHRISTI	21	19	NA	18	20	19
ACT	2011	AIM CENTER	VIDOR ISD	ORANGE	BEAUMONT	NA	NA	NA	NA	NA	NA
ACT	2011	AKINS	AUSTIN ISD	TRAVIS	AUSTIN	19	17	NA	16	17	17
ACT	2011	ALAMO HEIGHTS	ALAMO HEIGHTS ISD	BEXAR	SAN ANTONIO	25	24	NA	24	24	24
ACT	2011	ALBA-GOLDEN	ALBA-GOLDEN ISD	WOOD	KILGORE	20	19	NA	18	20	19
ACT	2011	ALBANY JR-SR	ALBANY ISD	SHACKELFORD	ABILENE	24	22	NA	21	22	22
ACT	2011	ALDINE	ALDINE ISD	HARRIS	HOUSTON	19	17	NA	16	18	18
¹ # of total rows: 15,073

EDA: Year-to-Year Correlations

First, before evaluating the primary concern at hand—the relationship between the academic UIL scores and the SAT/ACT scores (available in the schools_tea data created above)—I want to verify that there is some non-trivial relationship among the scores for a given school on a given test across years. (I would be surprised if this were not shown to be true.)

schools_tea_cors_byyear <-
  schools_tea %>%
  distinct(test, year, school, .keep_all = TRUE) %>%
  filter(!is.na(total)) %>%
  unite(test_school, test, school) %>%
  widyr::pairwise_cor(
    feature = test_school,
    item = year,
    value = total
  ) %>% 
  rename(year1 = item1, year2 = item2, cor = correlation)
schools_tea_cors_byyear %>% 
  filter(year1 <= year2)

year1	year2	cor
2011	2012	0.80
2011	2013	0.76
2012	2013	0.86
2011	2014	0.69
2012	2014	0.78
2013	2014	0.83
2011	2015	0.64
2012	2015	0.74
2013	2015	0.78
2014	2015	0.86

![](viz_schools_tea_cors_byyear_show-1.png) As expected, there are some strong correlations among the years for school-wide scores on these tests. Ok, now let’s bring in the “cleaned” school data (`schools_uil`) that I collected and cleaned in my UIL analysis. I’ll subset the data to include only the same years found in `schools_tea`—2011 through 2015.

school	city	complvl_num	score	year	conf	complvl	comp	advanced	n_state	n_bycomp	prnk	n_defeat	w
HASKELL	HASKELL	13	616	2011	1	District	Calculator Applications	1	0	8	1.00	7	TRUE
POOLVILLE	POOLVILLE	13	609	2011	1	District	Calculator Applications	1	0	8	0.86	6	FALSE
LINDSAY	LINDSAY	17	553	2011	1	District	Calculator Applications	1	0	7	1.00	6	TRUE
PLAINS	PLAINS	3	537	2011	1	District	Calculator Applications	1	0	10	1.00	9	TRUE
SAN ISIDRO	SAN ISIDRO	32	534	2011	1	District	Calculator Applications	1	0	4	1.00	3	TRUE
CANADIAN	CANADIAN	7	527	2011	1	District	Calculator Applications	1	0	7	1.00	6	TRUE
GARDEN CITY	GARDEN CITY	10	518	2011	1	District	Calculator Applications	1	0	8	1.00	7	TRUE
WATER VALLEY	WATER VALLEY	10	478	2011	1	District	Calculator Applications	0	0	8	0.86	6	FALSE
GRUVER	GRUVER	7	464	2011	1	District	Calculator Applications	0	0	7	0.83	5	FALSE
YANTIS	YANTIS	19	451	2011	1	District	Calculator Applications	1	0	10	1.00	9	TRUE
SHINER	SHINER	27	450	2011	1	District	Calculator Applications	1	0	9	1.00	8	TRUE
WEST TEXAS	STINNETT	7	443	2011	1	District	Calculator Applications	0	0	7	0.67	4	FALSE
HONEY GROVE	HONEY GROVE	17	440	2011	1	District	Calculator Applications	1	0	7	0.83	5	FALSE
LATEXO	LATEXO	23	439	2011	1	District	Calculator Applications	1	0	10	1.00	9	TRUE
MUENSTER	MUENSTER	17	436	2011	1	District	Calculator Applications	0	0	7	0.67	4	FALSE
VAN HORN	VAN HORN	1	436	2011	1	District	Calculator Applications	1	0	7	1.00	6	TRUE
SLOCUM	ELKHART	23	415	2011	1	District	Calculator Applications	0	0	10	0.89	8	FALSE
ERA	ERA	17	415	2011	1	District	Calculator Applications	0	0	7	0.50	3	FALSE
GOLDTHWAITE	GOLDTHWAITE	15	413	2011	1	District	Calculator Applications	1	0	7	1.00	6	TRUE
NEWCASTLE	NEWCASTLE	12	408	2011	1	District	Calculator Applications	1	0	10	1.00	9	TRUE
¹ # of total rows: 27,359

Now let’s try to evaluate whether or not year-to-year correlations also exist with this data set.

Importantly, some choice about how to quantify performance needs to be made. As I discussed in my long-form series of posts exploring the UIL academic data, the evaluation of performance is somewhat subjective. Should we use number of times a school advanced to the next level of competition in a given year? (Note that there are three competition levels—District, Region, and State.) What about the number the number of other schools it “defeated” in head-to-head competitions? In that separate analysis, I made the choice to use the percentile rank (prnk) of the school’s placings across all competition levels for a given competition type (comp). I believe this measure bests represent a school’s quality of performance (where a higher value indicates better performance). As I stated there when explaining my choice to use percent rank for identifying “dominant” individual“, ”I choose to use percent rank—which is a always a value between 0 and 1—because it inherently accounts for the wide range of number of competitors across all competitions. (For this context, a percent rank of 1 corresponds to the highest score in a given competition, and, conversely, a value of 0 corresponds to the lowest score.)”

Aside from this decision regarding performance evaluation in academic UIL competitions, note that I treat the competition type (comp) in schools_uil as analogous to the test variable indicating SAT or ACT score in the schools_tea data set. For those who have not read through my UIL analysis, note that scores for five different competition types was collected—Calculator Applications, Computer Science, Mathematics, Number Sense, and Science.

schools_uil_cors_byyear <-
  schools_uil %>% 
  select(year, school, city, comp, prnk) %>% 
  group_by(year, school, city, comp) %>% 
  summarise(prnk_sum = sum(prnk, na.rm = TRUE)) %>%
  ungroup() %>% 
  unite(comp_school, comp, school) %>%
  widyr::pairwise_cor(
    feature = comp_school,
    item = year,
    value = prnk_sum
  ) %>% 
  rename(year1 = item1, year2 = item2, cor = correlation)

schools_uil_cors_byyear %>% 
  filter(year1 <= year2)

table class=“table” style=“width: auto !important; margin-left: auto; margin-right: auto;“> year1 year2 cor 2011 2012 0.74 2011 2013 0.63 2012 2013 0.72 2011 2014 0.53 2012 2014 0.60 2013 2014 0.75 2011 2015 0.48 2012 2015 0.52 2013 2015 0.61 2014 2015 0.70

We can see that correlations among years do exist, as we would expect. The strength of the correlations decrease for years that are farther apart, which is also what we might expect.

“Final” Correlation Analysis

So, at this point, I have set myself up to do that which I set out to do—evaluate the relationship between the academic UIL competition scores and the national SAT/ACT scores.

In order to put the two sets of data on “equal grounds”, I only evaluate math scores. In particular, I filter comp in the UIL data to just the mathematically-based competitions—Calculator Applications, Mathematics, and Number Sense—excluding Science and Computer Science. And, for the SAT/ACT data, I select only the math score, which is available fore both tests, excluding the total and reading scores also available for each and the writing, english, and science scores available for one or the other. (Perhaps the ACT’s science score could be compared to the Science UIL scores, but I choose not to do so here.)

schools_uil_math <-
  schools_uil %>%
  filter(str_detect(comp, "Calculator|Math|Number")) %>%
  group_by(year, school, city) %>% 
  summarise(prnk_sum = sum(prnk, na.rm = TRUE)) %>%
  ungroup() %>% 
  # NOTE: "Renormalize" `prnk_sum`.
  mutate(math_prnk = percent_rank(prnk_sum)) %>% 
  select(-prnk_sum)
schools_uil_math

year	school	city	math_prnk
2011	ABBOTT	ABBOTT	0.82
2011	ABERNATHY	ABERNATHY	0.59
2011	ABILENE	ABILENE	0.00
2011	ACADEMY	KINGSVILLE	0.70
2011	ACADEMY	LITTLE RIVER	0.81
2011	ACADEMY OF FINE ARTS	FORT WORTH	0.55
2011	ADAMS	DALLAS	0.47
2011	ADAMSON	DALLAS	0.42
2011	ADRIAN	ADRIAN	0.36
2011	ADVANTAGE ACADEMY	WAXAHACHE	0.22
2011	AGUA DULCE	AGUA DULCE	0.57
2011	ALAMO HEIGHTS	SAN ANTONIO	0.70
2011	ALBA-GOLDEN	ALBA	0.72
2011	ALBANY	ALBANY	0.95
2011	ALEDO	ALEDO	0.89
2011	ALEXANDER	LAREDO	0.56
2011	ALICE	ALICE	0.10
2011	ALLEN	ALLEN	0.85
2011	ALPINE	ALPINE	0.57
2011	ALTO	ALTO	0.19
¹ # of total rows: 5,596

schools_tea_math <-
  schools_tea %>%
  select(test, year, school, city, math) %>%
  filter(!is.na(math)) %>% 
  group_by(test) %>% 
  mutate(math_prnk = percent_rank(math)) %>%
  ungroup() %>% 
  group_by(year, school, city) %>% 
  summarise_at(vars(math_prnk), funs(mean(., na.rm = TRUE))) %>% 
  ungroup()
schools_tea_math

year	school	city	math_prnk
2011	A C JONES	CORPUS CHRISTI	0.51
2011	A J MOORE ACAD	WACO	0.24
2011	A M CONS	HUNTSVILLE	0.97
2011	A MACEO SMITH HIGH SCHOOL	RICHARDSON	0.03
2011	A+ ACADEMY	RICHARDSON	0.14
2011	ABBOTT SCHOOL	WACO	0.72
2011	ABERNATHY	LUBBOCK	0.63
2011	ABILENE	ABILENE	0.60
2011	ACADEMY	WACO	0.91
2011	ACADEMY HIGH SCHOOL	AUSTIN	0.32
2011	ACADEMY OF CAREERS AND TECHNOLOGIE	SAN ANTONIO	0.03
2011	ACADEMY OF CREATIVE ED	SAN ANTONIO	0.48
2011	ADRIAN SCHOOL	AMARILLO	0.27
2011	ADVANTAGE ACADEMY	RICHARDSON	0.25
2011	AGUA DULCE	CORPUS CHRISTI	0.66
2011	AKINS	AUSTIN	0.25
2011	ALAMO HEIGHTS	SAN ANTONIO	0.95
2011	ALBA-GOLDEN	KILGORE	0.52
2011	ALBANY JR-SR	ABILENE	0.83
2011	ALDINE	HOUSTON	0.23
¹ # of total rows: 7,730

schools_join_math <-
  schools_tea_math %>%
  rename_at(vars(matches("^math")), funs(paste0("tea_", .))) %>% 
  inner_join(schools_uil_math %>% 
             rename_at(vars(matches("^math")), funs(paste0("uil_", .))),
           by = c("year", "school", "city")) %>%
  select(year, school, city, matches("math"))
schools_join_math

year	school	city	tea_math_prnk	uil_math_prnk
2011	ABILENE	ABILENE	0.60	0.00
2011	ALAMO HEIGHTS	SAN ANTONIO	0.95	0.70
2011	AMERICAS	EL PASO	0.31	0.69
2011	ANDERSON	AUSTIN	0.98	0.64
2011	ANDRESS	EL PASO	0.15	0.56
2011	ARLINGTON HEIGHTS	FORT WORTH	0.63	0.49
2011	AUSTIN	AUSTIN	0.89	0.22
2011	AUSTIN	EL PASO	0.22	0.68
2011	AUSTIN	HOUSTON	0.09	0.85
2011	BEL AIR	EL PASO	0.17	0.49
2011	BERKNER	RICHARDSON	0.80	0.50
2011	BOSWELL	FORT WORTH	0.83	0.22
2011	BOWIE	AUSTIN	0.97	0.70
2011	BOWIE	EL PASO	0.15	0.15
2011	BRANDEIS	SAN ANTONIO	0.79	0.39
2011	BREWER	FORT WORTH	0.48	0.19
2011	BURBANK	SAN ANTONIO	0.22	0.76
2011	BURGES	EL PASO	0.57	0.70
2011	CALALLEN	CORPUS CHRISTI	0.47	0.93
2011	CANUTILLO	EL PASO	0.18	0.81
¹ # of total rows: 699

schools_join_math_cors <-
  schools_join_math %>%
  select(-year) %>% 
  select_if(is.numeric) %>%
  corrr::correlate()
schools_join_math_cors

rowname	tea_math_prnk	uil_math_prnk
tea_math_prnk	NA	0.36
uil_math_prnk	0.36	NA

So, this correlation value—0.36—seems fairly low. At face value, it certainly does not provide any basis to claim that schools that do well in the academic UIL competitions also do well with SAT/ACT tests. However, perhaps if I used a different methodology, the result would be different. Other metrics used to quantify academic UIL performance could be tested in some kind of sensitivity analysis.

EDA: Year-to-Year Correlations, Cont.

While I won’t do any kind of rigorous second evaluation here, I do want to try to quantify the impact of the “missing” data dropped due to mismatched school names. If all possible data had been used, would the final correlation value have increased (or decreased) with more (or less) data? Although finding direct answer to this question is impossible, we can evaluate the difference in the year-to-year correlations of scores from the schools that are joined with the correlations calculated for all in “unjoined” schools_tea and schools_uil data sets. If we find that there are large discrepancies (one way or the other), then we may have some reason to believe that the 0.36 number found above is misleading.

To perform this task, I create a couple of intermediary data sets, as well as some functions.

schools_postjoin_math_tidy <-
  schools_join_math %>%
  unite(school_city, school, city) %>% 
  gather(metric, value, matches("prnk"))

pairwise_cor_f1 <-
  function(data, which = c("tea", "uil")) {
    which <- match.arg(which)
    data %>%
      filter(metric %>% str_detect(which)) %>% 
      # filter_at(vars(value), all_vars(!is.nan(.))) %>% 
      widyr::pairwise_cor(
        feature = school_city,
        item = year,
        value = value
      ) %>% 
      rename(year1 = item1, year2 = item2, cor = correlation) %>% 
      mutate(source = which %>% toupper())
  }

pairwise_cor_f2 <-
  function(data, which = c("tea", "uil")) {
    which <- match.arg(which)
    col <-
      data %>%
      names() %>% 
      str_subset("math")
    data %>%
      unite(school_city, school, city) %>% 
      rename(value = !!rlang::sym(col)) %>%
      mutate(source = which %>% toupper()) %>% 
      widyr::pairwise_cor(
        feature = school_city,
        item = year,
        value = value
      ) %>% 
      rename(year1 = item1, year2 = item2, cor = correlation) %>% 
      mutate(source = which %>% toupper())
  }

schools_postjoin_math_cors_byyear <-
  bind_rows(
    schools_postjoin_math_tidy %>% 
      pairwise_cor_f1("tea"),
    schools_postjoin_math_tidy %>% 
      pairwise_cor_f1("uil")
  )

schools_prejoin_math_cors_byyear <-
  bind_rows(
    schools_tea_math %>%
      pairwise_cor_f2("tea"),
    schools_uil_math %>%
      pairwise_cor_f2("uil")
  )

schools_math_cors_byyear_diffs <-
  schools_postjoin_math_cors_byyear %>% 
  inner_join(schools_prejoin_math_cors_byyear, 
             by = c("year1", "year2", "source"), 
             suffix = c("_join", "_unjoin")) %>% 
  mutate(cor_diff = cor_join - cor_unjoin)

Ok, enough of the data munging—let’s review the results!

schools_math_cors_byyear_diffs_wide <-
  schools_math_cors_byyear_diffs %>% 
  filter(year1 <= year2) %>% 
  select(-matches("join$")) %>% 
  unite(year_pair, year1, year2) %>% 
  spread(source, cor_diff)
schools_math_cors_byyear_diffs_wide

year_pair	TEA	UIL
2011_2012	0.09	-0.02
2011_2013	0.05	-0.04
2011_2014	0.11	-0.06
2011_2015	0.24	-0.03
2012_2013	0.01	0.00
2012_2014	0.06	0.04
2012_2015	0.15	0.00
2013_2014	-0.01	-0.03
2013_2015	0.08	0.00
2014_2015	0.05	-0.01

Note that the correlations in the joined data are a bit “stronger”—in the sense that they are more positive—among the TEA SAT/ACT data, although not in any kind of magnificent way. Additionally, the differences for the UIL data are trivial. Thus, we might say that the additional data that could have possibly increased (or decreased) the singular correlation value found—0.36—would not have changed much at all.

Conclusion

So, my initial inclination in my analysis of academic UIL competitions) seems correct—there is no significant relationship between Texas high school academic competition scores and standardized test scores (for math, between 2011 and 2015). And, with that question answered, I intend to explore this rich data set in other ways in future blog posts.

To leave a comment for the author, please follow the link and comment on their blog: r on Tony ElHabr.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Correlations Between Texas High School Academic Competition Results and SAT/ACT Scores

Introduction

Data Collection

Data Cleaning

EDA: Year-to-Year Correlations

“Final” Correlation Analysis

EDA: Year-to-Year Correlations, Cont.

Conclusion

Related

Introduction

Data Collection

Data Cleaning

EDA: Year-to-Year Correlations

“Final” Correlation Analysis

EDA: Year-to-Year Correlations, Cont.

Conclusion

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)