An Interesting Study: Exploring Mental Health Conditions in the Tech Workplace

May 27, 2017

(This article was first published on R – NYC Data Science Academy Blog, and kindly contributed to R-bloggers)

Contributed by Bo Lian. This blog post is the first class project on Exploratory Visualization & Shiny. Please check  the Shiny App here.

Background and Motivation

Mental illness is a global health problem. It has critically high prevalence, yielding severe health outcomes. One in four adults suffers from a diagnosable mental illness in any given year (National Alliance on Mental Health, 2013). In the tech industry, 50% of individuals have sought treatment for mental illness, according to Finkler, 2015.

This important health issue warrants further investigation. This study, based on the 2014 Mental Health in Tech Survey from Open Sourcing Mental Illness, performs a complete data visualization and statistical analyses among the mental illness and various factors in order to answer following questions:

  1. What are the strongest groups of predictors of mental health illness in the workplace?
  2. What might be the causes of the high mental illnesses prevalence in the tech industry other than some common thoughts?
  3. How is the representativeness of the survey, is other factors involved: geographic locations, ages, etc.?

The Data

The data used in the study is the 2014 Mental Health in Tech Survey from Open Sourcing Mental Illness. With 1259 responses, it is considered the largest survey  on mental health in the tech industry.

The dataset contains 27 factors that could be segmented into 5 categories, each of which can be performed for Chi-squared test against the response variable of the treatment (if one has sought treatment for a mental health condition).

1. Demographics : age, gender, country, state etc.

2. Mental Health Condition : treatment, work interference, family history etc.

3. Employment Background : tech, remote, employee number, size etc.

4. Organizational Policies on Mental Health : benefit, wellness program etc.

5. Openness about Mental Health : self awareness, coworkers, supervisors, openness to discuss etc.


Data Visualization and Analysis

The data is drawn from employees from 46 countries, among which US employees account for 59.2%. Most of those hail from tech heavy states like California, New York, Washington, etc.


The median age of the surveyees is 31. The distribution of ages is right skewed which is expected as tech industry tends to have younger employees.

From the box-plots, there is no statistically significant difference of ages between the treatment and the non treatment groups.

Below are 6 of the data analysis examples by Chi-Square Testing with the scores and P values. As expected, mental illness is strongly dependent on the family history, work interference. If the companies offer mental-illness care options, it also enhances the mental illness treatment rate, so as gender difference. What is interesting is that the study shows that the data shows that what people may assume correlates with mental illness is not a factor at all.  For example, either remote work or tech type company is a factor does not correlate with t mental illness  so as the supervisor’s attitude.


Results and Conclusions:

Significant Factors of the Chi-Square tests

They include: gender, family history, work interference, benefits, care options, mental health interview etc.

Insignificant Factors of the Chi-Square tests

Identifiers like self_employed, remote work, number of employees, tech or not, supervisors etc. , did not prove significant.

Among the results, the family history, work interference, gender difference are mostly likely to be the main causal factors. If a company has provided better mental health care services, the employees might be more likely to seek treatment or raise their awareness, different genders might tend to seek treatment differently as well. Interestingly, none of the true work conditions has any statistically significant impact on the mental illness of the employees. Human are tougher than they think! One thing to consider is that as the tech industry is primarily located at the US west or east coast or developed countries as shown in the map, the high living cost, competitive atmosphere, and high population density might contribute greatly to the prevalence of mental illness rather than the working conditions themselves.

Future Work

Keep exploring on

  • More data visualization
  • Data complexity and subjectivity, location specificity
  • Integrating ongoing survey data from 2016
  • Interaction among factors
  • Predictions with multiple variates

The post An Interesting Study: Exploring Mental Health Conditions in the Tech Workplace appeared first on NYC Data Science Academy Blog.

To leave a comment for the author, please follow the link and comment on their blog: R – NYC Data Science Academy Blog. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)