Original article published in opendatascience.com
In the last couple of years, data science has seen an immense influx in various industrial applications across the board. Today, we can see data science applied in health care, customer service, governments, cyber security, mechanical, aerospace, and other industrial applications. Among these, manufacturing has gained more prominence to achieve a simple goal of Just-in-Time (JIT). In the last 100 years, manufacturing has gone through four major industrial revolutions. In the first Industrial Revolution, we saw a transition of harvesting steam energy to mechanical energy. In the second industrial revolution, we saw batch production to assembly lines, which made things more affordable (e.g.: Ford’s Model T was a major outcome), and in the third, we saw significant use of computers and robotics. Between the third and fourth, there was a wave of lean manufacturing that is still being embraced by a lot of manufacturers. Currently, we are going through the fourth Industrial Revolution, where data from machines, environment, and products are being harvested to get closer to that simple goal of Just-in-Time; “Making the right products in right quantities at the right time.” One might ask why JIT is so important in manufacturing? The simple answer is to reduce the manufacturing cost and make products more affordable for everyone.
In this article, I will try to answer some of the most frequently asked questions on data science in manufacturing
How is manufacturing using data science and its impact?
The applications of data science in manufacturing are several. To name a few predictive maintenance, predictive quality, safety analytics, warranty analytics, plant facilities monitoring, computer vision, sales forecasting, KPI forecasting, and many more  as shown in Figure 1 .
Figure 1: Data science opportunities in manufacturing 
Predictive Maintenance: Machine breakdown in manufacturing is very expensive. Unplanned downtime is the single largest contributor to manufacturing overhead costs. Unplanned downtime costs businesses an average of $2 million over the last three years. In 2014 the average downtime cost per hour was $164,000. By 2016, that statistic had exploded by 59% to $260,000 per hour . This has led to embracing technologies like condition-based monitoring and predictive maintenance. Sensor data from machines are monitored continuously to detect anomalies (using models such as PCA-T2, one-class SVM, auto encoders, and logistic regression), diagnose failure modes (using classification models such as SVM, random forest, decision trees, and neural networks), predict the time to failure (TTF) (using combination of techniques such as survival analysis, lagging, curve fitting and regression models) and optimal maintenance time prediction (using operations research techniques)  .
Computer Vision: Traditional computer vision systems measure the parts for tolerance to determine if the parts are acceptable or not. Detecting the quality of the parts for defects such as scuff marks, scratches, and dents are equally important. Traditionally humans were used for inspecting for such defects. Today, AI technologies such as CNN, RCNN, and Fast RCNN’s have proven to be more accurate than their human counterparts and take much less time in inspecting. Hence, significantly reducing the cost of the products .
Sales forecasting: Predicting future trends has always helped in optimizing the resources for profitability. This has been true in various industries, such as manufacturing, airlines, and tourism. In manufacturing, knowing the manufacturing volumes ahead of time helps in optimizing the resources such as supply chain, machine-product balancing, and workforce. Techniques ranging from linear regression models, ARIMA, lagging to more complicated models such as LSTM are being used today to optimize the resources.
Predicting quality: The quality of the products coming out of the machines are predictable. Statistical process control techniques are the most common tools that we find on the manufacturing floor that tell us if the process is in control or out of control as shown in Figure 2. Using statistical techniques such as linear regression on time and product quality would yield us a reasonable trend line. This line is then extrapolated to answer questions such as “How long do we have before we start to make bad parts?”
The above are just some of the most common and popular applications. There are still various applications that are hidden and yet to be discovered.
Figure 2: An example of X-bar chart from R’s qcc package
How big is data science in manufacturing?
According to one estimate for the US, “The Big Data Analytics in Manufacturing Industry Market was valued at USD 904.65 million in 2019 and is expected to reach USD 4.55 billion by 2025, at a CAGR of 30.9% over the forecast period 2020 – 2025. ” In another estimation, “TrendForce forecasts that the size of the global market for smart manufacturing solutions will surpass US$320 billion by 2020. ” In another report it was stated that “The global smart manufacturing market size is estimated to reach USD 395.24 billion by 2025, registering a CAGR of 10.7% according to a new study by Grand View Research, Inc. ”
What are the challenges of data science in manufacturing?
There are various challenges for applying data science in manufacturing. Some of the most common ones that I have come across are as follows
Lack of subject matter expertise: Data science is a very new field. Every application in data science requires their own core set of skills. Likewise, in manufacturing, knowing the manufacturing and process terminologies, rules and regulations, business understanding, components of supply chain and industrial engineering is very vital. Lack of SME would lead to tackling the wrong set of problems, eventually leading to failed projects and, more importantly, losing trust. When someone asks me what is a manufacturing data scientist?, I show them this nice image in Figure 3.
Figure 3: Who is a manufacturing data scientist?
Reinventing the wheel: Every problem in a manufacturing environment is new, and the stakeholders are different. Deploying a standard solution is risky and, more importantly, at some point its bound to fail. Every new problem has a part of the solution that is readily available, and the remaining has to be engineered. Engineering involves developing new ML model workflows and/ writing new ML packages for the simplest case and developing a new sensor or hardware in the most complex ones. In my experience for the last couple of years, I have been on both extreme ends, and I have enjoyed it.
What tools do data scientists who work in manufacturing use?
A data scientist in manufacturing uses a combination of tools at every stage of the project lifecycle. For example:
1. Feasibility study: Notebooks (R markdown & Jupyter), GIT and PowerPoint
“Yes! You read it right. PowerPoint is still very much necessary in any organization. BI tools are trying hard to take them over. In my experience with half a dozen BI tools, PowerPoint still stands in first place in terms of storytelling.”
2. Proof of concept: R, Python, SQL, PostgreSQL, MinIO, and GIT
3. Scale-up: Kubernetes, Docker, and GIT pipelines
Currently, applying data science in manufacturing is very new. New applications are being discovered every day, and various solutions are invented constantly. In many manufacturing projects (capital investments), ROI is realized over the years (5 – 7 years). Most successfully deployed data science projects have their ROI in less than a year. This makes them very appreciable. Data science is just one of many tools that manufacturing industries are currently using to achieve their JIT goal. As a manufacturing
data scientist, some of my recommendations are to spend enough time to understand the problem statement, target for the low hanging fruit, get those early wins, and build trust in the organization.
I will be at ODSC East 2020, presenting “Predictive Maintenance: Zero to Deployment in Manufacturing.” Do stop by to learn more about our journey in deploying predictive maintenance in the production environment.
 ActiveWizards, “Top 8 Data Science Use Cases in Manufacturing,” [Online]. Available: https://activewizards.com/blog/top-8-data-science-use-cases-in-manufacturing/.
 IIoT World, “iiot-world.com,” [Online]. Available: https://iiot-world.com/connected-industry/what-data-science-actually-means-to-manufacturing/. [Accessed 02 10 2020].
 Swift Systems, “Swift Systems,” [Online]. Available: https://swiftsystems.com/guides-tips/calculate-true-cost-downtime/.
 N. a. T. G. Amruthnath, “Fault class prediction in unsupervised learning using model-based clustering approach.,” in In 2018 International Conference on Information and Computer Technologies (ICICT), Chicago, 2018.
 N. a. T. G. Amruthnath, “A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance.,” in In 2018 5th International Conference on Industrial Engineering and Applications (ICIEA), 2018.
 T. Y. C. M. Q. a. H. S. Wang, “A fast and robust convolutional neural network-based defect detection model in product quality control.,” The International Journal of Advanced Manufacturing Technology, vol. 94, no. 9-12, pp. 3465-3471, 2018.
 “Big Data Analytics in Manufacturing Industry Market – Growth, Trends, and Forecast (2020 – 2025),” Mordor Intelligence, 2020.
 Trendforce, “TrendForce Forecasts Size of Global Market for Smart Manufacturing Solutions to Top US$320 Billion by 2020; Product Development Favors Integrated Solutions,” 2017.
 Grand View Research. Inc, “Smart Manufacturing Market Size Worth $395.24 Billion By 2025,” 2019.