Enhanced accessibility into operational and equipment data has surged a transformation in the process manufacturing industry. Engineers can now see both historical and time-series data from their operation as it’s happening and at remote locations, so entire teams can be up-to-speed continuously and reliably. The only problem with this? Finding their team is “DRIP”—Data rich, information poor.
With tremendous amounts of data, a lack of proper organization, cleansing, and contextualizing only puts process engineers at a standstill. Some chemical environments have 20,000 to 70,000 signals (or sensors), oil refineries can have 100,000, and enterprise sensor data signals can reach millions.
These amounts of data can be overwhelming, but tactfully refining it can lead to greatly advantageous insights. Many SMEs and process engineers’ valuable time is filled with sorting through spreadsheets to try to wrangle the data, and not visualizing and analyzing patterns and models that lead to effective insight. With advanced analytics, process manufacturers can easily see all up-to-date data from disparate sources and make decisions based on the analysis to immediately improve operations.
Moving data from “raw” to ready for analysis should not take up the majority of your subject matter experts’ time. Some organizations in today’s world still report that over 70 percent of their time involved with operational analytics is only dedicated to cleansing their data.
But your team is not “data janitors.” Today’s technology can take care of the monotonous and very time-consuming tasks of accessing, cleansing, and contextualizing data so your team can move straight to benefitting from the insights.
For an entire generation, spreadsheets have been the method of choice for analyzing data in the process manufacturing industry. At the moment of analysis, the tool in use needs to enable user input to define critical time periods of interest and relevant context. Spreadsheets have been the way of putting the user in control of data investigation while offering a familiar, albeit cumbersome, path of analysis.
But the downfalls of spreadsheets have become increasingly apparent:
All of these pain points combine to an ultimate difficulty to reconcile and analyze data in the broader business context necessary for profitability and efficiency use cases to improve operational performance.
With advanced analytics, experts in process manufacturing operations on the front lines of configuring data analytics, improvements to the production’s yield, quality, availability, and bottom-lines are readily available.
Advanced analytics leverages innovations in big data, machine learning, and web technologies to integrate and connect to all process manufacturing data sources and drive business improvement. Some of the capabilities include:
Simply put, advanced analytics gives you the whole picture. It draws relationships and correlations between specific data that need to be made in order to improve performance based on accurate and reliable insight. Seeq’s advanced analytics solution is specifically designed for process manufacturing data and has been empowering and saving leading manufacturers time and money upon immediate implementation. Learn more about the application and how it eliminates the need for spreadsheet exhaustion here.
Machine Learning (ML) has seen an exponential growth during the last five years and many analytical platforms have adopted ML technologies to provide packaged solutions to their users. So, why has Machine Learning become mainstream?
Let’s take a look at Technically Multivariate Analysis (MVA). While many algorithms have been widely available for a long time, MVA is still considered a subset of ML algorithms. MVA typically refers to two algorithms:
As such, MVA has become a de facto standard in manufacturing batch processing and others. Some typical use cases are:
In principle, industrial datasets are not different from other supervised or unsupervised learning problems and they can be evaluated using a wide range of algorithms. Multivariate Analysis was preferred because it offered global and local explainability. MVA models are multivariate extensions of the well understood linear regression that provide weights (slope) for each variable. This enables critical understanding and optimization of underlying process dynamics which is a very important aspect in manufacturing.
In the past, many ML algorithms were considered black box models, because the inner mechanics of the model were not transparent to the user. These model types had limited utility in manufacturing since they could not answer the WHY and therefore lacked credibility.
This has very much changed. Today, model explainers in ML are a very active field of research and excellent libraries have become available to analyze the underlying model mechanics of highly complex architectures.
The following shows an example of applying ML technologies to a typical MVA project type. In the original publication (https://journals.sagepub.com/doi/10.1366/0003702021955358 ), several preprocessing steps have been studied together with PLS to build a predictive model. All steps were performed using commercial off the shelf software that manually worked the analysis.
Using ML pipelines, the same study can be structured as follows:
pipeline=Pipeline(steps= [('preprocess', None), ('regression',None)])
preprocessing_options=[{'preprocess': (SNV(),)},
{'preprocess': (MSC(),)},
{'preprocess': (SavitzkyGolay(9,2,1),)},
{'preprocess': (make_pipeline(SNV(),SavitzkyGolay(9,2,1)),)}]
regression_options=[{'regression': (PLSRegression(),), 'regression__n_components': np.arange(1,10)},
{'regression': (LinearRegression(),)},
{'regression': (xgb.XGBRegressor(objective="reg:squarederror", random_state=42),)}]
param_grid = []
for preprocess in preprocessing_options:
for regression in regression_options:
param_grid.append({**preprocess, **regression})
search=GridSearchCV(pipeline,param_grid=param_grid, scoring=score, n_jobs=2,cv=kf_10,refit=False)
This small code example manages to test every combination of prepossessing and regression steps, then automatically select the best model. [A combination of SNV (Standard Normal Variate), 1st derivative and XGBoost showed the highest cross validated explained variance of 0.958].
The transformed spectra and the model weights can be overlaid to provide insights into the model mechanics:
Multivariate Analysis (MVA) has been successfully applied in manufacturing and is here to stay. But there is no doubt that Machine Learning (ML) data engineering concepts will be widely applied to this domain as well. Pipelines and autotuning libraries will ultimately replace the manual work of selecting data transformation, model selection and hyper parameter tuning. New ML algorithms and Deep Learner, in combination with local and global explainer, will expand Manufacturing Intelligence and provide key insights into Process Dynamics.
Thanks to Dr. Salvador Garcia-Munoz for providing code examples and data sets.
For more information, please contact us.