Machine Learning (ML) has seen an exponential growth during the last five years and many analytical platforms have adopted ML technologies to provide packaged solutions to their users. So, why has Machine Learning become mainstream?
Let’s take a look at Technically Multivariate Analysis (MVA). While many algorithms have been widely available for a long time, MVA is still considered a subset of ML algorithms. MVA typically refers to two algorithms:
As such, MVA has become a de facto standard in manufacturing batch processing and others. Some typical use cases are:
In principle, industrial datasets are not different from other supervised or unsupervised learning problems and they can be evaluated using a wide range of algorithms. Multivariate Analysis was preferred because it offered global and local explainability. MVA models are multivariate extensions of the well understood linear regression that provide weights (slope) for each variable. This enables critical understanding and optimization of underlying process dynamics which is a very important aspect in manufacturing.
In the past, many ML algorithms were considered black box models, because the inner mechanics of the model were not transparent to the user. These model types had limited utility in manufacturing since they could not answer the WHY and therefore lacked credibility.
This has very much changed. Today, model explainers in ML are a very active field of research and excellent libraries have become available to analyze the underlying model mechanics of highly complex architectures.
The following shows an example of applying ML technologies to a typical MVA project type. In the original publication (https://journals.sagepub.com/doi/10.1366/0003702021955358 ), several preprocessing steps have been studied together with PLS to build a predictive model. All steps were performed using commercial off the shelf software that manually worked the analysis.
Using ML pipelines, the same study can be structured as follows:
pipeline=Pipeline(steps= [('preprocess', None), ('regression',None)])
preprocessing_options=[{'preprocess': (SNV(),)},
{'preprocess': (MSC(),)},
{'preprocess': (SavitzkyGolay(9,2,1),)},
{'preprocess': (make_pipeline(SNV(),SavitzkyGolay(9,2,1)),)}]
regression_options=[{'regression': (PLSRegression(),), 'regression__n_components': np.arange(1,10)},
{'regression': (LinearRegression(),)},
{'regression': (xgb.XGBRegressor(objective="reg:squarederror", random_state=42),)}]
param_grid = []
for preprocess in preprocessing_options:
for regression in regression_options:
param_grid.append({**preprocess, **regression})
search=GridSearchCV(pipeline,param_grid=param_grid, scoring=score, n_jobs=2,cv=kf_10,refit=False)
This small code example manages to test every combination of prepossessing and regression steps, then automatically select the best model. [A combination of SNV (Standard Normal Variate), 1st derivative and XGBoost showed the highest cross validated explained variance of 0.958].
The transformed spectra and the model weights can be overlaid to provide insights into the model mechanics:
Multivariate Analysis (MVA) has been successfully applied in manufacturing and is here to stay. But there is no doubt that Machine Learning (ML) data engineering concepts will be widely applied to this domain as well. Pipelines and autotuning libraries will ultimately replace the manual work of selecting data transformation, model selection and hyper parameter tuning. New ML algorithms and Deep Learner, in combination with local and global explainer, will expand Manufacturing Intelligence and provide key insights into Process Dynamics.
Thanks to Dr. Salvador Garcia-Munoz for providing code examples and data sets.
For more information, please contact us.