How to improve your Transition Analysis using TQS Pandas PiFrames in Python
By 
Holger Amort
0202-NovNovNov-2121

The TQS Pandas PiFrames for OSIsoft® PI System® library has been designed to accelerate multivariate analytics (MVA) and machine learning (ML) for the OSIsoft PI system. The difference to the existing PI Analysis calculation engine, is that TQS Pandas PiFrames is designed for vector or matrix operations instead of single value operation.

The TQS Pandas PiFrames for OSIsoft® PI System® library makes it very easy to work with structured and contextualized data in Python. Time segments can be defined as Event Frames (OSIsoft EF) and retrieved together with sensor data as structured Pandas data frames. This allows simple and very complex analytics of one dimensional or multi-dimensional data.

One use case in Biotechnology is the transition analysis (TA) on chromatography columns. Chromatography is used to purify the product and the performance of the chromatography column is key to achieve a good product quality. There are several metrics that can be calculated to monitor the columns performance, the following lists a few:

  • Gaussian or non-Gaussian height equivalent to a theoretical plate (HETP)
  • Peak Asymmetry
  • Tailing Factor
  • Resolution

The calculations are based on the transition peak which mathematically is a probability density function (pdf). The peak is calculated from the raw sensor data – the transition or cumulative distribution function - by numerical derivation. Often the curves are normalized by the flow rate to account for differences in total volume. The following shows an example of the transition (cdf) and the derivative (pdf):

Transition Analysis

The transition peak of the pdf is used to calculate, for example, the peak asymmetry using the following formula:

Assymetry = b/a

Where b and a are the 10% peak heights left (blue line) and right of the peak maximum (black line). Though the calculations are simple, the major problem is the numerical differentiation of noisy sensor data. This steps introduces so much additional noise that the peak shape is hard to analyze. Therefore, the analysis includes data smoothing steps as the LOWESS filter to reduce the noise level in the raw data and upsampling to increase the resolution.

It was performed using simulated data with different noise leves from 0 to 2.5% to evaluate how acurate and precise this analysis is.

The results show that this calculation has some significant variation even at low noise levels. There are also differences in the accuracy, which are introduced by the filtering step. Depending on the sensor data quality, this approach might not be senssitive enpough to pick up small changes in the columns performance.

To improve the results, the same test was performed by fitting an exponential modified gaussian directly to the transition curve.

The fitting routine led to much better accuracy and precision. This is mainly due to the fact that the tranisiton curve doesn’t have to be modified and therefore no additional noise or peak distortion is being introduced.

Summary:

Transition Analysis in biotech production is a great approach to monitor the column performance during chromatography steps. There are a lot of simple metrics available as key performance indicator (KPI’s), but they mostly operate on derived signal, which introduce noise and distortions in the calculation.

Using the raw transition signal and fitting a distribution function would be a much better way. Though this makes the analysis more complex and increases the latency, however much higher precision and accuracy can be achieved in the results.

For information, please contact us.

© All rights reserved.