machin learning with osi pi

Python based machine learning (ML) libraries have evolved at an unbelievable pace. It is most impressive that the time-consuming steps such as data encoding, feature selection, model comparison and even model optimization have been fully automated. For example, the relatively new Python library PyCaret calculates the metrics of over 21 different regression models and selects the best one with just a few lines of codes. Machine learning with OSI Pi has come along way.

There are plenty of industrial applications, where these algorithms could be successfully applied. But there are two major bottlenecks for successful projects:

  1. Historical Data collection for the Model Development
    1. Real time data collection for the Model Integration

Model Development data could be downloaded in Excel or text\csv files and analyzed offline. The drawback is that this approach cannot be productized and is limited to off-line applications.

To accelerate the model development and model integration (MD\MI pipelines) for the OSIsoft PI System, TQS has developed a Python library called TQS Pandas PiFrames for OSIsoft® PI System® that connects to the PI System and provides PI data as Pandas data frames. The Pandas data frame is the preferred data structure in Python for data scientists and is supported by many ML libraries. Therefore, the TQS Pandas PiFrames for OSIsoft® PI System® can be easily integrated into ML projects in both model development and model integration.

The following shows some code examples in Python.

  1. Connecting to the PI Data Historian and PI System:

cdf = ConnectToDefaultAF()
cdf = ConnectToDefaultPI()

df = GetMultipleAttributeValuesByVariable("Bio Reactor 1",["Temperature","Concentration","Level"],'t-2h','t',60,0,None)

The resulting data frame is a time series:

The data frame can also be arranged by variable columns:

df = GetMultipleAttributeValuesByFrame("Batch_0_*","Bio Reactor 1",["Temperature","Concentration","Level"],'t-7d','t',60,0,None)

During the last couple of months, we have developed use cases around OSIsoft PI system that are based on the TQS Pandas PiFrames for OSIsoft® PI System® library:

The library has shown to significantly reduce the model development and model integration time.


Machine Learning and AI projects are often slow to develop and difficult to integrate. The main reason is that most Python libraries are expecting Pandas data frames (or Numpy arrays) and these data structures are not readily available in industrial automation. TQS Integration has developed the TQS Pandas PiFrames for OSIsoft® PI System® libraries to accelerate both model development and model integration. The library is user friendly, fast and scales well for all common machine learning (ML) applications.

For information, please contact us.

Advanced data analytics is empowering process manufacturing teams across all verticals.

Enhanced accessibility into operational and equipment data has surged a transformation in the process manufacturing industry. Engineers can now see both historical and time-series data from their operation as it’s happening and at remote locations, so entire teams can be up-to-speed continuously and reliably. The only problem with this? Finding their team is “DRIP”—Data rich, information poor.

With tremendous amounts of data, a lack of proper organization, cleansing, and contextualizing only puts process engineers at a standstill. Some chemical environments have 20,000 to 70,000 signals (or sensors), oil refineries can have 100,000, and enterprise sensor data signals can reach millions.

These amounts of data can be overwhelming, but tactfully refining it can lead to greatly advantageous insights. Many SMEs and process engineers’ valuable time is filled with sorting through spreadsheets to try to wrangle the data, and not visualizing and analyzing patterns and models that lead to effective insight. With advanced analytics, process manufacturers can easily see all up-to-date data from disparate sources and make decisions based on the analysis to immediately improve operations.

Moving Up from “Data Janitors”

Moving data from “raw” to ready for analysis should not take up the majority of your subject matter experts’ time. Some organizations in today’s world still report that over 70 percent of their time involved with operational analytics is only dedicated to cleansing their data.

But your team is not “data janitors.” Today’s technology can take care of the monotonous and very time-consuming tasks of accessing, cleansing, and contextualizing data so your team can move straight to benefitting from the insights.

The Difference Between Spreadsheets and Advanced Analytics

For an entire generation, spreadsheets have been the method of choice for analyzing data in the process manufacturing industry. At the moment of analysis, the tool in use needs to enable user input to define critical time periods of interest and relevant context. Spreadsheets have been the way of putting the user in control of data investigation while offering a familiar, albeit cumbersome, path of analysis.

But the downfalls of spreadsheets have become increasingly apparent:

All of these pain points combine to an ultimate difficulty to reconcile and analyze data in the broader business context necessary for profitability and efficiency use cases to improve operational performance.

With advanced analytics, experts in process manufacturing operations on the front lines of configuring data analytics, improvements to the production’s yield, quality, availability, and bottom-lines are readily available.

How It’s Done

Advanced analytics leverages innovations in big data, machine learning, and web technologies to integrate and connect to all process manufacturing data sources and drive business improvement. Some of the capabilities include:

The Impact of Advanced Analytics

Simply put, advanced analytics gives you the whole picture. It draws relationships and correlations between specific data that need to be made in order to improve performance based on accurate and reliable insight. Seeq’s advanced analytics solution is specifically designed for process manufacturing data and has been empowering and saving leading manufacturers time and money upon immediate implementation. Learn more about the application and how it eliminates the need for spreadsheet exhaustion here.

TQS Integration proudly announces their partnership with Sartorius Data Analytics on multivariate technology that turns data into actionable information. Understanding and using information to make optimal business decision is key to all process manufacturing leaders today. This partnership allows TQS to empower their clients with data to drive digitized manufacturing processes and industry 4.0 advances.

Collaboration with Sartorius Data Analytics, the analytics division of biotech company Sartorius, will allow TQS to deliver solutions and training in the Umetrics product suite, particularly SIMCA® and SIMCA®-online. These easy-to-use software tools can be used to support regulatory compliance, enhance development, optimize production, enable digital transformation, and facilitate quality by design initiatives.

SIMCA® is a data exploration tool that turns information into valuable and knowledge building information to drive operational excellence. The multivariate modelling used in SIMCA®  identifies hidden trends and patterns not visible by univariate data analytic methods. SIMCA® is equipped with powerful modelling algorithms, interactive plots and direct drill-down analytics, providing easy access to visualization and interpretation of data patterns. This makes it a fantastic tool for trouble shooting process data to pinpoint root causes, as well as exploring areas for improvement.

SIMCA®-online is a real-time monitoring and prediction software that provides a complete set of interactive and visual monitoring tools to ensure that batch and continuous operation run smoothly. SIMCA®-online uses an ‘ideal process’ model to anticipate the effect of changes as they happen and suggests adjustments to your process based on multivariate trends. Early warnings of deviations help ensure processes are performing to specifications – maintaining optimum product quality, while maximizing resource efficiency and cost savings. This has made SIMCA® and SIMCA®-online the defacto standards for process monitoring and optimization in many industries.

"Driving Digital Transformation and Process Intensification today requires speed, knowledge and choosing the right value adding partners for executing and delivering solutions when internal resources are not enough or easily accessible. Ensuring certification of partners like TQS, we enable our valued customers by providing additional options for global reach, and local presence, to all new and existing users of the Umetrics® suite. Our drive is to continue to deliver new technology as hybrid and mechanistic modelling capabilities with an easy to use interface to the entire Life-Science industry.”, says Johan Hultman, Embedded Solutions (OEM) and Partner Manager at Sartorius."

This collaboration will provide a range of services to clients in the Pharmaceutical & Life Science industry enabling TQS to deliver certified implementation, data modelling services, installation and training.  

"Through this partnership, TQS can now ensure that our clients have what’s required for full utilisation of the MVDA (Multivariate data analysis) SIMCA toolbox and apply the power of MVDA to spectroscopy and multi-omics BIG DATA in the pharmaceutical industry. We want to support clients in their production goals and deliver a successful implementation that ensures a resilient process health and manufacturing success for all our clients.”, comments Brian O’Connor, Products and Services Manager at TQS."

In addition to all the off-the-shelf and bespoke products that TQS has to offer, this partnership is a boost to facilitate work with clients to ensure they have reliable and actionable results. In doing so, these insights help save time and money in production, minimize risk, and more importantly ensure compliance, while proactively improving product quality.

About TQS Integration

TQS Integration is a global data intelligence company providing turnkey solutions in system architecture and application design, engineering, system integration, project management, commissioning and 24x7 “follow the sun” support services to valued customers. TQS is at the forefront of data intelligence for over 20 years, working with an extensive client base in the Pharmaceutical, Life Science, Food & Beverage, Energy and Renewables industries. As the go-to partner for data collection, contextualization, visualization, analytics, and managed services, we are the main drivers in the world’s leading companies — helping them become leaders in Industry 4.0.

For information, please contact us.

About Sartorius Data Analytics

As part of Sartorius group, founded in 1870, the company earned sales revenue of more than 2.0 billion euros in 2020. More than 10,000 people work at the group's 60+ manufacturing and sales sites, serving customers around the globe. Sartorius Data Analytics are leading data analytics experts that help organizations in many different industries to get more value from their data using the Umetrics® Suite of Data Analytics Solutions. These solutions help harness the wealth of data, identifying vital elements to improve the results from research, product development and manufacturing processes.

For information, please visit Sartorius Data Analytics.

Data Latency

The topic of system latency has come up a couple of times in recent projects. If you really think about it, this is not surprising. As more manufacturing gets integrated, data must be synchronized and\or orchestrated between different applications. Here are just some examples:

  1. MES: Manufacturing execution system typically connect to a variety of data sources, so the workflow developer needs to know timeout settings for different applications. Connections to the automation system will have a very low latency, but what is the expected data latency of the historian?
  1. Analysis: More and more companies move towards real-time analytics. But just how fast can you really expect calculations to be updated? This is especially true for Enterprise level systems, that are typically clones from source OSIsoft PI servers by way of PI-to-PI. So you are looking at a data flow for example:

    Source -> PI Data Archive (local) -> PI-to-PI -> PI Data Archive (region) -> PI-to-PI -> PI Data Archive (enterprise) and latency in each step.
  2. Reports: One example are product release reports. How long do you need to wait to make sure that all data have been collected?

The OSIsoft PI time series object provides a time stamp which is typically provided from the source system. This time stamp will bubble up though interfaces and data archives unchanged. This makes sense when you compare historical data, but it will mask the latency in your data.

To detect when the data point gets queued and recorded at the data server, PI offers 2 event queue that can be monitored:

AFDataPipeType.Snapshot ... to monitor the snapshot queue

AFDataPipeType.Archive ... to monitor the archive queue

You can use PowerShell scripts, which have the advantage of being a lighter application that can be combined with the existing OSIsoft PowerShell library. PowerShell is also available on most server, so you don't need a separate development environment for code changes.

The first step is to connect to the OSIsoft PI Server using the AFSDK:

function Connect-PIServer{
param ([string] [Parameter(Mandatory=$true, Position=0, ValueFromPipeline=$true,
ValueFromPipelineByPropertyName=$true)] $PIServerName)
Add-Type -Path $Library

The function opens a connection to the server and returns the .NET object.

By monitoring the queues and writing the values, it will look like the following:

function Get-PointReference{
param ([PSTypeName('OSIsoft.AF.PI.PIServer')] [Parameter(Mandatory=$true,
Position=0, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)] $PIServer,
[string] [Parameter(Mandatory=$true, Position=1, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]

function Get-QueueValues{
param ( [PSTypeName('OSIsoft.AF.PI.PIPoint')] [Parameter(Mandatory=$true,
Position=0, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)] $PIPoint,
[double] [Parameter(Mandatory=$true, Position=1, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)] $DurationInSeconds )
# get the pi point and cretae NET list
$PIPointList = New-Object System.Collections.Generic.List[OSIsoft.AF.PI.PIPoint]
# create the pipeline
$ArchivePipeline=[OSIsoft.AF.PI.PIDataPipe]::new( [OSIsoft.AF.Data.AFDataPipeType]::Archive)
$SnapShotPipeline=[OSIsoft.AF.PI.PIDataPipe]::new( [OSIsoft.AF.Data.AFDataPipeType]::Snapshot)
# add signups
# now the polling
While((Get-Date) -lt $EndTime){
$ArchiveEvents = $ArchivePipeline.GetUpdateEvents(1000);
$SnapShotEvents = $SnapShotPipeline.GetUpdateEvents(1000);
# format output:
foreach($ArchiveEvent in $ArchiveEvents){
$AFEvent = New-Object PSObject -Property @{
Name = $ArchiveEvent.Value.PIPoint.Name
Type = "ArchiveEvent"
Action = $ArchiveEvent.Action
TimeStamp = $ArchiveEvent.Value.Timestamp.LocalTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
QueueTime = $RecordedTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
Value = $ArchiveEvent.Value.Value.ToString()
foreach($SnapShotEvent in $SnapShotEvents){
$AFEvent = New-Object PSObject -Property @{
Name = $SnapShotEvent.Value.PIPoint.Name
Type = "SnapShotEvent"
Action = $SnapShotEvent.Action
TimeStamp = $SnapShotEvent.Value.Timestamp.LocalTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
QueueTime = $RecordedTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
Value = $SnapShotEvent.Value.Value.ToString()
# 150 ms delay
Start-Sleep -m 150

These 2 scripts are all you need to monitor events coming into a single server. The data latency is simply the difference between the value's time stamp and the time recorded.

Measuring the data latency between 2 servers - for example a local and an enterprise server - can be done the same way. You just need 2 server objects and then monitor the snapshot (or archive) events.

unction Get-Server2ServerLatency{
param ( [PSTypeName('OSIsoft.AF.PI.PIPoint')] [Parameter(Mandatory=$true, Position=0,
ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)] $SourcePoint,
[PSTypeName('OSIsoft.AF.PI.PIPoint')] [Parameter(Mandatory=$true, Position=1,
ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)] $TargetPoint,
[double] [Parameter(Mandatory=$true, Position=2, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)] $DurationInSeconds )
$SourceList = New-Object System.Collections.Generic.List[OSIsoft.AF.PI.PIPoint]
$TargetList = New-Object System.Collections.Generic.List[OSIsoft.AF.PI.PIPoint]
# create the pipeline
$SourcePipeline=[OSIsoft.AF.PI.PIDataPipe]::new( [OSIsoft.AF.Data.AFDataPipeType]::Snapshot)
$TargetPipeline=[OSIsoft.AF.PI.PIDataPipe]::new( [OSIsoft.AF.Data.AFDataPipeType]::Snapshot)
# add signups
# now the polling
While((Get-Date) -lt $EndTime){
$SourceEvents = $SourcePipeline.GetUpdateEvents(1000);
$TargetEvents = $TargetPipeline.GetUpdateEvents(1000);
# format output:
foreach($SourceEvent in $SourceEvents){
$AFEvent = New-Object PSObject -Property @{
Name = $SourceEvent.Value.PIPoint.Name
Type = "SourceEvent"
Action = $SourceEvent.Action
TimeStamp = $SourceEvent.Value.Timestamp.LocalTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
QueueTime = $RecordedTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
Value = $SourceEvent.Value.Value.ToString()
foreach($TargetEvent in $TargetEvents){
$AFEvent = New-Object PSObject -Property @{
Name = $TargetEvent.Value.PIPoint.Name
Type = "TargetEvent"
Action = $TargetEvent.Action
TimeStamp = $TargetEvent.Value.Timestamp.LocalTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
QueueTime = $RecordedTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
Value = $TargetEvent.Value.Value.ToString()
# 150 ms delay
Start-Sleep -m 150

Here is a quick test of a PI2PI interface reading and writing to the same server:

Get-Server2ServerLatency $srv $srv sinusoid sinusclone 30

As you can see the difference between target and source is a bit over 1 sec, which is to be expected since the scan rate is 1 second.


Data latency is a key metric for every system that captures, stores, analyses, or processes data. Every sequential operation will add to the overall system latency and must be accounted for. It is not only the data transport over networks that is the major contributor, but also data queues that facilitate the packaging of data into messages that add significant delays. This topic is especially important for cloud-based systems that rely on on-premises sensor data.

As shown in this blog, data latency can and should be measured and be part of the architectural planning process. As a rule of thumb, sub second data latencies are challenging especially when the number of data sources increases.

Please contact us for more information.

© All rights reserved.