machin learning with osi pi

Python based machine learning (ML) libraries have evolved at an unbelievable pace. It is most impressive that the time-consuming steps such as data encoding, feature selection, model comparison and even model optimization have been fully automated. For example, the relatively new Python library PyCaret calculates the metrics of over 21 different regression models and selects the best one with just a few lines of codes. Machine learning with OSI Pi has come along way.

There are plenty of industrial applications, where these algorithms could be successfully applied. But there are two major bottlenecks for successful projects:

  1. Historical Data collection for the Model Development
    1. Real time data collection for the Model Integration

Model Development data could be downloaded in Excel or text\csv files and analyzed offline. The drawback is that this approach cannot be productized and is limited to off-line applications.

To accelerate the model development and model integration (MD\MI pipelines) for the OSIsoft PI System, TQS has developed a Python library called TQS Pandas PiFrames for OSIsoft® PI System® that connects to the PI System and provides PI data as Pandas data frames. The Pandas data frame is the preferred data structure in Python for data scientists and is supported by many ML libraries. Therefore, the TQS Pandas PiFrames for OSIsoft® PI System® can be easily integrated into ML projects in both model development and model integration.

The following shows some code examples in Python.

  1. Connecting to the PI Data Historian and PI System:


cdf = ConnectToDefaultAF()
cdf = ConnectToDefaultPI()


df = GetMultipleAttributeValuesByVariable("Bio Reactor 1",["Temperature","Concentration","Level"],'t-2h','t',60,0,None)

The resulting data frame is a time series:


The data frame can also be arranged by variable columns:

df = GetMultipleAttributeValuesByFrame("Batch_0_*","Bio Reactor 1",["Temperature","Concentration","Level"],'t-7d','t',60,0,None)

During the last couple of months, we have developed use cases around OSIsoft PI system that are based on the TQS Pandas PiFrames for OSIsoft® PI System® library:

The library has shown to significantly reduce the model development and model integration time.

SUMMARY

Machine Learning and AI projects are often slow to develop and difficult to integrate. The main reason is that most Python libraries are expecting Pandas data frames (or Numpy arrays) and these data structures are not readily available in industrial automation. TQS Integration has developed the TQS Pandas PiFrames for OSIsoft® PI System® libraries to accelerate both model development and model integration. The library is user friendly, fast and scales well for all common machine learning (ML) applications.

For information, please contact us.

Advanced data analytics is empowering process manufacturing teams across all verticals.

Enhanced accessibility into operational and equipment data has surged a transformation in the process manufacturing industry. Engineers can now see both historical and time-series data from their operation as it’s happening and at remote locations, so entire teams can be up-to-speed continuously and reliably. The only problem with this? Finding their team is “DRIP”—Data rich, information poor.

With tremendous amounts of data, a lack of proper organization, cleansing, and contextualizing only puts process engineers at a standstill. Some chemical environments have 20,000 to 70,000 signals (or sensors), oil refineries can have 100,000, and enterprise sensor data signals can reach millions.

These amounts of data can be overwhelming, but tactfully refining it can lead to greatly advantageous insights. Many SMEs and process engineers’ valuable time is filled with sorting through spreadsheets to try to wrangle the data, and not visualizing and analyzing patterns and models that lead to effective insight. With advanced analytics, process manufacturers can easily see all up-to-date data from disparate sources and make decisions based on the analysis to immediately improve operations.

Moving Up from “Data Janitors”

Moving data from “raw” to ready for analysis should not take up the majority of your subject matter experts’ time. Some organizations in today’s world still report that over 70 percent of their time involved with operational analytics is only dedicated to cleansing their data.

But your team is not “data janitors.” Today’s technology can take care of the monotonous and very time-consuming tasks of accessing, cleansing, and contextualizing data so your team can move straight to benefitting from the insights.

The Difference Between Spreadsheets and Advanced Analytics

For an entire generation, spreadsheets have been the method of choice for analyzing data in the process manufacturing industry. At the moment of analysis, the tool in use needs to enable user input to define critical time periods of interest and relevant context. Spreadsheets have been the way of putting the user in control of data investigation while offering a familiar, albeit cumbersome, path of analysis.

But the downfalls of spreadsheets have become increasingly apparent:

All of these pain points combine to an ultimate difficulty to reconcile and analyze data in the broader business context necessary for profitability and efficiency use cases to improve operational performance.

With advanced analytics, experts in process manufacturing operations on the front lines of configuring data analytics, improvements to the production’s yield, quality, availability, and bottom-lines are readily available.

How It’s Done

Advanced analytics leverages innovations in big data, machine learning, and web technologies to integrate and connect to all process manufacturing data sources and drive business improvement. Some of the capabilities include:

The Impact of Advanced Analytics

Simply put, advanced analytics gives you the whole picture. It draws relationships and correlations between specific data that need to be made in order to improve performance based on accurate and reliable insight. Seeq’s advanced analytics solution is specifically designed for process manufacturing data and has been empowering and saving leading manufacturers time and money upon immediate implementation. Learn more about the application and how it eliminates the need for spreadsheet exhaustion here.

Data Latency

The topic of system latency has come up a couple of times in recent projects. If you really think about it, this is not surprising. As more manufacturing gets integrated, data must be synchronized and\or orchestrated between different applications. Here are just some examples:

  1. MES: Manufacturing execution system typically connect to a variety of data sources, so the workflow developer needs to know timeout settings for different applications. Connections to the automation system will have a very low latency, but what is the expected data latency of the historian?
  1. Analysis: More and more companies move towards real-time analytics. But just how fast can you really expect calculations to be updated? This is especially true for Enterprise level systems, that are typically clones from source OSIsoft PI servers by way of PI-to-PI. So you are looking at a data flow for example:

    Source -> PI Data Archive (local) -> PI-to-PI -> PI Data Archive (region) -> PI-to-PI -> PI Data Archive (enterprise) and latency in each step.
  2. Reports: One example are product release reports. How long do you need to wait to make sure that all data have been collected?

The OSIsoft PI time series object provides a time stamp which is typically provided from the source system. This time stamp will bubble up though interfaces and data archives unchanged. This makes sense when you compare historical data, but it will mask the latency in your data.

To detect when the data point gets queued and recorded at the data server, PI offers 2 event queue that can be monitored:

AFDataPipeType.Snapshot ... to monitor the snapshot queue

AFDataPipeType.Archive ... to monitor the archive queue

You can use PowerShell scripts, which have the advantage of being a lighter application that can be combined with the existing OSIsoft PowerShell library. PowerShell is also available on most server, so you don't need a separate development environment for code changes.

The first step is to connect to the OSIsoft PI Server using the AFSDK:

function Connect-PIServer{
[OutputType('OSIsoft.AF.PI.PIServer')]
param ([string] [Parameter(Mandatory=$true, Position=0, ValueFromPipeline=$true,
ValueFromPipelineByPropertyName=$true)] $PIServerName)
$Library=$env:PIHOME+"\AF\PublicAssemblies\OSIsoft.AFSDK.dll"
Add-Type -Path $Library
$PIServer=[OSIsoft.AF.PI.PIServer]::FindPIServer($PIServerName)
$PIServer.Connect()
Write-Output($PIServer)
}

The function opens a connection to the server and returns the .NET object.

By monitoring the queues and writing the values, it will look like the following:

function Get-PointReference{
param ([PSTypeName('OSIsoft.AF.PI.PIServer')] [Parameter(Mandatory=$true,
Position=0, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)] $PIServer,
[string] [Parameter(Mandatory=$true, Position=1, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
$PIPointName)
$PIPoint=[OSIsoft.AF.PI.PIPoint]::FindPIPoint($PIServer,$PIPointName)
Write-Output($PIPoint)
}

function Get-QueueValues{
param ( [PSTypeName('OSIsoft.AF.PI.PIPoint')] [Parameter(Mandatory=$true,
Position=0, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)] $PIPoint,
[double] [Parameter(Mandatory=$true, Position=1, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)] $DurationInSeconds )
# get the pi point and cretae NET list
$PIPointList = New-Object System.Collections.Generic.List[OSIsoft.AF.PI.PIPoint]
$PIPointList.Add($PIPoint)
# create the pipeline
$ArchivePipeline=[OSIsoft.AF.PI.PIDataPipe]::new( [OSIsoft.AF.Data.AFDataPipeType]::Archive)
$SnapShotPipeline=[OSIsoft.AF.PI.PIDataPipe]::new( [OSIsoft.AF.Data.AFDataPipeType]::Snapshot)
# add signups
$ArchivePipeline.AddSignups($PIPointList)
$SnapShotPipeline.AddSignups($PIPointList)
# now the polling
$EndTime=(Get-Date).AddSeconds($DurationInSeconds)
While((Get-Date) -lt $EndTime){
$ArchiveEvents = $ArchivePipeline.GetUpdateEvents(1000);
$SnapShotEvents = $SnapShotPipeline.GetUpdateEvents(1000);
$RecordedTime=(Get-Date)
# format output:
foreach($ArchiveEvent in $ArchiveEvents){
$AFEvent = New-Object PSObject -Property @{
Name = $ArchiveEvent.Value.PIPoint.Name
Type = "ArchiveEvent"
Action = $ArchiveEvent.Action
TimeStamp = $ArchiveEvent.Value.Timestamp.LocalTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
QueueTime = $RecordedTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
Value = $ArchiveEvent.Value.Value.ToString()
}
$AFEvent.pstypenames.Add('My.DataQueueItem')
Write-Output($AFEvent)
}
foreach($SnapShotEvent in $SnapShotEvents){
$AFEvent = New-Object PSObject -Property @{
Name = $SnapShotEvent.Value.PIPoint.Name
Type = "SnapShotEvent"
Action = $SnapShotEvent.Action
TimeStamp = $SnapShotEvent.Value.Timestamp.LocalTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
QueueTime = $RecordedTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
Value = $SnapShotEvent.Value.Value.ToString()
}
$AFEvent.pstypenames.Add('My.DataQueueItem')
Write-Output($AFEvent)
}
# 150 ms delay
Start-Sleep -m 150
}
$ArchivePipeline.Dispose()
$SnapShotPipeline.Dispose()
}

These 2 scripts are all you need to monitor events coming into a single server. The data latency is simply the difference between the value's time stamp and the time recorded.

Measuring the data latency between 2 servers - for example a local and an enterprise server - can be done the same way. You just need 2 server objects and then monitor the snapshot (or archive) events.

unction Get-Server2ServerLatency{
param ( [PSTypeName('OSIsoft.AF.PI.PIPoint')] [Parameter(Mandatory=$true, Position=0,
ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)] $SourcePoint,
[PSTypeName('OSIsoft.AF.PI.PIPoint')] [Parameter(Mandatory=$true, Position=1,
ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)] $TargetPoint,
[double] [Parameter(Mandatory=$true, Position=2, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)] $DurationInSeconds )
$SourceList = New-Object System.Collections.Generic.List[OSIsoft.AF.PI.PIPoint]
$SourceList.Add($SourcePoint)
$TargetList = New-Object System.Collections.Generic.List[OSIsoft.AF.PI.PIPoint]
$TargetList.Add($TargetPoint)
# create the pipeline
$SourcePipeline=[OSIsoft.AF.PI.PIDataPipe]::new( [OSIsoft.AF.Data.AFDataPipeType]::Snapshot)
$TargetPipeline=[OSIsoft.AF.PI.PIDataPipe]::new( [OSIsoft.AF.Data.AFDataPipeType]::Snapshot)
# add signups
$SourcePipeline.AddSignups($SourceList)
$TargetPipeline.AddSignups($TargetList)
# now the polling
$EndTime=(Get-Date).AddSeconds($DurationInSeconds)
While((Get-Date) -lt $EndTime){
$SourceEvents = $SourcePipeline.GetUpdateEvents(1000);
$TargetEvents = $TargetPipeline.GetUpdateEvents(1000);
$RecordedTime=(Get-Date)
# format output:
foreach($SourceEvent in $SourceEvents){
$AFEvent = New-Object PSObject -Property @{
Name = $SourceEvent.Value.PIPoint.Name
Type = "SourceEvent"
Action = $SourceEvent.Action
TimeStamp = $SourceEvent.Value.Timestamp.LocalTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
QueueTime = $RecordedTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
Value = $SourceEvent.Value.Value.ToString()
}
$AFEvent.pstypenames.Add('My.DataQueueItem')
Write-Output($AFEvent)
}
foreach($TargetEvent in $TargetEvents){
$AFEvent = New-Object PSObject -Property @{
Name = $TargetEvent.Value.PIPoint.Name
Type = "TargetEvent"
Action = $TargetEvent.Action
TimeStamp = $TargetEvent.Value.Timestamp.LocalTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
QueueTime = $RecordedTime.ToString("yyyy-MM-dd HH:mm:ss.fff")
Value = $TargetEvent.Value.Value.ToString()
}
$AFEvent.pstypenames.Add('My.DataQueueItem')
Write-Output($AFEvent)
}
# 150 ms delay
Start-Sleep -m 150
}
$SourcePipeline.Dispose()
$TargetPipeline.Dispose()
}

Here is a quick test of a PI2PI interface reading and writing to the same server:

Get-Server2ServerLatency $srv $srv sinusoid sinusclone 30

As you can see the difference between target and source is a bit over 1 sec, which is to be expected since the scan rate is 1 second.

SUMMARY

Data latency is a key metric for every system that captures, stores, analyses, or processes data. Every sequential operation will add to the overall system latency and must be accounted for. It is not only the data transport over networks that is the major contributor, but also data queues that facilitate the packaging of data into messages that add significant delays. This topic is especially important for cloud-based systems that rely on on-premises sensor data.

As shown in this blog, data latency can and should be measured and be part of the architectural planning process. As a rule of thumb, sub second data latencies are challenging especially when the number of data sources increases.

Please contact us for more information.

Array
© All rights reserved. 
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram