Managing Data in vf-OS: Storage, Analytics and ETL
As manufacturing moves closer to the consumer, with mass customisation demands and lot-size one products, industrial processes call for deep reorganisation enabling flexibility and improved dynamics. In this challenging context, the multiplicity ofmanufacturing plants and locations around the world, each with individual legacy systems, require data management operations to consolidate, integrate and standardize information across the different systems and sites.
ETL (Extract, Transform, Load) is a basic process when the topic is data interoperability. In fact, sharing and analysing data plays a key role in the interoperability of the new industrial processes. Data is usually sent from one system to another, and within the supply chain scenario, from one organisation to another. As such, ETLs are fundamental for making use of valuable data and reach a good understanding of its meaning and quality. To this end, vf-OS provides the vf-OS Data Harmonisation component, a mechanism to create ETL routines in an easy and friendly way. The vf-OS Data Harmonisation component allows the industrial software developers to develop their own routines, to map between data formats and to seamlessly integrate data coming from sensors into the vApps or other vf-OS components that require this data to be in a specific format. These ETL routines are possible to be deployed in self-executable libraries allowing them to be integrated into the vApp data flow.
An additional step in the data management flow, is the persistence of the data for its later usage, at any moment, within the vApp or to perform data exploitation activities in the form of data analytics. vf-OS offers the possibility of having access to three types of storages depending on the data structure: (i) a relational database: useful when the data to be persisted has a defined fixed structure and should not be changed; (ii) a document-oriented storage: specifically developed to allow storing document or non-structured data; and (iii) a time-series database: used for storing data time-intensive data, i.e., data coming from the sensors.
The final step is the data exploitation. To allow a business-oriented exploitation of data that targets the purpose of the industrial organisation, user-friendly mechanisms to facilitate data analysis and knowledge extraction must be provided. In fact, and despite the inherent value of raw data, this is normally not exploitable nor profitable type of data. Thus, having a component that allows the generation of machine learning libraries to extract knowledge from raw data and provide wisdom for the people taking the business decisions is, undoubtfully, a must.