A novel way to manage accumulating IIoT bigdata (Feat. Mount)
Andrew Kim
Founder and CEO of Time-Series DBMS, MACHBASE, Global #1 on TPC-IoT IIoT, Manufacturing, Smart Factory, Smart City and All about Industrial IoT Data
INTRO
In various fields, commonly referred to as Smart-X, industrial big data, called sensor data or time series data, is increasingly being generated, and customers are storing and managing it through various types of technology and software. In addition, recently, because there are constraints such that data for AI learning cannot be immediately discarded or deleted after a certain period of time as before, how to manage such data has become an important issue.
I wonder if there is a simple way to manage data stored in a database or plain text, but there is a novel method proposed by Macbase, so let's take a look at how effective this method is.
Difficulties in managing massive amounts of IIoT big data
Limitations of traditional database backup concept
(The place where data is stored will be limited to “databases” such as MySQL, MaridDB, MongoDB, influxDB, etc.)
In general, the storage size of the equipment where the database is installed is limited, and old data that is thought to no longer be accessed, i.e. cold data, is generally deleted from the online database after backup.
However, when trying to manage large amounts of data by utilizing the database's unique function called backup, there are some practical difficulties as follows.
First, the scope of the backup target cannot be managed.
Backup in a traditional database is largely divided into a method of backing up the entire data (full backup) and a method of backing up only the changed parts (incremental backup). The reason for this division of methods is that backing up the entire database every time is not only inefficient, but also a huge waste of time and resources. However, due to the nature of data such as IIoT, backup for a “specific time range” and a “specific table” are also required for backup.
Second, the restore time for extracting backed up data is excessive. (You cannot access the backup data without restoring it)
The basic concept of database restoration is to revert the database instance to its backup copy. To achieve this, it is necessary to redo the contents of the existing backup copy back to the database, which is an enormous event that takes several hours to several days depending on various environments. In any case, once this long recovery process is completed, it becomes possible to access past data and conduct analysis and data exploration.
On the other hand, there is also a way to EXPORT/IMPORT a specific table or the entire database rather than using this unique backup function, but in this case, not only does it take up more storage space than expected as data in text format, but it also requires indexing after loading the data. Since data access is only possible after this is completed, this is also a very unrealistic method that still requires excessive time and resource costs from the perspective of data restoration.
Third, when restoring a database through backup, changes to the original database occur.
Since the concept of backup is to return the instance to that point in time, the shape of the current database inevitably changes, making it impossible to maintain the latest data. So, in most cases, restoration of backup data is performed only when a problem occurs in the original database due to a failure.
Backup method and data service issues
As mentioned earlier, "database restoration" is a technique for cases where the original database fails and can no longer be used.
But what if there is a frequent need to service (or extract) backed up historical data?
Let's assume that there is an organization that has a policy of performing a full backup at the end of each month and deleting all data according to the storage policy or backup policy.
If you detect a defect in a product produced on a specific day three months ago and need to submit data about what happened at that time, you will have to go through the restoration and data extraction process above, taking several hours each time. The problem is that it is a very inefficient and slow method, but there is no other solution.
Or, if an AI analysis organization periodically requests time, sensors, and data range under various conditions, it is clear that it will be very difficult to quickly provide data by restoring dozens of backup files backed up monthly for a long time each time.
In other words, if a large amount of data has been backed up even once, it can be said that rapid data service (extraction) of the stored data is very difficult using existing technology.
Imagine a new way to manage data
Because storage space is not infinite, data inevitably has to be backed up multiple times, but let's use our imagination and look at this problem from a different perspective.
Machbase Neo’s novel IIoT data management method
The figure below is a brief illustration of how Macbase Neo supports the new data management method mentioned in the above section. (really?)
Backup steps
1. The user decided to back up all data from the time range TIME-0 to TIME-1 for the entire database.
2. And, to improve storage space efficiency, all data (B1) from table B in the time range TIME-0 to TIME-1 is deleted.
In this case, when a backup is performed using the query below, a set of backed up files is created under the specified directory.
BACKUP DATABASE FROM time-0 TO time-1 INTO DISK = '/tmp/MYBKUP';
DELETE FROM B BEFORE time-1;
Once the above process is completed, a backup copy is created, all data in table B is deleted, and all processes have been performed for their original purpose. (Let’s say the above backup copy was later moved to /backup-store, a cheap space)
Mount steps
And, for various reasons, it became necessary to access and extract data contained in B1 of table B, which was deleted.
At this time, take the following steps to immediately check the data using the backup copy.
MOUNT DATABASE '/backup-store/MYBKUP' to newdb;
In this process, the data in the backed-up space is mounted as a new database called newdb, and all indexes are completed and all access is ready. (This process is completed within seconds!!)
Utilization stage
As briefly mentioned earlier, this mounted database shows amazing performance by being able to perform any time series query very quickly because not only the internal data but also the index have all been restored to their original state.
One thing to note is that when specifying a mounted table, it must be described in the form of database name, user name, and table name.
SELECT * from newdb.sys.B WHERE ?????
As shown above, you can access data by specifying the administrator user name sys and table name B in the table newdb. (Of course, note that data areas mounted in this way are always read-only.)
What a revolutionary data management interface is this?!!
I dare say that this concept of backing up a specific space in a database and immediately accessing the data through “mounting” it without a restore process is one of the most innovative approaches in database history!
Unmount steps
Now that you have finished using all the data, you can completely forget about its existence using the unmount command as shown below.
UNMOUNT DATABASE newdb;
Advantages and Considerations of Mounted Databases
Mounted databases are almost identical to online tables in terms of data extraction, but there are a few things to keep in mind.
Ultra-fast data restoration time
It's literally the blink of an eye. Even if you have backed up several terabytes of data, the mount quickly (a few seconds at most) becomes accessible as an additional database. You will quickly realize how revolutionary this is when you think about restoring database files backed up in existing mysql or oracle and extracting specific data.
Index maintenance for ultra-fast data access
The mounted database itself is fully prepared with time series indexes for high-speed data access. This means that the internal form of the data is completely identical to the data at the moment of backup, in line with the original purpose of data restoration.
Maintains perfectly identical rollup structure
One of the biggest features that Macbase boasts is that it can provide statistical results in real time for an arbitrary time range. If a rollup table is created, all the same data structures are maintained during backup, providing a completely identical data access environment where long-term statistical data can be obtained even when mounted.
Read only
As the title suggests, mounted tables are read-only. You may be mistaken in remembering that file systems in Unix systems can be mounted in write mode.
Therefore, you can assume that only SELECT queries are possible. This is natural from the data management/service perspective of the backup copy, but if you want to use the backup copy for reading/writing, you must perform a restoration process (Resotre) that overwrites the entire database.
In the case of Macbase Neo, it can be restored as follows. (For the classic version, please refer to the corresponding manual.)
machbase-neo restore --data machbase-home directory backup file-directory name
ex) machbase-neo restore --data /data/machbase_home /tmp/backup
V$mount table _STAT not available
V$XX_STAT, a tag-specific statistics table updated in real time, is not supported. This is also a subject for future improvement, but in most cases, if the purpose is mainly to extract data from tags, it may not be a big problem.
If the size of the data being backed up is very large, it may be a good idea to download V$AllTables_STAT separately and refer to it before backup.
Take advantage of backup
It's time to try this revolutionary mount feature rather than just see it!
1. Install Neo and get neo-apps source code
This installation-related content is exactly the same as the previous blog content. For the actual installation method, refer to steps 1 (Neo installation) and 2 (neo-apps download) in the link in the previous blog, and make sure to obtain the latest Macbase Neo version after 8.0.18-rc3a if possible. And then, just come back here. (From number 2 onwards, just follow the instructions below again)
When installing Machbase Neo, if possible, use version 8.0.18-rc4 or later.
2. Check backup-mount directory
3. Create schema and confirm data entry
In the backup-mount demo, two tables are created, there is one tag for each table, and one data is entered per minute from January 1, 2023 to March 31, 2023.
Therefore, after opening 0-Schema Data.wrk below, let's create a schema in order and enter data. Data input is prepared in advance through TQL.
If you press the red buttons above in order, schema creation and data entry will be completed for 129,600 cases for each table. (In case 1, two tables are created, so place the cursor on each line and click twice.)
You can check the number of entered data as shown below. (Available only in versions 8.0.18-rc4 and later)
领英推荐
And, if you open 1-Chart online.dsh, you can check the dashboard with 3 months' worth of data entered as shown below. (This is an average chart of all data without rollup, so it may take a bit of time)
4. Taste full backup and mount
Now that we have all our data, let's back up the entire database and actually mount it.
You can try the above process by opening 2-Full Backup and Mount.wrk and performing the steps in order.
All Database backup
BACKUP DATABASE INTO DISK='/tmp/mybkup2';
Executing the above command backs up all current databases to the given directory. Please note that this backup is an online backup and therefore has no effect on other operations.
If you actually look at the directory, you can see that it consists of multiple files as shown below.
sjkim@gamestar:~$ ls /tmp/mybkup2 -l
total 348
drwxrwxr-x 8 sjkim sjkim 4096 May 25 11:16 TAG_TABLESPACE
-rw-rw-r-- 1 sjkim sjkim 1148 May 25 11:16 backup.dat
-rw-rw-rw- 1 sjkim sjkim 2722 May 25 11:16 backup.trc
drwxrwxr-x 103 sjkim sjkim 4096 May 25 11:16 machbase_backup_19700101090000_20240525111644_27
-rw-r--r-- 1 sjkim sjkim 139264 May 25 11:16 meta.dbs-0
-rw-r--r-- 1 sjkim sjkim 143360 May 25 11:16 meta.dbs-1
-rw-r--r-- 1 sjkim sjkim 8192 May 25 11:16 meta.dbs-2
-rw-r--r-- 1 sjkim sjkim 36864 May 25 11:16 meta.dbs-3
-rw-r--r-- 1 sjkim sjkim 0 May 25 11:16 meta.dbs-4
-rw-r--r-- 1 sjkim sjkim 12288 May 25 11:16 meta.dbs-5
Check mount and data
Let's test whether Macbase Neo can mount the backed up database file internally as a new database instance. The command is as follows, and the new database name is mountdb.
MOUNT DATABASE '/tmp/mybkup2' to mountdb;
In less than 1 second, this command is declared successful.
If you check in Neo's menu, the list shows that MOUNTDB exists as shown below. It has become a mount!
Now, if you run various queries in SHELL to see if the data is properly entered, you can successfully check the data as follows.
Note that the table name below is not mytab_A, but mountdb.sys.mytab_A, which is a mounted database.
Unmount
When the data verification process is completed and this database is no longer needed, traces can be easily removed using the unmount command as shown below.
UMOUNT DATABASE mountdb;
Screenshot
If you access the table that has already been unmounted above, you can see that an error occurs.
Now we understand a novel and convenient way to manage IIoT data using the backup/mount/unmount feature!
5. Backup and Mount Time Range Database
If we backed up the entire database in step 4, now let's back up all tables that exist in a specific time range. It can be seen at a glance that specifying this time range is a very useful function from the perspective of data managers classifying and arranging data according to their own data management policies. The demo uses the prepared worksheet 3-Time Range DB Backup and Mount.wrk.
Time range database backup
The syntax of this backup is as follows.
BACKUP DATABASE FROM starting time TO end time into DISK=Target folder;
Very simple. The given backup sample is an example of backing up all data for the month of January to the mybkup3 folder, as shown below.
BACKUP DATABASE FROM TO_DATE('2023-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')
TO TO_DATE('2023-01-31 23:59:59','YYYY-MM-DD HH24:MI:SS')
INTO DISK='/tmp/mybkup3';
DELETE FROM mytab_B before TO_DATE('2023-01-31 23:59:59','YYYY-MM-DD HH24:MI:SS');
After backing up, delete all January data in Table B for testing purposes. As a result, in the online database, Table A holds data for January, February, and March, Table B only holds data for February and March, and the backed up database contains all data.
Mount and check data
As shown below, you can see that January's data has disappeared from mytab_B in the online database, and only January's data is stored intact in the mounted mytab_B table.
Unmount
As before, you can easily remove traces using the unmount command as shown below.
UMOUNT DATABASE mountdb;
A wave of emotion comes over me. This means that you can check the data by mounting the database backed up in a specific time range in the desired format at the desired time, and have it disappear immediately when necessary!
6. Backup and mount specific tables
Now, let's back up not the entire database but a specific tag table with a specific time range, and mount it to check the data. Please refer to this example as it is provided in 4-Time Range Table Backup and Mount.wrk.
Time range table backup
The syntax of this backup is as follows. (Of course, this is also possible without a time range)
BACKUP TABLE ???? FROM ???? TO ??? into DISK=????;
Very simple. The given backup sample is an example of backing up all March data of table mytab_A to the mybkup4 folder, as shown below.
BACKUP TABLE mytab_A FROM TO_DATE('2023-03-01 00:00:00','YYYY-MM-DD HH24:MI:SS')
TO TO_DATE('2023-03-31 23:59:59','YYYY-MM-DD HH24:MI:SS')
INTO DISK = '/tmp/mybkup4';
Check mount and table data
The mount method is the same as before, as shown below, and as shown in the results below, you can see that mytab_A in the backed-up space has only 3 months' worth of data entered, and mytab-A in the online space has all 3 months' data entered.
Unmount
As before, you can easily remove traces using the unmount command as shown below.
UMOUNT DATABASE mountdb;
In other words, I saw with my own eyes that I could back up not only the entire database or the database in a time range, but also a specific time range of a specific table, and mount it with the desired database name at the desired time to check the data at any time.
7. Mount multiple items at the same time
Lastly, let's mount not just one but multiple backups at the same time and visualize them. For this purpose, execute the prepared files 5-Mount All.wrk, 6-Chart All.dsh, and 7-Umount All.wrk in that order.
Mount multiple
Let's back up the three files backed up so far at once, as shown in the SQL below.
As shown above, we confirmed that multiple mounts could be successfully mounted at the same time. Additionally, you can check the layout of the table as shown below. (Available in 8.0.18-rc4 or higher)
7. Mount multiple items at the same time
Now, if you open chart 6 to see the distribution of the entire data, it is as follows. (It takes some time to print. This is a chart that prints all data.)
As shown in the figure above, the first backup (MOUNT-2 Tables) contains all data from January, February, and March of the two tables, and the second backup (MOUNT-3 Tables) contains only data from January. In the case of the third backup (MOUNT-4 Tables), you can see that table A contains only data from March.
In conclusion, by using Macbase Neo's backup/mount function, you can not only back up data under various conditions, but also mount the prepared backup file at high speed at the desired time to immediately search, extract, and convert data. You can receive powerful features.
Unmount
Proceed as follows in the same way as before.
umount database mountdb2;
umount database mountdb3;
umount database mountdb4;
Considerations for establishing a corporate data management policy
Decision point
In order to establish a data management policy at the corporate level, various options depending on the corporate environment must be considered, and there will be roughly the following decision-making points.
Let's not forget that the core foundation of the above decision-making considerations is the "mount function that allows anyone to easily access backed-up data at any time." Without this function, the service itself to data consumers would have to be provided in a completely different form.
Data Services What-If Scenario
Although the below is a fictitious representation of a specific company's service type, I will share it as it is useful as a reference.
"Currently, the company stores a large amount of data from the XX manufacturing process in real time. The data is utilized for various data services under various conditions from a "data analyst organization" to improve manufacturing quality, and for this purpose, convenient, fast, and innovative internal The data service model was constructed as follows."
END
The “Mount” function of Macbase Neo introduced so far is a truly innovative function from a big data management and service perspective. In particular, if online service data cannot be stored in one place indefinitely and the data needs to be backed up in some form while irregular access to the backup data is required, I am sure that it will be difficult to find a more convenient method than this "mount" function.
Conditions will vary from company to company, but I will end this here in the hope that Macbase Neo's mount function will be of some help in internal data innovation and service quality improvement.