登录查看更多内容

Tips to compress changing data without halting the live system that is changing the data

Deepak Kumar

Propelling AI To Reinvent The Future ||Author|| 150+ Mentorship|| Leader || Innovator || Machine learning Specialist || Distributed architecture | IoT | Cloud Computing

发布日期: 2016年6月22日

Introduction

Laymen explanation

Think that you want to compress a log file. This file will be written by software and so, file will be modified anytime. So, compressed file may be corrupt sometime. Or a bit better, compression tools like zip will throw error and deny producing compressed file.

Technical explanation

To ensure that file is not being modified while compression, snapshot can be taken . And then apply the compression over snapshot. Operating system should have feature to snapshot. For example, Linux LVM provides snapshot functionality

Linux LVM Snapshot approach

A snapshot volume is a special type of volume that presents all the data that was in the volume at the time the snapshot was created.

A wonderful facility provided by LVM is 'snapshots'. This allows the administrator to create a new block device which presents an exact copy of a logical volume, frozen at some point in time. Typically this would be used when some batch processing, a backup for instance, needs to be performed on the logical volume, but you don't want to halt a live system that is changing the data. When the snapshot device has been finished with the system administrator can just remove the device. This facility does require that the snapshot be made at a time when the data on the logical volume is in a consistent state

Care while using various compression tools

Linux Tar/gzip tool verifies the data and if inconsistency is found (which is true since data changed), then it throws warning and returns non-zero value to the shell. It is important to act on this warning otherwise we may be in surprise mode.
Moreover, this warning will be occasional and so, test cases should be diverse enough to catch such issues.

In best spirit, this should be taken care in design phase.

要查看或添加评论，请登录

Deepak Kumar的更多文章

Role of DBSCAN in machine learning

2023年12月21日

Role of DBSCAN in machine learning

Why to read this? Density-based spatial clustering of applications with noise (DBSCAN)is a well-known data clustering…
Choice between multithreading and multi-processing: When to use what

2023年12月20日

Choice between multithreading and multi-processing: When to use what

Introduction Single threaded and single process solution is normal practice. For example, if you open the text editor…
Artificial Narrow Intelligence

2023年12月18日

Artificial Narrow Intelligence

About ANI ANI stands for "Artificial Narrow Intelligence." ANI refers to artificial intelligence systems that are…
Federated learning and Vehicular IoT

2023年11月29日

Federated learning and Vehicular IoT

Definition Federated Learning is a machine learning paradigm that trains an algorithm across multiple decentralised…
An age old proven technique for image resizing

2023年7月14日

An age old proven technique for image resizing

Why to read? Anytime, was you curious to know how you are able to zoom small resolution picture to bigger size?…

1 条评论
Stock Market Volatility Index

2023年7月12日

Stock Market Volatility Index

Why? Traders and investors use the VIX index as a tool to gauge market sentiment and assess risk levels. It can help…
The case for De-normalisation in Machine learning

2023年7月8日

The case for De-normalisation in Machine learning

Why? The need for inverse normalization arises when you want to interpret or use the normalized data in its original…

1 条评论
Kubernetes complements Meta-verse

2023年7月4日

Kubernetes complements Meta-verse

Motivation The #metaverse is a virtual world or space that exists on the #internet . It's like a big interconnected…

1 条评论
Which one offers better Security- OSS or Proprietary software

2023年6月24日

Which one offers better Security- OSS or Proprietary software

Motivation World is using so many OSS. Apache Kafka is a core part of our infrastructure at LinkedIn Redis is core part…
Why chatGPT/LLM should have unlearning capability like human has..

2023年5月29日

Why chatGPT/LLM should have unlearning capability like human has..

Executive Summary Do you know, chatGPT/LLM has this open problem to solve. This problem(unlearn) has potential to…

1 条评论

See all articles

Tips to compress changing data without halting the live system that is changing the data

Deepak Kumar

Propelling AI To Reinvent The Future ||Author|| 150+ Mentorship|| Leader || Innovator || Machine learning Specialist || Distributed architecture | IoT | Cloud Computing

Introduction

Laymen explanation

Technical explanation

Linux LVM Snapshot approach

Care while using various compression tools

Deepak Kumar的更多文章

社区洞察

其他会员也浏览了

The Power of Span<T>: Managing Data in Memory Efficiently with C#

How to Use the Nexus Cache Tree (and Why Search Capabilities Are Priceless)

LinkedIn Article: Unveiling the XOR Linked List: A Memory-Efficient Data Structure

?? Understanding Normalization in Databases

The Math Behind Broken Links.

Copy Data on DataBox Using Robocopy

Lightning Data Service is unaware of the data that's cached by Apex methods.

Configure the application name in the connection string of your applications

How to insert data in one to one relationship in database?

Hardware RAID: What You Need to Know

Introduction

Laymen explanation

Technical explanation

Linux LVM Snapshot approach

Care while using various compression tools

Deepak Kumar的更多文章

Role of DBSCAN in machine learning

Choice between multithreading and multi-processing: When to use what

Artificial Narrow Intelligence

Federated learning and Vehicular IoT

An age old proven technique for image resizing

Stock Market Volatility Index

The case for De-normalisation in Machine learning

Kubernetes complements Meta-verse

Which one offers better Security- OSS or Proprietary software

Why chatGPT/LLM should have unlearning capability like human has..

社区洞察

其他会员也浏览了

The Power of Span<T>: Managing Data in Memory Efficiently with C#

How to Use the Nexus Cache Tree (and Why Search Capabilities Are Priceless)

LinkedIn Article: Unveiling the XOR Linked List: A Memory-Efficient Data Structure

?? Understanding Normalization in Databases

The Math Behind Broken Links.

Copy Data on DataBox Using Robocopy

Lightning Data Service is unaware of the data that's cached by Apex methods.

Configure the application name in the connection string of your applications

How to insert data in one to one relationship in database?

Hardware RAID: What You Need to Know