OLAP vs. OLTP: Different point of views cause data issues

OLTP and OLAP data processing In the context of the data space OLTP and OLAP are two fundamental concepts that serve different but complementary purposes, both in managing and utilising data within organisations.

OLTP at a glance Basic role

OLTP systems are designed to manage transaction-oriented applications. As such, they are optimised to handle large numbers of short atomic (i.e. indivisible), and often simple operations.

Primarily, these operations are involved in many day-to-day transactions that help a business operate like order entry, retail sales, financial transactions, and customer relationship interactions.

Key characteristics

  • Operationally focused, directly involved in business operations
  • High throughput for short operations like create, read, update, and delete (CRUD)
  • Ensures data integrity in real-time
  • Deal with current data that is highly detailed

Use case examples

  • Banking systems
  • Online booking systems
  • Retail point of sale (POS) systems

OLAP at a glance Basic role

OLAP systems are designed for query-intensive and complex analytical processes. These systems are optimised for data analysis and geared toward querying large sets of data at once to aid in looking for patterns, trends, and insights.

The primary role of OLAP is to aid in decision-making processes, like data mining, business intelligence (BI), and trend analysis.

Key characteristics

  • Analytically focused, used to drive strategic business decisions
  • Very efficient in handling complex queries that involve aggregations and joining large datasets
  • When used, data is often consolidated from various OLTP databases and other sources
  • Involves historical data analysis, as OLAP systems allow users to work on large volumes of historical data to derive insights over time

Use case examples

Financial reporting, customer relationship management (CRM) analytics, data warehousing systems

How OLTP and OLAP work together?

In integrated data warehousing environments, OLTP and OLAP will often share a data source-destination relationship. OLTP systems collect data through daily business operations and then pass it to OLAP systems for analysis. As part of this process, OLTP systems typically use an Entity-Relationship (ER) model, optimised for transactional efficiency. OLAP systems, alternately, will use a star schema or snowflake schema more optimised for reading and analysing data.

This congruent relation is important to highlight, as OLTP and OLAP, despite their differences, function as complementary systems. Due to OLTP’s focus on efficient transaction processing and OLAP’s on complex data analysis, both are often necessary for a comprehensive data strategy that is two parts of a whole.

Appreciation of this interplay between OLTP and OLAP is important, as it lays a foundation for helping us appreciate how these complementary systems foster such a surprising amount of discord amongst data professionals.

Drilling down into the “vs.” in OLAP vs. OLTP

  1. Separate objectives and priorities As the data producers in this equation, OLTP systems are employed in an organisation to manage and process transactional data. Use cases like updating inventory and managing customer information requires immediate, real-time data. This necessitates that OLTP users prioritise operational efficiency, preserving data integrity, and sheer speed.

OLAP system users, as data consumers, have very different priorities. They are looking to cast a slow, thorough gaze across large amounts of historical data in an attempt to identify trends that can improve strategic decision-making. This requires a combination of data aggregation, complex queries, and measured data analysis in order to extract insights and support long-term planning.

  1. Different data structures and models OLTP data producers use a normalized data model in order to reduce data redundancy and maintain data integrity. Normalized data models are also easy to organise and update, in addition to being very consistent.

However, while these factors benefit transactional processes, normalized data models are inefficient when used for complex queries, requiring data from multiple tables. This is exactly what data analysis requires, which is why OLAP data systems frequently utilize a denormalized data model. These models organize data into large comprehensive models that facilitate querying which can be both quick and complex, by comparison.

  1. Misaligned interpretations of data Holding very distinct objectives, producers and users may understandably interpret the same data in very different ways. Again, producers are looking for the quickest and most efficient ways to record and retrieve transactional data so that it remains highly in sync with the real-world operations of a business. But for the consumers, working to identify trends through moment-to-moment changes would be like studying individual grains of sand to track the ebb and flow of an entire coastline.
  2. Distinct languages OLTP and OLAP users quite literally end up speaking very different languages.

OLTP users typically leverage more technical, operational language in their work. Terms and concepts in the world of real-time transactional processes involve database management, application performance, and transactional integrity. Water-cooler discussion might center on resolving issues related to query speed, system reliability, and database normalization.

Meanwhile, in the world of the OLAP user, language will be much more business and analytically focused, including business intelligence (BI) terminology, statistical analysis, and data visualization concepts. Around this respective water cooler, practitioners might instead discuss data warehousing, business metrics, dimensional modelling, and big data analytics.

  1. Data quality and transformation issues As data moves from an OLTP to an OLAP system, issues can arise regarding data quality and transformation. Attempts to resolve these issues through denormalization or integration from additional sources can result in misalignment concerning data accuracy, timeliness, and completeness.
  2. Complexity in business logic The business logic used to great effect in OLTP environments may not translate cleanly into an OLAP system.

For example, in a sales environment, every time a customer makes a purchase, inventory needs to be updated, sales details need to be captured, and loyalty or rewards points might even need to be tallied–all of which an OLTP system would log behind the scenes. Business logic in this example would involve immediate updates to ensure inventory accuracy, recording of sales details, and updates to customer profiles. Stock quantities need to be decremented and loyalty points incremented.

Downstream, however, OLAP users are working to analyze sales data over time to help stakeholders make strategic decisions, like understanding the seasonality of customer buying patterns and predicting future inventory needs. This changes the business logic. Here, the focus is not on individual transactions and instead is on aggregating the sales data over time. Users and their systems might look to calculate average sale values per customer, or to review years of historical sales data to improve their ability to forecast future stock requirements.

No surprise then, that in our example business logic from the former would not seamlessly transfer to the latter. While still complementary, disparate logic would need to be corrected each time a data transfer is made, resulting in the aforementioned discrepancies with how the data needs to be understood and used.

  1. Different performance metrics Finally, both professional camps would be equally concerned with measuring success. But while OLTPs would be judged based on metrics like transaction throughput and response time, OLAPs would instead like to be indicators of data freshness and query response time. Even when all involved in a given organisation understand the metrics used by the other party, there’s no reason to believe they’ll inherently value them accordingly.

Whats the way to solve? Data Contracts

A data contract is a formal agreement between two or more parties that outlines specific terms and conditions of data sharing. These contracts can exist between organizations, systems, or individuals.

Where traditional contracts exist as written or spoken agreements, data contracts tend to encompass a combination of documents, tools, and artifacts that provide clear specifications, assurances, and systems to monitor and manage the data exchange between parties.

In practice, the formal agreement that data contracts represent will marshal the exchange, handling, storage, and usage of data. Contracting parties agree to ensure that any data outlined in the contract is managed accurately and securely, remaining in compliance with relevant regulations.

These agreements aren’t hypothetical. A contract only works if it can be enforced. This is why data contracts that involve software should be programmatic.

Operationally, contracts will commonly include the following:

Schema definitions: A data contract should explicitly define the schema, including semantics, so the structure of data is clear. As part of a contract, schema definitions may involve specifying which formats, structures (e.g., Avro or JSON), and data types will be used.

Validation rules: These rules make sure all ensuing data adheres to the defined schema and will meet data quality standards the contracting parties require.

Access control: Data contracts may specify who approved data producers and data consumers will be for the duration of time specified within the contract.

Data flow management: Contracts typically outline how datasets should flow between specific systems like data pipelines and data ingestion processes. They may also determine how data should be managed as it moves between contracting parties, such as data engineers and software engineers.

Communication protocols: API specifications and communication protocols may also be outlined to standardize the data exchange between systems and platforms.

Versioning: Data contracts implement versioning to manage changes made to the data schema. This management ensures that all changes, including breaking changes, will be handled in a way that won’t disrupt existing systems.

Dependency and metadata management: To make sure changes don’t negatively impact dependent systems, dependencies between different data entities may also be managed as part of a data contract. Data contracts may also dictate how metadata will be managed and exchanged between systems, ensuring data will be understandable and usable.

Compliance and auditing: Data contracts will ensure that all data exchange and management adhere to legal and compliance requirements. Mechanisms for auditing and tracking data usage and changes can also be specified.

Error handling: Finally, as much as data contracts work to make sure things go right, they should also guide responses when things go wrong. Defining how errors and discrepancies in data will be handled ensures that any issues that arise will be logged, addressed, and communicated properly.


Conclusion

In the ever-evolving landscape of data management, the age-old wisdom of thought leaders like Benjamin Franklin remains strikingly relevant. If “an ounce of prevention is worth a pound of cure,” the importance of proactive measures in data handling cannot be overstated.?

要查看或添加评论,请登录

Eeshan Pandey的更多文章

社区洞察

其他会员也浏览了