OLAP vs. OLTP: Different point of views cause data issues
Eeshan Pandey
Risk Consulting | Platform Tech | Compliance Strategy | Finance Transformation | Tech Convergence | Global Citizen
OLTP and OLAP data processing In the context of the data space OLTP and OLAP are two fundamental concepts that serve different but complementary purposes, both in managing and utilising data within organisations.
OLTP at a glance Basic role
OLTP systems are designed to manage transaction-oriented applications. As such, they are optimised to handle large numbers of short atomic (i.e. indivisible), and often simple operations.
Primarily, these operations are involved in many day-to-day transactions that help a business operate like order entry, retail sales, financial transactions, and customer relationship interactions.
Key characteristics
Use case examples
OLAP at a glance Basic role
OLAP systems are designed for query-intensive and complex analytical processes. These systems are optimised for data analysis and geared toward querying large sets of data at once to aid in looking for patterns, trends, and insights.
The primary role of OLAP is to aid in decision-making processes, like data mining, business intelligence (BI), and trend analysis.
Key characteristics
Use case examples
Financial reporting, customer relationship management (CRM) analytics, data warehousing systems
How OLTP and OLAP work together?
In integrated data warehousing environments, OLTP and OLAP will often share a data source-destination relationship. OLTP systems collect data through daily business operations and then pass it to OLAP systems for analysis. As part of this process, OLTP systems typically use an Entity-Relationship (ER) model, optimised for transactional efficiency. OLAP systems, alternately, will use a star schema or snowflake schema more optimised for reading and analysing data.
This congruent relation is important to highlight, as OLTP and OLAP, despite their differences, function as complementary systems. Due to OLTP’s focus on efficient transaction processing and OLAP’s on complex data analysis, both are often necessary for a comprehensive data strategy that is two parts of a whole.
Appreciation of this interplay between OLTP and OLAP is important, as it lays a foundation for helping us appreciate how these complementary systems foster such a surprising amount of discord amongst data professionals.
Drilling down into the “vs.” in OLAP vs. OLTP
OLAP system users, as data consumers, have very different priorities. They are looking to cast a slow, thorough gaze across large amounts of historical data in an attempt to identify trends that can improve strategic decision-making. This requires a combination of data aggregation, complex queries, and measured data analysis in order to extract insights and support long-term planning.
However, while these factors benefit transactional processes, normalized data models are inefficient when used for complex queries, requiring data from multiple tables. This is exactly what data analysis requires, which is why OLAP data systems frequently utilize a denormalized data model. These models organize data into large comprehensive models that facilitate querying which can be both quick and complex, by comparison.
领英推荐
OLTP users typically leverage more technical, operational language in their work. Terms and concepts in the world of real-time transactional processes involve database management, application performance, and transactional integrity. Water-cooler discussion might center on resolving issues related to query speed, system reliability, and database normalization.
Meanwhile, in the world of the OLAP user, language will be much more business and analytically focused, including business intelligence (BI) terminology, statistical analysis, and data visualization concepts. Around this respective water cooler, practitioners might instead discuss data warehousing, business metrics, dimensional modelling, and big data analytics.
For example, in a sales environment, every time a customer makes a purchase, inventory needs to be updated, sales details need to be captured, and loyalty or rewards points might even need to be tallied–all of which an OLTP system would log behind the scenes. Business logic in this example would involve immediate updates to ensure inventory accuracy, recording of sales details, and updates to customer profiles. Stock quantities need to be decremented and loyalty points incremented.
Downstream, however, OLAP users are working to analyze sales data over time to help stakeholders make strategic decisions, like understanding the seasonality of customer buying patterns and predicting future inventory needs. This changes the business logic. Here, the focus is not on individual transactions and instead is on aggregating the sales data over time. Users and their systems might look to calculate average sale values per customer, or to review years of historical sales data to improve their ability to forecast future stock requirements.
No surprise then, that in our example business logic from the former would not seamlessly transfer to the latter. While still complementary, disparate logic would need to be corrected each time a data transfer is made, resulting in the aforementioned discrepancies with how the data needs to be understood and used.
Whats the way to solve? Data Contracts
A data contract is a formal agreement between two or more parties that outlines specific terms and conditions of data sharing. These contracts can exist between organizations, systems, or individuals.
Where traditional contracts exist as written or spoken agreements, data contracts tend to encompass a combination of documents, tools, and artifacts that provide clear specifications, assurances, and systems to monitor and manage the data exchange between parties.
In practice, the formal agreement that data contracts represent will marshal the exchange, handling, storage, and usage of data. Contracting parties agree to ensure that any data outlined in the contract is managed accurately and securely, remaining in compliance with relevant regulations.
These agreements aren’t hypothetical. A contract only works if it can be enforced. This is why data contracts that involve software should be programmatic.
Operationally, contracts will commonly include the following:
Schema definitions: A data contract should explicitly define the schema, including semantics, so the structure of data is clear. As part of a contract, schema definitions may involve specifying which formats, structures (e.g., Avro or JSON), and data types will be used.
Validation rules: These rules make sure all ensuing data adheres to the defined schema and will meet data quality standards the contracting parties require.
Access control: Data contracts may specify who approved data producers and data consumers will be for the duration of time specified within the contract.
Data flow management: Contracts typically outline how datasets should flow between specific systems like data pipelines and data ingestion processes. They may also determine how data should be managed as it moves between contracting parties, such as data engineers and software engineers.
Communication protocols: API specifications and communication protocols may also be outlined to standardize the data exchange between systems and platforms.
Versioning: Data contracts implement versioning to manage changes made to the data schema. This management ensures that all changes, including breaking changes, will be handled in a way that won’t disrupt existing systems.
Dependency and metadata management: To make sure changes don’t negatively impact dependent systems, dependencies between different data entities may also be managed as part of a data contract. Data contracts may also dictate how metadata will be managed and exchanged between systems, ensuring data will be understandable and usable.
Compliance and auditing: Data contracts will ensure that all data exchange and management adhere to legal and compliance requirements. Mechanisms for auditing and tracking data usage and changes can also be specified.
Error handling: Finally, as much as data contracts work to make sure things go right, they should also guide responses when things go wrong. Defining how errors and discrepancies in data will be handled ensures that any issues that arise will be logged, addressed, and communicated properly.
Conclusion
In the ever-evolving landscape of data management, the age-old wisdom of thought leaders like Benjamin Franklin remains strikingly relevant. If “an ounce of prevention is worth a pound of cure,” the importance of proactive measures in data handling cannot be overstated.?