Datawarehousing
Traditional vs. cloud-based data warehouse
Traditional data warehouses are hosted on-premises, with data flowing in from relational databases, transactional systems, business applications, and other source systems. However, they are typically designed to capture a subset of data in batches and store it based on rigid schemas, making them unsuitable for spontaneous queries or real-time analysis. Companies also must purchase their own hardware and software with an on-premises data warehouse, making it expensive to scale and maintain. In a traditional warehouse, storage is typically limited compared to compute, so data is transformed quickly and then discarded to keep storage space free.
Today’s data analytics activities have transformed to the center of all core business activities, including revenue generation, cost containment, improving operations, and enhancing customer experiences. As data evolves and diversifies, organizations need more robust data warehouse solutions and advanced analytic tools for storing, managing, and analyzing large quantities of data across their organizations.?
These systems must be scalable, reliable, secure enough for regulated industries, and flexible enough to support a wide variety of data types and big data use cases. They also need to support flexible pricing and compute, so you only pay for what you need instead of guessing your capacity. The requirements go beyond the capabilities of most legacy data warehouses. As a result, many enterprises are turning to cloud-based data warehouse solutions.
A cloud data warehouse makes no trade-offs from a traditional data warehouse, but extends capabilities and runs on a fully managed service in the cloud. Cloud data warehousing offers instant scalability to meet changing business requirements and powerful data processing to support complex analytical queries.?
With a cloud data warehouse, you benefit from the inherent flexibility of a cloud environment with more predictable costs. The up-front investment is typically much lower and lead times are shorter with on-premises data warehouse solutions because the cloud service provider manages and maintains the physical infrastructure.?
领英推荐
How data warehousing works in the cloud
Like a traditional data warehouse, cloud data warehouses collect, integrate, and store data from internal and external data sources. Data is typically transferred from a source system using a data pipeline. The data is extracted from the source system, transformed, and then loaded into the data warehouse—a process known as ETL (extract, transform, load). Data can also be sent directly to a central repository and then converted using ELT (extract, load, transform) processes. From there, users can use different business intelligence (BI) tools to access, mine, and report on data. Cloud data warehouses should also support streaming use cases to activate on data in real or near-real time.
Cloud data warehouses offer structured and semi-structured data storage, processing, integration, cleansing, loading, and so on within a public cloud environment. You can also use them with a cloud data lake to collect and store unstructured data. With some providers, it’s even possible to unify your data warehouse and data lake to maintain and centrally manage a single copy of your enterprise data.?
Different cloud providers may take various approaches when it comes to cloud data warehouse services. For example, some cloud data warehouses may use a cluster-based architecture similar to a traditional data warehouse. In contrast, others adopt a modern serverless architecture, which further minimizes data management responsibilities. However, most cloud data warehouses provide built-in data storage and capacity management features and automatic upgrades.
Other key capabilities that cloud data warehouses include:?