Datawarehousing

Datawarehousing

Traditional vs. cloud-based data warehouse


Traditional data warehouses are hosted on-premises, with data flowing in from relational databases, transactional systems, business applications, and other source systems. However, they are typically designed to capture a subset of data in batches and store it based on rigid schemas, making them unsuitable for spontaneous queries or real-time analysis. Companies also must purchase their own hardware and software with an on-premises data warehouse, making it expensive to scale and maintain. In a traditional warehouse, storage is typically limited compared to compute, so data is transformed quickly and then discarded to keep storage space free.

Today’s data analytics activities have transformed to the center of all core business activities, including revenue generation, cost containment, improving operations, and enhancing customer experiences. As data evolves and diversifies, organizations need more robust data warehouse solutions and advanced analytic tools for storing, managing, and analyzing large quantities of data across their organizations.?

These systems must be scalable, reliable, secure enough for regulated industries, and flexible enough to support a wide variety of data types and big data use cases. They also need to support flexible pricing and compute, so you only pay for what you need instead of guessing your capacity. The requirements go beyond the capabilities of most legacy data warehouses. As a result, many enterprises are turning to cloud-based data warehouse solutions.

A cloud data warehouse makes no trade-offs from a traditional data warehouse, but extends capabilities and runs on a fully managed service in the cloud. Cloud data warehousing offers instant scalability to meet changing business requirements and powerful data processing to support complex analytical queries.?

With a cloud data warehouse, you benefit from the inherent flexibility of a cloud environment with more predictable costs. The up-front investment is typically much lower and lead times are shorter with on-premises data warehouse solutions because the cloud service provider manages and maintains the physical infrastructure.?

How data warehousing works in the cloud


Like a traditional data warehouse, cloud data warehouses collect, integrate, and store data from internal and external data sources. Data is typically transferred from a source system using a data pipeline. The data is extracted from the source system, transformed, and then loaded into the data warehouse—a process known as ETL (extract, transform, load). Data can also be sent directly to a central repository and then converted using ELT (extract, load, transform) processes. From there, users can use different business intelligence (BI) tools to access, mine, and report on data. Cloud data warehouses should also support streaming use cases to activate on data in real or near-real time.

Cloud data warehouses offer structured and semi-structured data storage, processing, integration, cleansing, loading, and so on within a public cloud environment. You can also use them with a cloud data lake to collect and store unstructured data. With some providers, it’s even possible to unify your data warehouse and data lake to maintain and centrally manage a single copy of your enterprise data.?

Different cloud providers may take various approaches when it comes to cloud data warehouse services. For example, some cloud data warehouses may use a cluster-based architecture similar to a traditional data warehouse. In contrast, others adopt a modern serverless architecture, which further minimizes data management responsibilities. However, most cloud data warehouses provide built-in data storage and capacity management features and automatic upgrades.

Other key capabilities that cloud data warehouses include:?

  • Massively parallel processing (MPP)
  • Columnar data stores
  • Self-service ETL and ELT data integration??
  • Disaster recovery features and automatic backups
  • Compliance and data governance tools
  • Built-in integrations for BI, AI, and machine learning

要查看或添加评论,请登录

Vanshika Munshi的更多文章

  • Key Data Engineer Skills and Responsibilities

    Key Data Engineer Skills and Responsibilities

    Over time, there has been a significant transformation in the realm of data and its associated domains. Initially, the…

  • What Is Financial Planning? Definition, Meaning and Purpose

    What Is Financial Planning? Definition, Meaning and Purpose

    Financial planning is the process of taking a comprehensive look at your financial situation and building a specific…

  • What is Power BI?

    What is Power BI?

    The parts of Power BI Power BI consists of several elements that all work together, starting with these three basics: A…

  • Abinitio Graphs

    Abinitio Graphs

    Graph Concept Graph : A graph is a data flow diagram that defines the various processing stages of a task and the…

  • Abinitio Interview Questions

    Abinitio Interview Questions

    1. What is Ab Initio? Ab Initio is a robust data processing and analysis tool used for ETL (Extract, Transform, Load)…

  • Big Query

    Big Query

    BigQuery is a managed, serverless data warehouse product by Google, offering scalable analysis over large quantities of…

  • Responsibilities of Abinitio Developer

    Responsibilities of Abinitio Developer

    Job Description Project Role : Application Developer Project Role Description : Design, build and configure…

  • Abinitio Developer

    Abinitio Developer

    Responsibilities Monitor and Support existing production data pipelines developed in AB Initio Analysis of highly…

  • Data Engineer

    Data Engineer

    Data engineering is the practice of designing and building systems for collecting, storing, and analysing data at…

  • Pyspark

    Pyspark

    What is PySpark? Apache Spark is written in Scala programming language. PySpark has been released in order to support…

社区洞察

其他会员也浏览了