Amazon Redshift

Amazon Redshift

  • Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud.
  • Customers can use the Redshift for just $0.25 per hour with no commitments or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year.

OLAP

OLAP is an?Online Analytics Processing System?used by the?Redshift.

Suppose we want to calculate the Net profit for EMEA and Pacific for the Digital Radio Product. This requires to pull a large number of records. Following are the records required to calculate a Net Profit:

  • Sum of Radios sold in EMEA.
  • Sum of Radios sold in Pacific.
  • Unit cost of radio in each region.
  • Sales price of each radio
  • Sales price - unit cost

The complex queries are required to fetch the records given above. Data Warehousing databases use different type architecture both from a database perspective and infrastructure layer.

Redshift Configuration

Redshift consists of two types of nodes:

  • Single node
  • Multi-node

Single node:?A single node stores up to 160 GB.

Multi-node:?Multi-node is a node that consists of more than one node. It is of two types:

  • Leader Node
  • It manages the client connections and receives queries. A leader node receives the queries from the client applications, parses the queries, and develops the execution plans. It coordinates with the parallel execution of these plans with the compute node and combines the intermediate results of all the nodes, and then return the final result to the client application.
  • Compute Node
  • A compute node executes the execution plans, and then intermediate results are sent to the leader node for aggregation before sending back to the client application. It can have up to 128 compute nodes.

Let's understand the concept of leader node and compute nodes through an example.


Redshift warehouse is a collection of computing resources known as nodes, and these nodes are organized in a group known as a cluster. Each cluster runs in a Redshift Engine which contains one or more databases.

When you launch a Redshift instance, it starts with a single node of size 160 GB. When you want to grow, you can add additional nodes to take advantage of parallel processing. You have a leader node that manages the multiple nodes. Leader node handles the client connection as well as compute nodes. It stores the data in compute nodes and performs the query.

要查看或添加评论,请登录

Vanshika Munshi的更多文章

  • Key Data Engineer Skills and Responsibilities

    Key Data Engineer Skills and Responsibilities

    Over time, there has been a significant transformation in the realm of data and its associated domains. Initially, the…

  • What Is Financial Planning? Definition, Meaning and Purpose

    What Is Financial Planning? Definition, Meaning and Purpose

    Financial planning is the process of taking a comprehensive look at your financial situation and building a specific…

  • What is Power BI?

    What is Power BI?

    The parts of Power BI Power BI consists of several elements that all work together, starting with these three basics: A…

  • Abinitio Graphs

    Abinitio Graphs

    Graph Concept Graph : A graph is a data flow diagram that defines the various processing stages of a task and the…

  • Abinitio Interview Questions

    Abinitio Interview Questions

    1. What is Ab Initio? Ab Initio is a robust data processing and analysis tool used for ETL (Extract, Transform, Load)…

  • Big Query

    Big Query

    BigQuery is a managed, serverless data warehouse product by Google, offering scalable analysis over large quantities of…

  • Responsibilities of Abinitio Developer

    Responsibilities of Abinitio Developer

    Job Description Project Role : Application Developer Project Role Description : Design, build and configure…

  • Abinitio Developer

    Abinitio Developer

    Responsibilities Monitor and Support existing production data pipelines developed in AB Initio Analysis of highly…

  • Data Engineer

    Data Engineer

    Data engineering is the practice of designing and building systems for collecting, storing, and analysing data at…

  • Pyspark

    Pyspark

    What is PySpark? Apache Spark is written in Scala programming language. PySpark has been released in order to support…

社区洞察

其他会员也浏览了