Create dashboards with Microsoft Power BI Desktop and Amazon Redshift
Image: Wendy Wong

Create dashboards with Microsoft Power BI Desktop and Amazon Redshift

The importance of dashboards

Business leaders need to make decisions quickly and they need to access self-service dashboards to automate manual reporting. Dashboards are important to allow them to have access to insights on their mobile device, in transit, in between meetings.

But as data analysts and data scientists we need to help the customer define the requirements to create a dashboard using an analytics framework such as CRISP-DM .

But before we get started we need to ask the following questions as we gather business requirements:

  • Who is my audience?
  • What business questions would you like answered from the dashboard?
  • How often will data be refreshed?
  • What visualizations and KPI metrics would you like to see?
  • What are the data sources?
  • What filters, drill-down would you like to see?

Lesson Objectives

In this lesson you learn the following:

  • How to select a business intelligence tool
  • Understand Power BI and Amazon Redshift solution architecture
  • Understand when to create a dataflow
  • The role of the platform engineering team
  • Create a view SQL statement in Amazon Redshift Query Editor V2
  • How to create a dataflow
  • Define new tables
  • Authentication and connecting to Amazon Redshift
  • Connect a dataflow to Power BI Desktop
  • Load and transform data
  • Create a date table
  • Create a data model
  • Create visualizations and publish your dashboard

Which business intelligence tool should I use?

Although I am trained in Tableau Desktop, sometimes as data analysts and data scientists we need to use the enterprise business intelligence that we are allowed to use.

Your industry or organization my have concerns about:

I don't want anything shared in the cloud

  • Your organization may not have a cloud strategy defined in financial services

I don't want to pay for any annual licence
I want to onboard users easily in bulk

If your team have weighed the pros and cons, sometimes you will be asked to use Power BI Desktop because it might be easy to have control to schedule data refreshes and to onboard multiple users in the long term.

There is no perfect business intelligence tool but always assess the needs of the stakeholder.

Solution Architecture

This is the high-level architecture of Power BI Desktop connecting to AWS data sources such as Amazon Redshift cloud data warehouse via an AWS client VPN connection to ensure security of the data and no public internet access.

Image: Wendy Wong

When to create dataflows?

Sometimes we may just need to create a dataflow to enable the creation of Power BI dashboards in the following scenarios:

  • To easily refresh data
  • To reuse the data elements created in Power Query from DAX functions
  • To transform and scale ETL with dataflows within Power BI Premium
  • To limit the volume of data only to the subset created within the dataflow

Platform Engineering - Provisioning Amazon Redshift clusters

In an enterprise, you may leverage the platform engineering team to help your data team to create a cloud data warehouse Amazon Redshift using Lake Formation with IAM permissions for your user group, to provision clusters and to migrate to the latest node to optimize performance.

Once the database is created, your data analyst or data scientist may be able to connect to the database with credentials such as the port number and password to:

  • Create a view and,
  • Querying data using Query Editor V2

Image: Wendy Wong

Create a view in Amazon Redshift Query Editor V2

Limit your data so create a view statement in SQL to provide a subset of data for your ad-hoc analysis. This helps to reduce the management of large amounts of data if we only query only the columns we need and try to optimize the query for better performance.

An example of a view statement:

CREATE VIEW SalesSummary AS
SELECT ProductID
, SUM(Quantity) AS TotalSold
, claim_type
, date
, claim_amount
, channel
, customer_name
, post_code
, product_name
FROM Sales
GROUP BY ProductID
,claim_type
, date
, claim_amount
, channel
, customer_name
, post_code
, product_name;
        

Create a dataflow

Login to the Power BI Premium workspace provided by your organization administrator that is a space to create and update dashboards.

In the workspace, click Define new tables.

Image: Microsoft Learn

Define new tables

Navigate to the search bar and type 'Redshift' to locate the connection for Amazon Redshift to create a data flow.

Image: Microsoft Learn

Authenticate and connect to Amazon Redshift

Sign in with the email address used in your organization or educational institution to authenticate your identity in Power BI Service.

You will be prompted to enter the port number, database name and enter the password to connect to the database from Amazon Redshift.

Locate the workspace and also the view that was created in Amazon Redshift and connect to the data source.

Image: Microsoft Learn

  • Create a name for the dataflow

Connect your dataflow to Power BI Desktop

After a dataflow is created in the workspace of Power BI Premium, you may open Power BI Desktop.

Navigate to the Home tab and click Get Data. Scroll down to dataflows and select Connect to connect to the data source.

Image: Wendy Wong

Load and transform data with Power Query

You may load the first 200 rows of data or the entire dataset by selecting Load.

Image: Wendy Wong

Alternatively, you may wish to transform and prepare your data using Power Query, write DAX functions by selecting Transform Data.

Image: Wendy Wong

Create a date table

To create a date table, you may navigate to Power Query and click a column that has a data type for date or timestamp.

A date table allows us to connect with another table that has a date data type.

Image: Wendy Wong

To create a date table, in Power Query you will firstly need to click on a column that has a date or timestamp -> navigate to Add Column -> navigate to Data Type: Date:

  • Create Year column
  • Create Month column
  • Create Day of Month column

Finally, 'Mark as date table' as suggested in the Microsoft Learn documentation .

Image: Wendy Wong

Now that the date table 'AW Calendar' is created, you may navigate to Table View and view the new columns that were created within the date table.

Image: Wendy Wong

Create your data model

You may navigate to Model View to build your data model by building relationships between the tables.

Once the date table is created e.g. 'AW Calendar', the date column may be linked to the primary table 'AW Sales' by connecting with the StockDate column.

In this ERD diagram you may link the primary keys to other tables within the data model.

Image: Wendy Wong

How to create visualizations

After you have created a data model, you may navigate to Report View on the left-hand side and create visualizations to help you to answer your business questions.

The dashboard will be peer reviewed for quality and finally published on Power BI Service.

Image; Wendy Wong
Image: Wendy Wong

Amazon Redshift blogs - How to get started

If you would to get started with Amazon Redshift, you may also refer to my blogs below:

References

Next Edition

In next week's edition we will be exploring:

  • Text Analytics

Until the next lesson, Happy Learning ! ??

Next Month - AWS Summit ANZ 2024

Join us in Sydney for AWS Summit Sydney 2024 for two days of innovation and learning for:

  • Day 1 - Builders Day - Wednesday 10 April
  • Day 2 -Innovation Day - Thursday 11 April

Connect with your industry peers, AWS Community Builders and AWS Heroes in the Developer Lounge.

You may register at this link .

You may preview the agenda and navigate your favourite sessions, workshops and keynotes.

I hope to see you at AWS Summit ANZ 2024 in ICC Sydney next month where the tech community of cloud enthusiasts and developers will gather together and innovate.


Harshal Patil

Data Engineering | Databricks Data engineer |Data warehousing | Python | Spark | ETL

8 个月

Actually data engineering

Diego Cortés J.

Passionate about data! | Lead Data Engineer | Data Architect | GCP | Snowflake | Teradata | Master Candidate at Global Data Management | Scrum Master

8 个月
回复

要查看或添加评论,请登录

Wendy Wong的更多文章

社区洞察

其他会员也浏览了