Create dashboards with Microsoft Power BI Desktop and Amazon Redshift
The importance of dashboards
Business leaders need to make decisions quickly and they need to access self-service dashboards to automate manual reporting. Dashboards are important to allow them to have access to insights on their mobile device, in transit, in between meetings.
But as data analysts and data scientists we need to help the customer define the requirements to create a dashboard using an analytics framework such as CRISP-DM .
But before we get started we need to ask the following questions as we gather business requirements:
Lesson Objectives
In this lesson you learn the following:
Which business intelligence tool should I use?
Although I am trained in Tableau Desktop, sometimes as data analysts and data scientists we need to use the enterprise business intelligence that we are allowed to use.
Your industry or organization my have concerns about:
I don't want anything shared in the cloud
I don't want to pay for any annual licence
I want to onboard users easily in bulk
If your team have weighed the pros and cons, sometimes you will be asked to use Power BI Desktop because it might be easy to have control to schedule data refreshes and to onboard multiple users in the long term.
There is no perfect business intelligence tool but always assess the needs of the stakeholder.
Solution Architecture
This is the high-level architecture of Power BI Desktop connecting to AWS data sources such as Amazon Redshift cloud data warehouse via an AWS client VPN connection to ensure security of the data and no public internet access.
When to create dataflows?
Sometimes we may just need to create a dataflow to enable the creation of Power BI dashboards in the following scenarios:
Platform Engineering - Provisioning Amazon Redshift clusters
In an enterprise, you may leverage the platform engineering team to help your data team to create a cloud data warehouse Amazon Redshift using Lake Formation with IAM permissions for your user group, to provision clusters and to migrate to the latest node to optimize performance.
Once the database is created, your data analyst or data scientist may be able to connect to the database with credentials such as the port number and password to:
Create a view in Amazon Redshift Query Editor V2
Limit your data so create a view statement in SQL to provide a subset of data for your ad-hoc analysis. This helps to reduce the management of large amounts of data if we only query only the columns we need and try to optimize the query for better performance.
An example of a view statement:
CREATE VIEW SalesSummary AS
SELECT ProductID
, SUM(Quantity) AS TotalSold
, claim_type
, date
, claim_amount
, channel
, customer_name
, post_code
, product_name
FROM Sales
GROUP BY ProductID
,claim_type
, date
, claim_amount
, channel
, customer_name
, post_code
, product_name;
Create a dataflow
Login to the Power BI Premium workspace provided by your organization administrator that is a space to create and update dashboards.
In the workspace, click Define new tables.
Define new tables
Navigate to the search bar and type 'Redshift' to locate the connection for Amazon Redshift to create a data flow.
Authenticate and connect to Amazon Redshift
Sign in with the email address used in your organization or educational institution to authenticate your identity in Power BI Service.
You will be prompted to enter the port number, database name and enter the password to connect to the database from Amazon Redshift.
Locate the workspace and also the view that was created in Amazon Redshift and connect to the data source.
领英推荐
Connect your dataflow to Power BI Desktop
After a dataflow is created in the workspace of Power BI Premium, you may open Power BI Desktop.
Navigate to the Home tab and click Get Data. Scroll down to dataflows and select Connect to connect to the data source.
Load and transform data with Power Query
You may load the first 200 rows of data or the entire dataset by selecting Load.
Alternatively, you may wish to transform and prepare your data using Power Query, write DAX functions by selecting Transform Data.
Create a date table
To create a date table, you may navigate to Power Query and click a column that has a data type for date or timestamp.
A date table allows us to connect with another table that has a date data type.
To create a date table, in Power Query you will firstly need to click on a column that has a date or timestamp -> navigate to Add Column -> navigate to Data Type: Date:
Finally, 'Mark as date table' as suggested in the Microsoft Learn documentation .
Now that the date table 'AW Calendar' is created, you may navigate to Table View and view the new columns that were created within the date table.
Create your data model
You may navigate to Model View to build your data model by building relationships between the tables.
Once the date table is created e.g. 'AW Calendar', the date column may be linked to the primary table 'AW Sales' by connecting with the StockDate column.
In this ERD diagram you may link the primary keys to other tables within the data model.
How to create visualizations
After you have created a data model, you may navigate to Report View on the left-hand side and create visualizations to help you to answer your business questions.
The dashboard will be peer reviewed for quality and finally published on Power BI Service.
Amazon Redshift blogs - How to get started
If you would to get started with Amazon Redshift, you may also refer to my blogs below:
References
Next Edition
In next week's edition we will be exploring:
Until the next lesson, Happy Learning ! ??
Next Month - AWS Summit ANZ 2024
Join us in Sydney for AWS Summit Sydney 2024 for two days of innovation and learning for:
Connect with your industry peers, AWS Community Builders and AWS Heroes in the Developer Lounge.
You may register at this link .
You may preview the agenda and navigate your favourite sessions, workshops and keynotes.
I hope to see you at AWS Summit ANZ 2024 in ICC Sydney next month where the tech community of cloud enthusiasts and developers will gather together and innovate.
Data Engineering | Databricks Data engineer |Data warehousing | Python | Spark | ETL
8 个月Actually data engineering
Passionate about data! | Lead Data Engineer | Data Architect | GCP | Snowflake | Teradata | Master Candidate at Global Data Management | Scrum Master
8 个月Diego Armando Gonzalez Pinzon