?? Better import data with Power Query ??

?? Better import data with Power Query ??

?? 1) About Power Query

  • Power Query is ETL (Extract, Transform, Load) tool developed by Microsoft, designed to help users efficiently manage and transform data from a wide range of data sources.
  • It is integrated with Microsoft ecosystem, including Excel, Power BI, Analysis Services, Azure Data Factory, Power Apps, Power Automate, Fabric, and more.
  • It is renowned for its user-friendly interface, which caters to both beginners and experienced data professionals: Intuitive Graphical Interface (drag-and-drop and data preview -> accessible for beginners), Advanced Capabilities with M Language (for experienced)


?? 2) Better Import Data from Wide range of data sources

Local Folder ??

  • ? DON'T:

- Use Sample Queries (Helper Queries) to import files from a folder: Messy Query structure, Difficult to debug, Long list of "applied steps"

- Forget to Promote Headers before combining step which leads to wrong data combination

- Use "Expand" button in Power Query UI, which includes list of column names in syntax => Error can occur when unnecessary columns disappear

  • ? DO:

- Use "Folder.Files" and leverage Custom column to make clean transformation steps

- Promote Headers before combining tables

- Use "Table.Combine" to combine data

// replace folder path in Folder.Files
// with Excel, remember to filter Sheet Name & Kind = "Sheet"
let
    Source = Folder.Files("C:\Users\username\Downloads\Sample"),
    GetData = Table.AddColumn(Source, "GetData", each Excel.Workbook([Content])),
    #"Expanded GetData" = Table.ExpandTableColumn(GetData, "GetData", {"Name", "Data", "Kind"}, {"Name.1", "Data", "Kind"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded GetData", each [Name.1] = "Dictionary"),
    #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each [Kind] = "Sheet"),
    #"Added Custom1" = Table.AddColumn(#"Filtered Rows1", "PromoteHeaders", each Table.PromoteHeaders([Data])),
    ExpandTable = Table.Combine(#"Added Custom1"[PromoteHeaders]),
in
    ExpandTable         
// CSV combine
// Parquet: Parquet.Document([Content])
let
    Source = Folder.Files("C:\Users\CustomerSC-Digital.V\Downloads\Sample csv"),
    GetData = Table.AddColumn(Source, "GetData", each Csv.Document([Content])),
    #"Added Custom1" = Table.AddColumn(GetData, "PromoteHeaders", each Table.PromoteHeaders([GetData])),
    ExpandTable = Table.Combine(#"Added Custom1"[PromoteHeaders]),
in
    ExpandTable         
Import files from folder
Local File ??

  • ? Use Power Query UI to import: Import > Choose Excel/CSV/...
  • ? Note:

- With Excel, filter "Kind" = "Sheet" or "Table" before expanding Excel data

- With CSV, remove [Columns=2] in syntax Source, or remember to adjust this param when there is new column in CSV source

Excel: 
Source = Excel.Workbook(File.Contents("C:\Users\username\Downloads\Sample\test 1.xlsx"), null, true)

CSV:
Source = Csv.Document(File.Contents("C:\Users\username\Downloads\Sample csv\test 1.csv"),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None])

Parquet:
Source = Parquet.Document(Web.Contents(url_path)
[Home > Import > Parquet > paste url_path]        


SharePoint Connection ??

  • ? DON'T: Use SharePoint.Files or use Home > Import > SharePoint folder > paste Site_url => Time-consuming when SharePoint site is big in data size
  • ? DO: Use Sharepoint.Contents with faster access to your needed data (Limitation: 5,000 Item List Limit Threshold)

// Folder
let
    Source = SharePoint.Contents("https://sharepoint.com/sites/sample_site",[ApiVersion=15]),
    filterFolder = Source{[Name="folder1"]}[Content]{[Name="folder2"]}[Content]
in
    filterFolder

// Excel file
Source = Excel.Workbook(Web.Contents("https://sharepoint.com/sites/sample_site/sample.xlsx"), null, true)

// CSV file
Source = Csv.Document(Web.Contents("https://sharepoint.com/sites/sample_site/sample.csv"))

//Parquet file
Source = Parquet.Document(Web.Contents(url_path)        
SharePoint connection
Azure Data Lake Storage ??

? DO

// Tree-like directory view
let
    Source = AzureStorage.DataLake(
        "https://<storagename>.dfs.core.windows.net/sample/sample1",
        [HierarchicalNavigation = true]),
    filterFolder = Source{[Name="folder1"]}[Content]{[Name="folder2"]}[Content]
in
    filterFolder

// Flatten
Source = AzureStorage.DataLake(
        "https://<storagename>.dfs.core.windows.net/sample/sample1")        
Other connection ??

  • LiveConnection to AAS, PBI Dataset on PBI Service
  • DirectQuery to SQL Server, SAP, etc.



?? If you’re using Power Query to connect to above-mentioned data sources, I encourage you to try my recommendations. They can help clean your workflow and improve efficiency. If you have any other best practices or suggestions, please share them with me. Let’s work together to make data processing even better!

?? Le Ngoc Phuong Trinh./

Thanh Tran Viet

MSc. SCM @ Wirtschaftsuniversit?t Wien | Supply Chain Analytics | CPIM

7 个月

Useful tips from you both on instagram and linkedin haha

C?u Tr?n

HR Data Analyst @ AEON Vietnam | SQL, Python and Power BI

7 个月

Bài này hay quá ch? ?úng cái e ?ang c?n ??

要查看或添加评论,请登录

Le Ngoc Phuong Trinh的更多文章

社区洞察

其他会员也浏览了