?? Better import data with Power Query ??
Le Ngoc Phuong Trinh
Business & Data Analytics at Unilever Vietnam | PL-300 | SCM Digitalization
?? 1) About Power Query
?? 2) Better Import Data from Wide range of data sources
Local Folder ??
- Use Sample Queries (Helper Queries) to import files from a folder: Messy Query structure, Difficult to debug, Long list of "applied steps"
- Forget to Promote Headers before combining step which leads to wrong data combination
- Use "Expand" button in Power Query UI, which includes list of column names in syntax => Error can occur when unnecessary columns disappear
- Use "Folder.Files" and leverage Custom column to make clean transformation steps
- Promote Headers before combining tables
- Use "Table.Combine" to combine data
// replace folder path in Folder.Files
// with Excel, remember to filter Sheet Name & Kind = "Sheet"
let
Source = Folder.Files("C:\Users\username\Downloads\Sample"),
GetData = Table.AddColumn(Source, "GetData", each Excel.Workbook([Content])),
#"Expanded GetData" = Table.ExpandTableColumn(GetData, "GetData", {"Name", "Data", "Kind"}, {"Name.1", "Data", "Kind"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded GetData", each [Name.1] = "Dictionary"),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each [Kind] = "Sheet"),
#"Added Custom1" = Table.AddColumn(#"Filtered Rows1", "PromoteHeaders", each Table.PromoteHeaders([Data])),
ExpandTable = Table.Combine(#"Added Custom1"[PromoteHeaders]),
in
ExpandTable
// CSV combine
// Parquet: Parquet.Document([Content])
let
Source = Folder.Files("C:\Users\CustomerSC-Digital.V\Downloads\Sample csv"),
GetData = Table.AddColumn(Source, "GetData", each Csv.Document([Content])),
#"Added Custom1" = Table.AddColumn(GetData, "PromoteHeaders", each Table.PromoteHeaders([GetData])),
ExpandTable = Table.Combine(#"Added Custom1"[PromoteHeaders]),
in
ExpandTable
Local File ??
领英推荐
- With Excel, filter "Kind" = "Sheet" or "Table" before expanding Excel data
- With CSV, remove [Columns=2] in syntax Source, or remember to adjust this param when there is new column in CSV source
Excel:
Source = Excel.Workbook(File.Contents("C:\Users\username\Downloads\Sample\test 1.xlsx"), null, true)
CSV:
Source = Csv.Document(File.Contents("C:\Users\username\Downloads\Sample csv\test 1.csv"),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None])
Parquet:
Source = Parquet.Document(Web.Contents(url_path)
[Home > Import > Parquet > paste url_path]
SharePoint Connection ??
// Folder
let
Source = SharePoint.Contents("https://sharepoint.com/sites/sample_site",[ApiVersion=15]),
filterFolder = Source{[Name="folder1"]}[Content]{[Name="folder2"]}[Content]
in
filterFolder
// Excel file
Source = Excel.Workbook(Web.Contents("https://sharepoint.com/sites/sample_site/sample.xlsx"), null, true)
// CSV file
Source = Csv.Document(Web.Contents("https://sharepoint.com/sites/sample_site/sample.csv"))
//Parquet file
Source = Parquet.Document(Web.Contents(url_path)
Azure Data Lake Storage ??
? DO
// Tree-like directory view
let
Source = AzureStorage.DataLake(
"https://<storagename>.dfs.core.windows.net/sample/sample1",
[HierarchicalNavigation = true]),
filterFolder = Source{[Name="folder1"]}[Content]{[Name="folder2"]}[Content]
in
filterFolder
// Flatten
Source = AzureStorage.DataLake(
"https://<storagename>.dfs.core.windows.net/sample/sample1")
Other connection ??
?? If you’re using Power Query to connect to above-mentioned data sources, I encourage you to try my recommendations. They can help clean your workflow and improve efficiency. If you have any other best practices or suggestions, please share them with me. Let’s work together to make data processing even better!
?? Le Ngoc Phuong Trinh./
MSc. SCM @ Wirtschaftsuniversit?t Wien | Supply Chain Analytics | CPIM
7 个月Useful tips from you both on instagram and linkedin haha
HR Data Analyst @ AEON Vietnam | SQL, Python and Power BI
7 个月Bài này hay quá ch? ?úng cái e ?ang c?n ??