Power BI: Python pandas in R ggplot2 charts in Power BI reports
Art Tennick
??3 US Patents in BI??technical editor 4 Packt Power BI books??author 20 computer books??Power BI??Analysis Services??Python/R in Power BI??Paginated??MDX DAX SQL Python R TMDL??ex-university SQL lecturer??35 years in BI
Uses Adventure Works DW and shows correlation between units sold and revenue. The integration of Python and R opens up the world of data science from within a Power BI report. You can use CNTK, TensorFlow, Keras, RevoScaleR, revoscalepy, and thousands of Python and R algorithms to return data frames. Then use R ggplot2, or Python ggplot, or Python matplotlib to display the results in charts. There is a lot of code below - I have opted for py_run_string() in the R reticulate package. But I could have used r_to_py and py_to_r, or simply import("pandas") in R - I don't think repl_python() will work in a stored procedure? Three lots of code for SQL Server 2017/Power BI, SQL Server 2016/Power BI, and Power BI alone. The latter will be slower as the Python/R is evaluated in an R Script visual. Apologies - impossible to format code in LinkedIn:
SQL Server 2017
truncate table PythonTable
insert into PythonTable
exec sp_execute_external_script
@language = N'Python'
, @script =
N'
PythonData = AdventureWorks
PythonData = PythonData[[0, 2, 3, 4]]
OutputDataSet = PythonData',
@input_data_1 = N'select englishcountryregionname as Country, stateprovincename as State, City , sum(orderquantity) as Units,
convert(float, sum([SalesAmount])) as Revenue from [dbo].[DimGeography] as G inner join [dbo].[DimCustomer] as C on G.GeographyKey = C.GeographyKey
inner join [dbo].[FactInternetSales] as F on C.CustomerKey = F.CustomerKey where [EnglishCountryRegionName] in(''Canada'', ''United States'')
group by [EnglishCountryRegionName], [StateProvinceName], [City]',
@input_data_1_name = N'AdventureWorks';
exec sp_execute_external_script
@language = N'R'
, @script =
N'
library(ggplot2)
library(scales)
RGraphic <- ggplot(PythonData, aes(x = Revenue, y = Units, colour = Country, shape = Country)) + geom_point(size = 0.5) + geom_smooth(size = 0.3) + facet_wrap(~ Country) + scale_x_continuous(labels = comma) + theme(legend.position =
"none") + theme(axis.text.y = element_text(colour = "blue")) + theme(axis.text.x = element_text(colour = "blue", angle = 45, hjust = 1, vjust = 1)) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
scale_colour_manual(values = c("red", "darkgreen"))
png("c://users//art//desktop//PythonR2017.png", width = 1200, height = 700, res = 300)
print(RGraphic)',
@input_data_1 = N'select * from PythonTable',
@input_data_1_name = N'PythonData';
SELECT
'data:image/png;base64,' + cast('' as xml).value('xs:base64Binary(sql:column("X.BulkColumn"))', 'varchar(max)') as [Image]
FROM
OPENROWSET(BULK N'C:\users\art\desktop\PythonR2017.png', SINGLE_BLOB) X
SQL Server 2016
exec sp_execute_external_script
@language = N'R'
, @script =
N'
library(reticulate)
library(ggplot2)
py_run_string("import pypyodbc")
py_run_string("connection = pypyodbc.connect(''Driver={SQL Server};''''Server=localhost;''''Database=adventureworksdw2014;''''trusted_connection=yes'')")
py_run_string("import pandas as pd")
py_run_string("PythonData = pd.read_sql_query(''select englishcountryregionname as Country, stateprovincename as State, City , sum(orderquantity) as Units, sum([SalesAmount]) as Revenue from [dbo].[DimGeography] as G inner join [dbo].
[DimCustomer] as C on G.GeographyKey = C.GeographyKey inner join [dbo].[FactInternetSales] as F on C.CustomerKey = F.CustomerKey group by [EnglishCountryRegionName], [StateProvinceName], [City]'', connection)")
py_run_string("PythonData = PythonData[[0, 2, 3, 4]]")
py_run_string("PythonData = PythonData[PythonData.country.isin([''Canada'', ''United States''])]")
RGraphic <- ggplot(py$PythonData, aes(x = revenue, y = units, colour = country)) + geom_point() + geom_smooth() + facet_wrap(~ country)
png("c://users//art//desktop//PythonR2016.png", width = 900, height = 700, res = 200)
print(RGraphic)';
SELECT
'data:image/png;base64,' + cast('' as xml).value('xs:base64Binary(sql:column("X.BulkColumn"))', 'varchar(max)') as [Image]
FROM
OPENROWSET(BULK N'C:\users\art\desktop\PythonR2016.png', SINGLE_BLOB) X
Power BI
library(reticulate)
library(ggplot2)
py_run_string("import pypyodbc")
py_run_string("connection = pypyodbc.connect('Driver={SQL Server};Server=localhost;Database=adventureworksdw2014;trusted_connection=yes')")
py_run_string("import pandas as pd")
py_run_string("PythonData = pd.read_sql_query('select englishcountryregionname as Country, stateprovincename as State, City , sum(orderquantity) as Units, sum([SalesAmount]) as Revenue from [dbo].[DimGeography] as G inner join [dbo].
[DimCustomer] as C on G.GeographyKey = C.GeographyKey inner join [dbo].[FactInternetSales] as F on C.CustomerKey = F.CustomerKey group by [EnglishCountryRegionName], [StateProvinceName], [City]', connection)")
py_run_string("PythonData = PythonData[[0, 2, 3, 4]]")
py_run_string("PythonData = PythonData[PythonData.country.isin(['Canada', 'United States'])]")
ggplot(py$PythonData, aes(x = revenue, y = units, colour = country)) + geom_point() + geom_smooth() + facet_wrap(~ country)
Sr Cloud Solution Architect - Data & AI, Author, Speaker, MCSE
6 年I like what Microsoft has done in SQL Server 2017 with Machine Learning Services. I believe native scoring and real-time scoring are game changers!