Power BI: Python pandas in R ggplot2 charts in Power BI reports

Power BI: Python pandas in R ggplot2 charts in Power BI reports

Uses Adventure Works DW and shows correlation between units sold and revenue. The integration of Python and R opens up the world of data science from within a Power BI report. You can use CNTK, TensorFlow, Keras, RevoScaleR, revoscalepy, and thousands of Python and R algorithms to return data frames. Then use R ggplot2, or Python ggplot, or Python matplotlib to display the results in charts. There is a lot of code below - I have opted for py_run_string() in the R reticulate package. But I could have used r_to_py and py_to_r, or simply import("pandas") in R - I don't think repl_python() will work in a stored procedure? Three lots of code for SQL Server 2017/Power BI, SQL Server 2016/Power BI, and Power BI alone. The latter will be slower as the Python/R is evaluated in an R Script visual. Apologies - impossible to format code in LinkedIn:

SQL Server 2017

truncate table PythonTable

insert into PythonTable

exec sp_execute_external_script

     @language = N'Python'

     , @script =

N'

PythonData = AdventureWorks

PythonData = PythonData[[0, 2, 3, 4]]

OutputDataSet = PythonData',

@input_data_1 = N'select englishcountryregionname as Country, stateprovincename as State, City , sum(orderquantity) as Units,

                 convert(float, sum([SalesAmount])) as Revenue from [dbo].[DimGeography] as G inner join [dbo].[DimCustomer] as C on G.GeographyKey = C.GeographyKey

                 inner join [dbo].[FactInternetSales] as F on C.CustomerKey = F.CustomerKey where [EnglishCountryRegionName] in(''Canada'', ''United States'')

                 group by [EnglishCountryRegionName], [StateProvinceName], [City]',

@input_data_1_name = N'AdventureWorks';

exec sp_execute_external_script

     @language = N'R'

     , @script =

N'

library(ggplot2)

library(scales)

RGraphic <- ggplot(PythonData, aes(x = Revenue, y = Units, colour = Country, shape = Country)) + geom_point(size = 0.5) + geom_smooth(size = 0.3) + facet_wrap(~ Country) + scale_x_continuous(labels = comma) + theme(legend.position =

"none") + theme(axis.text.y = element_text(colour = "blue")) + theme(axis.text.x = element_text(colour = "blue", angle = 45, hjust = 1, vjust = 1)) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +

scale_colour_manual(values = c("red", "darkgreen"))

png("c://users//art//desktop//PythonR2017.png", width = 1200, height = 700, res = 300)

print(RGraphic)',

@input_data_1 = N'select * from PythonTable',

@input_data_1_name = N'PythonData';

SELECT

           'data:image/png;base64,' + cast('' as xml).value('xs:base64Binary(sql:column("X.BulkColumn"))', 'varchar(max)') as [Image]

      FROM

            OPENROWSET(BULK N'C:\users\art\desktop\PythonR2017.png', SINGLE_BLOB) X

SQL Server 2016

exec sp_execute_external_script

     @language = N'R'

     , @script =

N'

library(reticulate)

library(ggplot2)

py_run_string("import pypyodbc")

py_run_string("connection = pypyodbc.connect(''Driver={SQL Server};''''Server=localhost;''''Database=adventureworksdw2014;''''trusted_connection=yes'')")

py_run_string("import pandas as pd")

py_run_string("PythonData = pd.read_sql_query(''select englishcountryregionname as Country, stateprovincename as State, City , sum(orderquantity) as Units, sum([SalesAmount]) as Revenue from [dbo].[DimGeography] as G inner join [dbo].

[DimCustomer] as C on G.GeographyKey = C.GeographyKey inner join [dbo].[FactInternetSales] as F on C.CustomerKey = F.CustomerKey group by [EnglishCountryRegionName], [StateProvinceName], [City]'', connection)")

py_run_string("PythonData = PythonData[[0, 2, 3, 4]]")

py_run_string("PythonData = PythonData[PythonData.country.isin([''Canada'', ''United States''])]")

RGraphic <- ggplot(py$PythonData, aes(x = revenue, y = units, colour = country)) + geom_point() + geom_smooth() + facet_wrap(~ country)

png("c://users//art//desktop//PythonR2016.png", width = 900, height = 700, res = 200)

print(RGraphic)';

SELECT

           'data:image/png;base64,' + cast('' as xml).value('xs:base64Binary(sql:column("X.BulkColumn"))', 'varchar(max)') as [Image]

      FROM

            OPENROWSET(BULK N'C:\users\art\desktop\PythonR2016.png', SINGLE_BLOB) X

Power BI

library(reticulate)

library(ggplot2)

py_run_string("import pypyodbc")

py_run_string("connection = pypyodbc.connect('Driver={SQL Server};Server=localhost;Database=adventureworksdw2014;trusted_connection=yes')")

py_run_string("import pandas as pd")

py_run_string("PythonData = pd.read_sql_query('select englishcountryregionname as Country, stateprovincename as State, City , sum(orderquantity) as Units, sum([SalesAmount]) as Revenue from [dbo].[DimGeography] as G inner join [dbo].

[DimCustomer] as C on G.GeographyKey = C.GeographyKey inner join [dbo].[FactInternetSales] as F on C.CustomerKey = F.CustomerKey group by [EnglishCountryRegionName], [StateProvinceName], [City]', connection)")

py_run_string("PythonData = PythonData[[0, 2, 3, 4]]")

py_run_string("PythonData = PythonData[PythonData.country.isin(['Canada', 'United States'])]")

ggplot(py$PythonData, aes(x = revenue, y = units, colour = country)) + geom_point() + geom_smooth() + facet_wrap(~ country)


Ryan Wade

Sr Cloud Solution Architect - Data & AI, Author, Speaker, MCSE

6 年

I like what Microsoft has done in SQL Server 2017 with Machine Learning Services. I believe native scoring and real-time scoring are game changers!

要查看或添加评论,请登录

Art Tennick的更多文章

社区洞察

其他会员也浏览了