登录查看更多内容

Microsoft R Server Wins Round 2 of the Bout!

Yogesh Kulkarni

Co-Founder and Chief Technology Officer at Ellicium Technology Solutions

发布日期: 2016年4月21日

In my earlier blogs “Can Microsoft R server turbocharge Analytics Workloads” and “Round 1 of the Bout Between Microsoft R Server”, I gave a background of the testing we are doing to see how well Microsoft R Server handles workloads as compared to RStudio. The first round of results which were based on processing data row-by-row, did not show any results in favour of Microsoft R Server or RStudio.

We did the second round of testing by using bulk processing of data and the results are AMAZING!! Microsoft R Server seems to have proven its prowess and the results are tilted heavily in its favour.

To quickly recap what we did for the Testing:

Sandbox instance details –

Machine – Intel Core i7 Quad-core 64 bit processor, 16 GB RAM
R version – 3.0.1
MySQL – 5.6.12

Flow of the R code used for testing –

Load the required R packages
Connect to a table containing varying loads (few millions of rows) and retrieve selective data
Process the data based on selective business rules and analytics algorithms in either of the two ways -

Round 1 of testing - Process data and load it into another table using one-row-at-a-time processing
Round 2 of testing - Hold data in a Data Frame, process it for all rows and load it into another table in bulk

Test Execution – The above R code was run using RStudio and Microsoft R Server for 10k rows, 50k rows and 100k rows and for processing data in bulk using R Data Frames.

Test Results -

Observations –

For any workload, right from 10k rows to 100k rows and when processing data in bulk using R Data Frames, Microsoft R Server processes data much faster than RStudio.
As the workload increases, Microsoft R Server performs better than RStudio by a significant margin as seen below –

for 10k rows, Microsoft R Server is 4 times faster !
for 50k rows, Microsoft R Server is 5 times faster !!
for 100k rows, Microsoft R Server is 6 times faster !!!

Conclusion –

1. For bulk data processing using R Data Frames, Microsoft R Server overshadows RStudio by a significant margin. This is because it seems to be able to parallelise data processing and make good use of all the available resources.

Looking at it in another way - to make use of the Microsoft R Server capabilities, data needs to be processed in bulk where ever possible. Processing data one-row-at-a-time will not give significant performance benefits on Microsoft R Server.

2. As the volume of data being processed in bulk increases, the difference in performance becomes more and more significant. This means that the Microsoft R Server is able to handle large volumes of data much better than RStudio.

In other words – beyond a certain volume of data, RStudio might not be able to process data within acceptable time frames. It will be imperative to consider Microsoft R Server for such cases.

Summary - Microsoft R Server seems to have an ability to parallelise data processing which gives superior results. Also, it makes optimum use of the available memory and processing power which is very essential when processing big volumes of data.

Next Steps – We plan to run similar tests for typical Machine Learning algorithms e.g. Regression, Classification or Principal Component Analysis (PCA). It will be interesting to see how Microsoft R Server fares in this particular case.

Samir Kumar Sahoo

Driving AI & Data Innovation | CEO @ Aptus Data Labs | Generative AI & Data Governance Advocate | Digital Transformation Leader

8 年

R studio & MS R Server architecture is different .. so that comparison does not make sense for parallelisation point of view.. important to test the parallelisation capability using MS R server vs Spark/MLib (or HP Vertica distributed R rapidminer ) or processing using a particular algorithm like regression or MCA etc with the same 4-core machine and high volume transaction 5-10 millions on hadoop

1 次回应

查看更多评论

要查看或添加评论，请登录

Yogesh Kulkarni的更多文章

Importance of Agile Leadership in Embracing Change

2023年7月21日

Importance of Agile Leadership in Embracing Change

The Role of Agile Leadership in Embracing Change In today's fast-paced digital landscape, effective leadership is…

7 条评论
Feeling Exiled in This Corona Pandemic? Key Points to Take Away from “The Mahabharata – Exile of the Pandavas”

2020年4月27日

Feeling Exiled in This Corona Pandemic? Key Points to Take Away from “The Mahabharata – Exile of the Pandavas”

As I write this article, there is one thing which I can be 100% sure of - a majority of you who are reading this…

19 条评论
Future of Big Data in 2018!

2017年12月14日

Future of Big Data in 2018!

The year 2017 was an interesting one in the Big Data world. Though the adoption of Hadoop as the Big Data platform…

8 条评论
Thinking of Professional Advancement In Life – Head To The Himalayas!

2017年8月30日

Thinking of Professional Advancement In Life – Head To The Himalayas!

The title of my blog might sound weird to some. However, that is exactly what I mean to say.

17 条评论
Building an All-Rounder Big Data Team

2016年9月8日

Building an All-Rounder Big Data Team

Talent wins games, but teamwork and intelligence win championships." --Michael Jordan Alone we can do so little…

11 条评论
Round 1 of the Bout - Microsoft R Server vs RStudio

2016年4月7日

Round 1 of the Bout - Microsoft R Server vs RStudio

In my earlier blog “Can Microsoft R server turbocharge Analytics Workloads”, I spoke at length about why we are looking…

2 条评论
Can Microsoft R server turbocharge Analytics Workloads?

2016年4月5日

Can Microsoft R server turbocharge Analytics Workloads?

R is a popular tool used by data scientists and engineers for data mining and statistical computing. It supports…

1 条评论
Why I rejected 300 Hadoop candidates!

2015年11月23日

Why I rejected 300 Hadoop candidates!

Having been a part of the IT industry for the last 18 years, I have had the chance to meet, interact and assess…

70 条评论
Are you planning for the Production Deployment of your Hadoop System – Part2

2015年10月15日

Are you planning for the Production Deployment of your Hadoop System – Part2

In my last blog, I discussed about the scenarios leading to the Production deployment of Hadoop and how the concerns…

7 条评论
Are you planning for the Production Deployment of your Hadoop System?

2015年9月22日

Are you planning for the Production Deployment of your Hadoop System?

Are you planning for the Production Deployment of your Hadoop System? So, you are a part of the Hadoop Bandwagon now!…

See all articles

Microsoft R Server Wins Round 2 of the Bout!

Yogesh Kulkarni

Co-Founder and Chief Technology Officer at Ellicium Technology Solutions

Yogesh Kulkarni的更多文章

社区洞察

其他会员也浏览了

RisingWave Newsletter June 2024

The Components of a System: Breaking Down the Basics

Why Certify? The Top Benefits of Microsoft 70-464 Certification

SQL Server on Azure VMs: Unraveling the SQL Server Price Puzzle

Manage your Power BI workloads between Premium and Fabric capacities

OpenStreetMap Server Self Hosted Setup Guide HowTo.

Proven Tips for Cracking Administering Microsoft SQL Server (70-462) Certification

Day 10 - Azure SQL Database

How to get better price of Azure resources for DEV and TEST environments

AAD authentication for Plotly Dash

Yogesh Kulkarni的更多文章

Importance of Agile Leadership in Embracing Change

Feeling Exiled in This Corona Pandemic? Key Points to Take Away from “The Mahabharata – Exile of the Pandavas”

Future of Big Data in 2018!

Thinking of Professional Advancement In Life – Head To The Himalayas!

Building an All-Rounder Big Data Team

Round 1 of the Bout - Microsoft R Server vs RStudio

Can Microsoft R server turbocharge Analytics Workloads?

Why I rejected 300 Hadoop candidates!

Are you planning for the Production Deployment of your Hadoop System – Part2

Are you planning for the Production Deployment of your Hadoop System?

社区洞察

其他会员也浏览了

RisingWave Newsletter June 2024

The Components of a System: Breaking Down the Basics

Why Certify? The Top Benefits of Microsoft 70-464 Certification

SQL Server on Azure VMs: Unraveling the SQL Server Price Puzzle

Manage your Power BI workloads between Premium and Fabric capacities

OpenStreetMap Server Self Hosted Setup Guide HowTo.

Proven Tips for Cracking Administering Microsoft SQL Server (70-462) Certification

Day 10 - Azure SQL Database

How to get better price of Azure resources for DEV and TEST environments

AAD authentication for Plotly Dash