登录查看更多内容

Round 1 of the Bout - Microsoft R Server vs RStudio

Yogesh Kulkarni

Co-Founder and Chief Technology Officer at Ellicium Technology Solutions

发布日期: 2016年4月7日

In my earlier blog “Can Microsoft R server turbocharge Analytics Workloads”, I spoke at length about why we are looking to conduct our own tests to assess if Microsoft R Server can handle workloads better than RStudio Open Source. The first round of testing has been completed and the results are slightly disappointing. Based on a few claims on the Datasheet, they are nowhere near what we were expecting! Is there a reason for it? Let’s find out….

To recap what we did for the Testing:

Sandbox instance details –

Machine – Intel Core i7 Quad-core 64 bit processor, 16 GB RAM
R version – 3.0.1
MySQL – 5.6.12

Flow of the R code used for testing –

Load the required R packages
Connect to a table containing varying loads (few millions of rows) and retrieve selective data
Process the data based on selective business rules and analytics algorithms in either of the two ways -

Process data and load it into another table using one-row-at-a-time processing
Hold data in a Data Frame, process it for all rows and load it into another table in bulk

Test Execution – The above R code was run using RStudio and Microsoft R Server for 10k rows, 50k rows and 100k rows and for one-row-at-a-time processing

Test Results -

Observations –

i) For a lower volume of data (10k rows and 50k rows), Microsoft R server seems to run 6% faster than RStudio when the processing is one row at a time. However, this difference of 6% is really not very significant.
ii) However, for higher volume of data (100k rows) and for one-row-at-a-time processing, RStudio seems to give a performance similar to that of Microsoft R Server.

Conclusion –

For one-row-at-a-time processing, Microsoft R server seems to get little opportunity to parallelise processing and make use of disk. As a result, the performance improvement is not very significant. To put it in different words, the one-row-at-a-time processing approach will not utilise the Microsoft R Server processing capabilities.
For one-row-at-a-time processing, run times scale linearly i.e. as the workload increases, the run times increase proportionately. To put it in different words, performance for higher volume of rows can be predicted to a good degree of precision.

Next Steps – We will now change the approach used in the R code. Instead of doing one-row-at-a-time processing, we will get the data in an R data frame and instruct R to do bulk processing. Let’s see if Microsoft R Server gets an opportunity to prove its prowess!

查看更多评论

要查看或添加评论，请登录

Yogesh Kulkarni的更多文章

Importance of Agile Leadership in Embracing Change

2023年7月21日

Importance of Agile Leadership in Embracing Change

The Role of Agile Leadership in Embracing Change In today's fast-paced digital landscape, effective leadership is…

7 条评论
Feeling Exiled in This Corona Pandemic? Key Points to Take Away from “The Mahabharata – Exile of the Pandavas”

2020年4月27日

Feeling Exiled in This Corona Pandemic? Key Points to Take Away from “The Mahabharata – Exile of the Pandavas”

As I write this article, there is one thing which I can be 100% sure of - a majority of you who are reading this…

19 条评论
Future of Big Data in 2018!

2017年12月14日

Future of Big Data in 2018!

The year 2017 was an interesting one in the Big Data world. Though the adoption of Hadoop as the Big Data platform…

8 条评论
Thinking of Professional Advancement In Life – Head To The Himalayas!

2017年8月30日

Thinking of Professional Advancement In Life – Head To The Himalayas!

The title of my blog might sound weird to some. However, that is exactly what I mean to say.

17 条评论
Building an All-Rounder Big Data Team

2016年9月8日

Building an All-Rounder Big Data Team

Talent wins games, but teamwork and intelligence win championships." --Michael Jordan Alone we can do so little…

11 条评论
Microsoft R Server Wins Round 2 of the Bout!

2016年4月21日

Microsoft R Server Wins Round 2 of the Bout!

In my earlier blogs “Can Microsoft R server turbocharge Analytics Workloads” and “Round 1 of the Bout Between Microsoft…

2 条评论
Can Microsoft R server turbocharge Analytics Workloads?

2016年4月5日

Can Microsoft R server turbocharge Analytics Workloads?

R is a popular tool used by data scientists and engineers for data mining and statistical computing. It supports…

1 条评论
Why I rejected 300 Hadoop candidates!

2015年11月23日

Why I rejected 300 Hadoop candidates!

Having been a part of the IT industry for the last 18 years, I have had the chance to meet, interact and assess…

70 条评论
Are you planning for the Production Deployment of your Hadoop System – Part2

2015年10月15日

Are you planning for the Production Deployment of your Hadoop System – Part2

In my last blog, I discussed about the scenarios leading to the Production deployment of Hadoop and how the concerns…

7 条评论
Are you planning for the Production Deployment of your Hadoop System?

2015年9月22日

Are you planning for the Production Deployment of your Hadoop System?

Are you planning for the Production Deployment of your Hadoop System? So, you are a part of the Hadoop Bandwagon now!…

See all articles

Round 1 of the Bout - Microsoft R Server vs RStudio

Yogesh Kulkarni

Co-Founder and Chief Technology Officer at Ellicium Technology Solutions

Yogesh Kulkarni的更多文章

社区洞察

其他会员也浏览了

The SQL Server 'Trifecta'

SQL Server Notes by AB | Note #31 | Database-wise CPU Cost | #ABSQLNotes

NetApp Doc Installation

SQL Server Notes by AB | Note #25 | Parallelism & CXPACKET | #ABSQLNotes

SQL Server Notes by AB | Note #21 | Processor % Processor Time vs Process % Processor Time | #ABSQLNotes

From VMsprawl to DBbrawl

SQL Server Notes by AB | Note #15 | Identifying Workloads That Are Causing High IO | #ABSQLNotes

SQL Server Notes by AB | Note #26 | RESOURCE SEMAPHORE Wait Type & SQLQERESERVATIONS Memory Clerk | #ABSQLNotes

Setting up Microsoft FHIR Server on your laptop & FHIR-ing up...

SQL Server 2022 Release Candidate is now available

Yogesh Kulkarni的更多文章

Importance of Agile Leadership in Embracing Change

Feeling Exiled in This Corona Pandemic? Key Points to Take Away from “The Mahabharata – Exile of the Pandavas”

Future of Big Data in 2018!

Thinking of Professional Advancement In Life – Head To The Himalayas!

Building an All-Rounder Big Data Team

Microsoft R Server Wins Round 2 of the Bout!

Can Microsoft R server turbocharge Analytics Workloads?

Why I rejected 300 Hadoop candidates!

Are you planning for the Production Deployment of your Hadoop System – Part2

Are you planning for the Production Deployment of your Hadoop System?

社区洞察

其他会员也浏览了

The SQL Server 'Trifecta'

SQL Server Notes by AB | Note #31 | Database-wise CPU Cost | #ABSQLNotes

NetApp Doc Installation

SQL Server Notes by AB | Note #25 | Parallelism & CXPACKET | #ABSQLNotes

SQL Server Notes by AB | Note #21 | Processor % Processor Time vs Process % Processor Time | #ABSQLNotes

From VMsprawl to DBbrawl

SQL Server Notes by AB | Note #15 | Identifying Workloads That Are Causing High IO | #ABSQLNotes

SQL Server Notes by AB | Note #26 | RESOURCE SEMAPHORE Wait Type & SQLQERESERVATIONS Memory Clerk | #ABSQLNotes

Setting up Microsoft FHIR Server on your laptop & FHIR-ing up...

SQL Server 2022 Release Candidate is now available