登录查看更多内容

Big Data tools at the Master in Business Analytics and Big Data at IE

Dennis Pedersen

Sales Strategy & Finance @Elastic | ex-AWS | MBA

发布日期: 2020年12月1日

During the Master in Business Analytics and Big Data here at IE we have been introduced to many different big data technologies. For the IE Big Data Club newsletter I have summarized what we have seen so far and how we used them.

R: R, or RStudio (which is the interface we use when programming in R), is a programming language which has its strengths within statistics and is a popular choice for data analysis. It is a free software that is supported by a large community which comes with many predefined ‘packages’. During our studies we have also come across R in other subjects such as Time Series Forecasting, Recommendation Engines, and Social Network Analysis. One example, of how we used R was to create a time series forecast for a stock price.

SQL: Standard Query Language (SQL) is a language to communicate with relational database management systems. We got familiar with creating, updating, and deleting tables from a database which holds important business information such as customer data. Also, we looked into extracting relevant business information from large databases. For example, finding employee specific information in your company. Overall, we got familiar with the structure of:

SELECT <select list>, FROM <table>, WHERE <predicates>, GROUP BY <expression>, HAVING <condition>, ORDER BY <…>

Hadoop: Hadoop, or rather Apache Hadoop is a software that allows for computing large amounts of data through a network of many computers. Hadoop can be used for several parts, storage and processing in the big data value chain which consists of its sources, ingestion, storage, processing, and serving. It is also open-source and has its strengths within batch processing.

Dataiku: Dataiku is web-based platform that facilitates data analytics and machine learning models amongst others. We used it for our machine learning classes as well as data competitions. One project was using machine learning models in order to predict housing prices in a given neighbourhood.

Python: Python is a programming language that can be used for many data science related tasks such as data mining, data visualization, and machine learning. It is similar to R open source and supported by a global community. We used for many different classes such as Recommendation Engines, Machine Learning, and Data Visualization.

NoSQL: Given the complexity of databases, SQL (even though still by many) is not enough. NoSQL refers to languages other than SQL, as it is also referred to “Not Only SQL”. In broad terms it is possible to divide NoSQL databases into: document-oriented, column-family, key-value store, and graph-oriented.

Spark: Apache Spark, is also a tool for handling large amounts of data such as Hadoop. However, Spark is able to handle streaming data, in other words real time data that is processed immediately, unlike Hadoop. A use case for Spark could be fraud detection, where the incoming data needs to be processed immediately and cannot wait until the next batch in an hour.

PowerBi: PowerBI is a popular data visualization tool, for example when creating dashboards to keep track of project developments. Data can be imported from Excel amongst others which makes it easy to use. We were introduced to PowerBI in our Data Visualization class, but also used it for data competition projects.

Written by Dennis Pedersen

要查看或添加评论，请登录

Dennis Pedersen的更多文章

Ethics in AI & the role of EU

2020年12月15日

Ethics in AI & the role of EU

In this article Capgemini's Oscar Alonso & Guillermo Blanco Mu?o share their thoughts on ethics in AI and the EU and…

1 条评论
Why AI projects fail?

2020年12月1日

Why AI projects fail?

In this article for the IE Big Data club, IE IMBA student Margaret McLeod who previously worked at Booz Allen…
Data Catalogue - the Excalibur Sword for Scaling AI

2020年11月17日

Data Catalogue - the Excalibur Sword for Scaling AI

It is a pleasure to present the latest article from Oscar and Guillermo who both work at Capgemini in Madrid. Thank you…
Human + Machine: the need for Artificial Emotional Intelligence?!

2020年11月17日

Human + Machine: the need for Artificial Emotional Intelligence?!

I spoke with IE Master in Big Data student Sydne-Aline Strasser on the importance of combining AI with emotional…

1 条评论
Product Manager - The job you didn't know you wanted

2020年11月3日

Product Manager - The job you didn't know you wanted

I caught up with Miguel Ors Bravo who is a senior product manager at Amazon for the IE Big Data Club biweekly…
What, Who & How: Driving compliance, cybersecurity and analytics through data governance

2020年10月20日

What, Who & How: Driving compliance, cybersecurity and analytics through data governance

On behalf of the IE Big Data Club I was in touch with Oscar Alonso Llombart and Guillermo Blanco Mu?oz who both work at…

1 条评论
Good Advanced Analytics Results Start With Data Governance!

2020年10月20日

Good Advanced Analytics Results Start With Data Governance!

On behalf of the IE Big Data Club I spoke with IE alumnus Hugo Loredo, who currently works at BCG on the importance of…
How to organize your Big Data News, in a Big Data Way?

2020年10月6日

How to organize your Big Data News, in a Big Data Way?

Inspired by IE professor Enrique Dans During my first semester of the International MBA, our Innovation’s in a Digital…

1 条评论
How to Survive IE’s Master in Big Data?

2020年10月6日

How to Survive IE’s Master in Big Data?

(… from someone who barely survived it.) On behalf of the IE Big Data Club, I spoke with IE alumnus Marcela Zablah on…

See all articles

Big Data tools at the Master in Business Analytics and Big Data at IE

Dennis Pedersen

Sales Strategy & Finance @Elastic | ex-AWS | MBA

Dennis Pedersen的更多文章

社区洞察

其他会员也浏览了

Real-Time Data Engineering Challenges in Databricks: How to Overcome Common Pain Points with PySpark

SQL: The Basics for Data Science Newbies | Learnbay

“THE FUNDAMENTALS OF BIG DATA TOOLS: MapReduce, Spark, and Hive”

Mastering the Technical Stacks: A Guide for Data & Analytics Professionals

Meet Ultipa Manager: Toolkits for Data Scientists

Top 10 big data platforms – Part 1

Data Formats and Compression in Data Engineering: Best Practices for CSV, Excel, JSON, Parquet, and Avro

Delta Lake Format: Understanding Parquet under the hood.

Data Modelling and Query Languages

Dennis Pedersen的更多文章

Ethics in AI & the role of EU

Why AI projects fail?

Data Catalogue - the Excalibur Sword for Scaling AI

Human + Machine: the need for Artificial Emotional Intelligence?!

Product Manager - The job you didn't know you wanted

What, Who & How: Driving compliance, cybersecurity and analytics through data governance

Good Advanced Analytics Results Start With Data Governance!

How to organize your Big Data News, in a Big Data Way?

How to Survive IE’s Master in Big Data?

社区洞察

其他会员也浏览了

Real-Time Data Engineering Challenges in Databricks: How to Overcome Common Pain Points with PySpark

SQL: The Basics for Data Science Newbies | Learnbay

“THE FUNDAMENTALS OF BIG DATA TOOLS: MapReduce, Spark, and Hive”

Mastering the Technical Stacks: A Guide for Data & Analytics Professionals

Meet Ultipa Manager: Toolkits for Data Scientists

Top 10 big data platforms – Part 1

Data Formats and Compression in Data Engineering: Best Practices for CSV, Excel, JSON, Parquet, and Avro

Delta Lake Format: Understanding Parquet under the hood.

Data Modelling and Query Languages