登录查看更多内容

IBM presents Predictive Analytics

Ruben Rabines

Lead Senior IT Recruiter | Following companies leveraging AR / VR / AI / Space / Robotics / Blockchain

发布日期: 2017年1月2日

We had our last 2016 Data Science Meetup speaker series, December 15th with Jim Crozier from IBM Data Science, giving us a great detailed introduction to Python 2.0 with Spark 2.0 (PySpark) and showcased it analyzing NFL data.

Jim gave us the Apache Spark IBM approach…Power of data. Simplicity of design. Speed of innovation.

Apache Spark is an open-source cluster computing framework with in-memory processing to speed analytic applications up to 100 times faster compared to other technologies currently in the market. Is known for its ease of use in creating algorithms that harness insight from complex data.

Jim also spoke about the Spark Core, and the different languages you can use such as R, Python and Scala. He told us that Scala has strong static types. Errors are raised at the compilation stage. It makes your development process easier especially in big projects. Also is based on JVM so it’s native for Hadoop. Hadoop is important because Spark was made on the top of the Hadoop’s filesystem HDFS.

Scala interacts with Hadoop via native Hadoop’s API in Java. That’s why it’s very easy to write native Hadoop applications in Scala.

He also covered some information on Machine learning that has come a long way from its early roots in classical math and statistics. Today’s machine learning uses analytic models and algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. This means data analysts and scientists can teach computers to solve problems without having to recode rules each time a new data set is presented.

Using algorithms that learn by looking at hundreds or thousands of data samples, computers can make predictions based on these learned experiences to solve the same problem in new situations. And they’re doing it with a level of accuracy that is beginning to mimic human intelligence.

IBM is helping organizations apply Machine Learning through the power of Apache Spark, bringing significant benefits to the analytics industry as companies increasingly make space for machine learning in the enterprise.

The demand for machine learning is booming!

Jim gave us some information about Pandas Data-frames and how they are not part of the Spark Library. Pandas is an open source Python library for data analysis.

Jim spoke about how IBM is mastering the art of Data Science via the IBM Data Science Experience. Is a new cloud-based, social work-space that helps data professionals consolidate create and collaborate across multiple open source tools such as R and Python. Read more Here.

Since its release, Apache Spark has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Netflix, Yahoo, and eBay have deployed Spark at massive scale, collectively processing multiple petabytes of data on clusters of over 8,000 nodes. It has quickly become the largest open source community in big data, with over 1000 contributors from 250+ organizations.

It was a great evening, thanks so much David Smith for putting together such a great event, and to Jim for giving us so much information on Spark, we hope to have you again in 2017.

IBM presents Predictive Analytics

Ruben Rabines

Lead Senior IT Recruiter | Following companies leveraging AR / VR / AI / Space / Robotics / Blockchain

更多精彩文章

社区洞察

其他会员也浏览了

PySpark Why and When to Use

BigData Analytics with PySpark

Understanding the PySpark

Understanding Spark on YARN Architecture

Best Ways to Use Pandas with PySpark

An In-depth Exploration of PySpark: A Powerful Framework for Big Data Processing

PySpark

How to implement Apache Spark in Data Processing and Analytics?

Spark - Managers' snapshot

28 Promising Companies Leading And Disrupting Industries With AI

2020年4月14日

Developing The Construction Sector With Artificial Intelligence

2018年9月2日

The Future Of Data Capture Systems (2/2): The Rossum Approach

2018年8月25日

The Death Of The Data Scientist

2018年8月16日

The Problem Of Biased Algorithms And How To Prevent Them

2017年9月28日

Will #ArtificialIntelligence surpass human intelligence?

2017年7月12日

15 Free Machine Learning Books

2017年3月7日

Women in Data Science, Strength and Empowerment

2017年2月9日

Data Scientist Interview Series: Q&A with Davi Abdallah

2017年1月27日

Top 10 Big Data Trends for 2017

2016年12月12日