Highlights ApacheCon 2019

Highlights ApacheCon 2019

Many people reached me asking for more information regarding the last version of ApacheCon in North America. Here are some highlights, mainly in the Big Data and Analytics subject:

General trends

Containers are everywhere. Many of the projects will add or already added support for containers and taking the advantage of this technology. The last session I watched was totally focused on change from the traditional approach of Hadoop (combine storage and compute) to a "containerized" Hadoop for flexibility and cloud experience.

Apache Ozone

Cool presentation about Ozone and how integrates in the Hadoop ecosystem. Ozone is an Object Storage oriented layer integrated with many of the Hadoop ecosystem projects. One of the presentation was held by Márton Elek, that is working mainly to make Ozone runs on containers.

https://www.youtube.com/watch?v=ySDjSLeWzNw

Twitter DAL and Hybrid Architecture

Cool session from Twitter challenges managing more than 1.5 trillion eventos per day (yes, 1,5 trillion daily). The interesting thing is DAL (Data Access Layer) that works like a data catalog that points to physical locations where data is stored. The other interesting part is how they synchronize their on prem data stored in Hadoop with Object Storage in Google (CGS), and extends the DAL capabilities to map that storage layer too:

https://www.slideshare.net/lohitvijayarenu/managing-100s-of-petabytes-of-data-in-cloud

KSQL

Interesting features in Real Time with Kafka Streams and KSQL:

https://www.slideshare.net/KaiWaehner/kafka-streams-vs-ksql-for-stream-processing-on-top-of-apache-kafka-142127337

The First Mile, or just data ingestion :)

Awesome presentation about NiFi delivered by Andy LoPresto. It is very clear that the efforts goes on the MiNiFi directions (more capabilities in the edge means ingest more data, consecutively more analytics).

https://www.youtube.com/watch?v=Txd5J7evxdE

Apache Griffin

Cool project to add Data Quality capabilities for Big Data, including Hadoop, and how eBay is using and integrating in their current architecture:

https://www.slideshare.net/HadoopSummit/using-hadoop-to-build-a-data-quality-service-for-both-realtime-and-batch-data

https://griffin.apache.org/

Machine Learning Productionize and Others

Apache Marvin: Marvin is an awesome tool to help the DS team in the last mile of ML projects, deploying and monitoring models:

https://www.slideshare.net/DanielTakabayashiSof/marvin-platform-potencializando-equipes-de-machine-learning

https://www.slideshare.net/DanielTakabayashiSof/marvin-ai-an-open-source-platform-to-deploy-and-manage-machine-learning-models-99287684?next_slideshow=1

Apache MXNet: more focus in deep learning, MXNet is tool to also help in the last mile of ML project: 

https://incubator.apache.org/projects/mxnet.html

Apache HiveMall: what do think about run ML models with queries? Many of us will say is a joke, right? That's the idea behind HiveMall. Mainly using UDFs in Hive, HiveMall propose to democratize ML for traditional SQL users:

https://speakerdeck.com/takuti/apache-hivemall-query-based-handy-scalable-machine-learning-on-hive?slide=47

Community Over Code: The Apache Way

Hope you enjoy the content!

Dinesh Chitlangia

Artificial Intelligence/Machine Learning, Distributed Systems, Apache PMC & Committer, Senior Member IEEE

5 å¹´

Thanks for the shoutout for Ozone!?

要查看或添加评论,请登录

Alex Campos的更多文章

  • Lakehouse, make Big Data great again

    Lakehouse, make Big Data great again

    Maybe the best title for this article is "The Data Lake is Dead; Long Live the Data Lake!" but Martin Willcox already…

    8 条评论
  • Leveraging Distributed SQL in the Digital Transformation Journey

    Leveraging Distributed SQL in the Digital Transformation Journey

    The Digital Path There is no doubt that many industries and companies are suffering transformations. This is great for…

  • From Business Intelligence to Generative AI: a retrospective view

    From Business Intelligence to Generative AI: a retrospective view

    What about if machines can generate content, like human-like responses, pictures, videos and music? Good news, we are…

    4 条评论
  • Digital Strategist Skills

    Digital Strategist Skills

    A new business context VUCA is the word that defines the current business context, composed of high volatility…

    4 条评论
  • Alfabetización Analítica

    Alfabetización Analítica

    Hoy en día con el auge de la analítica, donde se escucha que los datos son el nuevo petróleo, las organizaciones están…

    5 条评论
  • Big Data and the five “W”

    Big Data and the five “W”

    The increase in the last years in the creation of data, the new gold of companies, tempts interest and research for…

  • Big Data y las 5 “W”

    Big Data y las 5 “W”

    El incremento en los últimos a?os en la generación de datos, el nuevo oro de las empresas, provoca el interés y…

社区洞察

其他会员也浏览了