Highlights ApacheCon 2019
Many people reached me asking for more information regarding the last version of ApacheCon in North America. Here are some highlights, mainly in the Big Data and Analytics subject:
General trends
Containers are everywhere. Many of the projects will add or already added support for containers and taking the advantage of this technology. The last session I watched was totally focused on change from the traditional approach of Hadoop (combine storage and compute) to a "containerized" Hadoop for flexibility and cloud experience.
Apache Ozone
Cool presentation about Ozone and how integrates in the Hadoop ecosystem. Ozone is an Object Storage oriented layer integrated with many of the Hadoop ecosystem projects. One of the presentation was held by Márton Elek, that is working mainly to make Ozone runs on containers.
https://www.youtube.com/watch?v=ySDjSLeWzNw
Twitter DAL and Hybrid Architecture
Cool session from Twitter challenges managing more than 1.5 trillion eventos per day (yes, 1,5 trillion daily). The interesting thing is DAL (Data Access Layer) that works like a data catalog that points to physical locations where data is stored. The other interesting part is how they synchronize their on prem data stored in Hadoop with Object Storage in Google (CGS), and extends the DAL capabilities to map that storage layer too:
https://www.slideshare.net/lohitvijayarenu/managing-100s-of-petabytes-of-data-in-cloud
KSQL
Interesting features in Real Time with Kafka Streams and KSQL:
The First Mile, or just data ingestion :)
Awesome presentation about NiFi delivered by Andy LoPresto. It is very clear that the efforts goes on the MiNiFi directions (more capabilities in the edge means ingest more data, consecutively more analytics).
https://www.youtube.com/watch?v=Txd5J7evxdE
Apache Griffin
Cool project to add Data Quality capabilities for Big Data, including Hadoop, and how eBay is using and integrating in their current architecture:
Machine Learning Productionize and Others
Apache Marvin: Marvin is an awesome tool to help the DS team in the last mile of ML projects, deploying and monitoring models:
Apache MXNet: more focus in deep learning, MXNet is tool to also help in the last mile of ML project:
https://incubator.apache.org/projects/mxnet.html
Apache HiveMall: what do think about run ML models with queries? Many of us will say is a joke, right? That's the idea behind HiveMall. Mainly using UDFs in Hive, HiveMall propose to democratize ML for traditional SQL users:
Community Over Code: The Apache Way
Hope you enjoy the content!
Artificial Intelligence/Machine Learning, Distributed Systems, Apache PMC & Committer, Senior Member IEEE
5 å¹´Thanks for the shoutout for Ozone!?