Summary of the 6th Community over Code Performance Engineering Track (October 7, 2024, Denver, Colorado, USA)
After much anticipation, the 6th Community over Code Performance Engineering track was held on October 7 2024 in Denver, Colorado, USA. I've only just returned to Australia (hence the slight delay in writing this track report), but after the conference, I tracked down my new favourite steam locomotive, the Union Pacific "Big Boy" in the Forney Museum of Transportation - this is one massive locomotive (impossible to photograph in practice) as it is over 40m long, with 16 drive wheels (4-8-8-4 class), weighed 1.2 million pounds (600 tons), and produced a whopping 7,000 horsepower - way more than diesel locomotives of the time. They were probably the highest-performing steam locomotives ever built and would be a good candidate for the new Performance Engineering track train mascot!
This time around Roger Abelenda and I (Paul Brebner) were the co-chairs. I briefly introduced the track and explained the motivation (Apache projects have many performance and scalability challenges, some projects have solutions in the form of tools and best-practices and experiences that can be shared, and there are plenty of opportunities for cross-fertilization, particularly between old and new projects, including incubator projects).
From an innovation perspective, I have been hoping for talks that explore Open Source + Performance innovation (e.g. code analysis, simulation, etc) and noted that we have had one talk in the past that was close (byte code analysis for Camel), and that LLM's are likely to have an impact in the future.
The talks this time around were:
Paul Brebner (co-chair), Making Apache Kafka even faster and more scalable
Roger Abelenda (co-chair), Skywalking Copilot: A performance analysis assistant
Ritesh Shukla, Tanvi Penumudy, presented by Ethan Rose, Overview of tools, techniques and tips - Scaling Ozone performance to max out CPU, Network and Disk
Shawn McKinney, Load testing with Apache JMeter
Chaokun Yang, Introduction to Apache Fury Serialization
My talk on Kafka performance highlighted the performance impact of recent Kafka architectural changes (KRaft and Tiered storage), with a summary of Zipf's law and Kafka cluster size distribution from my C/C EU talk. Some general conclusions included that Kafka is still hard to benchmark, we need more "science" to compare results, benchmarking of cached/tiered systems is (still) tricky, etc.
Roger's talk on Apache Skywalking copilot ticked the boxes for open source performance engineering innovation for me, and also used LLM's! He demonstrated a new performance assistant for Apache Skywalking (an APM tool) that can help users find and analyse alarms, traces, metrics, topologies, metric charts and talk generally about performance. This was very clever and has enormous potential I think (and all made possible by open source - there's probably no way this could be done as easily - if at all - with closed-source APM tools).
Next up we had another in a series of great talks on Ozone performance - unfortunately Ritesh and Tanvi, couldn't attend in person, but Ethan Rose did a great job presenting with Q&A with Ritesh virtually at the end. Ethan covered flamegraphs vs. metrics (and why metrics plus flamegraphs are better), how to design good dashboards (redundancy is ok, LLMs are your friend for Grafana), the best order for tooling, and challenges/solutions scaling open source projects - all lessons learned from performance engineering of Ozone but widely applicable to other projects!
Shawn McKinney gave a great introduction to Apache JMeter, covering an overview of load testing in general, and going into more depth on load testing an LDAP system. This type of talk demonstrates the value of good introductory material with examples as not everyone is always at the same level across projects - I know I learned a lot. Note to self - maybe I should do an introduction to Performance Engineering next time!
We had our 2nd Shawn (Shawn Yang) present the final talk of the day Apache Fury, a blazingly-fast multi-language serialization framework. This was a great talk on performance by design, and ticked yet another box for me, and our first presentation in the track (in the US at least) from an Incubator project - great work! I think it works so well because it uses best serialization/deserialization practices per data type combined with some other magic. The examples Shawn gave included Flink, so I wonder if it would also work with Kafka?
We lost a 6th talk in the track (due to visa issues) leaving me time to attend the final talk of the Streaming track which also had a performance flavour: ?? Matthias J. Sax "The Nuts and Bolts of Kafka Streams: An Architectural Deep Dive" - Kafka Streams is a powerful technology for streams processing and we can look forward to ongoing improvements to scalability, reliability, high availability and performance etc.
So, by the end of the day here is the list of technologies we've covered in this track so far:
Apache Kafka
领英推荐
Apache JMeter & Selenium
Kubernetes
Apache Arrow
Java Profiling
Apache Flink
Apache Spark/ML
Apache Hadoop
Apache Ozone
Apache Cassandra
Apache Camel
Apache Lucene
Apache Iceberg
Apache Impala
Oxia
Apache Skywalking
Apache Fury
Thanks again to the speakers, attendees (about 150 in total this time) and Apache Software Foundation Community over Code conference organisers. We hope to run the event again so put your thinking caps on and start coming up with some possible talk titles and abstracts. If you also like to be involved in reviewing etc let us know.
The presentations will be available online eventually - I'll add them here when I find out where.
In the meantime here's a link to the intro and my talk: https://www.slideshare.net/slideshow/making-apache-kafka-even-faster-and-more-scalable/272645669
Just like Performance Engineering, driving the "Big Boy" locomotive was non-trivial - just look at the all controls (although the coal feed was automatic) - with the potential risk of the boiler exploding - fun!
Chief Technology Officer at Abstracta
4 个月Great summary as you always do. Thank you Paul for being such a good leader and pioneer in Apache community on performance topics.