Larger clouds over the Big Data landscape - Five main trends at Big Data Tech Warsaw 2021 - Part 4/5.
Each day this week I'll share one of the five Big Data trends that will be covered in detail by numerous presentations at the forthcoming edition of Big Data Tech Warsaw (February 25-26th, 2021).
Trend 4.
Larger clouds over the Big Data landscape
10 years ago only a few companies ran their Big Data infrastructure and pipelines in the public cloud (e.g. Netflix). The default way to build Big Data solutions was to use on-premise infrastructure and an ecosystem of open-source components. In 2012 and 2013, we even saw examples of the companies that tried the public cloud and migrated to their own datacenters due to high costs, issues with elasticity, and service unavailability.
Everyone was also saying that the public cloud is very expensive no matter how you calculate your costs.
This started changing in 2014 when Google and Microsoft begin to compete with Amazon. In my personal opinion, a big milestone for cloud providers was to convince Spotify to move from their large on-premise & open-source data infrastructure to the public cloud. This sent a message to the Big Data community that the public cloud has several great benefits that companies like Spotify are willing to pay for.
This trend has accelerated since then. In 2020 we saw even much faster adoption of public cloud solutions (at least in Poland) and now we actually see that the public cloud is used by a wide range of companies in almost all sectors (including banking and insurance).
Public Cloud at Big Data Tech Warsaw
During the Big Data Tech Warsaw conference, we will have many presentations related to the use of the public cloud. Here I highlight a few examples:
- Outfit7 develops highly-popular mobile apps such as My Talking Tom 2. They collect and analyze 3 TB of gaming events on an average per day in Google Cloud using Kubernetes, Dataflow, BigQuery, Cloud Composer, Jupyter, and Tableau. We will hear how their cloud-based architecture looks like, how their end-to-end real-time pipelines are implemented, how their team fights downtime with proactive monitoring and integration tests, as well as challenges they face. Last but not least, you’ll hear about the challenges that Outfit7 faced when the amount of data it had to handle skyrocketed during the peak of the COVID-19 quarantine.
- There will be more companies speaking about how they use Google Cloud. For example, Sotrender (a company that analyses social media data) will share how they train and deploy their machine learning models with Google Cloud Platform (e.g. AI Platform Notebooks, AI Platform Training, Cloud Run, Gitlab CI/CD) covering the full lifecycle of ML model, from experimentation, through training and deployment, to model monitoring. TalentAlpha will explain how they analyze HR-related on top of Google Cloud Platform, which can be used for skills analysis, psychometrics, assessment of specialists, recruitment, career guidance, and more. Allegro (the largest Polish e-commerce platform) will introduce BigFlow - an open-source Python framework for data processing on the Google Cloud Platform. In some aspects, BigFlow is similar to Scio (developed by Spotify).
- Of course, fans of other public clouds will find something for themselves. There will be a number of presentations that describe the solutions based on AWS. Simply Business (an online broker of business insurance) has built a customer data platform on top of AWS, where they combine different data points from different services and use them for scoring, personalization, and calculation of critical business metrics. During their talk, they will focus on describing how they implement the stateful applications using Kafka Streams and they will share the lessons they learned from running such applications in production for 2 years. I also recommend you to watch the slides by Simply Business from 2016 to see how their stack looked in the past and what business use-cases they implemented using a mix of open-source and AWS technologies. These use-cases apply to many other companies as they are about building a customer data platform, scoring, omnichannel, and personalization. Please note that these slides come from 2016, and during the conference, we will learn how it looks like today (many changes!).
- Everyone knows that the pay-as-you-go model has pros and cons. You pay only for what you use, but if you don't control the costs, or don't use the cloud services efficiently, or don't know common optimization techniques, then you can actually pay more than you really need. Nowa Era (the biggest educational publisher in Poland) will describe their statistical models (ARIMA) & techniques for AWS Spot instances price prediction that helps to achieve impressive cost optimization for Big Data infrastructure (up to 80% compared to on-demand instances). This will be interesting to everyone who would like to learn to use the public cloud in a more cost-efficient way.
- Last but not least, we will also hear about the production use-cases built on top of Azure. H&M will describe their multi-year AI/ML journey in the public cloud (Azure, Databricks) and explain how their architecture has evolved over time, covering the entire MLOps stack addressing a few common challenges in AI and Machine learning product, like development efficiency, end to end traceability, speed to production.
There are only a few highlighted examples, but you will definitely learn about the public cloud (GCP, AWS, Azure) at Big Data Tech Warsaw 2021 (February 25-26th).
What's next?
On Friday, I will share the 5th trend. Please stay tuned!
In the meantime, I encourage you to check our agenda and register before January 15th to take advantage of New Year Promotion (link).
As you might expect, this year the conference will be organized in form of an online interactive. Please check my recent blog post that explains how COVID-19 changes Big Data Tech Warsaw 2021 but makes it greater at the same time.
If you like this post, please give us a like, or share it or leave a comment. Thanks!
synerise.com | basemodel.ai | cleora.ai | wislakrakow.com | agh.edu.pl
4 年Great article!