The Future Of Cloud-Based Data, Analytics, and Machine Learning: Highlights from AWS re:Invent 2022
The Future Of Cloud-Based Data, Analytics, and Machine Learning: Highlights from AWS re:Invent 2022

The Future Of Cloud-Based Data, Analytics, and Machine Learning: Highlights from AWS re:Invent 2022

Thank you for reading my latest article?The Future Of Cloud-Based Data, Analytics, and Machine Learning: Highlights from AWS re:Invent 2022.?Here at?LinkedIn?and at?Forbes?I regularly write about management and technology trends.

To read my future articles simply join my network here or click 'Follow'. Also feel free to connect with me via?Twitter ,??Facebook ,?Instagram ,?Slideshare ?or?YouTube .

---------------------------------------------------------------------------------------------------------------

Many of those who follow my posts and blogs do so because they are interested in the latest developments in artificial intelligence (AI), data, and analytics. In this field, Amazon Web Services VP of Data and Machine Learning, Dr. Swami Sivasubramanian’s keynote at AWS re:Invent is one of the most important dates on the calendar.

I was lucky enough to be in Las Vegas for the leading cloud provider’s annual conference and had the chance to hear all about the latest developments at AWS as well as forthcoming products launching now or in the near future.

In some ways, this year’s biggest announcements were about integrating services to make them work more seamlessly together. The big themes included data technology enabling new ideas and innovation, the democratization of data, and the ongoing evolution of low-code/no-code solutions within the space.

There was some news about exciting new products and services, too, however, so I'll start with a rundown of the ones that I feel are likely to have the biggest impact.

Announcements

Before heading into the news and announcements, Sivasubramanian started with a dramatic introduction highlighting how some of history’s greatest discoverers – including Archimedes and Newton – hadn’t come up with their grand theories of concepts such as buoyancy and gravity out of thin air. Instead, they built – or iterated, to use modern parlance - on a body of knowledge that humanity developed over decades or centuries. ?

This set the scene nicely for the unveiling of a number of announcements that, while they may not seem earth-shattering in their own rights, are designed to enable new thinking and innovation that could lead to real change.

One of the most interesting was the announcement of Amazon DataZone – a data-management-as-a-service platform designed to enable businesses to bring all of their data together for use anywhere within the organization, along with granular management of essential features such as permissioning, security, and data governance.

This is an acknowledgment – as well as a potential solution – from AWS of the fact that today’s organizations are often well aware of the importance of understanding and leveraging their data assets, but find it increasingly difficult as their data grows to petabyte or even exabyte scale, often spread across numerous departments, databases, and data siloes. Attempting to bring it all together so it can be useful for anyone who has ideas about what to do with it invariably means balancing trade-offs between security and accessibility. Not to mention the complications caused by having any number of different database formats, storage infrastructures, and taxonomies. With Amazon DataZone, the idea is that all of this can be managed through a unified web portal connecting to AWS services such as Amazon S3 (an object storage service) and Amazon Redshift (a cloud data warehouse).

Also hot off the press at AWS re:Invent was news that Amazon’s machine learning platform Amazon SageMaker will now support geospatial data – in the US Western region, at least, where the feature has now gone into preview mode. Geospatial data includes GPS coordinates, satellite images, and map data – basically anything related to location. Why is this noteworthy? Well, location data is increasingly playing a role in any number of environmental and commercial use cases, where visualizing geographical features such as urban growth, vegetation, elevation, and transport networks can help with the planning and delivery of ecological initiatives, infrastructure delivery, and civil engineering. One feature that AWS showed off in its demonstration is the ability to remove cloud coverage obscuring ground details from satellite images.

Another of Sivasubramanian's headline announcements was that support for the open-source analytics engine Apache Spark had been added to AWS’ serverless interactive analytics service, Amazon Athena. Spark is one of the most widely used industry-standard analytics frameworks, and being able to access its runtime through the Amazon Athena dashboard – with Amazon Athena automatically handling configuration – will mean AWS customers can get started with interactive analytics using Amazon Athena for Apache Spark in under a second to analyze petabytes of data.

Sivasubramanian also took time to discuss the launch of Elastic Clusters for Amazon DocumentDB, which will allow customers to scale up their Amazon DocumentDB document storage and query infrastructure with support for millions of queries and read/write operations per second and petabytes of storage.

Data quality is an increasingly important factor for organizations dealing with AI and machine learning, where inbuilt bias and faults in data can quickly have a cascading effect that jeopardizes the value of an entire project. A new feature announced for the data integration service, AWS Glue, known as AWS Glue Data Quality , aims to stop data lakes from becoming “data swamps” by automating data quality assurance procedures and rules and creating alerts when bad, missing, or outdated data is detected.

One more announcement that I want to highlight is support being rolled out across Amazon Redshift for multiple zones of availability (Redshift Multi-AZ ), allowing data to be quickly shifted and re-deployed in the event of a failure in one zone. This is best thought of as an inbuilt-backup feature that allows AWS customers running business-critical data operations to feel secure in the knowledge they have better-than-ever levels of resilience, with automatic recovery in the event of failure and everything managed as if it were effectively one centralized data warehouse. This feature is currently available to customers in selected US East, US West, Asia Pacific, and Europe regions.

Inside Analysis

Following Sivasubramanian’s keynote, I took the chance to sit down and discuss the most exciting highlights, as well as some of AWS’s future plans, with one of his colleagues - Dr. Matt Wood, VP of Product for AWS.

Wood told me that he was particularly interested in the new Amazon Aurora zero-ETL (extract, transform, load) integration with Amazon Redshift and the Amazon Redshift integration for Apache Spark that were first mentioned the previous day by AWS’s CEO Adam Selipsky and were also a feature of Sivasubramanian’s announcements.

Regarding Amazon Aurora zero-ETL integration with Apache Spark, Wood said, "Up until today, you really had to do a lot of pipework, data engineering, ETL work … and moving that from a transactional system to a data warehouse system; you have to do a lot of repetitive work, a lot of real undifferentiated heavy lifting … you have to build the scripts, make sure they’re running, put them in the pipelines, fix them when they break. And every time you make a change, which will happen often, you have to update all of your scripts, so it quickly becomes an unsustainable black hole of work for customers. But today, all of that goes away, and you can literally click a button … and move that data from your Amazon Aurora tables …to be available for analysis inside Redshift”.

Aside from that, Wood told me he was also happy about the new Amazon Redshift integration for Apache Spark, which makes it easier for AWS customers to run Spark applications on Amazon Redshift data using AWS analytics and ML services.

Regarding the Redshift integration for Spark, Wood said, "Spark is probably the most popular way of building big data apps," he said, "It's a high-performance way of building big, distributed systems that can analyze a lot of data, and we've added integration with Spark that you can now run from pretty much anywhere on AWS including Amazon Athena, Elastic MapReduce (Amazon EMR) … and you can now run those Spark queries directly against your data in Amazon Redshift”.

Wood continues, “… so with just those two capabilities, we significantly increased the amount of data available for analytics and significantly reduced the barrier of entry for working with data … just allowing customers to get the answers to their questions from the data much more quickly.”

You can watch my whole conversation with Matt Wood here :

To stay on top of the latest on the latest trends, make sure to subscribe to?my newsletter , follow me on?Twitter ,?LinkedIn , and?YouTube , and check out my books ‘Data Strategy: How To Profit From A World Of Big Data, Analytics And Artificial Intelligence’ and?‘Business Trends in Practice ’.

---------------------------------------------------------------------------------------------------------------

About Bernard Marr

Bernard Marr is a world-renowned futurist, influencer and thought leader in the field of business and technology. He is the author of?21 best-selling books ?(and winner of the?2022 Business Book of the Year ?award), writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations. He has over 2 million social media followers, over 1.2 million newsletter subscribers and was ranked by LinkedIn as one of the top 5 business influencers in the world and the No 1 influencer in the UK.

No alt text provided for this image
Dr Frank Koenig

Senior Project Manager R&D Digitalisation at Siemens Logistics

1 年

Hi Bernard, working on cases where AI meets pragmatism (real industrial cases) and follow your publications closely.

Sushant Paricharak

Data Engineering Data and Analytics

1 年

Thank you Bernard for sharing the summary from re:Invent. It has given me a great details about what’s coming up on our way to address and manage the big data processes by adding newly introduced AWS services. I always like your knowledge articles and read it from start to end. Thank you and appreciate your efforts compiling the summary on helping we people who have not attended AWS re:Invent yet.

Jerome T. Siacor

Digital Marketer | Project Manager | Sales Coach

1 年

It's about time an expert showed us the way into the latest on cloud technology. God bless and keep writing, Bernard Marr ?? all the best!

要查看或添加评论,请登录

Bernard Marr的更多文章

社区洞察

其他会员也浏览了