What can we publicly summize about what we have seen this year at re:invent?
- Data takes centre stage. More so than in previous years it felt like AWS addressed 'data', rather than just 'technology'. Doubtless driven by (1) Increased customer focus on extracting value from data (2) Maturating of fundamental cloud tech building blocks.
- We are heading towards a no-ETL world. ETL is a friction to extracting value from data. Customers do not want to shoulder this burden, as it is not value-additive to their businesses.
- Customers continue to demand vertical specific solutions to problems unique to their industries.
- Serverless - it just makes so much sense!:) Know where and on what you want to compete. Double down in that area, and out-source the rest.
- Big picture: Cloud is not just about reducing costs, enabling better data management, or using the latest ML techniques. It is a framework to more quickly understand your customers problems, enable innovation and deploy effective, scalable and secure solutions. #customerobsession
Below is a summary of some of the cool new releases from AWS re:invent to date.
DATA
SECURITY
- Verified Permissions. A scalable, fine-grained permissions management and authorization service for custom applications.?With Verified Permissions,?application developers can let their end users manage permissions and share access to data. https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-verified-permissions-preview/
- Security Lake. Automatically centralizes security data from cloud, on-premises, and custom sources into a purpose-built data lake stored in your account. With Security Lake, you can get a more complete understanding of your security data across your entire organization.? https://aws.amazon.com/security-lake/
REDSHIFT
- Redshift Dynamic Data Masking. Allows customers to simplify the process of protecting sensitive data in Redshift. With Dynamic data masking, customers control access to their data through SQL based masking policies that determine how Redshift returns sensitive data to the user at query time. Dynamic data masking makes it simple for the customers to adapt to changing privacy requirements without altering underlying data or updating SQL queries. https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-redshift-support-dynamic-data-masking-preview/
- Redshift integration with Apache Spark. Enables Apache Spark applications to access Redshift data from AWS analytics services such as EMR, Glue, and SageMaker. Customers pushdown operations such as sort, aggregate, limit, join, and scalar functions so that only the relevant data is moved from Redshift to the consuming Spark application https://aws.amazon.com/redshift/features/integration-for-apache-spark/
- Redshift integration with Aurora. With Aurora zero-ETL integration with Redshift, transactional data is automatically and continuously replicated seconds after it is written into Aurora and seamlessly made available in Redshift . Zero-ETL integration makes it easier to run petabyte-scale analytics on transactional data in Aurora in near real time with Redshift. https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-aurora-zero-etl-integration-redshift/
- Redshift real-time streaming ingestion from KDS and MSK. Enables customers to achieve low latency, measured in seconds, while ingesting hundreds of megabytes of streaming data per second into Redshift.?https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-redshift-real-time-streaming-ingestion-kds-msk/
- Redshift now supports Multi-AZ. A Redshift Multi-AZ deployment allows you to recover in case of AZ failures without any user intervention. A Redshift Multi-AZ deployment is accessed as a single data warehouse with one endpoint and helps you maximize your data warehouse performance by distributing workload processing across multiple AZs automatically. https://press.aboutamazon.com/2022/11/aws-announces-five-new-database-and-analytics-capabilities
- Redshift Integration with Informatica Data Loader. https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-redshift-integration-informatica-data-loader-tool-data-uploads/ ?
- Redshift data sharing now supports centralized access control with AWS Lake formation. With the new Amazon Redshift data sharing managed by AWS Lake Formation customers can view, modify, and audit permissions on the tables and views in the Redshift datashares using Lake Formation APIs and the AWS Console, and allow the Redshift datashares to be discovered and consumed by other Redshift data warehouses. https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-redshift-data-sharing-centralized-access-control-lake-formation-preview/
- Redshift Auto-Copy from S3. simplify and automate data ingestion from S3 into Redshift by setting up copy jobs, user-defined data ingestion rules to track S3 locations for new files, and executing configured copy statements for each detected file. https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-redshift-supports-auto-copy-amazon-s3/
ANALYTICS
- OpenSearch Serverless. Run Search and Analytics Workloads without Managing Clusters. https://aws.amazon.com/blogs/aws/preview-amazon-opensearch-serverless-run-search-and-analytics-workloads-without-managing-clusters/
- Five new QuickSight capabilities. Today’s announcement expands QuickSight Q, a natural language querying capability, to support forecast and “why” questions and automate data preparation, making it easier and faster to start asking questions in natural language. Additionally, customers can now create and share paginated reports alongside interactive dashboards, quickly analyze and visualize billion-row datasets directly in QuickSight, and programmatically create and manage BI assets to accelerate migration from legacy systems. https://www.businesswire.com/news/home/20221128005874/en/AWS-Announces-Five-New-Capabilities-for-Amazon-QuickSight
- Glue for Apache Spark Native support for Data Lake Frameworks (Apache Hudi, Apache Iceberg, Delta Lake) https://aws.amazon.com/about-aws/whats-new/2022/11/aws-glue-apache-spark-native-data-lake-frameworks-apache-hudi-iceberg-delta-lake/
- Glue for Ray. Makes it easy to scale Python code to process large scale data in Glue. https://aws.amazon.com/about-aws/whats-new/2022/11/aws-glue-ray-preview/
- Glue Data Quality. Glue Data Quality builds confidence in your data by ensuring high data quality. It automatically measures, monitors, and manages data quality in your data lakes and data pipelines.?https://aws.amazon.com/about-aws/whats-new/2022/11/aws-glue-data-quality-preview/
- Athena for Apache Spark. The streamlined, interactive, serverless experience of Athena with Spark, in addition to SQL.?Athena takes care of managing the infrastructure and configuring Spark settings.?Build interactive PySpark applications using a simplified notebook experience in the Athena console or through Athena APIs. Spin up Spark workloads up to 75 times faster than other serverless Spark offerings. https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-athena-now-supports-apache-spark/
MACHINE LEARNING
- SageMaker Data Wrangler integration with Amazon AppFlow. With SageMaker Data Wrangler, you can explore and import data from a variety of popular sources, such as S3, Athena, Redshift, Snowflake, Databricks and Salesforce Customer Data Platform. Starting today, we are making it easier for customers to aggregate data for ML from over 40 third-party application data sources, including Salesforce Marketing, SAP, Google Analytics, LinkedIn and more https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-sagemaker-data-wrangler-over-40-third-party-applications-data-sources/
VERTICAL SPECIFIC
OTHER
Chief Marketing Officer | Product MVP Expert | Cyber Security Enthusiast | @ GITEX DUBAI in October
2 年Hugh, thanks for sharing!
Capital Markets Leader | Generative AI, Machine Learning & Data Platforms | Investment Technology | Scaling Growth & Operations | Speaker | Advisor | Board Member
2 年Great summary!