AWS re:Invent 2018 Take-Aways: AI, ML & Analytics Take Center Stage

AWS re:Invent 2018 Conference was one of the most anticipated events in the tech sector, as there is a history of new innovations being announced during the key notes of CEO Andy Jessy, CTO Werner Vogel, and various other sessions. Fifty thousand people attended the yearly event in Las Vegas during the last week of October, and over 100,000 followed the live-streamed key notes.

In the last five years, we have seen AWS's cloud platform evolve from a virtual data center in the cloud mostly used by start-ups to the largest enterprise cloud platform taking more than half of the cloud market share, followed by MS Azure. Interestingly, Alibaba Cloud may now be in third place globally, thanks to a high degree of adoption in China and other Asia Pacific markets. Google's GCP holds the fourth place overall and third place in the Americas and EMEA.

From what we saw at the conference, it is clear that the rate of innovation is increasing, and the focus is now clearly on AI, ML, Analytics and a diverse offering of specialized data services. Underpinning the new innovations in this space is the trend to move increasingly to more serverless solutions, where instances and processing power are fully abstracted and managed by AWS.

Here were some of the major take-aways:

Artificial Intelligence & Machine Learning

Many sessions focused on business solutions driven by AI/ML and demonstrated how AI/ML capabilities are being integrated into solutions like connected car or marketing automation and analytics. Additionally, AWS announced several new or expanded services.

AWS's AI services deploy developed algorithms in the background to handle tasks like identifying items or people in pictures or video (Rekogntion); speech recognition to turn spoken language to text (Transcribe); natural language processing (Lex - used by Alexa); and voicing text (Polly). Most of these innovations were released during re:Invent 2017. While these services utilize complex ML algorithms, they basically allow developers to build these tasks into applications through APIs.

In 2018 AWS adds three advanced AI Services:

AWS Textract is a service which can extract data from scanned documents. Many industries rely heavily on customer-submitted documents (just think, for example, about the insurance or mortgage industry), and traditional OCR services typically just provide a first pass requiring a labor intensive process to obtain useful data from documents. AWS Textract promises to automate this process to a high degree. This is a service which could be disruptive in several industries.

AWS Personalize provides developers with the capability to integrate recommendations for products, services or content in their applications, without requiring the machine-learning capabilities typically needed for customer engagement.

AWS Forecast is a fully-managed, deep-learning service for time-series forecasting, which can be applied to typical forecast problems like retail-demand and supply-chain planning. Again, the idea is to deploy complex algorithms to solve business problems, while not actually being required to design or develop these algorithms.

The data scientist workbench service, Sagemaker, introduced during re:Invent 2017, now includes a Machine Learning Marketplace, with more than 150 algorithms and models which can be deployed. AWS also introduced Amazon SageMaker Ground Truth to build highly accurate datasetswhile, at the same time, reducing data labeling costs.

Additionally, Amazon SageMaker RL brings new ML capabilities to build, train and deploy with reinforcement learning. Closely related to this was the release of the AWS Deepracer, a 1/18th scale racecar, driven by reinforcement learning and the AWS Deepracer League, which would be the world's first global autonomous racing car leagues, open to anyone to join. Apart from creating some excitement around the reinforcement learning capabilities, it seems AWS is taking steps towards creating autonomous vehicles, partially by crowdsourcing the underlying models.

Data Services

AWS also introduced three new database services and heavily invested in existing data services:

Quantum Ledger Database (QLDB): AWS studied the demand for blockchain-like technologies to implement secure ledgers and concluded that there may be both a need for centralized ledgers, managed by a company, and the distributed ledger capabilities, offered by blockchain technologies. The Quantum Ledger Database is proposed as an immutable, high-performing and secure source of truth, which essentially locks every data iteration in a ledger which can never be altered. Furthermore, AWS announced a Managed Blockchain for cases where a decentralized ledger is truly needed.

Timestream: A fast, scalable, fully-managed time-series database service for IoT and operational applications which will allow processing and analytics of high numbers of time- dependent events (think machine sensor outputs for example) at a cost and performance hard to achieve by relational databases.

Neptune: A fully-managed graph database, actually released before the conference. Based on graph technology, Neptune can be used to store and analyze a great number of different relationships in a way that would not be easily accomplished in a relational database. A good example is analyzing customers through social media and querying through the various ways they are connected to each other to find possible other customers who may be interested in the same product. Neptune partnered with Tom Sawyer Software to provide a fully-integrated solution for building applications to visualize and analyze data and connections. Additionally, Neptune supports Apache Tinkerpop's Gremlin query language to help users express traversals through graph-structured data.

These new services join AWS's key value and document store, Dynamo DB, in-memory managed database service Elasticache (which uses Redis or MemCache, AWS Relational Database Services (RDS), which offer AWS-managed versions of commercial databases (Oracle, SQL Server), open source (MySQL, MariaDB) and AWS's own Aurora Database.

Redshift, the massive parallel-processing database which has been the go-to solution for data warehouse solutions in AWS, is evolving in a number of interesting ways. Notably, it seems that AWS has been paying attention to the success of Snowflake's fully-managed, low-maintenance and auto-scaling solution and is making several steps towards providing similar features in Redshift. Redshift Spectrum has been around since last year and allows for a storage and compute split, by allowing Redshift compute nodes to query data stored in S3. AWS has also announced a new auto-scaling feature, which will allow Redshift to automatically build new compute clusters, which will have visibility to data in the initial cluster (and presumably in S3 in the case of Spectrum, although this was not clear yet) through a caching layer. This removes a painful process to build a new cluster and manage the data hydration. Auto scaling addressed the need to handle large user and workload concurrency, which is often the Achilles heel of Redshift. AWS has also invested in addressing the maintenance overhead required to maintain Redshift clusters, by creating a more efficient process to resize nodes, an automated assignment of sort keys, as well as automation for other maintenance tasks like VACCUUM and ANALYZE.

Data Lake Formation One of the most intriguing new services promises to automate the cataloging and ingestion of datasources, allowing for the creation of a data lake in days instead of months. This would leverage data standards and security patterns to be set up and executed. Glue features are most likely leveraged here, but there was no information available yet on the exact technologies coming into play.

Overall, AWS re:Invent stood out for the amount of innovation and the emphasis on analytics and data. Also interesting to note was that in 2017, AI services were introduced as "here is what the API provides, and we are curious to know how you will use it". In contrast, this year there were many sessions showing how business problems in various industries could be solved by an architecture which incorporates database technologies, AI services and ML capabilities in an integrated fashion, which points to both the market and AWS gaining maturity in this space.

 

 

Chanakya R.

Sr Data Engineer | Snowflake | PySpark | Informatica | ADF | SQL | Hive | SnowPro Certified | AWS Certified

5 年

excellent article Willem Vervarcke Thanks a lot

Yasir Minhas

Accomplished data leader with over 19 years of specialized expertise in Master Data Management, Data Governance, and Data Quality. Certified Six Sigma professional driving data excellence and operational efficiency.

5 年

Very interesting and concise article. Thanks

要查看或添加评论,请登录

社区洞察

其他会员也浏览了