登录查看更多内容

Amazon Athena

Rohit Singh

Associate Project Manager @ HuQuo

发布日期: 2024年10月22日

Amazon Athena is a service that enables data analysts to perform interactive queries in the web-based cloud storage service, Amazon Simple Storage Service (S3). Athena is used with large-scale data sets. Amazon S3 is designed for online backup and archiving of data and applications on Amazon Web Services (AWS). Amazon S3 was created to make web-scale computing easier for developers, with use cases such as data storage, archiving, website hosting, data backup and recovery, and application hosting for deployment. Amazon Athena enables users to analyze data in Amazon S3 using Structured Query Language (SQL). The tool is designed for quick, ad hoc and complex analysis. Because Athena is a serverless query service, analysts do not need to manage any underlying compute infrastructure to use it. They also do not need to load S3 data into Amazon Athena or transform it for analysis, making it easier and faster to gain insights.

What Does Amazon Athena Do?

AWS Athena is best described as an interactive query service that’s capable of seamlessly using standard Structured Query Language (SQL) to analyze data stored in the Amazon Simple Storage Service (Amazon S3).

Amazon Web Services (AWS) introduced Athena to simplify the whole process of analyzing raw Amazon S3 data in massive volumes. You do not have to load Amazon S3 data into Amazon Athena and then transform it for analysis. And that makes the service ideal for teams that want to perform ad hoc, quick, or complex data analyses. AWS Athena is also serverless and built to scale automatically. The fact that Athena is serverless means you won’t be required to set up or manage any infrastructure.

Benefits of Using Amazon Athena

Easy to use – Amazon Athena doesn’t require complex Extract, Transform, and Load (ETL) processes, so even users with basic SQL skills can use it. Even business analysts and other data professionals can adopt it, as standard SQL queries are very simple and straightforward.
Flexible – Amazon Athena’s open and versatile architecture doesn’t restrict you to a specific vendor, technology, or tool. You can, for example, work with a wide range of open-source file formats, as well as switch freely between query engines without adjusting the schema.
Highly available query service – Athena runs queries with compute resources distributed across multiple facilities as well as multiple devices within each facility.
Built for Amazon S3 – S3 is Amazon Athena’s primary data store, a durable, highly available data store.
Query your data almost instantly – Athena enables you to start querying your data in a few seconds. Simply point Amazon Athena to the data you’ve stored in S3, specify the schema, and begin querying it with Standard SQL.
It’s serverless – You do not manage the underlying compute infrastructure, setting you free to focus on optimizing the outcomes. You won’t have to worry about setting up clusters, regulating capacity, or loading data.
Pay your fair share – It’s pay per query, so you pay only for the queries you run — not the underlying infrastructure, etc. The service doesn’t charge you for compute instances. Instead, you only pay for the queries you’re running
Built on Presto and Trino – The interactive query service leverages Presto with ANSI SQL support. It also supports a variety of data formats; JSON, Apache web logs, CSV, Parquet, TSV, Text files with custom delimiters, ORC, Ion, and Avro.
Integrated with Amazon’s Glue Data Catalog by default – This means you can create a central repository for metadata across multiple services, discover schemas across data sources, add new and updated table and partition definitions to your Catalog, and manage schema versioning.

Glue offers fully managed ETL capabilities. That means you can use it to transform your data or restructure it into columnar formats for better performance and cost optimization.

领英推荐

DynamoDB Difinition & Data Modeling

Omar Ismail 2 年前

AWS Glue Tutorial for Beginners

Neal K. Davis 3 年前

Mastering Azure Data Administration: A Comprehensive…

Quantum Analytics NG 8 个月前

Advantages of Amazon Athena

Adopting Amazon Athena offers a number of benefits. Its serverless architecture enables rapid querying of data without the need for infrastructure management, making it an attractive option for organizations looking to reduce IT overhead.

Moreover, Amazon Athena:

Is cost-efficient
Supports a wide range of data formats
Provides fast access to data
Has seamless integration with other AWS services

These features make distributed data processing frameworks a powerful and versatile tool for data analysts, especially when dealing with data scanned from various sources.

Amazon Athena Limitations

While Amazon Athena is an impressive and relatively inexpensive query service, it does come with some limitations, including:

Cost Unpredictability: The pay-per-query model in Athena has its pros and cons. On one hand, it offers flexibility, but if your queries aren't finely tuned, or if your partitioning strategy isn't well-planned, it can work against you. Furthermore, in the absence of query optimization and a sound partitioning strategy, you may inadvertently query data you don't need, ultimately incurring unnecessary expenses.
Performance Inconsistency: Athena operates without the provision of exclusive resources. Instead, your queries draw from a shared resource pool with fellow users within the same AWS region. Consequently, it may not be the most suitable choice for applications demanding immediate, real-time outcomes.
Optimization Limitations: Optimization is constrained just to queries; data that is already stored in S3 cannot be further optimized.

Amazon Athena

Rohit Singh

Associate Project Manager @ HuQuo

What Does Amazon Athena Do?

Benefits of Using Amazon Athena

领英推荐

Advantages of Amazon Athena

Amazon Athena Limitations

更多精彩文章

社区洞察

其他会员也浏览了

Saying goodbye to writing CRUD? OR Data API Builder and Azure cloud

AWS DynamoDB Fundamentals | A Complete Guide

Data Virtualization for Google Bigquery with a powerful combination of Lyftrondata

Is Tessell the Snowflake of Operational Databases?

Azure Cosmos DB’s Advantages Over Standard Databases

Amazon Redshift’s Top Performance Features and Latest Capabilities

Big Data - AWS, Azure, GCP Offerings

Day - 07 | Databases & Analytics | AWS Cloud Practitioner Certification CLF-C02

A Deep Dive into Google Cloud's Data Warehouse Solution

Summary of AWS re:Invent -2021 keynote announcements by Swami Sivasubramanian on 1-Dec-2021.

What Does Amazon Athena Do?

Benefits of Using Amazon Athena

领英推荐

Advantages of Amazon Athena

Amazon Athena Limitations

GDRP

2024年11月13日

Data Vault

2024年11月12日

Rest API

2024年11月11日

Kafka

2024年11月9日

TestNG

2024年11月7日

NLP

2024年11月6日

Performance Optimization

2024年11月2日

Jenkins Pipeline

2024年11月1日

Gradle

2024年10月30日

Data Architecture

2024年10月28日

社区洞察

其他会员也浏览了

Saying goodbye to writing CRUD? OR Data API Builder and Azure cloud

AWS DynamoDB Fundamentals | A Complete Guide

Data Virtualization for Google Bigquery with a powerful combination of Lyftrondata

Is Tessell the Snowflake of Operational Databases?

Azure Cosmos DB’s Advantages Over Standard Databases

Amazon Redshift’s Top Performance Features and Latest Capabilities

Big Data - AWS, Azure, GCP Offerings

Day - 07 | Databases & Analytics | AWS Cloud Practitioner Certification CLF-C02

A Deep Dive into Google Cloud's Data Warehouse Solution

Summary of AWS re:Invent -2021 keynote announcements by Swami Sivasubramanian on 1-Dec-2021.