Amazon Athena– A Serverless Data Analytic tool
- NareshIT
Amazon Athena– A Serverless Data Analytic tool - NareshIT

Amazon Athena– A Serverless Data Analytic tool - NareshIT

It's not that easy the data analysis, and it's a complicated process, and we always want to make it simple as we need its results, and don't want to sweat a lot as well in the meantime. You will find a lot of tools for analytics, and AWS caters to Amazon Athena. In this tutorial, you will find the basics as well as the most advanced knowledge about AWS Athena. It's an interactive data analysis tool used for very tedious queries for the last time. It happens to be serverless, and hence you need not worry about setting it up, and you don't need to manage the infrastructure either.? Never consider it as the Database service. And hence you pay for only the queries that you execute. You only connect to the S3 data, describe the schema, and with SQL, you do the rest. Naresh I Technologies is the number one computer training institute in Hyderabad and among the top five computer training institutes in India. Contact us anytime for your aws training.?

The topics we will cover in this article are as follows. We will first provide a brief introduction to the Amazon Athena. Then, we will lay out the distinction between the MSSQL and the Amazon Athena. Then we will discuss its uses and know how to access it. Then we will discuss the features of Athena. And then, we will do two demos. The first is to make the table in Athena, and the second to compare Athena with MySQL.

An Introduction?

AWS launched Athena in 2016 on November 20. And as mentioned above, Athena is a serverless query service, which makes the data analysis through SQL possible of data saved in the S3. With just a few clicks in the AWS Management console, the customers can link the Athena with the data stored in S3. And then, run the queries with the help of the SQL. It is for getting the output within a fraction of a second.

You need not worry about the infrastructure management, and the customers make the payment only for the executed queries which they run. It also scales automatically, and queries execute in parallel. That provides quick results even when you are using a large dataset as well as complex queries. And now you know what it is, so let's figure out the differences between it and the SQL server.

Differences between Athena and SQL Server

Use of Amazon Athena

Being a data analyst, you can know about analyzing the data on S3. You will realize that S3 does provide storage. However, by now, you are not sure of the tools for analytics.

The AWS, however, provides AWS Athena, and hence now be assured you have the analytics tool for playing with the data. And via Athena, you can analyze the unstructured. And you can perform the same on the structured and semi-structured data stored inside the AWS S3.? With the help of Athena, we can create dynamic queries for the dataset. You can also use the Athena with the AWS Glue for ensuring better metadata in S3.


Via the AWS CloudFormation

With the help of the AWS CloudFormation and Athena, we can apply the named queries. These named queries allow us to provide a query name and then site it with the help of this name.

This interactive service from AWS is useful for Data Scientists. And, they are relevant for the developers to query the tables rather than running the lengthy query. We can also use it for fetching the data from the S3, then load it to various data stores, with the help of the Athena JDBC for the analysis and events related to Data Warehouse.

It's a great tool. Now, let's use this service from AWS.

How to Access Amazon Athena

You can access the Athena via the AWS Glue, cli and via the Athena with JDBC,

And now you know almost everything related to Athena. Let's now have a look at various features of the Athena.

Athena Features

Various features of Athena that make it a useful service for data analysis are as below:

  1. It's easy in implementation: We don't need installation. We can access it directly via the console and also via the CLI.
  2. We need not worry about the infrastructure, scaling, configuration, or failure. It's serverless.
  3. You pay per query and for the data involved. It's possible to save even more by compressing them and formatting the dataset accordingly.
  4. It's a fast analytic tool. You can perform very complex queries by breaking them into smaller ones and then run them parallelly, and finally combining the outputs into one desired format.
  5. You have control over the data set through the IAM policies and the AWS Identity.
  6. It's highly available, and you can execute these queries anytime. And it ensures 99.999% availability.
  7. You can integrate it with Glue to create a more awesome unified data repository. And thus, you can improve the data versioning, make the tables, views and all much better,

And it's efficient as well.?

Now you know enough about the Athena, and we can proceed with a demo.

Demo -1 (Making tables)

Now, you know everything related to Athena, let's dive into how we can create a query for querying the data stored in the form of a .json file within the S3 with the help of Athena.

At first, create multiple JSON files with the entries.

Now store the file in an S3 bucket.

Now make an external table for the files saved in S3.

Curate the queries for accessing the data.

Let's now understand how we can perform the above tasks.

  1. Create a JSON file. And we are sure you know how to make a simple JSON file with few entries.
  2. Now via the CLI, we will access the S3 bucket.
  3. Now configure the IAM user. You already know how to do these. You need to provide the access and secret access key and little more details which you know by now.
  4. Now create an S3 bucket.

Now copy the files to this s3 bucket.

  1. We need to create a table. If you don't have a database, then create a new database. Now provide the table with a table name. Now mention the file location like a file from S3 Bucket.
  2. Now pick the file type you are going to work with it. Then Pick the architecture of the data inside the file.????????????????????
  3. It's simple entering the data, and we don't require any partition for now. Now tap on "Create the table."
  4. Mention various column names and column types. And when you click on "Create table," Athena automatically generates the query for creating the tables. It will execute it automatically as well.
  5. Now you are up with the external table.

  1. We are going to write the query for selecting the data from the table.

  1. select * from nareshit
  2. ?Now tap on the run query, and you will get all the information in the table.

DEMO-II (Comparing MySQL and Athena)

Now, let's compare MySQL and Athena. Let's understand how simple queries take much less time when we run them in Athena.

  1. You need to load the CSV file to the MySQL, and you will find that it takes a lot of time in the case of MySQL though, in Athena, it only takes few minutes for uploading the CSV file to the S3, and the creation of the table requires even less time.?
  2. Now select the query: "select * from table;"

The query can be like: ""select * from "nareshit"." rows";""

Select this query in Athena. And then, select it in MySQL as well.?

  1. Select the query from MySQL
  2. Now select a column from the table.?
  3. Select a column in Athena
  4. Now select the same column in MySQL.
  5. Now for receiving the count for this column, write the query using SQL. It can be like:

SELECT count(name) FROM "nareshit". "rows" where name="Praveen" or name="Ranjith";

Now make use of the same query in Athena.?

Also, use this query in MySQL.

  1. Now write the query for counting the number of rows.

"select count (*) from nareshit.rows;"?

Now count all the rows in the Athena. Perform this query in MySQL as well.

  1. Now write the query with a specified range.

Select name FROM "nareshit"." rows" where name=" Saurabh" or name=" Praveen";

Now run this query in Athena and MySQL.?

You will find that the SQL commands run faster in Athena as compared to MySQL, and that completes the tutorial.

Naresh I Technologies is the number one computer training institute in Hyderabad and among the top five computer training institutes in India. Contact us anytime for your aws training. You can also opt for aws online training, and from any part of the world. And a big package is waiting for you. And all is yours for a nominal fee affordable for all with any range of budget. Let's have a look at what you will get with this AWS package:

  • You need to pay a nominal fee apart from the AWS fee for certification.
  • You can choose any AWS certification, as per your skills and interest.
  • You have the option to select from online and classroom training.
  • A chance to study at one of the best aws training institutes in India?
  • We provide aws training in Hyderabad and USA, and no matter in which part of the world you are, you can contact us.
  • Naresh I technologies cater to one of the best aws training in India.
  • And a lot more is waiting for you.

Contact us anytime for your complete AWS training.

Follow us for More Updates: https://bit.ly/NITLinkedIN

FAQ'S:

  • What is Amazon Athena?

Amazon Athena is an interactive query service that allows you to analyze data in Amazon S3 using standard SQL. It enables you to run ad-hoc queries on data stored in S3 without the need for infrastructure management or setup. Athena is serverless, meaning you pay only for the queries you run, with no upfront costs or capacity planning.

  • How does Amazon Athena work?

Amazon Athena uses Presto, an open-source distributed SQL query engine, to process SQL queries against data stored in Amazon S3. It supports various file formats such as CSV, JSON, Parquet, and ORC. Athena decouples compute and storage, so you can scale query processing independently of data storage. It integrates seamlessly with AWS Glue for data cataloging, making it easy to discover and query data using familiar SQL syntax.

  • What are the key benefits of using Amazon Athena?

Serverless: No infrastructure to manage; you pay only for the queries you run.

Scalability: Scales automatically to handle large datasets and complex queries.

Familiar SQL Interface: Supports standard SQL queries, making it accessible to users familiar with SQL.

Integration: Easily integrates with AWS services like Amazon S3 and AWS Glue for data cataloging.

Cost-effective: Pay-per-query pricing model with no upfront costs, ideal for ad-hoc and exploratory data analysis.

New Batch Details- AWS Online Training

Every week New Batches will be scheduled in NareshIT

要查看或添加评论,请登录

Naresh i Technologies的更多文章

社区洞察

其他会员也浏览了