AWS Athena

Amazon Athena

Athena is an interactive query service for analyzing data stored in Amazon S3 using standard SQL. As, It is serverless, so there is no infrastructure to set up or manage, and customers pay only for the queries they run.

It supports structured and semi-structured data formats such as CSV, JSON, Parquet, and more, making it ideal for flexible exploration of data.

Features

Works directly on files in S3.
Does not modify data in Amazon S3 during analysis, but it does use schema-on-read technology when queries are executed
Best for large-scale, unstructured data in S3, but not suitable for complex, high-performance analytics on massive datasets.
Use Redshift for complex, high-performance analytics on structured data.
Use Athena to process logs, perform ad-hoc analysis, and run interactive queries

1. What does Athena do?

Imagine you have a spreadsheet (CSV file) stored in Amazon S3. Athena lets you write SQL queries to analyze or fetch specific information from that spreadsheet without setting up a database or server.

2. Athena and Glue together

Athena: For querying and analyzing already-prepared data in S3.
Glue: For organizing, cleaning, and preparing messy or scattered data before analysis.

Both services work well together:

Use Glue to prepare and clean your data.
Use Athena to analyze it.

3. Can use Athena on DynamoDB and Amazon RDS?

Athena doesn’t natively query DynamoDB or RDS directly. Athena primarily works with data stored in Amazon S3, but it can also query with some additional configurations.

Use AWS Glue to extract data from DynamoDB/RDS and store it in S3 in a queryable format like Parquet or JSON.
Once the DynamoDB/RDS data is in S3, you can use Athena to query it just like any other data in S3.

4. Athena and Redshift together in a modern data architecture:

Use Athena for initial exploration and analysis on raw data in S3.
Once the data is cleaned and transformed, load it into Redshift for advanced, high-performance analytics and dashboard.

5. Athena, Redshift and Kinesis data analytics

Amazon Kinesis Data Analytics is designed for real-time data processing and analysis, which makes it fundamentally different from Athena and Redshift. While all three services are used for data analysis, but they serve different purposes.

Feature	Kinesis Data Analytics	Athena	Redshift
Nature of Data	Real-time, streaming data	Static data stored in S3	Structured, relational data
Use Case	Real-time analytics (e.g., monitoring)	Ad-hoc or batch querying	Complex, high-performance analytics
Data Source	Kinesis Streams, Kafka, Firehose	Files in S3 (CSV, JSON, Parquet)	Redshift tables or data from S3/RDS
Processing Speed	Millisecond/second-level latency	On-demand, batch processing	Scheduled, batch analytics
Cost Model	Pay for compute and processing time	Pay-per-query	Pay for compute and storage
Example Query	Detect fraudulent transactions as they occur	Summarize static logs in S3	Generate dashboards/reports

Amazon Athena

1. What does Athena do?

2. Athena and Glue together

3. Can use Athena on DynamoDB and Amazon RDS?

4. Athena and Redshift together in a modern data architecture:

5. Athena, Redshift and Kinesis data analytics

Quick Link

Query

Follow Us