Importance and Best Practices of Partitioning in Amazon Athena

Why is Partitioning Important?

When running a query like SELECT * FROM waf_logs, Athena scans all data in the table, which leads to slow query execution and high costs.
To optimize performance and reduce costs, partitioning is essential, and using WHERE to specify partitions is the best practice.


Why Use Partitioning?

1. Faster Query Execution

Without partitioning, Athena scans all files in S3, making queries slow.
Partitioning helps by limiting the scan to only relevant data.

Without Partitioning (SELECT * FROM waf_logs)

SELECT * FROM waf_logs;

Scans all data → Slow & Expensive

With Partitioning (WHERE log_date to filter)

SELECT * FROM waf_logs
WHERE log_date = '2025/02/23';

Scans only relevant data → Faster & Lower cost!


2. Reduced Query Costs

Athena charges based on the amount of data scanned.
Using partitioning prevents unnecessary scans and reduces costs.

Query MethodScanned DataCost
No Partitioning (SELECT * FROM waf_logs)1TB$5
With WHERE log_date = 'YYYY/MM/DD'50GB$0.25

Partitioning reduces costs by up to 95%!


3. Easier Data Management

Using partition keys that align with the S3 folder structure organizes data efficiently.
For example, using log_date, hour, and minute as partition keys enables easier access to logs.

S3 Folder Structure Example

s3://your-waf-logs-bucket/AWSLogs/WAFLogs/
  ├── 2025/02/23/00/05/
  ├── 2025/02/23/00/10/
  ├── 2025/02/24/00/05/
  ├── 2025/02/24/00/10/

With Athena Partitioning, you can efficiently query only the necessary logs!


Best Practices for Partitioning

1. Choosing the Right Partition Keys

For WAF logs, partitioning by date and time is optimal due to the large volume of data.

Partition KeyReason
log_dateDivides logs by day for faster date-based queries
hourFilters logs by specific hours for better analysis
minuteAllows analysis of requests in 5-minute intervals

2. Always Use WHERE with Partition Keys

If you don’t specify a partition key in the WHERE clause, Athena scans all data, making queries slow and costly.

Good Example (Uses Partition Pruning)

SELECT * FROM waf_logs
WHERE log_date = '2025/02/23'
AND hour = '14'
AND minute = '05'
LIMIT 100;

Scans only relevant data → Fast & Cost-efficient

Bad Example (No Partition Filtering)

SELECT * FROM waf_logs
WHERE action = 'BLOCK'
LIMIT 100;

Scans all partitions → Slow & Expensive!


3. Use Partition Projection for Automation

Partition Projection allows Athena to automatically detect new partitions, eliminating the need to manually add them.

Example: Enabling Partition Projection

ALTER TABLE waf_logs SET TBLPROPERTIES (
  'projection.enabled' = 'true',
  'projection.log_date.type' = 'date',
  'projection.log_date.format' = 'yyyy/MM/dd',
  'projection.log_date.range' = '2025/01/01,2025/12/31',
  'projection.log_date.interval' = '1',
  'projection.log_date.unit' = 'DAYS',
  'projection.hour.type' = 'integer',
  'projection.hour.range' = '00,23',
  'projection.hour.interval' = '1',
  'projection.minute.type' = 'integer',
  'projection.minute.range' = '00,55',
  'projection.minute.interval' = '5',
  'storage.location.template' = 's3://your-waf-logs-bucket/AWSLogs/WAFLogs/${log_date}/${hour}/${minute}/'
);

Now, Athena automatically recognizes partitions without manual ALTER TABLE ADD PARTITION!


4. Verify Partitioning with SHOW PARTITIONS

After setting up Partition Projection, confirm if Athena recognizes partitions.

SHOW PARTITIONS waf_logs;

Expected Output

log_date=2025/02/23/hour=00/minute=05
log_date=2025/02/23/hour=00/minute=10
...

If partitions appear correctly, partitioning is working!


Summary

Key PointBest Practice
Faster QueriesUse WHERE log_date for partition pruning
Lower Query CostsAvoid scanning unnecessary data
Better Data ManagementUse date/time-based partitions for efficiency
Partition ProjectionAutomatically detect new partitions

In Athena, reducing scanned data improves both performance and cost-efficiency!
Partitioning is essential for efficient querying!