Why is Partitioning Important?
When running a query like SELECT * FROM waf_logs
, Athena scans all data in the table, which leads to slow query execution and high costs.
To optimize performance and reduce costs, partitioning is essential, and using WHERE
to specify partitions is the best practice.
Why Use Partitioning?
1. Faster Query Execution
Without partitioning, Athena scans all files in S3, making queries slow.
Partitioning helps by limiting the scan to only relevant data.
Without Partitioning (SELECT * FROM waf_logs
)
SELECT * FROM waf_logs;
Scans all data → Slow & Expensive
With Partitioning (WHERE log_date
to filter)
SELECT * FROM waf_logs
WHERE log_date = '2025/02/23';
Scans only relevant data → Faster & Lower cost!
2. Reduced Query Costs
Athena charges based on the amount of data scanned.
Using partitioning prevents unnecessary scans and reduces costs.
Query Method | Scanned Data | Cost |
---|---|---|
No Partitioning (SELECT * FROM waf_logs ) | 1TB | $5 |
With WHERE log_date = 'YYYY/MM/DD' | 50GB | $0.25 |
Partitioning reduces costs by up to 95%!
3. Easier Data Management
Using partition keys that align with the S3 folder structure organizes data efficiently.
For example, using log_date
, hour
, and minute
as partition keys enables easier access to logs.
S3 Folder Structure Example
s3://your-waf-logs-bucket/AWSLogs/WAFLogs/
├── 2025/02/23/00/05/
├── 2025/02/23/00/10/
├── 2025/02/24/00/05/
├── 2025/02/24/00/10/
With Athena Partitioning, you can efficiently query only the necessary logs!
Best Practices for Partitioning
1. Choosing the Right Partition Keys
For WAF logs, partitioning by date and time is optimal due to the large volume of data.
Partition Key | Reason |
log_date | Divides logs by day for faster date-based queries |
hour | Filters logs by specific hours for better analysis |
minute | Allows analysis of requests in 5-minute intervals |
2. Always Use WHERE
with Partition Keys
If you don’t specify a partition key in the WHERE
clause, Athena scans all data, making queries slow and costly.
Good Example (Uses Partition Pruning)
SELECT * FROM waf_logs
WHERE log_date = '2025/02/23'
AND hour = '14'
AND minute = '05'
LIMIT 100;
Scans only relevant data → Fast & Cost-efficient
Bad Example (No Partition Filtering)
SELECT * FROM waf_logs
WHERE action = 'BLOCK'
LIMIT 100;
Scans all partitions → Slow & Expensive!
3. Use Partition Projection for Automation
Partition Projection allows Athena to automatically detect new partitions, eliminating the need to manually add them.
Example: Enabling Partition Projection
ALTER TABLE waf_logs SET TBLPROPERTIES (
'projection.enabled' = 'true',
'projection.log_date.type' = 'date',
'projection.log_date.format' = 'yyyy/MM/dd',
'projection.log_date.range' = '2025/01/01,2025/12/31',
'projection.log_date.interval' = '1',
'projection.log_date.unit' = 'DAYS',
'projection.hour.type' = 'integer',
'projection.hour.range' = '00,23',
'projection.hour.interval' = '1',
'projection.minute.type' = 'integer',
'projection.minute.range' = '00,55',
'projection.minute.interval' = '5',
'storage.location.template' = 's3://your-waf-logs-bucket/AWSLogs/WAFLogs/${log_date}/${hour}/${minute}/'
);
Now, Athena automatically recognizes partitions without manual ALTER TABLE ADD PARTITION
!
4. Verify Partitioning with SHOW PARTITIONS
After setting up Partition Projection, confirm if Athena recognizes partitions.
SHOW PARTITIONS waf_logs;
Expected Output
log_date=2025/02/23/hour=00/minute=05
log_date=2025/02/23/hour=00/minute=10
...
If partitions appear correctly, partitioning is working!
Summary
Key Point | Best Practice |
Faster Queries | Use WHERE log_date for partition pruning |
Lower Query Costs | Avoid scanning unnecessary data |
Better Data Management | Use date/time-based partitions for efficiency |
Partition Projection | Automatically detect new partitions |
In Athena, reducing scanned data improves both performance and cost-efficiency!
Partitioning is essential for efficient querying!