← Mark McGranaghan

CloudFront Analytics

2019-10

You’re reading a static website stored in S3 and served out of CloudFront. I recently set up web analytics for it with:

I now get analytics with flexible SQL queries:

Athena demo

This proved a solid approach. There’s no software I need to manage, no client-side JavaScript, and no third-party providers beyond AWS. Here’s how I set it up and use it.

Sending CloudFront logs to S3

First we need to store CloudFront access logs in S3. As a bonus, we’ll have this raw data if we want to analyze it outside of Athena later.

Ensure you have an S3 bucket to use for logs. I use one bucket for all my sites, with a directory for each site:

CloudFront logs index

Then in the CloudFront distribution for your site, enable access logging into that bucket (and into the appropriate directory if you’re using that setup):

CloudFront logs enabling

Cloudfront will deposit access logs in that directory every few minutes, assuming you have non-zero traffic. Here’s what mine look like:

CloudFront logs listing

If you download, decompress, and view one of those files, you’ll see lines of tab-seperated data, one per request to CloudFront. Each line will have the request URL, HTTP method, referrer, and so on.

The appeal of the Athena approach is that you can run SQL queries directly against these compressed log files in S3 with standard SQL. Let’s look at that next.

Querying web access logs in Athena

Now head over to Athena in the AWS console and create a database to use for your CloudFront analytics. I called mine “cloudfront”:

Athena database DDL

Create a table that points to your data in S3, using this query template, with your S3 location sub’d in. This query also shows the fields you’ll have access to in your SQL queries:

Athena table DDL

That’s it, you’re ready to run some queries. Here’s one I use to measure total monthly traffic to the site:

Athena results

Note that even though this query covers data from thousands of individual compressed files in S3, it runs in a few seconds.

You should carefully examine your data to be sure you’re getting the right results. For example some requests are from Googlebot and you may want to exclude those from your queries.

Overall I’ve been very happy with this analytics setup. I recommend it for anyone running a static site looking for basic analytics without the downsides of client-side JavaScript, additional third parties providers, or more software to run themselves.

If you have thoughts on the setup, send them over!