This is a money-saving tip for web publishers who use Amazon S3 for hosting images and other static content like CSS, JavaScript files, etc.
Since Amazon S3 is a “pay as you use” storage service, your S3 bill is always directly proportional to the bandwidth that your sites consume.
How Browsers Interact with Amazon S3
When a visitor comes to your site the first time, the static images are downloaded from Amazon S3 servers and get saved inside his browser’s cache.
Now if that same person visits your site again sometime in future, his browser will make another GET request to Amazon S3 asking for a fresh copy of the web images.
Since the images stored on Amazon S3 haven’t changed since his last visit, Amazon servers will return a 304 Not Modified header response indicating that there’s no need to download images again.
So far so good. That 304 response prevented the visitor’s browser from downloading the same data again (thus saving you money) but there’s another problem – Amazon S3 also charges you for every GET request so each time a browser asks Amazon if images have changed since last visit, that question itself adds to your bill even if the answer is “no”.
How to Reduce Your Amazon S3 Bill
While the cost for GET requests is small (just 1 ¢ per 10,000 requests), they can quickly add-up if you have have a popular site or if your website design uses too many images. For instance, every page on www.labnol.org has about 25 static images served from S3.
To control this cost, you need a mechanism that will prevent browsers from sending the GET request if the file already exists in their cache. This can be easily done by setting appropriate Cache-Control and Expires headers at the time of uploading the files on to Amazon S3.
Cache-Control is like instructing the browser whether to make any requests to Amazon S3 or not before a given period. So if you set Cache-Control max-age=864000
for your S3 images, the web browsers will not request that file from S3 storage until the next 10 days (3600*24*10
sec).
Other than saving money, your site will also load relatively faster because the visitor’s browser will reuse images, logos and other static files from the cache without making any new request to Amazon S3.
BitRhymes, developers of the popular Sketch Me app for MySpace, saw their Amazon S3 bill dip by 40% after they implemented cached headers for images.
Implement Caching for Amazon S3 Files
To set the appropriate Cache-Control headers for files hosted on Amazon S3, you can either use the Bucket Explorer client (cost $50) or upload files manually via this PHP script written by Lalit Patel who is also the inspiration behind this article.
If you are worried about setting Cache headers for JavaScript and CSS files as they may change frequently (especially when you are in the midst of a site re-design), Lalit shares a very simple workaround – just append a version number after the file name like main.js?v=2.
Before: <link href="http://files.labnol.org/main.css?v=2" ../>
After: <link href="http://files.labnol.org/main.css?v=3" ../>
Change the version from 2 to 3 and visitors browser will make a fresh GET request to Amazon S3 for the latest version of the S3 file.