Find Sites that are Hotlinking to your Amazon S3
Hotlinking, as you probably know, occurs when people embed files in their web pages that are otherwise hosted on someone else’s web server. Some photo sharing sites (e.g., Flickr) allow hotlinking as long as you link to the original source of the image but in most other cases, hotlinking sites will almost always increase the bandwidth use of the other site.
The problem becomes more serious if you using a service like Amazon S3 (or CloudFront) to host your images because Amazon charges a fee for every byte of data downloaded from their servers. Thus you’ll also have to pay for all sites that are consuming your bandwidth by hotlinking to your S3 hosted content.
Who is hotlinking to your Amazon S3 Images?
If you like to know about other sites or web pages that are linking your Amazon S3 files, there are two options:
Option #1 (Simple): Link your Amazon S3 (or CloudFront) account with S3Stat and turn on server logging for your S3 buckets - you can do this from the S3Stat web dashboard itself.
The service will regularly parse your Amazon S3 server access logs and will then prepare a list of referrer URLs that are accessing your S3 content. If you spot a web URL that doesn’t belong to you, chances are that the site is hotlinking to one of your S3 files.
Option #2 (Free): The S3Stat service discussed above is pretty easy to use but costs about $5 per month.
If you are looking for a free alternative to monitor your S3 files, here’s a tip - download any of the free Amazon S3 clients (I recommend CloudBerry Explorer) and enable logging for buckets whose usage you want to track. Wait for some time for Amazon to create logs of your S3 files and then download all these log files to a local folder on your hard-drive (again using any of the S3 clients).
Merge the log files into one text file and import it into a spreadsheet program like Excel. Now convert the data into columns using “space” as the delimiter. If everything goes fine, the 17th column (or Q) will contain the HTTP Referrer headers and these are often the URLs of the linking or embedding page.
How to Prevent Sites from Hotlinking
Unlike the Apache server where you can prevent other sites from hotlinking to your images through some .htaccess rules, such a feature in not available in S3.
Therefore the best way to deal with hotlinking on S3 is to send an email to the owner of the other site or simply move your images to another location and update the hyperlinks in your own web pages.
The other approach that you may also want to explore useses Signed URLs (see tip #2) - these are temporary links that automatically expire after a given time (similar to rapidshare). It may not be the right thing to do for static images but if you are hosting downloadable files like videos, ebooks or MP3s, time limited URLs could be a good option on S3.
Related: More Amazon S3 Tutorials
Amit Agarwal
Google Developer Expert, Google Cloud Champion
Amit Agarwal is a Google Developer Expert in Google Workspace and Google Apps Script. He holds an engineering degree in Computer Science (I.I.T.) and is the first professional blogger in India.
Amit has developed several popular Google add-ons including Mail Merge for Gmail and Document Studio. Read more on Lifehacker and YourStory