Tag Archives: WAF

Block bad networks from sites behind Sophos WAF

Recently I have noticed some odd traffic coming to one of my blogs. This particular blog is set to NOT be indexed by search engines b(robots.txt deny.) Every bot that’s touched that site has honored that file… until now.

Periodically I will get huge spikes of traffic (huge for my small site, anyway.) The culprit is always the same: Apple! Why are they crawling my site? I can’t find a definitive reason. A couple searches reveals articles like this one¬†speculating that Apple is starting a search engine. The problem is the traffic I’m seeing from Apple shows just a safari user agent, nothing about being a bot. A discussion on Reddit¬†talks about Apple crawling sites, but they also list a user agent I’m not seeing.

The user agent reported by the bot that’s been crawling me (ignoring robots.txt file) is:

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1623.0 Safari/537.36

The IPs rotate randomly from Apple’s IP space, with the biggest offender being

x_forwarded_for count 1680 982 444 174 36 28 26 26 24 24 24 22 21 7 7 6 4 4 4 3 2 2 2 2 1


I e-mailed Apple at abuse@apple.com requesting they stop this action. I didn’t expect anything from it, and indeed nothing happened. I kept getting crawled.

So, now to the title of this post. I had to tell my Web Application Firewall to block Apple’s address space. Sophos UTM 9.3 makes this easier, although the option is somewhat hidden for some reason. The option is in the “Site Path Routing” tab within the Web Application Firewall context. Once there, edit your site path and check the “Access Control” checkbox.


In my case I decided to block the entire subnet – No more Apple crawling.. at least from the 17 network.

Add x-forwarded-for header to Apache

If you happen to be running your site behind a web application firewall you will notice that initially you will not be able to determine the true source of traffic coming to your server. The default setup for Apache will only show traffic coming from the firewall itself.

To fix this, you need to tweak the LogFormat parameters in /etc/apache2/apache2.conf (for Debian distros) or wherever your apache config file is in other distros. Per here, you need to add


to your config file. Here is an example setup successfully implementing X forwarded For as well as maintaining logging the IP of the WAF itself (in case you have more than one..)

LogFormat "%v:%p %{X-Forwarded-For}i %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%{X-Forwarded-For}i %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%{X-Forwarded-For}i %h %l %u %t \"%r\" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent