Tag Archives: varnish

Rewrite HTTP host request with Varnish

I had a stubborn WordPress plugin that somehow was remembering the URL of the site it was installed on. It became a problem when I changed the site URL. Despite changing the URL everywhere I could think of, this particular plugin was calling CSS files for the URL of the original site. I did a search and replace in the site database and searched all files for any reference to that site but couldn’t find anything. I never did find the culprit. My workaround was to use varnish to rewrite the request before it hit the browser.

Thanks to this answer by Jorge Nerin on Stack Overflow, I found my answer on how to do this.

backend www {
  .host = "www.example.com";
  .port = "http";
}

sub vcl_recv {
  if (req.http.host ~ "(?i)^(www.)?example.com$") {
    set req.backend_hint = www;
  }
}

In my case I had a default backend (no other backends configured) so my varnish config was simply adding these line in sub vcs_recv (varnish 4 syntax)

if (req.http.host ~ "(?i)^(www.)?old.host.name$") {
     set req.backend_hint = default;
}

That did the trick!

Whitelist IPs with varnish

I recently needed to restrict which IP addresses can access wp-login.php for a wordpress site of mine. This site is sitting behind varnish cache for speed. Modifying htaccess doesn’t work in this case so I have to modify the varnish configuration in order to get this to work.

The varnish documentation is actually quite good at telling you what you need to do.  You have to first specify an acl and then in the vcl_recv function specify the action when these IPs are used.

I kept running into a problem where varnish wouldn’t compile. I kept receiving this error:

"Expected CSTR got 'admin_net'" (C String?)

It turns out my load balancer does not support the PROXY protocol, so client.ip is always the IP of the load balancer, not the IP of the person making the request.

The solution was finally found here where it was explained that in absence of PROXY protocol you can use the std.ip() function to convert a string containing an IP address to the value varnish expects an IP address to be, in order to check it against an ACL (see here for syntax reference.)

I then had to take it a step further to trim all the extraneous commas and quotes from X-Forwarded-For so that the std.ip() function would work:

# set realIP by trimming CloudFlare IP which will be used for various checks
set req.http.X-Actual-IP = regsub(req.http.X-Forwarded-For, "[, ].*$", "");

With these three bits combined I was able to properly restrict access to wp-login.php to a specified whitelist of IP addresses:

acl admin {
 "10.0.0.0/24";
 "10.0.1.0/24";
 "10.0.2.0/24";
}

...

sub vcl_recv {
...

 # set realIP by trimming CloudFlare IP which will be used for various checks
 set req.http.X-Actual-IP = regsub(req.http.X-Forwarded-For, "[, ].*$", "");

 #Deny wp-login.php access if not in admin ACL
 if ((std.ip(req.http.X-Actual-IP, "0.0.0.0") !~ admin) && req.url ~ "^/wp-login.php") {
  return(synth(403, "Access denied."));
 }
...
}

Success.

Purge Varnish cache by visiting URL

I came across a need to allow web developers to purge Varnish cache. The problem is the developers aren’t allowed access to the production machine and our web application firewall blocks purge requests. I needed for there to be a way for them to simply access a page hosted on the webserver and cause it to purge its own varnish cache.

I was able to accomplish this by placing a PHP file in the webserver’s directory and controlling access to it via .htaccess. Thanks to this site for the php script and this one for the .htaccess guidance.

Place this PHP file where the web devs can access it, making sure to modify the $hostname variable to suit your needs and to rename the file to have a .php extension.

<?php

#Simple script to purge varnish cache
#Adapted from the script from http://felipeferreira.net/index.php/2013/08/script-purge-varnish-cache/
#This script runs locally on the webserver and will purge all cached pages for the site specified in $hostname
#Modify the $hostname parameter below to match the hostname of your site

$hostname = $_SERVER['SERVER_NAME'];

header( 'Cache-Control: max-age=0' );

$port     = 80;
$debug    = false;
$URL      =  "/*";

purgeURL( $hostname, $port, $URL, $debug );

function purgeURL( $hostname, $port, $purgeURL, $debug )
{
    $finalURL = sprintf(
        "http://%s:%d%s", $hostname, $port, $purgeURL
    );

    print( "<br> Purging ${finalURL} <br><br>" );

    $curlOptionList = array(
        CURLOPT_RETURNTRANSFER    => true,
        CURLOPT_CUSTOMREQUEST     => 'PURGE',
        CURLOPT_HEADER            => true ,
        CURLOPT_NOBODY            => true,
        CURLOPT_URL               => $finalURL,
        CURLOPT_CONNECTTIMEOUT_MS => 2000
    );

    $fd = true;
    if( $debug == true ) {
        print "<br>---- Curl debug -----<br>";
        $fd = fopen("php://output", 'w+');
        $curlOptionList[CURLOPT_VERBOSE] = true;
        $curlOptionList[CURLOPT_STDERR]  = $fd;
    }

    $curlHandler = curl_init();
    curl_setopt_array( $curlHandler, $curlOptionList );
    $return = curl_exec($curlHandler);

    if(curl_error($curlHandler)) {
    print "<br><hr><br>CRITICAL - Error to connect to $hostname port $port - Error:  curl_error($curl) <br>";
    exit(2);
 }

    curl_close( $curlHandler );
    if( $fd !== false ) {
        fclose( $fd );
    }
    if( $debug == true ) {
        print "<br> Output: <br><br> $return <br><br><hr>";
    }
}
?>

<title>Purge cache</title>
Press submit to purge the cache
<form method="post" action="<?php echo $PHP_SELF;?>">
<input type="submit" value="submit" name="submit">
</form>

Place (or append) the following .htaccess code in the same directory you placed the php file:

#Only allow internal IPs to access cache purge page
<Files "purge.php">
    Order deny,allow
    Deny from all
    SetEnvIf X-Forwarded-For ^10\. env_allow_1
    Allow from env=env_allow_1
    Satisfy Any
</Files>

The above code only allows access to the purge.php page from IP addresses beginning with “10.” (internal IPs)

This PHP / .htaccess combo allowed the web devs to purge cache without any system access or firewall rule changes. Hooray!

Install wordpress on CentOS7 with nginx & caching

Lately I’ve become interested in the LEMP stack (as opposed to the LAMP stack.) As such I’ve decided to set up a wordpress site running CentOS 7, nginx, mariadb, php-fpm, zend-opcache, apc, and varnish. This writeup will borrow heavily from two of my other writeups, Install Wordpress on CentOS 7 with  SELinux and Speed up WordPress in CentOS7 using caching. This will be a mashup of those two with an nginx twist with guidance from digitalocean. Let’s begin.

Repositories

To install the required addons we will need to have the epel repository enabled:

yum -y install epel-release

nginx

Install necessary packages:

sudo yum -y install nginx
sudo systemctl enable nginx

Optional: symlink /usr/share/nginx/ to /var/www/ (for those of us who are used to apache)

sudo ln -s /usr/share/nginx/ /var/www

Open necessary firewall ports:

sudo firewall-cmd --add-service=http --permanent
sudo systemctl restart firewalld

start nginx:

sudo systemctl start nginx

Navigate to your new site to make sure it brings up the default page properly.

MariaDB

Install:

sudo yum -y install mariadb-server mariadb
sudo systemctl enable mariadb

Run initial mysql configuration to set database root password

sudo systemctl start mariadb
sudo mysql_secure_installation

Configure:

Create a wordpress database and user:

mysql -u root -p 
#enter your mysql root password here
create user wordpress;
create database wordpress;
GRANT ALL PRIVILEGES ON wordpress.* To 'wordpress'@'localhost' IDENTIFIED BY 'password';
quit;

php-fpm

Install:

sudo yum -y install php-fpm php-mysql php-pclzip
sudo systemctl enable php-fpm

Configure:

Uncomment cgi.fix_pathinfo and change value to 0:

sudo sed -i 's/\;\(cgi.fix_pathinfo=\)1/\10/g' /etc/php.ini

Modify the listen= parameter to listen to UNIX socket instead of TCP:

sudo sed -i 's/\(listen =\).*/\1 \/var\/run\/php-fpm\/php-fpm.sock/g' /etc/php-fpm.d/www.conf

Change listen.owner and listen.group to nobody:

sudo sed -i 's/\(listen.owner = \).*/\1nobody/g; s/\(listen.group = \).*/\1nobody/g' /etc/php-fpm.d/www.conf

Change running user & group from apache to nginx:

sudo sed -i 's/\(^user = \).*/\1nginx/g; s/\(^group = \).*/\1nginx/g' /etc/php-fpm.d/www.conf

Start php-fpm:

sudo systemctl start php-fpm

Caching

To speed up wordpress further we need to install a few bits of caching software. Accept defaults when prompted.

zend-opcache & apc:

Install necessary packages:

sudo yum -y install php-pecl-zendopcache php-pecl-apcu php-devel gcc
sudo pecl install apc

Add apc extension to php configuration:

sudo sh -c "echo '\

#Add apc extension 
extension=apc.so' >> /etc/php.ini"

Restart php-fpm:

sudo systemctl restart php-fpm

Varnish

Install:

sudo yum -y install varnish
sudo systemctl enable varnish

Configure nginx to listen on port 8080 instead of port 80:

sudo sed -i /etc/nginx/nginx.conf -e 's/listen.*80/&80 /'

Change varnish to listen on port 80 instead of port 6081:

sudo sed -i /etc/varnish/varnish.params -e 's/\(VARNISH_LISTEN_PORT=\).*/\180/g'

Optional: change varnish to cache files in memory with a limit of 256M (caching to memory is much faster than caching to disk)

sudo sed -i /etc/varnish/varnish.params -e 's/\(VARNISH_STORAGE=\).*/\1\"malloc,256M\"/g'

Add varnish configuration to work with caching wordpress sites:

Update 2/25/2018 added section to allow facebook to properly scrape varnish cached sites.

sudo sh -c 'echo "/* SET THE HOST AND PORT OF WORDPRESS
 * *********************************************************/
vcl 4.0;
import std;

backend default {
 .host = \"127.0.0.1\";
 .port = \"8080\";
 .first_byte_timeout = 60s;
 .connect_timeout = 300s;
}
 
# SET THE ALLOWED IP OF PURGE REQUESTS
# ##########################################################
acl purge {
 \"localhost\";
 \"127.0.0.1\";
}

#THE RECV FUNCTION
# ##########################################################
sub vcl_recv {

#Facebook workaround to allow proper shares
if (req.http.user-agent ~ \"facebookexternalhit\")
        {
        return(pipe);
        }

# set realIP by trimming CloudFlare IP which will be used for various checks
set req.http.X-Actual-IP = regsub(req.http.X-Forwarded-For, \"[, ].*$\", \"\"); 

 # FORWARD THE IP OF THE REQUEST
 if (req.restarts == 0) {
 if (req.http.x-forwarded-for) {
 set req.http.X-Forwarded-For =
 req.http.X-Forwarded-For + \", \" + client.ip;
 } else {
 set req.http.X-Forwarded-For = client.ip;
 }
 }

 # Purge request check sections for hash_always_miss, purge and ban
 # BLOCK IF NOT IP is not in purge acl
 # ##########################################################

 # Enable smart refreshing using hash_always_miss
if (req.http.Cache-Control ~ \"no-cache\") {
 if (client.ip ~ purge) {
 set req.hash_always_miss = true;
 }
}

if (req.method == \"PURGE\") {
 if (!client.ip ~ purge) {
 return(synth(405,\"Not allowed.\"));
 }

 ban (\"req.url ~ \"+req.url);
 return(synth(200,\"Purged.\"));

}

if (req.method == \"BAN\") {
 # Same ACL check as above:
 if (!client.ip ~ purge) {
 return(synth(403, \"Not allowed.\"));
 }
 ban(\"req.http.host == \" + req.http.host +
 \" && req.url == \" + req.url);

 # Throw a synthetic page so the
 # request wont go to the backend.
 return(synth(200, \"Ban added\"));
}

# Unset cloudflare cookies
# Remove has_js and CloudFlare/Google Analytics __* cookies.
 set req.http.Cookie = regsuball(req.http.Cookie, \"(^|;\s*)(_[_a-z]+|has_js)=[^;]*\", \"\");
 # Remove a \";\" prefix, if present.
 set req.http.Cookie = regsub(req.http.Cookie, \"^;\s*\", \"\");

 # For Testing: If you want to test with Varnish passing (not caching) uncomment
 # return( pass );

# DO NOT CACHE RSS FEED
 if (req.url ~ \"/feed(/)?\") {
 return ( pass ); 
}

#Pass wp-cron

if (req.url ~ \"wp-cron\.php.*\") {
 return ( pass );
}

## Do not cache search results, comment these 3 lines if you do want to cache them

if (req.url ~ \"/\?s\=\") {
 return ( pass ); 
}

# CLEAN UP THE ENCODING HEADER.
 # SET TO GZIP, DEFLATE, OR REMOVE ENTIRELY. WITH VARY ACCEPT-ENCODING
 # VARNISH WILL CREATE SEPARATE CACHES FOR EACH
 # DO NOT ACCEPT-ENCODING IMAGES, ZIPPED FILES, AUDIO, ETC.
 # ##########################################################
 if (req.http.Accept-Encoding) {
 if (req.url ~ \"\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$\") {
 # No point in compressing these
 unset req.http.Accept-Encoding;
 } elsif (req.http.Accept-Encoding ~ \"gzip\") {
 set req.http.Accept-Encoding = \"gzip\";
 } elsif (req.http.Accept-Encoding ~ \"deflate\") {
 set req.http.Accept-Encoding = \"deflate\";
 } else {
 # unknown algorithm
 unset req.http.Accept-Encoding;
 }
 }

 # PIPE ALL NON-STANDARD REQUESTS
 # ##########################################################
 if (req.method != \"GET\" &&
 req.method != \"HEAD\" &&
 req.method != \"PUT\" && 
 req.method != \"POST\" &&
 req.method != \"TRACE\" &&
 req.method != \"OPTIONS\" &&
 req.method != \"DELETE\") {
 return (pipe);
 }
 
 # ONLY CACHE GET AND HEAD REQUESTS
 # ##########################################################
 if (req.method != \"GET\" && req.method != \"HEAD\") {
 return (pass);
 }
 
 # OPTIONAL: DO NOT CACHE LOGGED IN USERS (THIS OCCURS IN FETCH TOO, EITHER
 # COMMENT OR UNCOMMENT BOTH
 # ##########################################################
 if ( req.http.cookie ~ \"wordpress_logged_in|resetpass\" ) {
 return( pass );
 }

 #fix CloudFlare Mixed Content with Flexible SSL
 if (req.http.X-Forwarded-Proto) {
 return(hash);
 }

 # IF THE REQUEST IS NOT FOR A PREVIEW, WP-ADMIN OR WP-LOGIN
 # THEN UNSET THE COOKIES
 # ##########################################################
 if (!(req.url ~ \"wp-(login|admin)\") 
 && !(req.url ~ \"&preview=true\" ) 
 ){
 unset req.http.cookie;
 }

 # IF BASIC AUTH IS ON THEN DO NOT CACHE
 # ##########################################################
 if (req.http.Authorization || req.http.Cookie) {
 return (pass);
 }
 
 # IF YOU GET HERE THEN THIS REQUEST SHOULD BE CACHED
 # ##########################################################
 return (hash);
 # This is for phpmyadmin
if (req.http.Host == \"pmadomain.com\") {
return (pass);
}
}

sub vcl_hash {

if (req.http.X-Forwarded-Proto) {
 hash_data(req.http.X-Forwarded-Proto);
 }
}


# HIT FUNCTION
# ##########################################################
sub vcl_hit {
 return (deliver);
}

# MISS FUNCTION
# ##########################################################
sub vcl_miss {
 return (fetch);
}

# FETCH FUNCTION
# ##########################################################
sub vcl_backend_response {
 # I SET THE VARY TO ACCEPT-ENCODING, THIS OVERRIDES W3TC 
 # TENDANCY TO SET VARY USER-AGENT. YOU MAY OR MAY NOT WANT
 # TO DO THIS
 # ##########################################################
 set beresp.http.Vary = \"Accept-Encoding\";

 # IF NOT WP-ADMIN THEN UNSET COOKIES AND SET THE AMOUNT OF 
 # TIME THIS PAGE WILL STAY CACHED (TTL), add other locations or subdomains you do not want to cache here in case they set cookies
 # ##########################################################
 if (!(bereq.url ~ \"wp-(login|admin)\") && !bereq.http.cookie ~ \"wordpress_logged_in|resetpass\" ) {
 unset beresp.http.set-cookie;
 set beresp.ttl = 1w;
 set beresp.grace =3d;
 }

 if (beresp.ttl <= 0s ||
 beresp.http.Set-Cookie ||
 beresp.http.Vary == \"*\") {
 set beresp.ttl = 120 s;
 # set beresp.ttl = 120s;
 set beresp.uncacheable = true;
 return (deliver);
 }

 return (deliver);
}

# DELIVER FUNCTION
# ##########################################################
sub vcl_deliver {
 # IF THIS PAGE IS ALREADY CACHED THEN RETURN A HIT TEXT 
 # IN THE HEADER (GREAT FOR DEBUGGING)
 # ##########################################################
 if (obj.hits > 0) {
 set resp.http.X-Cache = \"HIT\";
 # IF THIS IS A MISS RETURN THAT IN THE HEADER
 # ##########################################################
 } else {
 set resp.http.X-Cache = \"MISS\";
 }
}" > /etc/varnish/default.vcl'

Restart varnish & nginx:

sudo systemctl restart nginx varnish

Logging

By default varnish does not log its traffic. This means that your apache log will only log things varnish does not cache. We have to configure varnish to log traffic so you don’t lose insight into who is visiting your site.

Update 2/14/2017:  I’ve discovered a better way to do this. The old way is still included below, but you really should use this other way.

New way:

CentOS ships with some systemd scripts for you. You can use them out of the box by simply issuing

systemctl start varnishncsa
systemctl enable varnishncsa

If you are behind a reverse proxy then you will want to tweak the varnishncsa output a bit to reflect x-forwarded-for header values (thanks to this github discussion for the guidance.) Accomplish this by appending a modified log output format string to /lib/systemd/system/varnishncsa.service:

sudo sed -i /lib/systemd/system/varnishncsa.service -e "s/ExecStart.*/& -F \'%%{X-Forwarded-For}i %%l %%u %%t \"%%r\" %%s %%b \"%%{Referer}i\" \"%%{User-agent}i\"\' /g"

Lastly, reload systemd configuration, enable, and start the varnishncsa service:

sudo systemctl daemon-reload
sudo systemctl enable varnishncsa
sudo systemctl start varnishncsa

 


Old way:

First, enable rc.local

sudo chmod +x /etc/rc.local
sudo systemctl enable rc-local

Next, add this entry to the rc.local file:

sudo sh -c 'echo "varnishncsa -a -w /var/log/varnish/access.log -D -P /var/run/varnishncsa.pid" >> /etc/rc.local'

If your varnish server is behind a reverse proxy (like a web application firewall) then modify the above code slightly (thanks to this site for the information on how to do so)

sudo sh -c "echo varnishncsa -a -F \'%{X-Forwarded-For}i %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"\' -w /var/log/varnish/access.log -D -P /var/run/varnishncsa.pid >> /etc/rc.local"

Once configuration is in place, start the rc-local service

sudo systemctl start rc-local

WordPress

Download, extract, and set permissions for your wordpress installation (this assumes your wordpress site is the only site on the server)

wget https://wordpress.org/latest.zip
sudo unzip latest.zip -d /usr/share/nginx/html/
sudo chown nginx:nginx -R /usr/share/nginx/html/

Configure nginx

Follow best practice for nginx by creating a new configuration file for your wordpress site. In this example I’ve created a file wordpress.conf inside the /etc/nginx/conf.d directory.

Create the file:

sudo vim /etc/nginx/conf.d/wordpress.conf

Insert the following, making sure to modify the server_name directive:

server {
    listen 8080;
    listen [::]:8080;
    server_name wordpress;
    root /usr/share/nginx/html/wordpress;
    port_in_redirect off;


    location / {
        index index.php;
        try_files $uri $uri/ /index.php?q=$uri&$args;
    }

    location ~ \.php$ {
        fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
        fastcgi_index index.php;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        include fastcgi_params;
    }

    error_page 404 /404.html;     
    location = /40x.html {
     }

    error_page 500 502 503 504 /50x.html;
    location = /50x.html {
    }
}

Restart nginx

sudo systemctl restart nginx

Configure upload directory

If you want users to upload content, then you will want to assign the http_sys_rw_content_t selinux security context for the wp-uploads directory (create it if it doesn’t exist)

sudo mkdir /usr/share/nginx/html/wordpress/wp-content/uploads
sudo chown nginx:nginx /usr/share/nginx/html/wordpress/wp-content/uploads
sudo semanage fcontext -a -t httpd_sys_rw_content_t "/usr/share/nginx/html/wordpress/wp-content/uploads(/.*)?"
sudo restorecon -Rv /usr/share/nginx/html/wordpress/wp-content/uploads

Run the wizard

In order for the wizard to run properly we need to temporarily give the wordpress directory httpd_sys_rw_content_t selinux context

sudo chcon -t httpd_sys_rw_content_t /usr/share/nginx/html/wordpress/

Now navigate to your new website in a browser and follow the wizard, which will create a wp-config.php file inside the wordpress directory.

W3 Total Cache

Once your wordpress site is set up you will want to configure it to communicate with varnish. This will make sure the cache is always up to date when changes are made.

Install the W3 Total Cache wordpress plugin and configure it as follows:

opcode-database-object

Opcode cache: Opcode:Zend Opcache

Database cache: Check enable, select Opcode: Alternative PHP Cache (APC / APCu)

Object cache: Check enable, select Opcode: Alternative PHP Cache (APC / APCu)

Fragment cache: Opcode: Alternative PHP cache (APC / APCu)

proxy

Reverse Proxy: Check “Enable reverse proxy caching via varnish”
Specify 127.0.0.1 in the varnish servers box. Click save all settings.

Wrapping up

Once your site is properly set up, restore the original security context for the wordpress directory:

sudo restorecon -v /usr/share/nginx/html/wordpress/

Lastly restart nginx and varnish:

sudo systemctl restart nginx varnish

Success! Everything is working within the proper SELinux contexts and caching configuration.

Troubleshooting

403 forbidden

I received this error after setting everything up. After some digging I came across this site which explained what could be happening.

For me this meant that nginx couldn’t find an index file and was trying to default to a directory listing, which is not allowed by default. This is fixed by inserting a proper directive to find index files, in my case, index.php. Make sure you have “index index.php” in your nginx.conf inside the location / block:

    location / {
        index index.php;
    }

 Accessing wp-admin redirects you to port 8080, times out

If you find going to /wp-admin redirects you to a wrong port and times out, it’s because nginx is forwarding the portnumber. We want to turn that off (thanks to this site for the help.) Add this to your nginx.conf:

  port_in_redirect off;

Find top 10 requests returning 404 errors

I had a website where I was curious what the top 10 URLs that were returning 404s were along with how many hits those URLs got. This was after a huge site redesign so I was curious what old links were still trying to be accessed.

Getting a report on this can be accomplished with nothing more than the Linux command line and the log file you’re interested in. It involves combining grep, sed, awk, sort, uniq, and head commands. I enjoyed how well these tools work together so I thought I’d share. Thanks to this site for giving me the inspiration to do this.

This is the command I used to get the information I wanted:

grep '404' _log_file_ | sed 's/, /,/g' | awk {'print $7'} | sort | uniq -c | sort -n -r | head -10

Here is a rundown of each command and why it was used:

  • grep ‘404’ _log_file_ (replace with filename of your apache, tomcat, or varnish access log.) grep reads a file and returns all instances of what you want, in this case I’m looking for the number 404 (page not found HTTP error)
  • sed ‘s/, /,/g’ Sed will edit a stream of text in any way that you specify. The command I gave it (s/, /,/g) tells sed to look for instances of commas followed by spaces and replace them with just commas (eliminating the space after any comma it sees.) This was necessary in my case because sometimes the source IP address field has multiple IP addresses and it messed up the results. This may be optional if your server isn’t sitting behind any type of reverse proxy.
  • awk {‘print $7’} Awk has a lot of similar functions to sed – it allows you to do all sorts of things to text. In this case we’re telling awk to only display the 7th column of information (the URL requested in apache and varnish logs is the 7th column)
  • sort This command (absent of arguments) sorts our results alphabetically, which is necessary for the next command to work properly.
  • uniq -c This command eliminates any duplicates in the results. The -c argument adds a number indicating how many times that unique string was found.
  • sort -n -r Sorts the results in reverse alphabetical order. The -n argument sorts things numerically so that 2 follows 1 instead of 10. -r Indicates to reverse the order so the highest number is at the top of the results instead of the default which is to put the lowest number first.
  • head -10 outputs the top 10 results. This command is optional if you want to see all the results instead of the top 10. A similar command is tail – if you want to see the last results instead.

This was my output – exactly what I was looking for. Perfect.

2186 http://<sitename>/source/quicken/index.ini
2171 http://<sitename>/img/_sig.png
1947 http://<sitename>/img/email/email1.aspx
1133 http://<sitename>/source/quicken/index.ini
830 http://<sitename>/img/_sig1.png
709 https://<sitename>/img/email/email1.aspx
370 http://<sitename>/apple-touch-icon.png
204 http://<sitename>/apple-touch-icon-precomposed.png
193 http://<sitename>/About-/Plan.aspx
191 http://<sitename>/Contact-Us.aspx

Website load testing with wget and siege

In an effort to see how well my varnish configuration stood up to real world tests I came across a really cool piece of software: siege. You can use siege to benchmark / load test your site by having it hit your site repeatedly in a configurable fashion.

I played around with the  siege configuration until I found one I liked. It involves using wget to spider your target site, varnishncsa and tee (if you use varnish), awk to parse the access log into a usable URL list, and then passing that list to siege to test the results.

varnishncsa

For my purposes I wanted to test how well varnish performed. Varnish is a caching server that sits in front of your webserver. By default it doesn’t log anything – this is where varnishncsa comes in. Run the following command to monitor the varnish log while simultaneously outputting the results to a file:

varnishncsa | tee varnish.log

If you don’t use varnish and rather serve pages up directly through nginx, apache, or something else, then you can skip this step.

wget

The next step is to spider your site to get a list of URLs to send to siege; wget fits in nicely for this task. This is the configuration I used (replace aoeu.1234.com with the domain of your web server)

wget -r -l4 --spider -D aoeu1234.com http://www.aoeu1234.com

-r tells wget to be recursive (follow links)
-l is the number of levels you want to go down when following links
— spider tells wget to not actually download anything (no need to save the results, we just want to hit the site to generate log entries)
-D specifies a comma separated list of domains you want wget to crawl (useful if you don’t want wget to follow links outside your site)

awk

Once the spidering is finished, we want to use awk to parse the resulting server logfile to extract the URLs that were accessed and pipe the results into a file named crawl.txt.

awk '{ print $7 }' varnish.log | sort | uniq > crawl.txt

varnish.log is what I named the output from the varnishncsa command above; if you use apache/nginx, then you would use your respective access log file instead.

siege

This is where the fun comes in. You can now use siege to stress test your website to see how well it does.

siege -c 550 -i -t 3M -f crawl.txt -d 10

There are many siege options so you’ll definitely want to read up on the man page. I picked the following based on this site:

-c concurrent connections. I had to tweak siege configuration to go up that high.
-i tells siege to access the URLs in a random fashion
-t specifies how long to run siege for in seconds, minutes, or hours
-f specifies a list of URLs for siege to use
-d specifies the maximum delay between user requests (it will pick a random number between 0 and this specified number to make requests more random)

siege configuration

I had to tweak siege a little bit to allow more than 256 connections at a time. First, run

siege.config

to generate an initial siege configuration. Then modify ~/.siege/siege.conf to increase the limit of concurrent connections:

sed -i ~/.siege/siege.conf -e 's/limit = 256/limit = 1000/g'

Note: the config says the following:
DO NOT INCREASE THIS NUMBER UNLESS YOU CONFIGURED APACHE TO HANDLE MORE THAN 256 SIMULTANEOUS REQUESTS

This was fine in my case because it’s hitting varnish, not apache, so we can really ramp up the simultaneous user count.

sysctl configuration

When playing with siege I ran into a few different error messages. Siege is a complicated enough program that it requires you to tweak system settings for it to work fully. I followed some advice on this site to tweak my sysctl configuration to make it work better. Your mileage may vary.

### IMPROVE SYSTEM MEMORY MANAGEMENT ###

# Increase size of file handles and inode cache
fs.file-max = 2097152

# Do less swapping
vm.swappiness = 10
vm.dirty_ratio = 60
vm.dirty_background_ratio = 2

### GENERAL NETWORK SECURITY OPTIONS ###

# Number of times SYNACKs for passive TCP connection.
net.ipv4.tcp_synack_retries = 2

# Allowed local port range
net.ipv4.ip_local_port_range = 2000 65535

# Protect Against TCP Time-Wait
net.ipv4.tcp_rfc1337 = 1

# Decrease the time default value for tcp_fin_timeout connection
net.ipv4.tcp_fin_timeout = 15

# Decrease the time default value for connections to keep alive
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15

### TUNING NETWORK PERFORMANCE ###

# Default Socket Receive Buffer
net.core.rmem_default = 31457280

# Maximum Socket Receive Buffer
net.core.rmem_max = 12582912

# Default Socket Send Buffer
net.core.wmem_default = 31457280

# Maximum Socket Send Buffer
net.core.wmem_max = 12582912

# Increase number of incoming connections
net.core.somaxconn = 4096

# Increase number of incoming connections backlog
net.core.netdev_max_backlog = 65536

# Increase the maximum amount of option memory buffers
net.core.optmem_max = 25165824

# Increase the maximum total buffer-space allocatable
# This is measured in units of pages (4096 bytes)
net.ipv4.tcp_mem = 65536 131072 262144
net.ipv4.udp_mem = 65536 131072 262144

# Increase the read-buffer space allocatable
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.udp_rmem_min = 16384

# Increase the write-buffer-space allocatable
net.ipv4.tcp_wmem = 8192 65536 16777216
net.ipv4.udp_wmem_min = 16384

# Increase the tcp-time-wait buckets pool size to prevent simple DOS attacks
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

Speed up WordPress in CentOS7 using caching

It has recently come to my attention that WordPress has a serious design flaw: it’s trivially easy to execute a denial of service attack against it in its default state. One simply has to hold down F5 to cause the site to slow to a crawl and in many cases crash entirely.

The reason behind this is that every single page request turns into the server parsing PHP code and executing a database call. Holding down F5 initiates hundreds of page view requests, which turn into hundreds of PHP code execution threads (each taking up CPU and memory) and hundreds of database calls. The server becomes quickly overwhelmed and runs out of resources.

The solution to this problem (on CentOS 7 at least) is a combination of php-fpm, zendopcache, APC, varnish and W3 Total Cache. It’s definitely more complicated but it eliminates this problem and massively increases site load times and general responsiveness.

Repositories

To install the required addons we will need to have the epel repository enabled:

yum -y install epel-release

zend-opcache

This caches PHP opcode to greatly speed up PHP code execution. It’s included in later versions of PHP but alas CentOS 7 is stuck on PHP 5.4, which does not include such caching. You have to install it manually. Thanks to this site for the information.

sudo yum -y install php-pecl-zendopcache

apc

This is another kind of of cache – this time for database operations.

sudo yum -y install php-pecl-apcu php-devel gcc
sudo pecl install apc
#accept defaults when prompted

php-fpm

php-fpm is a different way to serve up PHP code. Instead of apache running a module to interpret php code, it will send all php requests to a separate PHP server, optimized for speed. That php server will interpret the code and return the results to your browser.

sudo yum -y install php-fpm
sudo systemctl enable php-fpm

Modify your apache config to forward all php requests to php-fpm. Be sure to modify this to match your site URL setup:

sudo sh -c "echo '
    <LocationMatch \"^/(.*\.php(/.*)?)$\"> 
        ProxyPass fcgi://127.0.0.1:9000/var/www/html/\$1 
    </LocationMatch>' >> /etc/httpd/conf/httpd.conf"

Varnish

Varnish is a reverse proxy server optimized for caching pages. It sits between your visitors and your webserver and caches whatever it can to increase responsiveness and speed. This site pointed me in the right direction for configuring Varnish  in CentOS 7.

sudo yum -y install varnish
sudo systemctl enable varnish

Change apache to listen on port 8080 instead of port 80:

sudo sed -i /etc/httpd/conf/httpd.conf -e 's/Listen 80/&80/'

Change varnish to listen on port 80 instead of port 6081:

sudo sed -i /etc/varnish/varnish.params -e 's/VARNISH_LISTEN_PORT=6081/VARNISH_LISTEN_PORT=80/g'

Now we need to configure Varnish to properly cache wordpress sites. I found the configuration from this site to be the most helpful. I normally include code blocks to copy and paste but the configuration file is pretty big.

Instead, click here for the configuration code, then copy the whole page and paste it into your terminal.

Update 12/28/2016: I’ve updated the varnish configuration code slightly to allow the “purge all caches” button of W3 Total cache to work. Thanks to this site for pointing me in the right direction and this thread for getting me there.

After varnish has been configured, restart your new PHP / caching stack:

sudo systemctl restart httpd varnish php-fpm

Logging

Update: added this section on 11/4/2016

By default varnish does not log its traffic. This means that your apache log will only log things varnish does not cache. We have to configure varnish to log traffic so you don’t lose insight into who is visiting your site.

Update 2/14/2017:  I’ve discovered a better way to do this. The old way is still included below, but you really should use this other way.

New way:

CentOS ships with some systemd scripts for you. You can use them out of the box by simply issuing

systemctl start varnishncsa
systemctl enable varnishncsa

If you are behind a reverse proxy then you will want to tweak the varnishncsa output a bit to reflect x-forwarded-for header values (thanks to this github discussion for the guidance.) Accomplish this by appending a modified log output format string to /lib/systemd/system/varnishncsa.service:

sudo sed -i /lib/systemd/system/varnishncsa.service -e "s/ExecStart.*/& -F \'%%{X-Forwarded-For}i %%l %%u %%t \"%%r\" %%s %%b \"%%{Referer}i\" \"%%{User-agent}i\"\' /g"

Lastly, reload systemd configuration, enable, and start the varnishncsa service:

sudo systemctl daemon-reload
sudo systemctl enable varnishncsa
sudo systemctl start varnishncsa

Old way:

First, enable rc.local

sudo chmod +x /etc/rc.local
sudo systemctl enable rc-local #you can ignore errors here

Next, add this entry to the rc.local file:

sudo sh -c 'echo "varnishncsa -a -w /var/log/varnish/access.log -D -P /var/run/varnishncsa.pid" >> /etc/rc.local'

If your varnish server is behind a reverse proxy (like a web application firewall) then modify the above code slightly (thanks to this site for the information on how to do so)

sudo sh -c "echo varnishncsa -a -F \'%{X-Forwarded-For}i %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"\' -w /var/log/varnish/access.log -D -P /var/run/varnishncsa.pid >> /etc/rc.local"

W3 Total Cache

The improvements above will greatly increase your speed and eliminate the F5 denial of service issue. The last bit to make it even sweeter is to install the W3 Total Cache wordpress plugin. Thanks to this site for the information that pointed me in the right direction.

There are a ton of options in W3 Total cache that are beyond the scope of this tutorial. For our purposes we will enable the following options in the General Settings tab of the plugin:

opcode-database-object

Opcode cache: Opcode:Zend Opcache

Database cache: Check enable, select Opcode: Alternative PHP Cache (APC / APCu)

Object cache: Check enable, select Opcode: Alternative PHP Cache (APC / APCu)

proxy

Reverse Proxy: Check “Enable reverse proxy caching via varnish”
Specify 127.0.0.1 in the varnish servers box. Click save all settings.

Full speed ahead

With all of these pieces into place your site is definitely more complicated, but it is also much faster to load. Enjoy.

Troubleshooting

If you go through all these steps only to see this very non-descriptive message:

File not found

it means you have PHP forwarding to the wrong directory. Modify the LocationMatch section you inserted at the bottom of /etc/httpd/conf/httpd.conf earlier to ensure the correct directory structure is passed for php files.