Automatically extract rar files downloaded with transmission

My new project recently has been to configure sonarr to work with transmission. The challenge was getting these two pieces of software to properly interface with each other. Sonarr would successfully instruct transmission to download the requested show but once the download completed it would not import the show to its library. The reason behind this was my torrent tracker – most torrents are downloaded as multi part rar files. Sonarr has no mechanism for processing rar files so I had to get creative.

The solution was to write a simple script and have transmission execute it after finishing the download. The script uses the find command to look for rar files in the directory transmission created for that particular torrent. If any rar files are found it will extract them into that same directory. This was important because sonarr will only look in the torrent download directory for the completed video file.

After some tweaking I got it to work pretty well for me. Here is the code I used (thanks to this site for the direction.)

#!/bin/bash
#A simple script to extract a rar file inside a directory downloaded by Transmission.
#It uses environment variables passed by the transmission client to find and extract any rar files from a downloaded torrent into the folder they were found in.
find /$TR_TORRENT_DIR/$TR_TORRENT_NAME -name "*.rar" -execdir unrar e -o- "{}" \;

Save the above script into a file your transmission client can read and make it executable. Lastly configure transmission to run this script on torrent completion by modifying your settings.json file (mine was located at /var/lib/transmission/.config/transmission-daemon/settings.json) Modify the following variables (be sure to stop your transmission client first before making any changes.)

"script-torrent-done-enabled": true, 
"script-torrent-done-filename": "/path/to/where/you/saved/the/script",

That’s it! Sonarr will now properly import shows that were downloaded via multipart rar torrent.

Find top 10 requests returning 404 errors

I had a website where I was curious what the top 10 URLs that were returning 404s were along with how many hits those URLs got. This was after a huge site redesign so I was curious what old links were still trying to be accessed.

Getting a report on this can be accomplished with nothing more than the Linux command line and the log file you’re interested in. It involves combining grep, sed, awk, sort, uniq, and head commands. I enjoyed how well these tools work together so I thought I’d share. Thanks to this site for giving me the inspiration to do this.

This is the command I used to get the information I wanted:

grep '404' _log_file_ | sed 's/, /,/g' | awk {'print $7'} | sort | uniq -c | sort -n -r | head -10

Here is a rundown of each command and why it was used:

  • grep ‘404’ _log_file_ (replace with filename of your apache, tomcat, or varnish access log.) grep reads a file and returns all instances of what you want, in this case I’m looking for the number 404 (page not found HTTP error)
  • sed ‘s/, /,/g’ Sed will edit a stream of text in any way that you specify. The command I gave it (s/, /,/g) tells sed to look for instances of commas followed by spaces and replace them with just commas (eliminating the space after any comma it sees.) This was necessary in my case because sometimes the source IP address field has multiple IP addresses and it messed up the results. This may be optional if your server isn’t sitting behind any type of reverse proxy.
  • awk {‘print $7’} Awk has a lot of similar functions to sed – it allows you to do all sorts of things to text. In this case we’re telling awk to only display the 7th column of information (the URL requested in apache and varnish logs is the 7th column)
  • sort This command (absent of arguments) sorts our results alphabetically, which is necessary for the next command to work properly.
  • uniq -c This command eliminates any duplicates in the results. The -c argument adds a number indicating how many times that unique string was found.
  • sort -n -r Sorts the results in reverse alphabetical order. The -n argument sorts things numerically so that 2 follows 1 instead of 10. -r Indicates to reverse the order so the highest number is at the top of the results instead of the default which is to put the lowest number first.
  • head -10 outputs the top 10 results. This command is optional if you want to see all the results instead of the top 10. A similar command is tail – if you want to see the last results instead.

This was my output – exactly what I was looking for. Perfect.

2186 http://<sitename>/source/quicken/index.ini
2171 http://<sitename>/img/_sig.png
1947 http://<sitename>/img/email/email1.aspx
1133 http://<sitename>/source/quicken/index.ini
830 http://<sitename>/img/_sig1.png
709 https://<sitename>/img/email/email1.aspx
370 http://<sitename>/apple-touch-icon.png
204 http://<sitename>/apple-touch-icon-precomposed.png
193 http://<sitename>/About-/Plan.aspx
191 http://<sitename>/Contact-Us.aspx

Website load testing with wget and siege

In an effort to see how well my varnish configuration stood up to real world tests I came across a really cool piece of software: siege. You can use siege to benchmark / load test your site by having it hit your site repeatedly in a configurable fashion.

I played around with the  siege configuration until I found one I liked. It involves using wget to spider your target site, varnishncsa and tee (if you use varnish), awk to parse the access log into a usable URL list, and then passing that list to siege to test the results.

varnishncsa

For my purposes I wanted to test how well varnish performed. Varnish is a caching server that sits in front of your webserver. By default it doesn’t log anything – this is where varnishncsa comes in. Run the following command to monitor the varnish log while simultaneously outputting the results to a file:

varnishncsa | tee varnish.log

If you don’t use varnish and rather serve pages up directly through nginx, apache, or something else, then you can skip this step.

wget

The next step is to spider your site to get a list of URLs to send to siege; wget fits in nicely for this task. This is the configuration I used (replace aoeu.1234.com with the domain of your web server)

wget -r -l4 --spider -D aoeu1234.com http://www.aoeu1234.com

-r tells wget to be recursive (follow links)
-l is the number of levels you want to go down when following links
— spider tells wget to not actually download anything (no need to save the results, we just want to hit the site to generate log entries)
-D specifies a comma separated list of domains you want wget to crawl (useful if you don’t want wget to follow links outside your site)

awk

Once the spidering is finished, we want to use awk to parse the resulting server logfile to extract the URLs that were accessed and pipe the results into a file named crawl.txt.

awk '{ print $7 }' varnish.log | sort | uniq > crawl.txt

varnish.log is what I named the output from the varnishncsa command above; if you use apache/nginx, then you would use your respective access log file instead.

siege

This is where the fun comes in. You can now use siege to stress test your website to see how well it does.

siege -c 550 -i -t 3M -f crawl.txt -d 10

There are many siege options so you’ll definitely want to read up on the man page. I picked the following based on this site:

-c concurrent connections. I had to tweak siege configuration to go up that high.
-i tells siege to access the URLs in a random fashion
-t specifies how long to run siege for in seconds, minutes, or hours
-f specifies a list of URLs for siege to use
-d specifies the maximum delay between user requests (it will pick a random number between 0 and this specified number to make requests more random)

siege configuration

I had to tweak siege a little bit to allow more than 256 connections at a time. First, run

siege.config

to generate an initial siege configuration. Then modify ~/.siege/siege.conf to increase the limit of concurrent connections:

sed -i ~/.siege/siege.conf -e 's/limit = 256/limit = 1000/g'

Note: the config says the following:
DO NOT INCREASE THIS NUMBER UNLESS YOU CONFIGURED APACHE TO HANDLE MORE THAN 256 SIMULTANEOUS REQUESTS

This was fine in my case because it’s hitting varnish, not apache, so we can really ramp up the simultaneous user count.

sysctl configuration

When playing with siege I ran into a few different error messages. Siege is a complicated enough program that it requires you to tweak system settings for it to work fully. I followed some advice on this site to tweak my sysctl configuration to make it work better. Your mileage may vary.

### IMPROVE SYSTEM MEMORY MANAGEMENT ###

# Increase size of file handles and inode cache
fs.file-max = 2097152

# Do less swapping
vm.swappiness = 10
vm.dirty_ratio = 60
vm.dirty_background_ratio = 2

### GENERAL NETWORK SECURITY OPTIONS ###

# Number of times SYNACKs for passive TCP connection.
net.ipv4.tcp_synack_retries = 2

# Allowed local port range
net.ipv4.ip_local_port_range = 2000 65535

# Protect Against TCP Time-Wait
net.ipv4.tcp_rfc1337 = 1

# Decrease the time default value for tcp_fin_timeout connection
net.ipv4.tcp_fin_timeout = 15

# Decrease the time default value for connections to keep alive
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15

### TUNING NETWORK PERFORMANCE ###

# Default Socket Receive Buffer
net.core.rmem_default = 31457280

# Maximum Socket Receive Buffer
net.core.rmem_max = 12582912

# Default Socket Send Buffer
net.core.wmem_default = 31457280

# Maximum Socket Send Buffer
net.core.wmem_max = 12582912

# Increase number of incoming connections
net.core.somaxconn = 4096

# Increase number of incoming connections backlog
net.core.netdev_max_backlog = 65536

# Increase the maximum amount of option memory buffers
net.core.optmem_max = 25165824

# Increase the maximum total buffer-space allocatable
# This is measured in units of pages (4096 bytes)
net.ipv4.tcp_mem = 65536 131072 262144
net.ipv4.udp_mem = 65536 131072 262144

# Increase the read-buffer space allocatable
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.udp_rmem_min = 16384

# Increase the write-buffer-space allocatable
net.ipv4.tcp_wmem = 8192 65536 16777216
net.ipv4.udp_wmem_min = 16384

# Increase the tcp-time-wait buckets pool size to prevent simple DOS attacks
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1