Determine what a Splunk forwarder is forwarding

I recently came across a need to determine exactly what is logging to a forwarder in Splunk. I had a hard time finding out what to search for so I thought I’d share what I found.

The key to discovering where data is coming in from is in Splunk’s metrics.log files. Searching these files gives us what we need. This search reveals anything coming in over UDP (you can also change it to be TCP if desired) and totals it by host (forwarder.)

source=*metrics.log group=udpin_connections | stats count by host

Oddly enough Splunk doesn’t have a field extracted for its own metrics.log. A key useful field is missing – sourceHost. I used the field extraction tool to create it and it generated this field extraction:

^[^,\n]*,\s+(?P<sourceHost>\d+\.\d+\.\d+\.\d+)

Field extraction in hand, I was able to generate the report I was looking for: devices actively sending logs to my forwarder over UDP.

source=*metrics.log group=udpin_connections  host=splunk | stats count by sourceHost sourcePort

where host is the forwarder you want to investigate. Useful.

Fix Xen VGA Passthrough in Linux Mint 17.1

I wrote in my last post about how I upgraded from Linux Mint 16 to 17.1. I thought everything went smoothly, but it turns out one feature did break: VGA passthrough via Xen. For the past year or so I’ve had a Windows 8.1 gaming VM with direct access to my video card. It’s worked out nicely in Linux Mint 16 but broke completely in 17.1.

I followed the advice of powerhouse on the Linux Mint forums on how to get things up and running, but it wasn’t quite enough. After much banging of my head against the wall I read on the Xen mailing list that there was a regression in VGA passthrough functionality with Xen 4.4.1, which is the version of Xen Mint 17.1 uses.

I finally came to a solution to my problem today – upgrade to Xen 4.5. I couldn’t find any pre-built packages for Ubuntu 14.04 (the base of Mint 17.1) so I ended up compiling Xen 4.5 from source. Below is what I did to make it all work.

Fix broken symlink for /usr/lib/xen-default

sudo rm /usr/lib/xen-default
sudo ln -s /usr/lib/xen-4.4/ /usr/lib/xen-default

Update the DomU CFG file

A couple things needed tweaking. Here is my working cfg:

builder='hvm'
memory = '8192'
name = 'win8.1'
vcpus=6
cpus="2-7"
pae=1
acpi=1
apic=1
vif = [ 'mac=3a:82:47:2a:51:20,bridge=xenbr0,model=virtio' ]
disk = [ 'phy:/dev/mapper/desktop--xen-Win8.1,xvda,w' ]
device_model_version = 'qemu-xen-traditional'
boot='c'
sdl=0
vnc=1
vncpasswd=''
stdvga=0
serial='pty'
tsc_mode=0
viridian=1
usb=1
usbdevice='tablet'
gfx_passthru=0
pci=[ '01:00.0', '01:00.1' , '00:1d.0' ]
localtime=1
pci_power_mgmt=1
on_xend_stop = "shutdown"
xen_platform_pci=1
pci_power_mgmt=1

For some, that’s all they had to do. For me, I had to do a few more things.

Compile Xen 4.5

This step was thanks to two different sites, this one and this one.

Install necessary packages

sudo apt-get install build-essential bcc bin86 gawk bridge-utils iproute libcurl3 libcurl4-openssl-dev bzip2 module-init-tools transfig tgif texinfo texlive-latex-base texlive-latex-recommended texlive-fonts-extra texlive-fonts-recommended pciutils-dev mercurial libjpeg-dev make gcc libc6-dev-i386 zlib1g-dev python python-dev python-twisted libncurses5-dev patch libvncserver-dev libsdl-dev libpixman-1-dev iasl libbz2-dev e2fslibs-dev git-core uuid-dev ocaml ocaml-findlib libx11-dev bison flex xz-utils libyajl-dev gettext markdown libaio-dev pandoc

Checkout Xen source

git clone git://xenbits.xen.org/xen.git xen-4.5.0
cd xen-4.5.0
git checkout RELEASE-4.5.0

Build from source

./configure --libdir=/usr/lib
 make world -j8

When I tried this the make failed with this error:

/usr/include/linux/errno.h:1:23: fatal error: asm/errno.h: No such file or directory
 #include <asm/errno.h>

The fix (thanks to askubuntu)  was to install linux-libc-dev and make a symlink for it:

sudo apt-get install linux-libc-dev
sudo ln -s /usr/include/asm-generic /usr/include/asm

It then compiled successfully.

Install freshly compiled Xen 4.5

sudo make install
sudo update-rc.d xencommons defaults
sudo update-rc.d xendomains defaults
sudo ldconifg

Set grub to boot from new Xen kernel

sudo update-grub
sudo vim /etc/default/grub

Edit GRUB_DEFAULT to match wherever update-grub put your new Xen kernel (in my case it was the second entry, so my GRUB_DEFAULT=1), then run update-grub again

sudo update-grub

Reboot

Success at last. Enjoy your VM gaming once more with Xen 4.5.

Upgrade Linux Mint 16 to 17.1

I realized recently that my desktop system is quite out of date. It has worked so well for so long that I didn’t realize for a while that it was end of support. I was running Linux Mint 16 – Petra.

Thanks to this site the upgrade was fairly painless – a few repository updates, upgrade, and reboot. Simple! The steps I took are below

Update all repositories

Use sed in conjunction with find to quickly and easily update all your repository files from saucy to trusty, and from petra to rebecca, making a backup of files modified.

sudo find /etc/apt/sources.list.d/ -type f -exec sed --in-place=.bak 's/saucy/trusty/' {} \;
sudo find /etc/apt/sources.list.d/ -type f -exec sed --in-place=.bak 's/petra/rebecca/' {} \;

Update your system

This took a while. It had to download 1.5GB of data and install it.

sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get upgrade

Cleanup

After running the upgrade I had a notice that I had many packages that were installed but no longer required. To remove unnecessary packages after the upgrade:

sudo apt-get autoremove

Install new language settings:

sudo apt-get install mintlocale

Install gvfs-backends (for Thunar)

sudo apt-get install gvfs-backends

Reboot

Flawless! It worked on the first try. Awesome.

Use BASH to append an incremental suffix to directories

In migrating Splunk indexes I came across the need to add an incrementing suffix of _(numbers) to a bunch of directories. For example, renaming directories

test
test1
test2
test3

to

test_100
test1_101
test2_102
test3_103

After a bunch of searching I settled on this approach of using a simple bash loop to accomplish this:

I=100; for F in db_*; do mv $F $F\_$I; I=$((I+1)); done

There is an initial variable, I, which is set to 100. The for loop goes through each file beginning with db_ and then renames it adding the proper suffix in the iteration (_###). The last step is to increment the value of I.

Replace I for whatever you want you suffix to begin with and change db_* with whatever the criteria for the files you want to rename are. Simple, but it works.

Block annoying bots with Apache .htaccess

Recently one of my sites has been having its database crash repeatedly. Investigation reveals it always happens while an aggressive bot is crawling it. Since the site is small it was causing the database to run out of memory and die.

The Web Application Firewall that this site is behind frustratingly does not have a feature for blocking user agents. I decided to resort to Apache on the webserver itself to serve as a gatekeeper. The user agent in question? flipboard proxy. It also conveniently appears to ignore robots.txt.

Thanks to this article I learned the details on how to get Apache to block this particular user agent. Creating an .htaccess file (if it doesn’t already exist) and putting it in the root directory of the website causes it to apply to the entire site. Within the .htaccess file, place the following directives:

#block bad bots with a 403
#SetEnvIfNoCase User-Agent "facebookexternalhit" bad_bot
SetEnvIfNoCase User-Agent "Twitterbot" bad_bot
SetEnvIfNoCase User-Agent "Baiduspider" bad_bot
SetEnvIfNoCase User-Agent "MetaURI" bad_bot
SetEnvIfNoCase User-Agent "mediawords" bad_bot
SetEnvIfNoCase User-Agent "FlipboardProxy" bad_bot
<Limit GET POST HEAD>
 Order Allow,Deny
 Allow from all
 Deny from env=bad_bot
</Limit>

Save the file in the root of your website and make sure its permissions are such that your apache server can read the file. Success! Flipboard proxy (and other bad bots) no longer crashes the site. Instead, it gets served a 403 – Forbidden page for every request it makes.

Configure full VPN tunnel in Sophos UTM

For years now I have had a successful split tunnel VPN with my Sophos UTM. Recently I’ve wanted to have a full tunnel option for greater security in remote areas (hotel wi-fi, etc.) Unfortunately setting up such a thing in Sophos is NOT straightforward.

The biggest problem I had was that no websites would work after the VPN was initiated. NSlookup was fine, connection was fine, even internal sites would load properly, but no external internet.

Thanks to this post I finally found the culprit: the pesky allowed networks feature for each UTM function. In my case, the VPN was allowing all the necessary traffic through but my transparent proxy was denying web access. I had to add my VPN pool to the list of allowed networks to my proxy.

To summarize, this is what you must do to have a full VPN tunnel:

  • Configure the desired method in the Remote Access section. Take note of whatever IP pool you use for your VPN. In my case I used VPN Pool (SSL)
  • Ensure that internet access is in the list of allowed networks for the user you’ve configured for VPN (Any, or Internet IPv4/6)
  • Add your VPN pool to the list of allowed networks for each service you use.
    • Network services / DNS
    • Web Protection / Web Filtering
  • Profit

Slow Linux VM performance in VMware vSphere

Recently I’ve been scratching my head over a particular performance issue with Linux VMs hosted on VMWare vSphere. Everything seemed to move at a glacial pace.

vmstat gave a few clues as to what was happening, although depending on what I read it still wasn’t clear:

vmstat

It became apparent that I was suffering from some kind of queuing problem. I wasn’t sure if it was CPU or disk related. I came across this post which has a lot of good performance tuning guides.This tip caught my eye:


 

7. Set your disk scheduling algorithm to ‘noop’

The Linux kernel has different ways to schedule disk I/O, using schedulers like deadline, cfq, and noop. The ‘noop’ — No Op — scheduler does nothing to optimize disk I/O. So why is this a good thing? Because ESX is also doing I/O optimization and queuing! It’s better for a guest OS to just hand over all the I/O requests to the hypervisor to sort out than to try optimizing them itself and potentially defeating the more global optimizations.

You can change the kernel’s disk scheduler at boot time by appending:

elevator=noop

to the kernel parameters in /etc/grub.conf.


Sure enough, I modified /boot/grub/grub.conf on my Centos 6 boxes and appended elevator=noop to the kernel line, then rebooted. It helped a lot! Performance no longer was pitiful. I’m not nearly as familiar with vmware as I am with Xenserver so this was a good hint.

Fix Splunk lockout after exceeded quota

Recently I came across a situation with my home install of Splunk (free license) where the 500MB quota was exceeded three days in a row. I hadn’t checked Splunk for a few days so I was completely blindsided by it. The consequence of going over quota three days in a row? Losing the ability to do any searches in Splunk, which is a real downer.

The easiest, although least convenient, way to fix being locked out is to wait it out. If you go 30 days in a row without violating the license, Splunk will unlock itself. Splunk will still receive and index events during that time. The inability to search makes it really difficult to track down what the problem is, though, and I wasn’t happy waiting for 30 days before getting Splunk back.

Poking around on the Splunk forums I discovered that there is a way to get splunk back – perform a fresh install and then migrate your database and settings over to the fresh install. This involves backing up a few things, then copying them over the fresh install’s default folders

  • $SPLUNK_HOME/var/lib/splunk/defaultdb   #Default Splunk index, where all my data is held. If you have other indexes in here you’ll want to copy them too.
  • $SPLUNK_HOME/etc  #all your configuration files

Simply back up the above folders, install Splunk on a new machine, launch Splunk first so it will generate all the default files, then copy the files over to the new instance.

I went a step further and planned for the future. I wrote a quick and dirty script that will do all of this for you,  even on the same machine – no need to copy to another machine.  The script assumes you’re running a redhat derivative and have the correct Splunk install file in a predictable location. Update the locations of splunk directories and install files as needed and run as root.

#!/bin/bash

#Backup important directories
mkdir /opt/splunkbackup/
cp -al /opt/splunk/etc /opt/splunkbackup/
cp -al /opt/splunk/var/lib/splunk/defaultdb /opt/splunkbackup/

#Nuke splunk
/opt/splunk/bin/splunk stop
rm -rf /opt/splunk

#Reload from fresh start
rpm -iv --replacepkgs /home/nicholas/splunk-6.2.2-255606-linux-2.6-x86_64.rpm
/opt/splunk/bin/splunk start --accept-license

#Restore configuration files and indexes
/opt/splunk/bin/splunk stop
rm -rf /opt/splunk/etc
cp -al /opt/splunkbackup/etc /opt/splunk/
rm -rf /opt/splunk/var/lib/splunk/defaultdb
cp -al /opt/splunkbackup/defaultdb /opt/splunk/var/lib/splunk/
chown splunk:splunk -R /opt/splunk/
/opt/splunk/bin/splunk start

#Remove splunk backup
rm -rf /opt/splunkbackup

This will restore your searches, settings, and data. It won’t restore audit and other internal Splunk information, however. This script worked marvelously in getting my Splunk back.

Fix owncloud client sync “not valid JSON!” error

Recently I migrated my owncloud installation from one webserver to another. I learned that all you have to do is copy the data/ and config/ directories from your owncloud installation to the new machine’s owncloud folder to migrate everything over.

After the migration, I noticed the Windows desktop sync clients stopped working (android worked fine, though.) The main error messasge was not helpful:

Failed to connect to owncloud at https://servername/path/status.php: Unknown error

I found out from here that you can alter the shortcut for the Windows client and append –logwindow to the Target section. Once that was done, I was able to get more information:

03-02 10:01:27:240 0x50f9ec8 networkjobs.cpp:453 status.php from server is not valid JSON!

Manually navigating to status.php in the browser didn’t reveal anything – it appeared to load normally.

After much digging I found a suggestion to check the admin settings within owncloud. This is where I realized I probably didn’t migrate things properly. There was a big warning about invalid .htaccess files. Progress!

The lack of an .htaccess file made me realize that instead of completely moving the entire folder from the old owncloud install to the new, I needed to copy into the existing new owncloud directory. In moving instead of copying I somehow missed a few important files.

I started over, this time copying all files inside data and config into the new owncloud data and config directories. Apparently the Windows sync client requires valid .htaccess configuration before it will work.  Success!

Troubleshoot RSA SecurID in CentOS 6

Unexpected error from ACE/Agent API.

In following this guide for configuring a CentOS 6 system to authenticate with RSA SecurID I came across an unusual error message that had me scratching my head:

Unexpected error from ACE/Agent API.

The problem stemmed from having an incorrect value in the /var/ace/sdopts.rec file for CLIENT_IP. For some reason I had put the IP address of the RSA authentication server in there. CLIENT_IP is the IP address of the RSA client, or rather, the machine you’re working on. The client uses whatever’s in that file to report to the RSA server what its IP address is. If the RSA server gets an invalid IP response from the client, it won’t authenticate.

SELinux issues

Much blood and tears were shed in dealing with getting SELinux to exist harmoniously with RSA SecurID. The problem was exacerbated my the fact that there is a lot of half solutions and misinformation floating out there on the internet. This will hopefully help fix that.

The message entry does not exist for Message ID: 1001

At this point acetest worked beautifully but I could not use an RSA passcode to SSH into the system. Digging into the log revealed this error message:

sshd[2135]: ACEAGENT: The message entry does not exist for Message ID: 1001

Thanks to this post, I realized it was due to selinux. Modifying the selinux config information to allow /var/ace to be read, per the commands below, seemed to fix the issue.

setenforce 0
chcon -Rv --type=sshd_t /var/ace/
setenforce 1

But alas! The solution was not a very good one. The commands above have two problems with them: first, the chcon command is temporary and does not survive selinux policy relabels; second, it assigns the type sshd_t, which does allow SSH to access it, but revokes RSA SecurID’s ability to write to the directory. This is a problem if you ever need to clear node secrets. The server will initiate the wipe but the client will not be able to modify that directory, resulting in node secret mismatches.

I finally decided to RTFM and landed on this documentation page, which explained the issue I was having: selinux mislabeling. The proper solution to this problem is use a label that both SecurID and SSHD can write / read to. Thanks to this SELinux Manpage (it really pays to RTFM!) I discovered that the label I want is var_auth_t (the default label applied when creating /var/ace is var_t, which SSH can’t read.) 

To survive relabeling, use the semanage command, which is not installed by default. Thanks to this site I learned I must install policycoreutils-pithon:

yum install policycoreutils-python

Once semanage is installed, use it to change the label for /var/ace and everything inside it to var_auth_t, then apply the changes with restorecon:

semanage fcontext -a -t var_auth_t "/var/ace(/.*)?"
restorecon -R -v /var/ace

Finally, both RSA SecurID and OpenSSH can read what they need to and authentication is successful.

First acetest succeeds but subsequent ones fail

If you followed the bad advice of relabeling /var/ace to sshd_t you might run across a very frustrating issue where acetest would succeed, but any attempts to SSH into the box or even run acetest again would fail. The error message on the RSA SecurID server was

Node secret mismatch: cleared on server but not on agent

The problem is due to the improper SELinux labeling mentioned above. The fix is the same:

yum install policycoreutils-python
semanage fcontext -a -t var_auth_t "/var/ace(/.*)?"
restorecon -R -v /var/ace

SSH access denied even with successful acetest

If acetest succeeds and you’ve loaded the module into PAM but still get access denied, it could be due to your SSH configuration. Ensure the following options are set:

ChallengeResponseAuthentication yes 
UsePrivilegeSeparation no

Victory.