Tag Archives: Splunk

Fix erroneous DM Splunk Missing Forwarders alert

For some time now Splunk has been alerting me to “missing” forwarders even though all of those forwarders are working perfectly fine. It turns out to be a glitch in the Deployment Monitor app. After much digging I found this Splunk article which explains it:

https://answers.splunk.com/answers/188784/after-update-to-splunk-enterprise-62-why-does-the.html

The fix is fairly simple, thankfully. You have to edit the macros.conf of the Deployment Monitor app to add this small snippet right before the first pipe:

NOT eventType=*

The default path for this configuration file is:

/opt/splunk/etc/apps/SplunkDeploymentMonitor/default/macros.conf

The relevant stanza in my macros.conf is below:

[forwarder_metrics]
definition = index="_internal" source="*metrics.log" group=tcpin_connections NOT eventType=* | eval sourceHost=if(isnull(hostname), sourceHost,hostname) | eval connectionType=case(fwdType=="uf","universal forwarder", fwdType=="lwf", "lightweight forwarder",fwdType=="full", "heavy forwarder", connectionType=="cooked" or connectionType=="cookedSSL","Splunk forwarder", connectionType=="raw" or connectionType=="rawSSL","legacy forwarder")| eval build=if(isnull(build),"n/a",build) | eval version=if(isnull(version),"pre 4.2",version) | eval guid=if(isnull(guid),sourceHost,guid) | eval os=if(isnull(os),"n/a",os)| eval arch=if(isnull(arch),"n/a",arch) | fields connectionType sourceIp sourceHost sourcePort destPort kb tcp_eps tcp_Kprocessed tcp_KBps splunk_server build version os arch guid

Change the hostname on a Splunk Indexer

Recently I set about to change the hostname on a Splunk indexer. It should be pretty easy, right? Beware. It can be pretty nasty! Below is my experience.

I started with the basics.

  • hostname command
    hostname <newhostname>
  • Modify /etc/system/network to make it persistent (CentOS specific)
    sed -i 's/<old hostname>/<new hostname>/g' /etc/system/network
  • Inform Splunk of the hostname change
    sed -i 's/<old hostname>/<new hostname>/g' $SPLUNK_HOME/etc/system/local/server.conf
  • Restart Splunk

Sadly, that wasn’t the end of it. I noticed right away Splunk complained of a few things:

TcpOutputProc - Forwarding to indexer group default-autolb-group blocked for 300 seconds.
WARN TcpOutputFd - Connect to 10.0.0.10:9997 failed. Connection refused

Running

netstat -an | grep LISTEN

revealed that the server was not even listening on 9997 like it should be. I found this answer indicating it could be an issue with DNS tripping up on that server. I edited $SPLUNK_HOME/etc/system/local/inputs.conf with the following:

[splunktcp://9997]
connection_host = none

but I also noticed that after I ran the command a short time later it was no longer listening on 9997. Attempting to telnet from the forwarder to the indexer in question revealed the same results – works at first, then quit working. Meanwhile no events are getting stored on that indexer.

I was pulling my hair out trying to figure out what was happening. Finally I discovered this gem on Splunk Answers:

Are you using the deployment server in your environment? Is it possible your forwarders’ outputs.conf got deployed to your indexer?

On the indexer:
./splunk cmd btool outputs list –debug

Sure enough! after running

./splunk cmd btool outputs list --debug

I discovered this little gem of a stanza:

/opt/splunk/etc/apps/APP_Forwarders/default/outputs.conf [tcpout]

That shouldn’t’ have been there! Digging into my deployment server I discovered that I had a server class with a blacklist, that is, it included all deployment clients except some that I had listed. The blacklist had the old hostname, which meant when I changed the indexer’s hostname it no longer matched the blacklist and thus was deployed a forwarder’s configuration, causing a forwarding loop. My indexer was forwarding back to the forwarder everything it was getting from the forwarder, causing Splunk to shut down port 9997 on the offending indexer completely.

After getting all that set up I noticed Splunk was only returning searches from the indexers whose hostnames I had not changed. Everything looked good in the distributed search arena – status was OK on all indexers; yet I still was not getting any results from the indexer whose name I had changed, even though it was receiving events! This was turning into a problem. It was creating a blind spot.

Connections great, search status great, deployment status good.. I didn’t know what else to do. I finally thought to reload Splunk on the search head that had been talking to the server whose name I changed. Success! Something in the search head must have made it blind to the indexer once its name had changed. Simply restarting Splunk on the search head fixed it.

In short, if you’re crazy enough to change the name of one of your indexers in a distributed Splunk environment, make sure you do the following:

  • Change hostname on the OS
  • Change ServerName in Splunk config files
    • Add connection_host = none in inputs.conf (optional?)
  • Clean up your deployment server
    • Delete old hostname from clients phoning home
    • MAKE SURE the new hostname won’t be sucked up into an unwanted server class
  • Clean up your search head
    • Delete old hostname search peer
    • Add new hostname search peer
    • Restart search head
  • Profit

Install Splunk Universal Forwarder on Linux

I do this infrequently enough that I decided I should really write this down. Below is the quick and dirty way to get the Splunk universal forwarder installed on a new Linux  system. Thanks to byteschef for the information used to create this guide.

Download the latest splunk .RPM from their site and install it via RPM -i <filename> (if RedHat based) or dpki -i <filename> if debian based.

Run the following commands as root:

cd /opt/splunkforwarder/bin
./splunk start --accept-license
./splunk enable boot-start
./splunk add forward-server <IP/hostname of splunk server>:9997 -auth admin:changeme
./splunk add monitor /var/log
./splunk edit user admin -password NEW_PASSWORD -auth admin:changeme
./splunk restart

If there are any other directories you want monitored other than /var/log (application logs, for example) then issue:

./splunk add monitor <directory to monitor>

Done.

Migrate from Sophos UTM to pfSense part 1

I’ve been using a Sophos UTM virtual appliance as my main firewall / threat manager appliance for about two years now. I’ve had some strange issues with this solution off and on but for the most part it worked. The number of odd issues has begun to build, though.

Recently it decided to randomly drop some connections even though logs showed no dropped packets. The partial connections spanned across various networks and devices. I never did figure out what was wrong. After two days of furiously investigating (including disconnecting all devices from the network), the problem went away completely on its own with no action on my part. It was maddening – enough to drive me to pfSense.

As of version 2.2 pfSense can be fully virtualized in Xen, thanks to FreeBSD 10.1. This allowed me the option to migrate. Below are the initial steps I’ve taken to move to pfSense.

Features checklist

I am currently using the following functions in Sophos UTM. My goal is to move these functions to equivalents in pfSense:

  • Network firewall
  • Web Application Firewall, also known as a reverse proxy.
  • NTP server
  • PPPOE client
  • DHCP server
  • DNS server
  • Transparent proxy for content filtering and reporting
  • E-mail server / SPAM protection
  • Intrusion Detection system
  • Anti-virus
  • SOCKS proxy
  • Remote access portal (for downloading VPN configurations, etc)
  • Citrix Xenserver support (for live migration etc)
  • Log all events to a syslog server
  • VPN server
  • Daily / weekly / monthly e-mail reports on bandwidth usage, CPU, most visited sites, etc.

I haven’t migrated all of these function over to pfSense which is why this article is only Part 1. Here is what I have done so far.

Xenserver support

Installing xen tools is fairly straightforward thanks to this article. It’s simply a matter of dropping to a shell on your pfSense VM to install and enable xen tools

pkg install xe-guest-utilities
echo "xenguest_enable=\"YES\"" >> /etc/rc.conf.local
ln -s /usr/local/etc/rc.d/xenguest /usr/local/etc/rc.d/xenguest.sh 
service xenguest start

PPPoE client

The wizard works fine for configuring PPPOE, however I experienced some very strange issues with internet speed. Downstream would be fine but upstream would be incredibly slow. Another symptom was NAT / port forwarding appearing not to work at all.

It turns out the issue was pfSense’s virtualized status. There is a bug in the virtio driver that handles virtualized networking. You have to disable all hardware offloading on both the xenserver hypervisor and the pfSense VM to work around the bug. Details on how to do this can be found here. After that fix was implemented, speed and performance went back to normal.

DNS server

To get this working like it did in Sophos you have to disable the default DNS resolver service and enable the DNS forwarder service instead. Once DNS forwarder is enabled, check the box “register DHCP leases in DNS” so that DHCP hostnames come through to clients.

Syslog

Navigate to Status / system logs / settings tab and  tick “Send log messages to remote syslog server” and fill out the appropriate settings.

Note for Splunk users: the Technology Add-on for parsing pfsense logs expects the sourcetype to equal pfsense (not syslog). Create a manual input for logs coming from pfsense so it’s tagged as pfsense and not syslog (thanks to this post for the solution on how to get the TA to work properly.)

VPN

OpenVPN – wizard ran fine. Install OpenVPN Client Export utility package for easy exporting to clients. Once package is installed go to VPN / OpenVPN and you will see a new tab – Client Export.

Note you will need to create a user and check the “create certificate” checkbox or add a user certificate to existing user by going to System / User manager, Editing the user and clicking the plus next to User Certificates. The export utility will only show users that have valid certificates attached to them. If no users have valid certificates the Client Export tab will be blank.

Firewall

One useful setting to note is to enable NAT reflection. This allows you to access NATed resources as if you were outside the network, even though you are inside it. Do this by going to System / Advanced and clicking on the Firewall / NAT tab. Scroll halfway down to find the Network Address Translation section. Change NAT reflection mode for port forwards to Enable (Pure NAT)

It’s also very helpful to configure host and port aliases by going to Firewall / Aliases. This is roughly equivalent to creating Network and Host definitions in Sophos. When you write firewall rules you can simply use the alias instead of writing out hosts IPs and ports.

So far so good

This is the end of part 1. I’ve successfully moved the following services from Sophos UTM to pfSense:

  • Network firewall
  • PPPOE client
  • Log all events to a syslog server
  • VPN server
  • NTP server
  • DHCP server
  • DNS server
  • Xenserver support

I’m still working on moving the other services over. I’ve yet to find a viable alternative to the web application firewall but I haven’t given up yet.

Determine what a Splunk forwarder is forwarding

I recently came across a need to determine exactly what is logging to a forwarder in Splunk. I had a hard time finding out what to search for so I thought I’d share what I found.

The key to discovering where data is coming in from is in Splunk’s metrics.log files. Searching these files gives us what we need. This search reveals anything coming in over UDP (you can also change it to be TCP if desired) and totals it by host (forwarder.)

source=*metrics.log group=udpin_connections | stats count by host

Oddly enough Splunk doesn’t have a field extracted for its own metrics.log. A key useful field is missing – sourceHost. I used the field extraction tool to create it and it generated this field extraction:

^[^,\n]*,\s+(?P<sourceHost>\d+\.\d+\.\d+\.\d+)

Field extraction in hand, I was able to generate the report I was looking for: devices actively sending logs to my forwarder over UDP.

source=*metrics.log group=udpin_connections  host=splunk | stats count by sourceHost sourcePort

where host is the forwarder you want to investigate. Useful.

Fix Splunk lockout after exceeded quota

Recently I came across a situation with my home install of Splunk (free license) where the 500MB quota was exceeded three days in a row. I hadn’t checked Splunk for a few days so I was completely blindsided by it. The consequence of going over quota three days in a row? Losing the ability to do any searches in Splunk, which is a real downer.

The easiest, although least convenient, way to fix being locked out is to wait it out. If you go 30 days in a row without violating the license, Splunk will unlock itself. Splunk will still receive and index events during that time. The inability to search makes it really difficult to track down what the problem is, though, and I wasn’t happy waiting for 30 days before getting Splunk back.

Poking around on the Splunk forums I discovered that there is a way to get splunk back – perform a fresh install and then migrate your database and settings over to the fresh install. This involves backing up a few things, then copying them over the fresh install’s default folders

  • $SPLUNK_HOME/var/lib/splunk/defaultdb   #Default Splunk index, where all my data is held. If you have other indexes in here you’ll want to copy them too.
  • $SPLUNK_HOME/etc  #all your configuration files

Simply back up the above folders, install Splunk on a new machine, launch Splunk first so it will generate all the default files, then copy the files over to the new instance.

I went a step further and planned for the future. I wrote a quick and dirty script that will do all of this for you,  even on the same machine – no need to copy to another machine.  The script assumes you’re running a redhat derivative and have the correct Splunk install file in a predictable location. Update the locations of splunk directories and install files as needed and run as root.

#!/bin/bash

#Backup important directories
mkdir /opt/splunkbackup/
cp -al /opt/splunk/etc /opt/splunkbackup/
cp -al /opt/splunk/var/lib/splunk/defaultdb /opt/splunkbackup/

#Nuke splunk
/opt/splunk/bin/splunk stop
rm -rf /opt/splunk

#Reload from fresh start
rpm -iv --replacepkgs /home/nicholas/splunk-6.2.2-255606-linux-2.6-x86_64.rpm
/opt/splunk/bin/splunk start --accept-license

#Restore configuration files and indexes
/opt/splunk/bin/splunk stop
rm -rf /opt/splunk/etc
cp -al /opt/splunkbackup/etc /opt/splunk/
rm -rf /opt/splunk/var/lib/splunk/defaultdb
cp -al /opt/splunkbackup/defaultdb /opt/splunk/var/lib/splunk/
chown splunk:splunk -R /opt/splunk/
/opt/splunk/bin/splunk start

#Remove splunk backup
rm -rf /opt/splunkbackup

This will restore your searches, settings, and data. It won’t restore audit and other internal Splunk information, however. This script worked marvelously in getting my Splunk back.

Automatically delete old data in Splunk

I’ve had Splunk humming along for about two years now. I’ve already increased the storage space for my Splunk VM once. Today I received a notice that I’ve once again run out of space and indexing had been suspended. I wanted a more permanent solution to this problem, so I consulted the almighty Google.

The solution to my problem is to set a retirement policy. This allows Splunk to automatically delete old data when you hit a certain index size. You can also go by time, but I opted for size. It’s a pretty simple configuration change. Simply edit (or create if it doesn’t exist) $SPLUNK_HOME/etc/system/local/index.conf and add two lines

sudo vim /opt/splunk/etc/system/local/indexes.conf
[main]
maxTotalDataSizeMB = 50000

Then, restart your Splunk instance (the command below is Debian/Ubuntu specific)

sudo service splunk restart

The configuration above tells Splunk to keep at most 50GB of data. When that limit is reached, it begins deleting the oldest log files. No more out of space errors!

Get geolocation info in Splunk with iplocation

Splunk 6 has many awesome new features, one of which is built-in IP geolocation. No longer do you have to manually lookup up city, state, and country when investigating logs – Splunk will do that for you. This page has the details.

For example, if I want my x_forwarded_for IP addresses to have geolocation, I tack this at the end of my query:

| iplocation x_forwarded_for | stats count by x_forwarded_for City Region Country

The fields iplocation can produce are:

  • City
  • Continent
  • Country
  • lat
  • lon
  • MetroCode
  • Region
  • Timezone

You can combine this query with DNS lookups (as detailed here) for a more complete picture of your data.

<search query> | iplocation x_forwarded_for | lookup dnslookup clientip as x_forwarded_for OUTPUT clienthost as hostname | stats count by x_forwarded_for City Region Country hostname

Neat.

Extract multiple Active Directory fields in Splunk

I had posted here about how to extract account names with a specific modifier (exclude account names ending in a dollar sign.) That worked for one specific instance, but I found I needed something better. Active Directory logs have multiples of the same value (Account_Name, Group_Name, etc.) that all depend on context, namely the value of the line two lines above it.

For example,

Message=A member was added to a security-enabled universal group.

Subject:
 Security ID: <Random long SID>
 Account Name: Administrator
 Account Domain: ExampleDomain
 Logon ID: <random hex value>

Member:
 Security ID: <Another random long SID>
 Account Name: CN=George Clooney,OU=ExampleDomain,OU=Hollywood,OU=California,DC=USA,DC=NA,DC=Terra

Group:
 Security ID: <Yet another long SID>
 Account Name: Old Actors
 Account Domain: ExampleDomain

You can see that there are three different Security ID fields, three different Account Name fields, and two different Account Domain fields. The key is the context: Subject account name, member account name, or group account name.

I wrestled for some time to find a regex expression for Splunk that would continue matching things after a line has ended. After much searching I came across this post which explained the need for a regex modifier to do what I wanted.

In my case I needed to use the (?s) modifier to include newline characters in my extraction. My new and improved AD regex extraction is as follows:

(?s)(Group:.+Account Name:\s+)(?P<real_group_name>[^\n]+)
  • (?s)  Regex modifier indicating to include new lines
  • Group:  Section I am interested in. You can replace this with Member: if you’re interested in member account names instead
  • .+ match one or more of any character (including new line as indicated by modifier above)
  • Account Name:\s+ This is in conjuction with the previous two items to create a match that includes the section name and anything after that until the spaces after Account Name
  • [^\n]+ Match one or more characters that is not a new line (since you might have an account name with spaces.)

Finally! This is the regex I’ve been looking for.

 

Fix sudo being slow after changing hostname

Recently I changed the hostname of one of my machines. Ever since I did this there has been a five second pause from when I enter a command and when it actually executes. I was perplexed about this until I came across this post explaining that the /etc/hosts file was probably still pointing to the old hostname. It turns out it was!

So, to recap, if you want to change the hostname of your machine you have to make sure you do these three things:

  • issue the hostname command to change the hostname while running
  • update /etc/hostname with your new hostname
  • update /etc/hosts to reflect your new hostname after 127.0.0.1 (get rid of the old hostname.)

Update 2/24/2015: If you happen to have a Splunk forwarder installed on the machine, make sure you update its config to reflect the new hostname. Thanks to Splunk Answers for the information.  To do this, update  $SPLUNK_HOME/etc/system/local/server.conf and change the serverName= field to your new hostname.