Category Archives: Analytics

Fix erroneous DM Splunk Missing Forwarders alert

For some time now Splunk has been alerting me to “missing” forwarders even though all of those forwarders are working perfectly fine. It turns out to be a glitch in the Deployment Monitor app. After much digging I found this Splunk article which explains it:

https://answers.splunk.com/answers/188784/after-update-to-splunk-enterprise-62-why-does-the.html

The fix is fairly simple, thankfully. You have to edit the macros.conf of the Deployment Monitor app to add this small snippet right before the first pipe:

NOT eventType=*

The default path for this configuration file is:

/opt/splunk/etc/apps/SplunkDeploymentMonitor/default/macros.conf

The relevant stanza in my macros.conf is below:

[forwarder_metrics]
definition = index="_internal" source="*metrics.log" group=tcpin_connections NOT eventType=* | eval sourceHost=if(isnull(hostname), sourceHost,hostname) | eval connectionType=case(fwdType=="uf","universal forwarder", fwdType=="lwf", "lightweight forwarder",fwdType=="full", "heavy forwarder", connectionType=="cooked" or connectionType=="cookedSSL","Splunk forwarder", connectionType=="raw" or connectionType=="rawSSL","legacy forwarder")| eval build=if(isnull(build),"n/a",build) | eval version=if(isnull(version),"pre 4.2",version) | eval guid=if(isnull(guid),sourceHost,guid) | eval os=if(isnull(os),"n/a",os)| eval arch=if(isnull(arch),"n/a",arch) | fields connectionType sourceIp sourceHost sourcePort destPort kb tcp_eps tcp_Kprocessed tcp_KBps splunk_server build version os arch guid

Change the hostname on a Splunk Indexer

Recently I set about to change the hostname on a Splunk indexer. It should be pretty easy, right? Beware. It can be pretty nasty! Below is my experience.

I started with the basics.

  • hostname command
    hostname <newhostname>
  • Modify /etc/system/network to make it persistent (CentOS specific)
    sed -i 's/<old hostname>/<new hostname>/g' /etc/system/network
  • Inform Splunk of the hostname change
    sed -i 's/<old hostname>/<new hostname>/g' $SPLUNK_HOME/etc/system/local/server.conf
  • Restart Splunk

Sadly, that wasn’t the end of it. I noticed right away Splunk complained of a few things:

TcpOutputProc - Forwarding to indexer group default-autolb-group blocked for 300 seconds.
WARN TcpOutputFd - Connect to 10.0.0.10:9997 failed. Connection refused

Running

netstat -an | grep LISTEN

revealed that the server was not even listening on 9997 like it should be. I found this answer indicating it could be an issue with DNS tripping up on that server. I edited $SPLUNK_HOME/etc/system/local/inputs.conf with the following:

[splunktcp://9997]
connection_host = none

but I also noticed that after I ran the command a short time later it was no longer listening on 9997. Attempting to telnet from the forwarder to the indexer in question revealed the same results – works at first, then quit working. Meanwhile no events are getting stored on that indexer.

I was pulling my hair out trying to figure out what was happening. Finally I discovered this gem on Splunk Answers:

Are you using the deployment server in your environment? Is it possible your forwarders’ outputs.conf got deployed to your indexer?

On the indexer:
./splunk cmd btool outputs list –debug

Sure enough! after running

./splunk cmd btool outputs list --debug

I discovered this little gem of a stanza:

/opt/splunk/etc/apps/APP_Forwarders/default/outputs.conf [tcpout]

That shouldn’t’ have been there! Digging into my deployment server I discovered that I had a server class with a blacklist, that is, it included all deployment clients except some that I had listed. The blacklist had the old hostname, which meant when I changed the indexer’s hostname it no longer matched the blacklist and thus was deployed a forwarder’s configuration, causing a forwarding loop. My indexer was forwarding back to the forwarder everything it was getting from the forwarder, causing Splunk to shut down port 9997 on the offending indexer completely.

After getting all that set up I noticed Splunk was only returning searches from the indexers whose hostnames I had not changed. Everything looked good in the distributed search arena – status was OK on all indexers; yet I still was not getting any results from the indexer whose name I had changed, even though it was receiving events! This was turning into a problem. It was creating a blind spot.

Connections great, search status great, deployment status good.. I didn’t know what else to do. I finally thought to reload Splunk on the search head that had been talking to the server whose name I changed. Success! Something in the search head must have made it blind to the indexer once its name had changed. Simply restarting Splunk on the search head fixed it.

In short, if you’re crazy enough to change the name of one of your indexers in a distributed Splunk environment, make sure you do the following:

  • Change hostname on the OS
  • Change ServerName in Splunk config files
    • Add connection_host = none in inputs.conf (optional?)
  • Clean up your deployment server
    • Delete old hostname from clients phoning home
    • MAKE SURE the new hostname won’t be sucked up into an unwanted server class
  • Clean up your search head
    • Delete old hostname search peer
    • Add new hostname search peer
    • Restart search head
  • Profit

Install Splunk Universal Forwarder on Linux

I do this infrequently enough that I decided I should really write this down. Below is the quick and dirty way to get the Splunk universal forwarder installed on a new Linux  system. Thanks to byteschef for the information used to create this guide.

Download the latest splunk .RPM from their site and install it via RPM -i <filename> (if RedHat based) or dpki -i <filename> if debian based.

Run the following commands as root:

cd /opt/splunkforwarder/bin
./splunk start --accept-license
./splunk enable boot-start
./splunk add forward-server <IP/hostname of splunk server>:9997 -auth admin:changeme
./splunk add monitor /var/log
./splunk edit user admin -password NEW_PASSWORD -auth admin:changeme
./splunk restart

If there are any other directories you want monitored other than /var/log (application logs, for example) then issue:

./splunk add monitor <directory to monitor>

Done.

Determine what a Splunk forwarder is forwarding

I recently came across a need to determine exactly what is logging to a forwarder in Splunk. I had a hard time finding out what to search for so I thought I’d share what I found.

The key to discovering where data is coming in from is in Splunk’s metrics.log files. Searching these files gives us what we need. This search reveals anything coming in over UDP (you can also change it to be TCP if desired) and totals it by host (forwarder.)

source=*metrics.log group=udpin_connections | stats count by host

Oddly enough Splunk doesn’t have a field extracted for its own metrics.log. A key useful field is missing – sourceHost. I used the field extraction tool to create it and it generated this field extraction:

^[^,\n]*,\s+(?P<sourceHost>\d+\.\d+\.\d+\.\d+)

Field extraction in hand, I was able to generate the report I was looking for: devices actively sending logs to my forwarder over UDP.

source=*metrics.log group=udpin_connections  host=splunk | stats count by sourceHost sourcePort

where host is the forwarder you want to investigate. Useful.

Fix Splunk lockout after exceeded quota

Recently I came across a situation with my home install of Splunk (free license) where the 500MB quota was exceeded three days in a row. I hadn’t checked Splunk for a few days so I was completely blindsided by it. The consequence of going over quota three days in a row? Losing the ability to do any searches in Splunk, which is a real downer.

The easiest, although least convenient, way to fix being locked out is to wait it out. If you go 30 days in a row without violating the license, Splunk will unlock itself. Splunk will still receive and index events during that time. The inability to search makes it really difficult to track down what the problem is, though, and I wasn’t happy waiting for 30 days before getting Splunk back.

Poking around on the Splunk forums I discovered that there is a way to get splunk back – perform a fresh install and then migrate your database and settings over to the fresh install. This involves backing up a few things, then copying them over the fresh install’s default folders

  • $SPLUNK_HOME/var/lib/splunk/defaultdb   #Default Splunk index, where all my data is held. If you have other indexes in here you’ll want to copy them too.
  • $SPLUNK_HOME/etc  #all your configuration files

Simply back up the above folders, install Splunk on a new machine, launch Splunk first so it will generate all the default files, then copy the files over to the new instance.

I went a step further and planned for the future. I wrote a quick and dirty script that will do all of this for you,  even on the same machine – no need to copy to another machine.  The script assumes you’re running a redhat derivative and have the correct Splunk install file in a predictable location. Update the locations of splunk directories and install files as needed and run as root.

#!/bin/bash

#Backup important directories
mkdir /opt/splunkbackup/
cp -al /opt/splunk/etc /opt/splunkbackup/
cp -al /opt/splunk/var/lib/splunk/defaultdb /opt/splunkbackup/

#Nuke splunk
/opt/splunk/bin/splunk stop
rm -rf /opt/splunk

#Reload from fresh start
rpm -iv --replacepkgs /home/nicholas/splunk-6.2.2-255606-linux-2.6-x86_64.rpm
/opt/splunk/bin/splunk start --accept-license

#Restore configuration files and indexes
/opt/splunk/bin/splunk stop
rm -rf /opt/splunk/etc
cp -al /opt/splunkbackup/etc /opt/splunk/
rm -rf /opt/splunk/var/lib/splunk/defaultdb
cp -al /opt/splunkbackup/defaultdb /opt/splunk/var/lib/splunk/
chown splunk:splunk -R /opt/splunk/
/opt/splunk/bin/splunk start

#Remove splunk backup
rm -rf /opt/splunkbackup

This will restore your searches, settings, and data. It won’t restore audit and other internal Splunk information, however. This script worked marvelously in getting my Splunk back.

Automatically delete old data in Splunk

I’ve had Splunk humming along for about two years now. I’ve already increased the storage space for my Splunk VM once. Today I received a notice that I’ve once again run out of space and indexing had been suspended. I wanted a more permanent solution to this problem, so I consulted the almighty Google.

The solution to my problem is to set a retirement policy. This allows Splunk to automatically delete old data when you hit a certain index size. You can also go by time, but I opted for size. It’s a pretty simple configuration change. Simply edit (or create if it doesn’t exist) $SPLUNK_HOME/etc/system/local/index.conf and add two lines

sudo vim /opt/splunk/etc/system/local/indexes.conf
[main]
maxTotalDataSizeMB = 50000

Then, restart your Splunk instance (the command below is Debian/Ubuntu specific)

sudo service splunk restart

The configuration above tells Splunk to keep at most 50GB of data. When that limit is reached, it begins deleting the oldest log files. No more out of space errors!

Get geolocation info in Splunk with iplocation

Splunk 6 has many awesome new features, one of which is built-in IP geolocation. No longer do you have to manually lookup up city, state, and country when investigating logs – Splunk will do that for you. This page has the details.

For example, if I want my x_forwarded_for IP addresses to have geolocation, I tack this at the end of my query:

| iplocation x_forwarded_for | stats count by x_forwarded_for City Region Country

The fields iplocation can produce are:

  • City
  • Continent
  • Country
  • lat
  • lon
  • MetroCode
  • Region
  • Timezone

You can combine this query with DNS lookups (as detailed here) for a more complete picture of your data.

<search query> | iplocation x_forwarded_for | lookup dnslookup clientip as x_forwarded_for OUTPUT clienthost as hostname | stats count by x_forwarded_for City Region Country hostname

Neat.

Extract multiple Active Directory fields in Splunk

I had posted here about how to extract account names with a specific modifier (exclude account names ending in a dollar sign.) That worked for one specific instance, but I found I needed something better. Active Directory logs have multiples of the same value (Account_Name, Group_Name, etc.) that all depend on context, namely the value of the line two lines above it.

For example,

Message=A member was added to a security-enabled universal group.

Subject:
 Security ID: <Random long SID>
 Account Name: Administrator
 Account Domain: ExampleDomain
 Logon ID: <random hex value>

Member:
 Security ID: <Another random long SID>
 Account Name: CN=George Clooney,OU=ExampleDomain,OU=Hollywood,OU=California,DC=USA,DC=NA,DC=Terra

Group:
 Security ID: <Yet another long SID>
 Account Name: Old Actors
 Account Domain: ExampleDomain

You can see that there are three different Security ID fields, three different Account Name fields, and two different Account Domain fields. The key is the context: Subject account name, member account name, or group account name.

I wrestled for some time to find a regex expression for Splunk that would continue matching things after a line has ended. After much searching I came across this post which explained the need for a regex modifier to do what I wanted.

In my case I needed to use the (?s) modifier to include newline characters in my extraction. My new and improved AD regex extraction is as follows:

(?s)(Group:.+Account Name:\s+)(?P<real_group_name>[^\n]+)
  • (?s)  Regex modifier indicating to include new lines
  • Group:  Section I am interested in. You can replace this with Member: if you’re interested in member account names instead
  • .+ match one or more of any character (including new line as indicated by modifier above)
  • Account Name:\s+ This is in conjuction with the previous two items to create a match that includes the section name and anything after that until the spaces after Account Name
  • [^\n]+ Match one or more characters that is not a new line (since you might have an account name with spaces.)

Finally! This is the regex I’ve been looking for.

 

Extract Active Directory Account Names in Splunk

I don’t really understand Microsoft’s rationale when it comes to log verbosity. I suppose too much information is better than not enough information, but that comes at the cost of making it difficult if you have to try and actually read the information.

I’ve been trying to extract usernames from Active Directory controller logs and it turned out to be quite a pain. Why do the logs have more than one field with the same name? It confuses Splunk and seems to fly in the face of common sense and decency.. I will stop ranting now.

In my specific case, AD lockout logs have two Account Name fields, one for the controller and one for the user being locked out. I am interested only in the username and not the AD controller account name.  How do you tell Splunk to only include the second instance of Account Name?

The answer is to create a field extraction using negative lookahead (Thanks to this article which gave me the guidance I needed.) I had to tweak the regex to look for and exclude any matches ending in a dollar sign, as opposed to excluding dashes in the article’s example. My fine tuned regex statement is below:

Account Name:\s+(?!.+\$)(?P<FIELDNAME>\S+)

It looks for Account Name: followed by one or more spaces (there is excess spacing in the logs for some reason.) The real magic happens in the next bit – (?!.+\$)

  • Parenthesis group the expression together
  • ?! means negative lookahead – don’t include anything you find that matches the following regex
  • .+ – one or more characters
  • \$ – stop matching when you encounter a dollar sign

The second regex string is simply \S+ (one or more non-whitespace characters.)

Note this doesn’t satisfy all AD logs, just the ones I’m interested in (account lockouts – they all have a first Account Name ending in a dollar sign.)

The result of all this jargon and gnashing of teeth: clean Splunk logs revealing only what I want without excess information. Neat.


 

Update: I found an even better way to do this. The key is to use the regex modifier (?s) to include new lines. The better query is now this:

(?s)(<section name of the field you're interested in>:.+Account Name:\s+)(?P<real_group_name>[^\n]+)

A detailed explanation is located here.

Make stats in Splunk more meaningful with fillnull

I mentioned in my last post about a common issue that I have with the stats command: items with empty values are simply excluded from the results. What if you want to include those empty results with the stats command?

The solution, which I found here, is to use the fillnull command.

<search query> | fillnull value=”-” | stats count by <field(s) which contain empty values>

It’s that simple! Now instead of excluding empty results, they are included and display as a dash. Brilliant.