Recently I set about to change the hostname on a Splunk indexer. It should be pretty easy, right? Beware. It can be pretty nasty! Below is my experience.
I started with the basics.
- hostname command
- Modify /etc/system/network to make it persistent (CentOS specific)
sed -i 's/<old hostname>/<new hostname>/g' /etc/system/network
- Inform Splunk of the hostname change
sed -i 's/<old hostname>/<new hostname>/g' $SPLUNK_HOME/etc/system/local/server.conf
- Restart Splunk
Sadly, that wasn’t the end of it. I noticed right away Splunk complained of a few things:
TcpOutputProc - Forwarding to indexer group default-autolb-group blocked for 300 seconds.
WARN TcpOutputFd - Connect to 10.0.0.10:9997 failed. Connection refused
netstat -an | grep LISTEN
revealed that the server was not even listening on 9997 like it should be. I found this answer indicating it could be an issue with DNS tripping up on that server. I edited $SPLUNK_HOME/etc/system/local/inputs.conf with the following:
[splunktcp://9997] connection_host = none
but I also noticed that after I ran the command a short time later it was no longer listening on 9997. Attempting to telnet from the forwarder to the indexer in question revealed the same results – works at first, then quit working. Meanwhile no events are getting stored on that indexer.
I was pulling my hair out trying to figure out what was happening. Finally I discovered this gem on Splunk Answers:
Are you using the deployment server in your environment? Is it possible your forwarders’ outputs.conf got deployed to your indexer?
On the indexer:
./splunk cmd btool outputs list –debug
Sure enough! after running
./splunk cmd btool outputs list --debug
I discovered this little gem of a stanza:
That shouldn’t’ have been there! Digging into my deployment server I discovered that I had a server class with a blacklist, that is, it included all deployment clients except some that I had listed. The blacklist had the old hostname, which meant when I changed the indexer’s hostname it no longer matched the blacklist and thus was deployed a forwarder’s configuration, causing a forwarding loop. My indexer was forwarding back to the forwarder everything it was getting from the forwarder, causing Splunk to shut down port 9997 on the offending indexer completely.
After getting all that set up I noticed Splunk was only returning searches from the indexers whose hostnames I had not changed. Everything looked good in the distributed search arena – status was OK on all indexers; yet I still was not getting any results from the indexer whose name I had changed, even though it was receiving events! This was turning into a problem. It was creating a blind spot.
Connections great, search status great, deployment status good.. I didn’t know what else to do. I finally thought to reload Splunk on the search head that had been talking to the server whose name I changed. Success! Something in the search head must have made it blind to the indexer once its name had changed. Simply restarting Splunk on the search head fixed it.
In short, if you’re crazy enough to change the name of one of your indexers in a distributed Splunk environment, make sure you do the following:
- Change hostname on the OS
- Change ServerName in Splunk config files
- Add connection_host = none in inputs.conf (optional?)
- Clean up your deployment server
- Delete old hostname from clients phoning home
- MAKE SURE the new hostname won’t be sucked up into an unwanted server class
- Clean up your search head
- Delete old hostname search peer
- Add new hostname search peer
- Restart search head