Tag Archives: regex

Determine what a Splunk forwarder is forwarding

I recently came across a need to determine exactly what is logging to a forwarder in Splunk. I had a hard time finding out what to search for so I thought I’d share what I found.

The key to discovering where data is coming in from is in Splunk’s metrics.log files. Searching these files gives us what we need. This search reveals anything coming in over UDP (you can also change it to be TCP if desired) and totals it by host (forwarder.)

source=*metrics.log group=udpin_connections | stats count by host

Oddly enough Splunk doesn’t have a field extracted for its own metrics.log. A key useful field is missing – sourceHost. I used the field extraction tool to create it and it generated this field extraction:

^[^,\n]*,\s+(?P<sourceHost>\d+\.\d+\.\d+\.\d+)

Field extraction in hand, I was able to generate the report I was looking for: devices actively sending logs to my forwarder over UDP.

source=*metrics.log group=udpin_connections  host=splunk | stats count by sourceHost sourcePort

where host is the forwarder you want to investigate. Useful.

Extract multiple Active Directory fields in Splunk

I had posted here about how to extract account names with a specific modifier (exclude account names ending in a dollar sign.) That worked for one specific instance, but I found I needed something better. Active Directory logs have multiples of the same value (Account_Name, Group_Name, etc.) that all depend on context, namely the value of the line two lines above it.

For example,

Message=A member was added to a security-enabled universal group.

Subject:
 Security ID: <Random long SID>
 Account Name: Administrator
 Account Domain: ExampleDomain
 Logon ID: <random hex value>

Member:
 Security ID: <Another random long SID>
 Account Name: CN=George Clooney,OU=ExampleDomain,OU=Hollywood,OU=California,DC=USA,DC=NA,DC=Terra

Group:
 Security ID: <Yet another long SID>
 Account Name: Old Actors
 Account Domain: ExampleDomain

You can see that there are three different Security ID fields, three different Account Name fields, and two different Account Domain fields. The key is the context: Subject account name, member account name, or group account name.

I wrestled for some time to find a regex expression for Splunk that would continue matching things after a line has ended. After much searching I came across this post which explained the need for a regex modifier to do what I wanted.

In my case I needed to use the (?s) modifier to include newline characters in my extraction. My new and improved AD regex extraction is as follows:

(?s)(Group:.+Account Name:\s+)(?P<real_group_name>[^\n]+)
  • (?s)  Regex modifier indicating to include new lines
  • Group:  Section I am interested in. You can replace this with Member: if you’re interested in member account names instead
  • .+ match one or more of any character (including new line as indicated by modifier above)
  • Account Name:\s+ This is in conjuction with the previous two items to create a match that includes the section name and anything after that until the spaces after Account Name
  • [^\n]+ Match one or more characters that is not a new line (since you might have an account name with spaces.)

Finally! This is the regex I’ve been looking for.

 

Extract Active Directory Account Names in Splunk

I don’t really understand Microsoft’s rationale when it comes to log verbosity. I suppose too much information is better than not enough information, but that comes at the cost of making it difficult if you have to try and actually read the information.

I’ve been trying to extract usernames from Active Directory controller logs and it turned out to be quite a pain. Why do the logs have more than one field with the same name? It confuses Splunk and seems to fly in the face of common sense and decency.. I will stop ranting now.

In my specific case, AD lockout logs have two Account Name fields, one for the controller and one for the user being locked out. I am interested only in the username and not the AD controller account name.  How do you tell Splunk to only include the second instance of Account Name?

The answer is to create a field extraction using negative lookahead (Thanks to this article which gave me the guidance I needed.) I had to tweak the regex to look for and exclude any matches ending in a dollar sign, as opposed to excluding dashes in the article’s example. My fine tuned regex statement is below:

Account Name:\s+(?!.+\$)(?P<FIELDNAME>\S+)

It looks for Account Name: followed by one or more spaces (there is excess spacing in the logs for some reason.) The real magic happens in the next bit – (?!.+\$)

  • Parenthesis group the expression together
  • ?! means negative lookahead – don’t include anything you find that matches the following regex
  • .+ – one or more characters
  • \$ – stop matching when you encounter a dollar sign

The second regex string is simply \S+ (one or more non-whitespace characters.)

Note this doesn’t satisfy all AD logs, just the ones I’m interested in (account lockouts – they all have a first Account Name ending in a dollar sign.)

The result of all this jargon and gnashing of teeth: clean Splunk logs revealing only what I want without excess information. Neat.


 

Update: I found an even better way to do this. The key is to use the regex modifier (?s) to include new lines. The better query is now this:

(?s)(<section name of the field you're interested in>:.+Account Name:\s+)(?P<real_group_name>[^\n]+)

A detailed explanation is located here.

Splunk regex tips

I’ve spent some time playing around in Splunk trying to refine my dashboards and searches. Here is what I’ve learned (or re-learned) about Splunk and using regular expressions in your searches.

Field extraction syntax

The general formula for using regex to create field extractions is as follows:

(?i)Initial regex match(?P<FIELDNAME>Regex dictating how much to match after initial match)

Example:
(?i)Last Matched Message: (?P<message>(?:[^”]+))

This field extraction searched my logfiles for the string “Last Matched Message: ” It then kept matching every character until it reached a double quote ” and named the extraction “message”. I can now do a “| stats count by message” query in Splunk to cleanly see the values of “Last Matched Message” in my firewall logs.

Regex lookahead

The above regex string utilized a positive lookahead. The syntax for Splunk includes a question mark as expected, but also a colon for some reason (as opposed to an equal sign.) I haven’t looked into why.  Just put  (?:)  in front of your criteria (see above)

Eval

The eval parameter is handy if you want to take information in Splunk and make decisions on it, then display the results in its place. I use it to translate 6 to TCP and 17 to UDP in my firewall logs.

Example:
eval proto = case(proto=”6″,”TCP”,proto=”17″,”UDP”)

Regex: sed

You can use stream editor in Splunk just like you would in Linux. This allows you to modify the output of Splunk results, making them much more useful. The syntax is:
| rex field=fieldname mode=sed “sed syntax

Example:
| rex field=owncloud_file mode=sed “s/\&files\=/\//g”

In this example I take a field I had created (owncloud_file) and then instruct sed to search “s” then look for the string “&files” (with a proper escape character for the &), then replace that string with an equal sign. The g deletes the match so future matches can be made.

The field extraction I have for owncloud looks specifically for the output from the Files app so I can see which files have been downloaded:
(?i)\?dir=(?P<owncloud_file>(?:[^ \”]+))
The regex looks for ?dir= and then matches anything that’s not a double quote.

URLdecode

I made a few sed regex extractions to clean up URLs (replacing %20 for space, etc) when I realized there’s a much easier way to do this: the urldecode function. Simply append the following to your search:

| eval fieldname = urldecode(fieldname)

In my case all I had to do was append
| eval owncloud_file = urldecode(owncloud_file) and voila! all my results look nice and human readable. Magic.

Phew!

I think I’ll stop for now.