I’ve spent some time playing around in Splunk trying to refine my dashboards and searches. Here is what I’ve learned (or re-learned) about Splunk and using regular expressions in your searches.

Field extraction syntax

The general formula for using regex to create field extractions is as follows:

(?i)Initial regex match(?P<FIELDNAME>Regex dictating how much to match after initial match)

Example:
(?i)Last Matched Message: (?P<message>(?:[^”]+))

This field extraction searched my logfiles for the string “Last Matched Message: ” It then kept matching every character until it reached a double quote ” and named the extraction “message”. I can now do a “| stats count by message” query in Splunk to cleanly see the values of “Last Matched Message” in my firewall logs.

Regex lookahead

The above regex string utilized a positive lookahead. The syntax for Splunk includes a question mark as expected, but also a colon for some reason (as opposed to an equal sign.) I haven’t looked into why. Just put (?:) in front of your criteria (see above)

Eval

The eval parameter is handy if you want to take information in Splunk and make decisions on it, then display the results in its place. I use it to translate 6 to TCP and 17 to UDP in my firewall logs.

Example:
eval proto = case(proto=”6″,”TCP”,proto=”17″,”UDP”)

Regex: sed

You can use stream editor in Splunk just like you would in Linux. This allows you to modify the output of Splunk results, making them much more useful. The syntax is:
| rex field=fieldname mode=sed “sed syntax”

Example:
| rex field=owncloud_file mode=sed “s/\&files\=/\//g”

In this example I take a field I had created (owncloud_file) and then instruct sed to search “s” then look for the string “&files” (with a proper escape character for the &), then replace that string with an equal sign. The g deletes the match so future matches can be made.

The field extraction I have for owncloud looks specifically for the output from the Files app so I can see which files have been downloaded:
(?i)\?dir=(?P<owncloud_file>(?:[^ \”]+))
The regex looks for ?dir= and then matches anything that’s not a double quote.

URLdecode

I made a few sed regex extractions to clean up URLs (replacing %20 for space, etc) when I realized there’s a much easier way to do this: the urldecode function. Simply append the following to your search:

| eval fieldname = urldecode(fieldname)

In my case all I had to do was append
| eval owncloud_file = urldecode(owncloud_file) and voila! all my results look nice and human readable. Magic.

Phew!

I think I’ll stop for now.

Technicus

Splunk regex tips