I don’t really understand Microsoft’s rationale when it comes to log verbosity. I suppose too much information is better than not enough information, but that comes at the cost of making it difficult if you have to try and actually read the information.
I’ve been trying to extract usernames from Active Directory controller logs and it turned out to be quite a pain. Why do the logs have more than one field with the same name? It confuses Splunk and seems to fly in the face of common sense and decency.. I will stop ranting now.
In my specific case, AD lockout logs have two Account Name fields, one for the controller and one for the user being locked out. I am interested only in the username and not the AD controller account name. How do you tell Splunk to only include the second instance of Account Name?
The answer is to create a field extraction using negative lookahead (Thanks to this article which gave me the guidance I needed.) I had to tweak the regex to look for and exclude any matches ending in a dollar sign, as opposed to excluding dashes in the article’s example. My fine tuned regex statement is below:
Account Name:\s+(?!.+\$)(?P<FIELDNAME>\S+)
It looks for Account Name: followed by one or more spaces (there is excess spacing in the logs for some reason.) The real magic happens in the next bit – (?!.+\$)
- Parenthesis group the expression together
- ?! means negative lookahead – don’t include anything you find that matches the following regex
- .+ – one or more characters
- \$ – stop matching when you encounter a dollar sign
The second regex string is simply \S+ (one or more non-whitespace characters.)
Note this doesn’t satisfy all AD logs, just the ones I’m interested in (account lockouts – they all have a first Account Name ending in a dollar sign.)
The result of all this jargon and gnashing of teeth: clean Splunk logs revealing only what I want without excess information. Neat.
Update: I found an even better way to do this. The key is to use the regex modifier (?s) to include new lines. The better query is now this:
(?s)(<section name of the field you're interested in>:.+Account Name:\s+)(?P<real_group_name>[^\n]+)
A detailed explanation is located here.