I had posted here about how to extract account names with a specific modifier (exclude account names ending in a dollar sign.) That worked for one specific instance, but I found I needed something better. Active Directory logs have multiples of the same value (Account_Name, Group_Name, etc.) that all depend on context, namely the value of the line two lines above it.
Message=A member was added to a security-enabled universal group. Subject: Security ID: <Random long SID> Account Name: Administrator Account Domain: ExampleDomain Logon ID: <random hex value> Member: Security ID: <Another random long SID> Account Name: CN=George Clooney,OU=ExampleDomain,OU=Hollywood,OU=California,DC=USA,DC=NA,DC=Terra Group: Security ID: <Yet another long SID> Account Name: Old Actors Account Domain: ExampleDomain
You can see that there are three different Security ID fields, three different Account Name fields, and two different Account Domain fields. The key is the context: Subject account name, member account name, or group account name.
I wrestled for some time to find a regex expression for Splunk that would continue matching things after a line has ended. After much searching I came across this post which explained the need for a regex modifier to do what I wanted.
In my case I needed to use the (?s) modifier to include newline characters in my extraction. My new and improved AD regex extraction is as follows:
- (?s) Regex modifier indicating to include new lines
- Group: Section I am interested in. You can replace this with Member: if you’re interested in member account names instead
- .+ match one or more of any character (including new line as indicated by modifier above)
- Account Name:\s+ This is in conjuction with the previous two items to create a match that includes the section name and anything after that until the spaces after Account Name
- [^\n]+ Match one or more characters that is not a new line (since you might have an account name with spaces.)
Finally! This is the regex I’ve been looking for.