.Dd February 23, 2007 .Dt GROK 1 .Os .Sh NAME .Nm grok .Nd An expert system for real-time log analysis .Sh SYNOPSIS .Nm .Op Fl b .Op Fl d Ar 1, 2, ... .Op Fl f Ar configfile .Nm .Fl m Ar match .Fl r Ar reaction .Sh DESCRIPTION The .Nm utility watches files (or STDOUT) as specified in the configuration file for patterns. Upon recognition of a pattern a pre-configured action is taken. .Pp The options are as follows: .Bl -tag -width indent .It Fl b Run in the background. Detaches .Nm and redirects STDOUT, STDIN, and STDERR to .Pa /dev/null .It Fl d Ar [1, 2, ...] The value (default is 1 if no value is given) specifies the debug level. Only messages with a debug value less than or equal to the specified value will be printed. A debug value of 2 or higher will output information about the patterns, file, and exec definitions. .It Fl f Ar file Use .Ar file as a configuration file. If this is not specified .Nm will look in the current working directory for .Pa grok.conf and use that. .It Fl m Ar match Specify a match string. See COMMANDLINE below. .It Fl r Ar reaction Specify a reaction string. See COMMANDLINE below. .El .Sh CONFIGURATION The .Pa grok.conf file follows this syntax: .Bd -literal -offset "indent" patterns { }; filters { }; file "filename" { }; exec "command" { }; filelist "file1,file2,..." { }; catlist "file1,file2,..." { }; filecmd "command" { filespec }; .Ed .Pp .ft B patternspec .Pp .Bd -filled -offset "indent" .ft P Pattern specifiers determine what to match and capture. These patterns can be referred to as %PATTERNNAME% elsewhere in the configuration file. Patterns can have other patterns nested within them. Patterns have have the following syntax: .Pp PATTERNNAME = "regexp"; .Pp Pre-existing named patterns exist for those who do not know regular expressions. The patterns are referenced by using %PATTERN%. .Nm comes with a good set of standard patterns. See DEFAULT PATTERNS for a list of them. .Pp It is sometimes desirable to use multiple pre-existing named patterns in a single match line. This is accomplished by using subnames. The syntax for this is %NAME:SUBNAME%. This allows you to distinguish from multiple %NAME% values on a single match line. See the EXAMPLES section for a better idea of how this works. .Ed .Pp .ft B filterspec .ft P .Bd -filled -offset "indent" Filter specifiers are used in a reaction statement, and are useful for modifying the output of a variable. They have the following syntax: .Pp /FILTERNAME/ = { perl block }; .Pp The slash notation is important and can often be thought of as a search and replace, as that is often the purpose of a filter. The syntax for using a filter in a reaction statement is %NAME|FILTER%. .Ed .Pp .ft B filespec .ft P .Bd -filled -offset "indent" File specifiers have the following syntax: .Pp .Bd -literal type "description" { match = "match pattern"; key = "some string"; threshold = "integer"; interval = "integer"; reaction = "reaction command(s)"; reaction = { reaction perl block }; }; .Ed .Pp .Bl -column ".Sy Description" .It Em Description Ta A string used to identify the definition. This has no significance other than debugging and clarification for the user. .It Em Match Ta The string pattern to match. Multiple match entries per type definition and named patterns are allowed. .It Em Key Ta The key to use in the hashtable when counting occourances of a specific event. The default key is a concatenation of all captured patterns into a string. As this may not be desirable the key value can be specified. Pattern names are valid and evaluated at runtime. .It Em Threshold Ta Specify the number of matches by it's key before the reaction(s) take place. Once the threshold is reached the reaction(s) occour and the occurance count is set to zero. The default threshold is 1 (each occurance causes a reaction). .It Em Interval Ta The time, in seconds, between occurance counter reset. If the threshold is not reached by the specified interval time then the occourance count is is reset and no reaction is triggered. .It Em Reaction Ta Any valid (string of) command(s) or block of perl code. If using a perl block .Nm provides helper variables. These are $v (a hashref containing all named patterns matched, subnames are valid here) and $d (a hashref to use for your own storage, keyed with your key value). .El Three special variables are provided for use in the reaction. These are: .Bl -column ".Sy %=MATCH%" .It Em %=MATCH% Ta Contains the substring matched by the .Ql match properties. .It Em %=LINE% Ta Contains the whole line matched. .It Em %=FILE% Ta Contains the path to the current file .El .Ed .Pp .ft B filelist, filecat, and filecmd .ft P .Bd -filled -offset "indent" filelist is a simple way to specify multiple files. Format is a a comma-separated list of files. Glob patterns are supported. Files in the file list are handled the same way as if you had individually written 'file "foo"' per file. That is, they are watched with 'tail -0f'. .Pp filecat has the same syntax as filelist, but instead of those files watched by tail, they are simply catted. .Pp filecmd allows a command which returns a list of files. The returned list should be newline delimited (cp the output of find, ls). Under the hood, this essentially becomes a dynamically-generated filelist entry As with filelist, so the listed output can contain globs as filelist can. .Ed .Pp .Sh BUILTIN PATTERNS There are lots of builtin patterns at your disposal: .Bl -column ".Sy ABCDEFGIJKL" .It Sy "Name" Ta Sy "Regular Expression" .It Em USERNAME Ta Match a username: /[a-zA-Z0-9_-]+/ .It Em USER Ta Alias of USERNAME .It Em NUMBER Ta Any real number (including scientific notation). .It Em INT Ta Any integer .It Em DATA Ta Non-greedy wildcard. .It Em GREEDYDATA Ta Greedy wildcard. .It Em QUOTEDSTRING Ta Quoted string. (double or single) .It Em QS Ta Alias of QUOTEDSTRING .It Em MAC Ta MAC Address .It Em IP Ta IP Address .It Em HOSTNAME Ta Any RFC 1035 compliant hostname. .It Em HOST Ta Alias of HOSTNAME .It Em IPORHOST Ta IP or HOSTNAME .It Em MONTH Ta Any representation of a month: Jan, January, 01, 1 .It Em MONTHDAY Ta 01-31 and 1-31 .It Em DAY Ta Any form of english weekday: Mon, Monday, 1 .It Em YEAR Ta Alias of INT .It Em TIME Ta /\e\\&d{2}:\e\\&d{2}:\e\\&d{2}/ .It Em HTTPDATE Ta %MONTHDAY%/%MONTH%/%YEAR%:%TIME% %INT:ZONE% .It Em PROG Ta Alias of WORD .It Em PID Ta Alias of INT .It Em SYSLOGDATE Ta MONTH% %MONTHDAY% %TIME% .It Em SYSLOGPROG Ta %PROG%(\\&[%PID%\])? .It Em SYSLOGBASE Ta %SYSLOGDATE% %HOSTNAME% %SYSLOGPROG%: .It Em APACHELOG Ta %IPORHOST% %USER:IDENT% %USER:AUTH% \\&[%HTTPDATE%\] %QS:URL% %NUMBER:RESPONSE% %NUMBER:BYTES% %QS:REFERRER% %QS:AGENT%" .El .Pp .Sh BUILTIN FILTERS There are pre-defined filters available within a reaction statement. The pre-defined filters and their descriptions are: .Bl -column ".Sy stripquotes" .It Em shnq Ta Used to escape (){}[]$*?!|'"` characters. .It Em shdq Ta Used to escape $"` characters. .It Em e[XYZ] Ta Used to escape arbitrary (XYZ) characters. .It Em stripquotes Ta Strip leading and trailing quote characters ("`'). .It Em parsedate Ta Convert a string containing a date to machine time. .It Em strftime(fmt) Ta Convert unix epoch to arbitrary time format using specified time format. The time format is the same as .Xr strftime 3 . Except you must use &X instead of %X for conversion specifications. Such as strftime("&H:&M:&S"). .It Em uid2user Ta UID to username. .It Em urlescape Ta Escape a string so it can be easily used in a URL. .It Em ip2host Ta IP to hostname. .It Em httpfilter Ta Convert apache 'GET /foo/bar HTTP/1.0' to '/foo/bar' (without quotes) .El .Pp .Sh COMMANDLINE grok is ready to do parsing even without a fancy config file. If all you need to do is parse one match and have one reaction, then you can invoke grok from the commandline like this: .Bd -literal -offset "indent" grok -m MATCH -r REACTION .Ed .Pp grok will create an in-memory config file that looks like this: .Bd -literal -offset "indent" exec "cat" { type "all" { match = "MATCH"; reaction = { print meta2string("REACTION", $v); }; }; }; .Ed .Pp What does that mean? This means that you get to specify one match and one reaction. The reaction is just a string that will be expanded and printed when the match matches. See EXAMPLES for more information. .Pp .Sh EXAMPLES .Bl -enum .It The following example defines %TTY%, watches .Pa /var/log/messages for failed su(1) attempts and prints a message to STDOUT (notice the use of sub-named pattern captures). .Bd -literal -offset "indent" patterns { TTY = "/dev/tty[qp][a-z0-9]"; }; file "/var/log/messages" { type "failed su(1) attempt" { match = "BAD SU %USER:FROM% to %USER:TO% on %TTY%"; reaction = "echo 'Failed su(1): %USER:FROM% -> %USER:TO% (%TTY%)'"; }; }; .Ed .It For the following example we are watching apache logfiles and replacing the quored URL string with just the URL in the reaction statement. The format of the file is: .Pp 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)" .Pp .Bd -literal -offset "indent" filters { /httpfilter/ = { s/^\\\+ (\\S+) \\S+$/$1/; }; }; exec "cat /var/log/http.access.log" { type "http" { match = "%APACHELOG%"; reaction = "echo '%IP%: %QUOTEDSTRING:URL|e[']|stripquotes|httpfilter%'"; }; }; .Ed .It Below is a rule for watching failed SSH login attempts and blocking them using PF. Notice the multiple type entries for a single file. .Bd -literal -offset "indent" file "/var/log/auth.log" { type "ssh-illegal-user" { match = "Illegal user %USERNAME% from %IP%"; threshold = 10; # 10 hits ... key = "%IP%"; # from a single ip ... interval = 600; # in 10 minutes reaction = "pfctl -t naughty -T add %IP%"; }; type "ssh-scan-possible" { match = "Did not receive identification string from %IP%"; threshold = 3; interval = 60; reaction = "pfctl -t naughty -T add %IP%"; }; }; .Ed .It The following is an example of watching tcpdump output for SYN packets destined to port 22 and printing a message. The second type statement is useful for watching portscans. .Bd -literal -offset "indent" exec "tcpdump -li em0 -n 2< /dev/null" { type "ssh-connect" { match = "%IP:SRC%.\ed+ < %IP:DST%.22: S"; reaction = "echo 'SSH connect(): %IP:SRC% -< %IP:DST%'"; }; type "port-scan" { match = "%IP:SRC%.%PORT% < %IP:DST%.%PORT:DST%: S"; key = "%IP:SRC%"; threshold = 30; interval = 5; reaction = "echo 'Port scan from %IP:SRC%'"; }; }; .Ed .It The following example illustrates the optional filters available when evaluating a variable in a reaction statement. Assume that .Pa /etc/passwd contains the following line: .Pp test:*:1002:1002:T"est?:/home/test:/bin/sh .Bd -literal -offset "indent" exec "cat /etc/passwd" { type "passwd" { match = "^test"; reaction = "echo 'Found: %=LINE|shdq%'"; }; }; .Ed .Pp The output of this is: .Bd -literal -offset "indent" Found: test:*:1002:1002:T\\"est?:/home/test:/bin/sh .Ed .Pp Using the same line in .Pa /etc/passwd but changing the example to look like: .Pp .Bd -literal -offset "indent" exec "cat /etc/passwd" { type "passwd" { match = "^test"; reaction = "echo 'Found: %=LINE|shnq%'"; }; }; .Ed .Pp results in: .Pp .Bd -literal -offset "indent" Found: test:\\*:1002:1002:T\\"est\\?:/home/test:/bin/sh .Ed .It The following example illustrates how to use filelist: .Pp .Bd -literal -offset "indent" filelist "/var/log/auth.log,/var/log/secure,/var/log/messages" { ... } .Ed .It The following example illustrates filelist with a glob: .Bd -literal -offset "indent" filelist '/var/log/*.log,/var/log/messages' { ... } .Ed .It The following example shows how to use -m and -r. Let's find out what programs are logging to /var/log/messages: .Bd -literal -offset "indent" % grok -m "%SYSLOGBASE%" -r "%PROG%" < /var/log/messages | sort | uniq kernel named newsyslog sshd .Ed .It The following (somewhat silly) example shows how to use -m and -r. Let's compare the forward and reverse dns entries for www.freebsd.org: .Bd -literal -offset "indent" % host -t A www.freebsd.org | grok -m "%IP%" -r "%IP|ip2host%" www.freebsd.org .Ed .Pp In the above example, we do a name resolution on www.freebsd.org and grok for an IP, then filter the IP through a dns lookup and print the output. .It The following (advanced) example shows how to use -m and -r. Let's break a apart a log into files by IP. Note that the match below matches the IP anywhere in the line. .Bd -literal -offset "indent" % cat /var/log/messages \e\ | perl grok -m '%IP%' -r 'echo "%=LINE|shdq%" >> /tmp/log.%IP%' \e\ | sh % ls /tmp/log.* /tmp/log.192.168.0.254 /tmp/log.192.168.10.177 /tmp/log.192.168.10.175 /tmp/log.192.168.10.189 .Ed .El .Sh FILES .Pa /usr/local/etc/grok.conf .Pp .Sh AUTHOR .An -nosplit .An "Jordan Sissel" .Aq jls@semicomplete.com wrote and maintains .Nm . .An "Wesley Shields" .Aq wxs@csh.rit.edu wrote the manual page. .Sh CONTRIBUTORS .An "Canaan Silberberg" contributed patches supporting filelist and filecmd. .Sh BUGS There are no known bugs at this time. Bugs can be reported to .Aq jls@semicomplete.com .