When saving logs I like to have as verbose data as possible to be stored. However when viewing a log I may only be looking at specific parts of that log. Another concern is if I need to give my logs to a third party and I don't want to reveal certain information to that 3rd party. I'll go over a couple of things that I use on a day to day basis. Note that entire books have been written about SED and AWK so my use of them is very limited compared to what could be done.
SED
The way I use SED is very similar to VI's search and replace tool. A good example of this is that my blog (http://www.mellowd.co.uk/ccie/) sits behind a reverse proxy. I have an IPTables rules that logs any blocked traffic. Now I'd like to share my deny logs, but I don't want you to see my actual server IP, I only want you to see my reverse proxy IP. If I just showed you my raw logs, you'd see my actual IP address. By grepping this log through sed, I can change it on the fly. The format to do so is sed s/<source pattern>/<destination pattern>/
I've used sed to change my IP to 8.8.8.8 and now can happily show the logs. Note this is done in real-time so piping tail through sed is possible:
Oct 19 14:43:11 mellowd kernel: [10084876.715244] IPTables Packet Dropped: IN=venet0 OUT= MAC= SRC=23.95.84.106 DST=8.8.8.8 LEN=118 TOS=0x00 PREC=0x00 TTL=53 ID=0 DF PROTO=UDP SPT=34299 DPT=1900 LEN=98
Oct 19 14:49:33 mellowd kernel: [10085258.251596] IPTables Packet Dropped: IN=venet0 OUT= MAC= SRC=201.168.76.131 DST=8.8.8.8 LEN=434 TOS=0x00 PREC=0x00 TTL=52 ID=0 DF PROTO=UDP SPT=5063 DPT=5060 LEN=414
Oct 19 14:52:53 mellowd kernel: [10085458.580901] IPTables Packet Dropped: IN=venet0 OUT= MAC= SRC=220.135.220.150 DST=8.8.8.8 LEN=40 TOS=0x00 PREC=0x00 TTL=104 ID=62597 PROTO=TCP SPT=6000 DPT=3128 WINDOW=16384 RES=0x00 SYN URGP=0
AWK
AWK is very handy to get only the information you require to show. In the above example there is a lot of information that I might now want to know about. Maybe I'm only interested in the date, time, and source IP address. I don't care about the rest. I can pipe the same tail command I used above through awk and get it to show me only the fields I care about. By default, awk uses the space as the field separation character and then each field is numbered sequentially. The format for this is awk '{ print <fields you want to see> }'
I'll now simply cat the syslog file, and use awk to show me what I want to see:
sudo cat /var/log/syslog | grep IPTables | awk '{ print $1" "$2" "$3"\t"$13 }'
Oct 19 14:27:47 SRC=92.50.157.14
Oct 19 14:29:04 SRC=23.250.11.219
Oct 19 14:37:06 SRC=114.32.207.183
Oct 19 14:40:32 SRC=117.21.176.77
Oct 19 14:40:32 SRC=117.21.176.77
Oct 19 14:41:36 SRC=220.135.220.11
Oct 19 14:43:11 SRC=23.95.84.106
Oct 19 14:49:33 SRC=201.168.76.131
Oct 19 14:52:53 SRC=220.135.220.150
Oct 19 14:54:50 SRC=162.212.181.242
Oct 19 14:58:04 SRC=122.225.97.92
Oct 19 15:01:49 SRC=192.3.207.210
Oct 19 15:05:38 SRC=222.186.21.99
Oct 19 15:06:48 SRC=221.206.226.92
Oct 19 15:07:35 SRC=162.212.181.242
Oct 19 15:13:42 SRC=118.161.75.85
I've included spaces and a tab character between the fields to ensure I get the output looking as I want it. If you count the original log you'll see that field 1 = Oct, field 2 = 19, field 3 = the time, and field 13 = SRC=IP
I may not want to see SRC= in the output, so use sed to replace it with nothing:
sudo cat /var/log/syslog | grep IPTables | awk '{ print $1" "$2" "$3"\t"$13 }' | sed s/SRC=//
Oct 19 14:27:47 92.50.157.14
Oct 19 14:29:04 23.250.11.219
Oct 19 14:37:06 114.32.207.183
Oct 19 14:40:32 117.21.176.77
Oct 19 14:40:32 117.21.176.77
Oct 19 14:41:36 220.135.220.11
Oct 19 14:43:11 23.95.84.106
Oct 19 14:49:33 201.168.76.131
Oct 19 14:52:53 220.135.220.150
Oct 19 14:54:50 162.212.181.242
Oct 19 14:58:04 122.225.97.92
Oct 19 15:01:49 192.3.207.210
Oct 19 15:05:38 222.186.21.99
Oct 19 15:06:48 221.206.226.92
Oct 19 15:07:35 162.212.181.242
Oct 19 15:13:42 118.161.75.85
I'm eager to see any other handy sed/awk/grep commands you use in a similar scope to the ones I use above.