Date: 11 Jul 2021 16:11:35 -0400 From: "John Levine" <johnl@iecc.com> To: freebsd-questions@freebsd.org Cc: dpchrist@holgerdanske.com Subject: Re: Analyzing Log files of very large size Message-ID: <20210711201136.B3271205F2CA@ary.qy> In-Reply-To: <e797b547-4084-351d-08a9-31784b10fecd@holgerdanske.com>
next in thread | previous in thread | raw e-mail | index | archive | help
It appears that David Christensen <dpchrist@holgerdanske.com> said: >On 7/11/21 5:13 AM, KK CHN wrote: >> I am in a requirement to analyze large log files of sonic wall firewall >> around 50 GB. for a suspect attack. ... >But if this project is for an employer or client, I would recommend >starting with the commercial-off-the-shelf (COTS) log analysis tool made >by the hardware vendor. Train up on it. Buy a support contract: > >https://www.sonicwall.com/wp-content/uploads/2019/01/sonicwall-analyzer.pdf This is reasonable advice if you plan to be doing these analyses on a regular basis, but it's overkill if you only expect to do it once. I have found that some of the text processing utilities that come with BSD are a lot faster than others. The regex matching in perl is a lot faster than python, sometimes by an order of magnitude. My took of choice is mawk, an implementation of the funky but very useful awk language that is amazingly fast. grep is OK, sed is too slow for anything other than tiny jobs. I'd suggest first dividing up the logs into manageable chunks, perhaps using split or csplit, or it would be a good first project in mawk, using patterns to divide the files into chunks that represent an hour or a day. Then you can start looking for interesting patterns, perhaps with grep if they are simple enough, or more likely with some short mawk scripts. R's, John
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20210711201136.B3271205F2CA>