Date: Mon, 12 Jul 2021 00:06:04 -0400 From: Paul Procacci <pprocacci@gmail.com> To: John Levine <johnl@iecc.com> Cc: FreeBSD Questions <freebsd-questions@freebsd.org>, dpchrist@holgerdanske.com Subject: Re: Analyzing Log files of very large size Message-ID: <CAFbbPujtM-yzk0GbKLaKr7=OrCA3rdBzQ6T%2BB8KaB8wSK0Xz2w@mail.gmail.com> In-Reply-To: <20210711201136.B3271205F2CA@ary.qy> References: <e797b547-4084-351d-08a9-31784b10fecd@holgerdanske.com> <20210711201136.B3271205F2CA@ary.qy>
next in thread | previous in thread | raw e-mail | index | archive | help
This advice is sound. I'd personally do the same leaning on either awk or perl mysql. It just depends naturally on what you're after in the long run. >> I am in a requirement to analyze large log files of sonic wall firewall > >> around 50 GB. for a suspect attack. ... > > >But if this project is for an employer or client, I would recommend > >starting with the commercial-off-the-shelf (COTS) log analysis tool made > >by the hardware vendor. Train up on it. Buy a support contract: > > > > > https://www.sonicwall.com/wp-content/uploads/2019/01/sonicwall-analyzer.pdf > > This is reasonable advice if you plan to be doing these analyses on a > regular > basis, but it's overkill if you only expect to do it once. > > I have found that some of the text processing utilities that come with BSD > are a lot faster than others. The regex matching in perl is a lot faster > than python, sometimes by an order of magnitude. My took of choice is > mawk, > an implementation of the funky but very useful awk language that is > amazingly > fast. grep is OK, sed is too slow for anything other than tiny jobs. > > I'd suggest first dividing up the logs into manageable chunks, perhaps > using > split or csplit, or it would be a good first project in mawk, using > patterns > to divide the files into chunks that represent an hour or a day. > > Then you can start looking for interesting patterns, perhaps with grep if > they > are simple enough, or more likely with some short mawk scripts. > > R's, > John > > This advice is sound. I'd personally do the same leaning on either awk or perl myself. Another note, I've done something similar before where awk/perl simply weren't enough for 50+ TB of logs that were being consumed daily so I had to roll my own using C/qp-tries[1]. Again, if not only your volume is high but your frequency of processing this data is often, you'd consider a more custom solution should not one already exist: Note: A latest poster mentioned AVL trees as well. That's fine too. I just prefer qp-tries. [1] https://dotat.at/prog/qp/README.html -- __________________ :(){ :|:& };:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFbbPujtM-yzk0GbKLaKr7=OrCA3rdBzQ6T%2BB8KaB8wSK0Xz2w>
