Date: Mon, 12 Jul 2021 02:20:58 -0400 From: Paul Procacci <pprocacci@gmail.com> To: serejk@febras.net Cc: KK CHN <kkchn.in@gmail.com>, freebsd-questions <freebsd-questions@freebsd.org> Subject: Re: Analyzing Log files of very large size Message-ID: <CAFbbPugNamorCpL1%2Bbkao06iWSUJkPS5V3KORs3SCUUChbBU5Q@mail.gmail.com> In-Reply-To: <d0ebe655c44cd2b5a70bbac4dcdddcc3@febras.net> References: <CAKgGyB_TJrLWSjcnc9491Gg0Q5CLqLdmWx2yga_Ez7-gE6YcKQ@mail.gmail.com> <E9C00664-DAC7-4F58-BCCA-CDD2654C9325@febras.net> <CAKgGyB_reF4eqz4pvQj7tFsOQEEB3WrFZa-91L%2BNChm=85h0-A@mail.gmail.com> <d0ebe655c44cd2b5a70bbac4dcdddcc3@febras.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jul 12, 2021 at 1:44 AM Korolev Sergey <serejk@febras.net> wrote: > > > I think, that proper tools usually highly depends on desired > result, so my reasoning is quite general. > > People here advise to use > Perl and also split one large file into managable pieces - all that is > very good, I vote for that. > > But I don`t know Perl at all, so I usually > get along with standard shell utilities: grep, tr, awk, sed, etc. I used > to parse big maillogs with them successfully. > Most standard shell utilities can certainly get the job done if the file sizes are of a size that's manageable. That is most likely the vast majority of cases. No question about that. There's certainly a point however when the sizes become so unmanageable that their completion will be on your 150th birthday. ;) An exaggeration undoubtedly. There's obviously options for this, but you'll seldom find the answer in any standard install of any userland. Sometimes you can get away with xargs, depending on what the data is that you're working with, but that's all that comes to mind. The "promotion" from there in my mind is going the perl route (or any other interpreted language) capable of threading ... and from there as necessary ... C (or other compiled language). Someone made mention of Elasticsearch and that's a good option too. All the work of indexing the data has already been done for you. You just don't have to mind paying for it. ;) Hell, I've used postgresql with their fulltext search for similar things as well and I'd argue if that's already in your stack, to at the very least try that first. You'd be surprised at how darn well it does. Goodnight! ~Paul -- __________________ :(){ :|:& };:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFbbPugNamorCpL1%2Bbkao06iWSUJkPS5V3KORs3SCUUChbBU5Q>