Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Jul 2021 02:20:58 -0400
From:      Paul Procacci <pprocacci@gmail.com>
To:        serejk@febras.net
Cc:        KK CHN <kkchn.in@gmail.com>, freebsd-questions <freebsd-questions@freebsd.org>
Subject:   Re: Analyzing Log files of very large size
Message-ID:  <CAFbbPugNamorCpL1%2Bbkao06iWSUJkPS5V3KORs3SCUUChbBU5Q@mail.gmail.com>
In-Reply-To: <d0ebe655c44cd2b5a70bbac4dcdddcc3@febras.net>
References:  <CAKgGyB_TJrLWSjcnc9491Gg0Q5CLqLdmWx2yga_Ez7-gE6YcKQ@mail.gmail.com> <E9C00664-DAC7-4F58-BCCA-CDD2654C9325@febras.net> <CAKgGyB_reF4eqz4pvQj7tFsOQEEB3WrFZa-91L%2BNChm=85h0-A@mail.gmail.com> <d0ebe655c44cd2b5a70bbac4dcdddcc3@febras.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jul 12, 2021 at 1:44 AM Korolev Sergey <serejk@febras.net> wrote:

>
>
> I think, that proper tools usually highly depends on desired
> result, so my reasoning is quite general.
>
> People here advise to use
> Perl and also split one large file into managable pieces - all that is
> very good, I vote for that.
>
> But I don`t know Perl at all, so I usually
> get along with standard shell utilities: grep, tr, awk, sed, etc. I used
> to parse big maillogs with them successfully.
>

Most standard shell utilities can certainly get the job done if the file
sizes are
of a size that's manageable.  That is most likely the vast majority of
cases.  No
question about that.

There's certainly a point however when the sizes become so unmanageable
that their
completion will be on your 150th birthday.  ;)  An exaggeration undoubtedly.

There's obviously options for this, but you'll seldom find the answer in any
standard install of any userland.  Sometimes you can get away with xargs,
depending
on what the data is that you're working with, but that's all that comes to
mind.

The "promotion" from there in my mind is going the perl route (or any other
interpreted
language) capable of threading ... and from there as necessary ... C (or
other compiled
language).

Someone made mention of Elasticsearch and that's a good option too.  All
the work
of indexing the data has already been done for you.  You just don't have to
mind paying
for it.  ;)

Hell, I've used postgresql with their fulltext search for similar things as
well and I'd argue
if that's already in your stack, to at the very least try that first.
You'd be surprised at
how darn well it does.

Goodnight!

~Paul
-- 
__________________

:(){ :|:& };:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFbbPugNamorCpL1%2Bbkao06iWSUJkPS5V3KORs3SCUUChbBU5Q>