From owner-freebsd-questions@freebsd.org Mon Jul 12 06:21:13 2021 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 0F42C651B85 for ; Mon, 12 Jul 2021 06:21:13 +0000 (UTC) (envelope-from pprocacci@gmail.com) Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GNYYh2kGHz3j5r for ; Mon, 12 Jul 2021 06:21:11 +0000 (UTC) (envelope-from pprocacci@gmail.com) Received: by mail-pl1-x632.google.com with SMTP id p17so8089143plf.12 for ; Sun, 11 Jul 2021 23:21:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=CNyVwZbSVYZbXscqIue5fzvt9kQB8pfl02YuwBTQOAo=; b=WLxt3ax5kid/rgoCaD/i8Q7xWjMkSF+5+e2A+Tbx+4bAMY+m8MDwIHaQKk4rruw5od QSuJMnCWbyajnm+smn7DwrFb/mczQwN0Ch2cbV/vSKOXXBPmR3jfrb/4G0JBSId+ILQu w/vXZ7H83U7KEXmrwrPDNG3vZMpEQ7Tq4uHUFuaMPCypp2Cipc7lxtdXJ5aXaFp6LnHW 2sJMC6k1szHGXaGxAkkteZRYBdRA4dHepQjqpnZ+xMCda/G++wNhOmyzOzZHeGK6q423 8az1DHD8yBUbxDXhI35x9tL83oIFELXn2Ah8NQ0yoTgYiq/uGnssaXatW+bh8dlvJlHo B4rQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=CNyVwZbSVYZbXscqIue5fzvt9kQB8pfl02YuwBTQOAo=; b=KTXm8Jac2e68WzZXVbmFGshn9aZM37AMEI+rJaZOJ+CeKZrZXuq5NgQcbRkCsNQyV3 RRWSlt1JNV9Pt4qzv1mDqo41KILh3D5MJbSaJL8+OAkKiWfcz8anPhheDI9IyPhB86zT e8dRIqvrcKVZBTbQji41CaMD+YIOfD+IYmQ6OPBCnZwSkOBA7DIn35n+5m11mfiLp55p kXgOsAVDzWFtJQqrG6ivyh6GGUPQTML6vpizRyh8+FiQiF5srTR29uohKToJ3Xac17f7 8hr98r7EIhODgX64q8J/f+Y8PxVX/uxFCb5E0uJkC7uxL+8Z0F731rR8TXKOE/tNrtwh idAQ== X-Gm-Message-State: AOAM532EwCBFhEsY6uEYiAdVNg1kZMTfjvCRgIZzd2/s6JhUk5WKeksi nnDZx7HGrXRQ8ki9sWBG3IffYF75iH4QJFT7lk0a6IGi1nBZ X-Google-Smtp-Source: ABdhPJyi2RJxNTTuj2McYGsY90KkysXcZbcYwKBRClk3dX0UHxQFVnh1ms+wYVJLclPBWyf5vd8uvWr3wgQvtvk7vQ0= X-Received: by 2002:a17:902:b497:b029:129:ade4:a45 with SMTP id y23-20020a170902b497b0290129ade40a45mr27244749plr.69.1626070870491; Sun, 11 Jul 2021 23:21:10 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Paul Procacci Date: Mon, 12 Jul 2021 02:20:58 -0400 Message-ID: Subject: Re: Analyzing Log files of very large size To: serejk@febras.net Cc: KK CHN , freebsd-questions X-Rspamd-Queue-Id: 4GNYYh2kGHz3j5r X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=WLxt3ax5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of pprocacci@gmail.com designates 2607:f8b0:4864:20::632 as permitted sender) smtp.mailfrom=pprocacci@gmail.com X-Spamd-Result: default: False [-2.03 / 15.00]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RBL_DBL_DONT_QUERY_IPS(0.00)[2607:f8b0:4864:20::632:from]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_SPAM_SHORT(0.97)[0.965]; NEURAL_HAM_LONG(-1.00)[-1.000]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; SPAMHAUS_ZRD(0.00)[2607:f8b0:4864:20::632:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::632:from]; FREEMAIL_CC(0.00)[gmail.com,freebsd.org]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-questions]; RCVD_COUNT_TWO(0.00)[2] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.34 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jul 2021 06:21:13 -0000 On Mon, Jul 12, 2021 at 1:44 AM Korolev Sergey wrote: > > > I think, that proper tools usually highly depends on desired > result, so my reasoning is quite general. > > People here advise to use > Perl and also split one large file into managable pieces - all that is > very good, I vote for that. > > But I don`t know Perl at all, so I usually > get along with standard shell utilities: grep, tr, awk, sed, etc. I used > to parse big maillogs with them successfully. > Most standard shell utilities can certainly get the job done if the file sizes are of a size that's manageable. That is most likely the vast majority of cases. No question about that. There's certainly a point however when the sizes become so unmanageable that their completion will be on your 150th birthday. ;) An exaggeration undoubtedly. There's obviously options for this, but you'll seldom find the answer in any standard install of any userland. Sometimes you can get away with xargs, depending on what the data is that you're working with, but that's all that comes to mind. The "promotion" from there in my mind is going the perl route (or any other interpreted language) capable of threading ... and from there as necessary ... C (or other compiled language). Someone made mention of Elasticsearch and that's a good option too. All the work of indexing the data has already been done for you. You just don't have to mind paying for it. ;) Hell, I've used postgresql with their fulltext search for similar things as well and I'd argue if that's already in your stack, to at the very least try that first. You'd be surprised at how darn well it does. Goodnight! ~Paul -- __________________ :(){ :|:& };: