From owner-freebsd-questions@freebsd.org Sun Jul 11 20:11:40 2021 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 2971366DC64 for ; Sun, 11 Jul 2021 20:11:40 +0000 (UTC) (envelope-from johnl@iecc.com) Received: from gal.iecc.com (gal.iecc.com [IPv6:2001:470:1f07:1126:0:43:6f73:7461]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "gal.iecc.com", Issuer "Let's Encrypt Authority X3" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GNJ2M0wTnz3FCg for ; Sun, 11 Jul 2021 20:11:38 +0000 (UTC) (envelope-from johnl@iecc.com) Received: (qmail 95611 invoked from network); 11 Jul 2021 20:11:37 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=iecc.com; h=date:message-id:from:to:cc:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:cleverness; s=17579.60eb5079.k2107; bh=W36nkKlG5g72PXLwmvAWRVhCpIlXgJMIIyumVbI4TuU=; b=ZtpvjVywptsMJAv0PPty24yUlC6K6W2L55hK1a8ROpXbbgymICkgry9zWkd6q2dzU42jKMFe3VR/U/NVdMXzNLT6zCZ/dnO7/CLJDp/Q3W2E7Klf2Q5zG7CDsmdNhtlvsIYFZuUPHwFJ7b1xcBtQG1pkTvGFne53mO8HoAlJACHn4+S8aFcD9uZSwrRQn0RQFOLW9EdrAq3RdH5pYmKqEJ8JQNFsUPup37jX0Kc/PQW62IIKQnuOGbqVmf7TMbreHd5Q4yIGAG3+eKzkSe21ttW7FtX9ACDZGWl2sE7lJ4Owp18vXHiM5M28kw2OwkVYb1PCtrlU7ISlvpSahGxRSg== Received: from ary.qy ([IPv6:2001:470:1f07:1126::78:696d:6170]) by imap.iecc.com ([IPv6:2001:470:1f07:1126::78:696d:6170]) with ESMTPS (TLS1.2 ECDHE-RSA AES-256-GCM AEAD) via TCP6; 11 Jul 2021 20:11:37 -0000 Received: by ary.qy (Postfix, from userid 501) id B3271205F2CA; Sun, 11 Jul 2021 16:11:35 -0400 (EDT) Date: 11 Jul 2021 16:11:35 -0400 Message-Id: <20210711201136.B3271205F2CA@ary.qy> From: "John Levine" To: freebsd-questions@freebsd.org Cc: dpchrist@holgerdanske.com Subject: Re: Analyzing Log files of very large size In-Reply-To: Organization: Taughannock Networks X-Headerized: yes Cleverness: minimal Mime-Version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8bit X-Rspamd-Queue-Id: 4GNJ2M0wTnz3FCg X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none (invalid DKIM record) header.d=iecc.com header.s=17579.60eb5079.k2107 header.b=ZtpvjVyw; dmarc=pass (policy=none) header.from=iecc.com; spf=pass (mx1.freebsd.org: domain of johnl@iecc.com designates 2001:470:1f07:1126:0:43:6f73:7461 as permitted sender) smtp.mailfrom=johnl@iecc.com X-Spamd-Result: default: False [-3.40 / 15.00]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ip6:2001:470:1f07:1126::/64]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; HAS_WP_URI(0.00)[]; HAS_ORG_HEADER(0.00)[]; SPAMHAUS_ZRD(0.00)[2001:470:1f07:1126:0:43:6f73:7461:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[iecc.com:~]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[iecc.com,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; R_DKIM_PERMFAIL(0.00)[iecc.com:s=17579.60eb5079.k2107]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RBL_DBL_DONT_QUERY_IPS(0.00)[2001:470:1f07:1126:0:43:6f73:7461:from]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-questions]; RCVD_IN_DNSWL_LOW(-0.10)[2001:470:1f07:1126:0:43:6f73:7461:from] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Jul 2021 20:11:40 -0000 It appears that David Christensen said: >On 7/11/21 5:13 AM, KK CHN wrote: >> I am in a requirement to analyze large log files of sonic wall firewall >> around 50 GB. for a suspect attack. ... >But if this project is for an employer or client, I would recommend >starting with the commercial-off-the-shelf (COTS) log analysis tool made >by the hardware vendor. Train up on it. Buy a support contract: > >https://www.sonicwall.com/wp-content/uploads/2019/01/sonicwall-analyzer.pdf This is reasonable advice if you plan to be doing these analyses on a regular basis, but it's overkill if you only expect to do it once. I have found that some of the text processing utilities that come with BSD are a lot faster than others. The regex matching in perl is a lot faster than python, sometimes by an order of magnitude. My took of choice is mawk, an implementation of the funky but very useful awk language that is amazingly fast. grep is OK, sed is too slow for anything other than tiny jobs. I'd suggest first dividing up the logs into manageable chunks, perhaps using split or csplit, or it would be a good first project in mawk, using patterns to divide the files into chunks that represent an hour or a day. Then you can start looking for interesting patterns, perhaps with grep if they are simple enough, or more likely with some short mawk scripts. R's, John