From owner-freebsd-questions@freebsd.org Sun Jul 11 20:18:12 2021 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 9C90F66E3C8 for ; Sun, 11 Jul 2021 20:18:12 +0000 (UTC) (envelope-from m.e.sanliturk@gmail.com) Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GNJ9v644Tz3GBH for ; Sun, 11 Jul 2021 20:18:11 +0000 (UTC) (envelope-from m.e.sanliturk@gmail.com) Received: by mail-wm1-x32b.google.com with SMTP id h18-20020a05600c3512b029020e4ceb9588so12863486wmq.5 for ; Sun, 11 Jul 2021 13:18:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gxaFktUD1L1MWLLUAyPQaqSEhVKFqDyON5ewQPqqk0U=; b=LkUhT1RlVUZFgenHbV8YBPVHOhEAwSGAuLCHKxv6XGek2n+Eoz0lwM56Qfng5LT1yB wKY3VprPalBDoQBYq7AydCWjULuRA1XaQnNdLV64VG72jwyNXzu/7VFAnNyAcp0zw8qM SRz6/6tyeNpZSwG12sTmTqJ8bu/CevmDsIe8UVpjfT1C467vF9iBkLQyGwg+Io6KZPN8 iMMggRUXQyMSlXOfa4ItIS6Wy19UwAMuy9v3ad5y4s5xcoIYvuCWv/fZOK31gNWZwhT4 ed8iZ8kBTjz+8NJ1HMZCEaJy21peE7lvMQE0B5E05Xc0cm+rYofCdMRJdD97eSt6LgBU M5oA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gxaFktUD1L1MWLLUAyPQaqSEhVKFqDyON5ewQPqqk0U=; b=U0hUO0j3RDsgJPvFcYBuPyUNuA7Elldo8V0j1lwWn3nUahkpv0G26DDdnWBsNuBxjE yS/crhH21fXCAA4otpiIbwpF5oHByZsdmpawKFFDVj9ykb47CxfPlVzWraoEZ2PfMTxN 5dZ+jlkfo53vafql5pKfnUvnvx37xjvmXS4/TO0zh3gg6UuO1Dp+hBTNAvWUw3Vb9vsq wz5rwbRoo+Pyo9kG1C1pSl38o1g1PEKIoFByu8i+P9PP7sotsXgnuGKBnB5WKh70s2ef aUtPrrzXoIbLP2ah+j7s52LieCfz7Qw0r665cp8QaouoFzHTvx5Qz97vVOTt+G46D/Eo oviw== X-Gm-Message-State: AOAM530XP2eUG9Qe6u+QfVw3bRN7rP5cpaMAI3EE0/Pvn2paJ7HVio2W /StS4AwvclZi9MPO6WbUBjEKykFmIsl28B8IFiQ= X-Google-Smtp-Source: ABdhPJw/x4gJ4uSDeCMJCEzsmDIA5LCqJvoDIpBP9CUvv9il+IM9Bw6CNhCZJ8mOy6vaH98zbYEfSohYz68iEagO26w= X-Received: by 2002:a05:600c:ad6:: with SMTP id c22mr4896366wmr.19.1626034689559; Sun, 11 Jul 2021 13:18:09 -0700 (PDT) MIME-Version: 1.0 References: <20210711103839.61dfd4baafa38984f208b707@optonline.net> In-Reply-To: <20210711103839.61dfd4baafa38984f208b707@optonline.net> From: Mehmet Erol Sanliturk Date: Sun, 11 Jul 2021 23:17:32 +0300 Message-ID: Subject: Re: Analyzing Log files of very large size To: Vlad Markov Cc: FreeBSD Questions Mailing List X-Rspamd-Queue-Id: 4GNJ9v644Tz3GBH X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=LkUhT1Rl; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of mesanliturk@gmail.com designates 2a00:1450:4864:20::32b as permitted sender) smtp.mailfrom=mesanliturk@gmail.com X-Spamd-Result: default: False [-3.28 / 15.00]; FREEMAIL_FROM(0.00)[gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FREEMAIL_TO(0.00)[optonline.net]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RBL_DBL_DONT_QUERY_IPS(0.00)[2a00:1450:4864:20::32b:from]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TAGGED_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.28)[-0.278]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; SPAMHAUS_ZRD(0.00)[2a00:1450:4864:20::32b:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::32b:from]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-questions] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.34 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Jul 2021 20:18:12 -0000 On Sun, Jul 11, 2021 at 5:38 PM Vlad Markov wrote: > On Sun, 11 Jul 2021 19:43:41 +0530 > KK CHN wrote: > > > Yes, it is. > > > > On Sun, Jul 11, 2021 at 6:02 PM Korolev Sergey > wrote: > > > > > Is it a plain text file? > > > > > > On 11 Jul 2021, at 22:13, KK CHN wrote: > > > > > > List, > > > > > > I am in a requirement to analyze large log files of sonic wall firewall > > > around 50 GB. for a suspect attack. > > > > > > What tools and solutions need to be deployed for handling this much > large > > > files and pls enlighten me with your expertise and reference materials > if > > > any. > > > > > > All are tcp / ip communications, DNS UDP transports .. > > > > > > Regards, > > > Kris > I used to use split to break up large log files into manageable pieces. > From there it depends on how you work. At first we used grep then we moved > on to using perl regex to analyze logs. > > Vlad > > > > -- > > > My idea is as follows because I am trying to use such a feature for a database management system to track behavior of the program . The generated log for a very short time came out 56 GigaBytes . During backup of sources , the computer warned me about "You are trying to backup 56 GigaBytes into a 4.7 GigaBytes DVD." Assume a message line is 56 bytes , this size of file contains 1 Billion records to study . Then , it is easy to load this size of file as an AVL tree into memory by grouping the accessed parts by counting their occurrences . In your case , you may generate your log as , perhaps "accessor , accessed parts , ... " . Assume that you need who is accessing ( or attempting to access ) into 'some (as list )" parts . During AVL tree generation , use "accessed parts" as KEYs , and "accessor" values as its leaves with some other vital information . >From an AVL tree it is very easy to get a list of such accessors in order and study them in more detail . Since a small amount of information is sufficient , computers with memory capacities will be sufficient . If your memory is not sufficient , you may use an SSD disk as a storage with even 500 Mega~Bytes per second write/read speeds . Be careful about wear of such disks with very high amounts of write/read operations . It is very easy to find open source AVL tree software with sufficiently permissive licenses . I do not know exactly , but my opinion is that even in FreeBSD sources there are such parts . It is possible to find information about AVL trees in data structures books , especially such books using C or C++ may be more useful for you . https://en.wikipedia.org/wiki/AVL_tree AVL tree Please search the following phrase in Google : open source repositories about avl software Mehmet Erol Sanliturk