From nobody Wed Aug 10 00:40:00 2022 X-Original-To: questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4M2WLK4Ncgz4Y8RF for ; Wed, 10 Aug 2022 00:40:09 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Received: from holgerdanske.com (holgerdanske.com [IPv6:2001:470:0:19b::b869:801b]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "holgerdanske.com", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4M2WLH6Hvzz3sLt for ; Wed, 10 Aug 2022 00:40:07 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=holgerdanske.com; s=nov-20210719-112354; t=1660092000; bh=J+fb3Z5Wd1DJYeqMpoUq1swHHxUMUWBUJA9llwquHIU=; h=Received:Message-ID:Date:MIME-Version:User-Agent:Subject: Content-Language:To:References:From:In-Reply-To:Content-Type: Content-Transfer-Encoding; b=AnZOSXxDmBxSK/uc84OlDTr5qG41N8ebwUpSXLNNvBCu0dHhZAbkpNmXWRdUPTKJD +2eyoUVHx+gtgoBOx/MqgrIBqIK5upewvA7UAYmpKOzYzDRgj+hSio8wfmNcP/Vhos eFKqMAsJxBZcn2znkcv9fITXT63H9wqrS7PzFmxwL1/Mv4e+8DccvY7BPW/TzJzEUJ 6u40dHbnHonp929MXVd7Ov68O+6+pMm8unIAC87O0frB5kO4sGmBXbH6kuZ4W8JPhe gbSFilh79t1PrpX/TKXgsk3o/wfoNRDcx0N0CsNGn2JTgVMrenI5eEmiCDHc7AW7j7 7oCU/KSOGM5D10w9l/ViqNzwuyZfjaUsu3az85m+USyThLHgmesrLLBg0YfaMshIF0 PCj5NmWKoLqYQ//8g25OPsRtIRkXbaOOyxH2ODH/y9dhFCN4u64X2p0NdzQjcLKrpI lVscGOKycZPxlGRlMICV0JoSk4WOMalt6lWGbKOnbI9eNJ4r1IbxT0IIFbnqlWIuvv DbjfB5Qc7xLLo3KTkzyeSKRzyQUQMcAnJYdloANo6R4BH2plyN038i8F5Cs7+bPfK/ /JcuABIZ8ZPsT7cQlHluszu9ftueMKi41NsYNIHrqZ2ziMGtqCvy2a38KuiSAq4dXl +7gLtuVi9MFu3Sy5dHtcG6WU= Received: from 99.100.19.101 (99-100-19-101.lightspeed.frokca.sbcglobal.net [99.100.19.101]) by holgerdanske.com with ESMTPSA (TLS_AES_128_GCM_SHA256:TLSv1.3:Kx=any:Au=any:Enc=AESGCM(128):Mac=AEAD) (SMTP-AUTH username dpchrist@holgerdanske.com, mechanism PLAIN) for ; Tue, 9 Aug 2022 17:40:00 -0700 Message-ID: <34cea530-841a-527c-b080-ab3e1379d86f@holgerdanske.com> Date: Tue, 9 Aug 2022 17:40:00 -0700 List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 Subject: Re: zfs and git upload-pack Content-Language: en-US To: questions@freebsd.org References: <20220807102839.7c69f387@bureaucracy.de> <20220807195750.0233e2f3@bureaucracy.de> <348470bf-0f11-f7b3-e782-881c3f864ffb@holgerdanske.com> <20220807211348.401ee1c3@bureaucracy.de> <62ced745-9db4-021f-ae0a-fdb4aba03a13@holgerdanske.com> <92ca8cb253ec6c84e44c76d82e98c1e1.philipp@bureaucracy.de> From: David Christensen In-Reply-To: <92ca8cb253ec6c84e44c76d82e98c1e1.philipp@bureaucracy.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4M2WLH6Hvzz3sLt X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=holgerdanske.com header.s=nov-20210719-112354 header.b=AnZOSXxD; dmarc=pass (policy=none) header.from=holgerdanske.com; spf=pass (mx1.freebsd.org: domain of dpchrist@holgerdanske.com designates 2001:470:0:19b::b869:801b as permitted sender) smtp.mailfrom=dpchrist@holgerdanske.com X-Spamd-Result: default: False [-4.00 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[holgerdanske.com,none]; R_SPF_ALLOW(-0.20)[+a:november.he.net]; R_DKIM_ALLOW(-0.20)[holgerdanske.com:s=nov-20210719-112354]; MIME_GOOD(-0.10)[text/plain]; DKIM_TRACE(0.00)[holgerdanske.com:+]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; MLMMJ_DEST(0.00)[questions@freebsd.org]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; TO_DN_NONE(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[questions@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On 8/9/22 06:16, Philipp wrote: > [2022-08-07 20:52] David Christensen >> On 8/7/22 12:13, Philipp Takacs wrote: >>> On Sun, 7 Aug 2022 11:12:20 -0700 David Christensen >>>> On 8/7/22 10:57, Philipp Takacs wrote: >>>>> On Sun, 7 Aug 2022 09:54:41 -0700 David Christensen >>>>>> On 8/7/22 01:28, Philipp wrote: >>>>>>> Hi all >>>>>>> >>>>>>> I host a quite uncommon git repository mostly out of binary >>>>>>> files. I have the problem every time this repo is cloned the host >>>>>>> allocate memory and going to swap. This leads to the host being >>>>>>> unusable and need to force rebooted. >>>>>>> >>>>>>> The repo is stored on a zfs and nullmounted in a jail to run the >>>>>>> git service over ssh. The host is a FreeBSD 13.1 with 4GB RAM and >>>>>>> 4GB swap. >>>>>>> >>>>>>> What I have noticed is that the biggest memory consumtion is from >>>>>>> mmap() a pack file. For the given repo this has the size of 6,7G. >>>>>>> I suspect this file is mapped in memory but not correctly >>>>>>> handled/unmaped (by the kernel) when not enough memory is >>>>>>> available. >>>>>>> >>>>>>> I have tested some options to solve/workaround this issue: >>>>>>> >>>>>>> * limit the zfs ARC size in loader.conf >>>>>>> * zfs set primarycache none for the dataset >>>>>>> * limit datasize, memoryuse and vmemoryuse via login.conf >>>>>>> * limit git packedGitLimit > I have [restored the options to previous values]. Now the behavior has changed. Now one clone was > succsessfull and at later clones stop with an error (Cannot allocate > memory). This is better but still not good. If you reboot the server, does the clone work the first them and then always fail? >> What happens if the clone is attempted by a different user on the same >> workstation? >> >> >> What happens if the clone is attempted from another workstation? > > The same as described. That supports a hypothesis that the problem is the server. >> Please post client console sessions that demonstrate correct operation >> and failed operation. > successfull: > > satanist@hell tmp$ git clone -v ssh://bigrepo@git.bureaucracy.de:2222/bigrepo > Cloning into 'bigrepo'... > remote: Objekte aufzählen: 9661, fertig. > remote: Gesamt 9661 (Delta 0), Wiederverwendet 0 (Delta 0), Pack wiederverwendet 9661 > Receiving objects: 100% (9661/9661), 6.73 GiB | 5.96 MiB/s, done. > Resolving deltas: 100% (3/3), done. > Updating files: 100% (6591/6591), done. Are those status messages in German? I do not know German... > unsuccessfull: > > satanist@hell tmp$ git clone -v ssh://bigrepo@git.bureaucracy.de:2222/bigrepo > Cloning into 'bigrepo'... > remote: Enumerating objects: 9661, done. Rerror: git upload-pack: git-pack-objects died with error.iB/s > fatal: git upload-pack: aborting due to possible repository corruption on the remote side. > remote: fatal: packfile ./objects/pack/pack-6fee671a31a59454b539c88d674373d88ad67780.pack cannot be mapped: Cannot allocate memory > remote: aborting due to possible repository corruption on the remote side. > fatal: early EOF > fatal: index-pack failed > > As mentioned earlier the "Cannot allocate memory" is new. The old > behavior was that the server was unusable till I restarted the server. > I currently don't know how this exactly looks on the client, but there > is not mutch info in the output. Now the status messages are in English. Do you know why the language changed? Have you run 'git fsck'? https://git-scm.com/docs/git-fsck Does the server have ECC memory? >>> This is a server, a client connect with a >>> git client over ssh and use git-upload-pack >> >> >> https://git-scm.com/docs/git-upload-pack > > Yes this programm, but I post hear because I susspect this is an freebsd > issue not an issue with git. This programm basicly mmap() some files, > parse them and write parts (based on stdin) of the content to stdout. I desire stable OS's, so I typically use an older production release. My servers have: 2022-08-09 16:17:40 dpchrist@f3 ~ $ freebsd-version ; uname -a 12.3-RELEASE-p5 FreeBSD f3.tracy.holgerdanske.com 12.3-RELEASE-p5 FreeBSD 12.3-RELEASE-p5 GENERIC amd64 Perhaps you should grab a blank SSD, do an install of FreeBSD-12.3-RELEASE, install and set up the software you need without using jails, restore the Git repository onto it, and test if that works. >>> to receive the content of >>> the repo. The communication of the git client and git-upload-pack works >>> with stdin/stdout. I can give the logs of my git authorization handler >>> (inside jail): >> >> >> >> What file? > > This file is called fugit.log and is created by a authorization handler > for git over ssh called fugit. I use fugit to manage authorization for > multible git repositories. See https://github.com/cbdevnet/fugit/blob/master/fugit Is fugit is validated and supported on your OS? Is there a support community? Has anyone else seen this problem? Has anyone else seen this problem on Git? > So it looks like the git-upload-pack and the corresponding > IO causes a lot of cache allocation. This leads to no free memory left > for the rest of the operation of the server. As the clone works the first time, and then fails every time thereafter with memory-related errors, perhaps there is a memory leak? > I don't know where to increase the verbosity. ssh just starts a session > and goes on. The logs of fugit are already posted completly. git-upload-pack > does not log. Perhaps we can explore that later. >> # zpool iostat -v 60 > > This looks quite normal here parts of the output: > > Normal operation without a git clone running: > > pool alloc free read write read write > ---------- ----- ----- ----- ----- ----- ----- > zroot 51.8G 408G 3 14 25.4K 150K > mirror-0 51.8G 408G 3 14 25.4K 150K > ada1p3 - - 1 7 12.2K 74.8K > ada0p3 - - 1 7 13.2K 74.8K > ---------- ----- ----- ----- ----- ----- ----- > > During a clone: > > capacity operations bandwidth > pool alloc free read write read write > ---------- ----- ----- ----- ----- ----- ----- > zroot 51.8G 408G 22 24 2.87M 226K > mirror-0 51.8G 408G 22 24 2.87M 226K > ada1p3 - - 11 12 1.47M 113K > ada0p3 - - 11 12 1.40M 113K > ---------- ----- ----- ----- ----- ----- ----- > > As expected the read goes up during the the clone. But not > to a level I have conserne about the load. I assume ada0 and ada1 are HDD's (?). (If so, an SSD cache device really helps; but that should be unrelated to this issue.) Does the "During a clone" output correspond to the first clone after a reboot or to a second or subsequent clone? You want to look at the former; the latter may not show much because of caching. >> Start the following command in another terminal on the server to monitor >> CPU and/or IO activity (press 'm' to switch between the two) (press 'q' >> to exit): >> >> # top -S -s 60 > > This also looks as expected. The git process (chiled of git-upload-pack) > uses cpu and memory also creates IO. I have some output some secounds > befor the git was killed (sorted by RES): > > Mem: 598M Active, 426M Inact, 166M Laundry, 1223M Wired, 1020M Free > ARC: 543M Total, 185M MFU, 82M MRU, 16K Anon, 8618K Header, 267M Other > 98M Compressed, 240M Uncompressed, 2,45:1 Ratio > Swap: 4096M Total, 17M Used, 4079M Free > Displaying CPU statistics. > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 45719 satanist 1 30 0 1289M 986M pipdwt 2 0:26 17,77% git > 53388 10001 44 52 0 2755M 237M uwait 2 2:23 0,15% java > > After the java process there are only processes with less then 100MB > reserved. > > I don't know excactly, but it looks like RES and SIZE adds memory > allocation and memory mapped files. In this case I would argue there is > sufficient memory availible to drop, because it can be read from disk. Check 'man top' for SIZE and RES field definitions just to be sure. My top(1) looks somewhat different. The difference between SIZE and RES is 303M for git and 2518M for java; 2821M total. This total is larger than any Mem, ARC (ZFS), or Swap statistic. My guess is that the overall statistics do not include memory-mapped files (?). See if your top(1) can display the "RSfd" field -- Resident File-Backed Memory Size, which should include file mappings. "1020M Free" is shown. Was the server hung when you ran top(1)? Does the server have a console? Do you have access? If you log in to the console before testing and renice that process to a negative priority, the console might keep working when everything else hangs. If you try the same trick over SSH, it could interact with SSH and Git. David