Date: Fri, 19 Aug 2022 10:59:08 +0200 From: Philipp <satanist+freebsd@bureaucracy.de> To: questions@freebsd.org Subject: Re: zfs and git upload-pack Message-ID: <97376e6a2eefd7e62e2118d505cd7584.philipp@bureaucracy.de> In-Reply-To: <34cea530-841a-527c-b080-ab3e1379d86f@holgerdanske.com> References: <20220807102839.7c69f387@bureaucracy.de> <a94ec43d-8ae8-fd20-a5a9-0e1e67750107@holgerdanske.com> <20220807195750.0233e2f3@bureaucracy.de> <348470bf-0f11-f7b3-e782-881c3f864ffb@holgerdanske.com> <20220807211348.401ee1c3@bureaucracy.de> <62ced745-9db4-021f-ae0a-fdb4aba03a13@holgerdanske.com> <92ca8cb253ec6c84e44c76d82e98c1e1.philipp@bureaucracy.de> <34cea530-841a-527c-b080-ab3e1379d86f@holgerdanske.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[2022-08-09 17:40] David Christensen <dpchrist@holgerdanske.com> > On 8/9/22 06:16, Philipp wrote: > > [2022-08-07 20:52] David Christensen > >> On 8/7/22 12:13, Philipp Takacs wrote: > >>> On Sun, 7 Aug 2022 11:12:20 -0700 David Christensen > >>>> On 8/7/22 10:57, Philipp Takacs wrote: > >>>>> On Sun, 7 Aug 2022 09:54:41 -0700 David Christensen > >>>>>> On 8/7/22 01:28, Philipp wrote: > >>>>>>> Hi all > >>>>>>> > >>>>>>> I host a quite uncommon git repository mostly out of binary > >>>>>>> files. I have the problem every time this repo is cloned the host > >>>>>>> allocate memory and going to swap. This leads to the host being > >>>>>>> unusable and need to force rebooted. > >>>>>>> > >>>>>>> The repo is stored on a zfs and nullmounted in a jail to run the > >>>>>>> git service over ssh. The host is a FreeBSD 13.1 with 4GB RAM and > >>>>>>> 4GB swap. > >>>>>>> > >>>>>>> What I have noticed is that the biggest memory consumtion is from > >>>>>>> mmap() a pack file. For the given repo this has the size of 6,7G. > >>>>>>> I suspect this file is mapped in memory but not correctly > >>>>>>> handled/unmaped (by the kernel) when not enough memory is > >>>>>>> available. > >>>>>>> > >>>>>>> I have tested some options to solve/workaround this issue: > >>>>>>> > >>>>>>> * limit the zfs ARC size in loader.conf > >>>>>>> * zfs set primarycache none for the dataset > >>>>>>> * limit datasize, memoryuse and vmemoryuse via login.conf > >>>>>>> * limit git packedGitLimit > > > I have [restored the options to previous values]. Now the behavior has = changed. Now one clone was > > succsessfull and at later clones stop with an error (Cannot allocate > > memory). This is better but still not good. > > > If you reboot the server, does the clone work the first them and then = > always fail? I have currently not that mutch time, I might test this next week. > >> What happens if the clone is attempted by a different user on the same > >> workstation? > >> > >> > >> What happens if the clone is attempted from another workstation? > > = > > The same as described. > > > That supports a hypothesis that the problem is the server. > > > >> Please post client console sessions that demonstrate correct operation > >> and failed operation. > > > successfull: > > = > > satanist@hell tmp$ git clone -v ssh://bigrepo@git.bureaucracy.de:2222/b= igrepo > > Cloning into 'bigrepo'... > > remote: Objekte aufz=C3=A4hlen: 9661, fertig. > > remote: Gesamt 9661 (Delta 0), Wiederverwendet 0 (Delta 0), Pack wieder= verwendet 9661 > > Receiving objects: 100% (9661/9661), 6.73 GiB | 5.96 MiB/s, done. > > Resolving deltas: 100% (3/3), done. > > Updating files: 100% (6591/6591), done. > > > Are those status messages in German? I do not know German... A sorry havent paied attention. I can translate: aufz=C3=A4hlen =3D> counting Gesamt =3D> total Wiederverwendet =3D> reused > > unsuccessfull: > > = > > satanist@hell tmp$ git clone -v ssh://bigrepo@git.bureaucracy.de:2222/b= igrepo > > Cloning into 'bigrepo'... > > remote: Enumerating objects: 9661, done. = Rerror: git upload-pack: git-pack-ob > jects died with error.iB/s > > fatal: git upload-pack: aborting due to possible repository corruption = on the remote side. > > remote: fatal: packfile ./objects/pack/pack-6fee671a31a59454b539c88d674= 373d88ad67780.pack cannot be mapped: Cannot allocate memory > > remote: aborting due to possible repository corruption on the remote si= de. > > fatal: early EOF > > fatal: index-pack failed > > = > > As mentioned earlier the "Cannot allocate memory" is new. The old > > behavior was that the server was unusable till I restarted the server. > > I currently don't know how this exactly looks on the client, but there > > is not mutch info in the output. > > > Now the status messages are in English. Do you know why the language = > changed? Because I lied a bit. The first clone was directly from the host. The other one was over the jail responsible for git. There are a bit diffrent lang settings and the first one don't use fugit. Also the repo is read from a nullfs which points to the same data. But the "Cannot allocate memory" happends indipendet of this. > Have you run 'git fsck'? > > https://git-scm.com/docs/git-fsck Not before, but this currently behaves like the git clone before restoring the defaults. So free memory goes down to 85M and server start swapping. I have killed it now, before the server was unusable. I have tested this on a local copy created with scp. No error. > Does the server have ECC memory? No > >>> This is a server, a client connect with a > >>> git client over ssh and use git-upload-pack > >> > >> > >> https://git-scm.com/docs/git-upload-pack > > = > > Yes this programm, but I post hear because I susspect this is an freebs= d > > issue not an issue with git. This programm basicly mmap() some files, > > parse them and write parts (based on stdin) of the content to stdout. > > > I desire stable OS's, so I typically use an older production release. = > My servers have: > > 2022-08-09 16:17:40 dpchrist@f3 ~ > $ freebsd-version ; uname -a > 12.3-RELEASE-p5 > FreeBSD f3.tracy.holgerdanske.com 12.3-RELEASE-p5 FreeBSD = > 12.3-RELEASE-p5 GENERIC amd64 > > > Perhaps you should grab a blank SSD, do an install of = > FreeBSD-12.3-RELEASE, install and set up the software you need without = > using jails, restore the Git repository onto it, and test if that works. I have not the time to reinstall the complete server (ldap, mailserver, http, shell, ...). Also I don't have the money to buy extra hardware. > >>> to receive the content of > >>> the repo. The communication of the git client and git-upload-pack wor= ks > >>> with stdin/stdout. I can give the logs of my git authorization handle= r > >>> (inside jail): > >> > >> <snip> > >> > >> What file? > > = > > This file is called fugit.log and is created by a authorization handler > > for git over ssh called fugit. I use fugit to manage authorization for > > multible git repositories. See https://github.com/cbdevnet/fugit/blob/m= aster/fugit > > > Is fugit is validated and supported on your OS? Not exactly I have tested it and patched the issues I have found. > Is there a support community? Yes, but it's small. I would guess 3 people including me. > Has anyone else seen this problem? No, but the problem also ocourse if I clone the repo without fugit. > Has anyone else seen this problem on Git? I haven't found any repots. > > So it looks like the git-upload-pack and the corresponding > > IO causes a lot of cache allocation. This leads to no free memory left > > for the rest of the operation of the server. > > > As the clone works the first time, and then fails every time thereafter = > with memory-related errors, perhaps there is a memory leak? This is also my guess, the question is where and how to fix it. I suspect some fs cache. Because as I mentioned in my first mail, after a clone free memory stay low till I unmount the zfs dataset. > > I don't know where to increase the verbosity. ssh just starts a session > > and goes on. The logs of fugit are already posted completly. git-upload= -pack > > does not log. > > > Perhaps we can explore that later. > > > >> # zpool iostat -v 60 > > = > > This looks quite normal here parts of the output: > > = > > Normal operation without a git clone running: > > = > > pool alloc free read write read write > > ---------- ----- ----- ----- ----- ----- ----- > > zroot 51.8G 408G 3 14 25.4K 150K > > mirror-0 51.8G 408G 3 14 25.4K 150K > > ada1p3 - - 1 7 12.2K 74.8K > > ada0p3 - - 1 7 13.2K 74.8K > > ---------- ----- ----- ----- ----- ----- ----- > > = > > During a clone: > > = > > capacity operations bandwidth > > pool alloc free read write read write > > ---------- ----- ----- ----- ----- ----- ----- > > zroot 51.8G 408G 22 24 2.87M 226K > > mirror-0 51.8G 408G 22 24 2.87M 226K > > ada1p3 - - 11 12 1.47M 113K > > ada0p3 - - 11 12 1.40M 113K > > ---------- ----- ----- ----- ----- ----- ----- > > = > > As expected the read goes up during the the clone. But not > > to a level I have conserne about the load. > > > I assume ada0 and ada1 are HDD's (?). (If so, an SSD cache device = > really helps; but that should be unrelated to this issue.) This are SSD's. > Does the "During a clone" output correspond to the first clone after a = > reboot or to a second or subsequent clone? You want to look at the = > former; the latter may not show much because of caching. I belive it was during a failing one. I can check for the output of the first one after boot next week. > >> Start the following command in another terminal on the server to monit= or > >> CPU and/or IO activity (press 'm' to switch between the two) (press 'q= ' > >> to exit): > >> > >> # top -S -s 60 > > = > > This also looks as expected. The git process (chiled of git-upload-pack= ) > > uses cpu and memory also creates IO. I have some output some secounds > > befor the git was killed (sorted by RES): > > = > > Mem: 598M Active, 426M Inact, 166M Laundry, 1223M Wired, 1020M Free > > ARC: 543M Total, 185M MFU, 82M MRU, 16K Anon, 8618K Header, 267M Other > > 98M Compressed, 240M Uncompressed, 2,45:1 Ratio > > Swap: 4096M Total, 17M Used, 4079M Free > > Displaying CPU statistics. > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU= COMMAND > > 45719 satanist 1 30 0 1289M 986M pipdwt 2 0:26 17,77% = git > > 53388 10001 44 52 0 2755M 237M uwait 2 2:23 0,15% = java > > = > > After the java process there are only processes with less then 100MB > > reserved. > > = > > I don't know excactly, but it looks like RES and SIZE adds memory > > allocation and memory mapped files. In this case I would argue there is > > sufficient memory availible to drop, because it can be read from disk. > > > Check 'man top' for SIZE and RES field definitions just to be sure. My = > top(1) looks somewhat different. top(1): > SIZE is the total size of the process (text, data, and stack), > RES is the current amount of resident memory So I would interprete this as the mmap is included. > The difference between SIZE and RES is 303M for git and 2518M for java; = > 2821M total. This total is larger than any Mem, ARC (ZFS), or Swap = > statistic. My guess is that the overall statistics do not include = > memory-mapped files (?). No this only means the memory was allocated but not in physical memory. For the java process this is propably some libs which are = currently not used. For git RES goes constanly up till RES and SIZE is about the same and then it tries to mmap the next junk of the file. > See if your top(1) can display the "RSfd" field -- Resident File-Backed = > Memory Size, which should include file mappings. I haven't found an option to do so. > "1020M Free" is shown. Was the server hung when you ran top(1)? No. Currently no hung during clone. > Does the server have a console? Do you have access? If you log in to = > the console before testing and renice that process to a negative = > priority, the console might keep working when everything else hangs. If = > you try the same trick over SSH, it could interact with SSH and Git. The server has a console over IPMI. I can request a shell for IPMI access from my provider. I can try to monitor it over this console next week. Philipp
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?97376e6a2eefd7e62e2118d505cd7584.philipp>