Date: Sun, 7 Aug 2022 21:13:48 +0200 From: Philipp Takacs <satanist+freebsd@bureaucracy.de> To: questions@freebsd.org Subject: Re: zfs and git upload-pack Message-ID: <20220807211348.401ee1c3@bureaucracy.de> In-Reply-To: <348470bf-0f11-f7b3-e782-881c3f864ffb@holgerdanske.com> References: <20220807102839.7c69f387@bureaucracy.de> <a94ec43d-8ae8-fd20-a5a9-0e1e67750107@holgerdanske.com> <20220807195750.0233e2f3@bureaucracy.de> <348470bf-0f11-f7b3-e782-881c3f864ffb@holgerdanske.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 7 Aug 2022 11:12:20 -0700 David Christensen <dpchrist@holgerdanske.com> wrote: > On 8/7/22 10:57, Philipp Takacs wrote: > > On Sun, 7 Aug 2022 09:54:41 -0700 > > David Christensen <dpchrist@holgerdanske.com> wrote: > > > >> On 8/7/22 01:28, Philipp wrote: > >>> Hi all > >>> > >>> I host a quite uncommon git repository mostly out of binary > >>> files. I have the problem every time this repo is cloned the host > >>> allocate memory and going to swap. This leads to the host being > >>> unusable and need to force rebooted. > >>> > >>> The repo is stored on a zfs and nullmounted in a jail to run the > >>> git service over ssh. The host is a FreeBSD 13.1 with 4GB RAM and > >>> 4GB swap. > >>> > >>> What I have noticed is that the biggest memory consumtion is from > >>> mmap() a pack file. For the given repo this has the size of 6,7G. > >>> I suspect this file is mapped in memory but not correctly > >>> handled/unmaped (by the kernel) when not enough memory is > >>> available. > >>> > >>> I have tested some options to solve/workaround this issue: > >>> > >>> * limit the zfs ARC size in loader.conf > >>> * zfs set primarycache none for the dataset > >>> * limit datasize, memoryuse and vmemoryuse via login.conf > >>> * limit git packedGitLimit > >>> > >>> None of them have solved the issue. > >>> > >>> On my last test I have noticed that the memory is not freed after > >>> the git commands are stopped. To free the memory I had to unmount > >>> the zfs dataset: > >>> > >>> satanist@confusion:~ $ freecolor -om > >>> total used free shared buffers > >>> cached Mem: 3923 3881 41 0 > >>> 0 0 Swap: 4096 456 3639 > >>> satanist@confusion:~ $ sudo zfs umount > >>> zroot/data/git/bigrepository satanist@confusion:~ $ freecolor -om > >>> total used free shared buffers > >>> cached Mem: 3923 278 3645 0 > >>> 0 0 Swap: 4096 456 3639 > >>> > >>> Any idea how to debug this or what else I could try solve this? > >>> > >>> Philipp > >> > >> > >> Add memory; the more, the better. > > > > Can you borrow me your credit card? > > > > Jokes aside. this repo gets cloned a few times a month. Currently > > the Host dies because one client try to clone this repo. This > > should be work with 4GB memory. > > > Please post console sessions that demonstrate cloning without failure > and cloning with failure. Not sure what you mean. This is a server, a client connect with a git client over ssh and use git-upload-pack to receive the content of the repo. The communication of the git client and git-upload-pack works with stdin/stdout. I can give the logs of my git authorization handler (inside jail): Sat Aug 6 12:01:27 UTC 2022 username: Command is git-upload-pack '/bigrepository' Sat Aug 6 12:01:27 UTC 2022 username: pull from repo /bigrepository Sat Aug 6 12:01:27 UTC 2022 username: Trying to read from /bigrepository (Path /git/bigrepository/bigrepository.git) Sat Aug 6 14:35:23 UTC 2022 username: Pull accepted The last line mean the clone was finished[0]. But at this time everything else on the host was unusable. Here the corresponding content of /var/log/messages: Aug 6 12:25:07 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 12:26:41 confusion sshd[50953]: error: Fssh_kex_exchange_identification: Connection closed by remote host Aug 6 12:28:37 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 12:40:45 confusion sshd[62976]: fatal: Timeout before authentication for 103.188.176.3 port 45910 Aug 6 13:02:03 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (2 occurrences) Aug 6 13:02:03 confusion syslogd: last message repeated 2 times Aug 6 13:08:29 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 13:10:30 confusion syslogd: last message repeated 1 times Aug 6 13:15:39 confusion syslogd: last message repeated 1 times Aug 6 13:23:25 confusion syslogd: last message repeated 2 times Aug 6 13:33:37 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (2 occurrences) Aug 6 13:36:08 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 13:37:59 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (2 occurrences) Aug 6 14:00:51 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 14:00:51 confusion syslogd: last message repeated 3 times Aug 6 14:02:09 confusion syslogd: last message repeated 1 times Aug 6 14:10:38 confusion syslogd: last message repeated 1 times Aug 6 14:10:38 confusion kernel: sonewconn: pcb 0xfffff8007b48a9b0 ([2a01:138:9000:bc42::15e4:2d4d]:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrence s) Aug 6 14:12:18 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (3 occurrences) Aug 6 14:12:18 confusion kernel: sonewconn: pcb 0xfffff8007b48a9b0 ([2a01:138:9000:bc42::15e4:2d4d]:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (2 occurrence s) Aug 6 14:15:04 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 14:16:35 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (2 occurrences) Aug 6 14:17:40 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 14:20:39 confusion syslogd: last message repeated 1 times Aug 6 14:20:39 confusion kernel: sonewconn: pcb 0xfffff8007b48a9b0 ([2a01:138:9000:bc42::15e4:2d4d]:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrence s) Aug 6 14:22:12 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 14:24:20 confusion syslogd: last message repeated 1 times Aug 6 14:27:44 confusion syslogd: last message repeated 1 times Aug 6 14:28:46 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (2 occurrences) Aug 6 14:30:15 confusion login[8129]: ROOT LOGIN (root) ON ttyu2 Aug 6 14:33:00 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 14:34:16 confusion kernel: pid 13305 (java), jid 5, uid 10001, was killed: failed to reclaim memory Aug 6 14:34:28 confusion sshd[70018]: warning: can't get client address: Connection reset by peer Aug 6 14:36:48 confusion kernel: pid 28013 (slapd), jid 1, uid 389, was killed: failed to reclaim memory Aug 6 14:37:06 confusion sshd[74213]: error: Fssh_kex_exchange_identification: Connection closed by remote host Aug 6 14:37:13 confusion sshd[80386]: error: beginning MaxStartups throttling Aug 6 14:37:14 confusion kernel: pid 56727 (smtpd), jid 4, uid 257, was killed: failed to reclaim memory Aug 6 14:34:00 confusion sshd[68901]: fatal: Timeout before authentication for 84.147.51.239 port 58360 Aug 6 14:37:59 confusion login[87884]: ROOT LOGIN (root) ON ttyu2 Aug 6 14:38:00 confusion sshd[80386]: error: beginning MaxStartups throttling Aug 6 14:40:27 confusion syslogd: kernel boot file is /boot/kernel/kernel Aug 6 14:40:27 confusion kernel: ---<<BOOT>>--- Between 12:00 and 14:00 the server started to be slow and running ssh/mosh session stopped working. Starting new sessions over ssh was not possible. The root login at 14:38 was me over ipmi try to somehow get the the host working again. But I could login and only run top then this session was also unusable. I have then restarted the server over ipmi. Philipp [0] https://github.com/cbdevnet/fugit/blob/master/fugit#L94
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20220807211348.401ee1c3>