From nobody Sun Aug 7 19:13:48 2022 X-Original-To: questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4M18Br1w59z4YkC1 for ; Sun, 7 Aug 2022 19:13:56 +0000 (UTC) (envelope-from satanist+freebsd@bureaucracy.de) Received: from smtp1.bureaucracy.de (smtp1.bureaucracy.de [80.190.133.201]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp1.bureaucracy.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4M18Bp5jhmz454f for ; Sun, 7 Aug 2022 19:13:54 +0000 (UTC) (envelope-from satanist+freebsd@bureaucracy.de) Received: from localhost (p200300e5b71d26001027fa3a9bef1d75.dip0.t-ipconnect.de [2003:e5:b71d:2600:1027:fa3a:9bef:1d75]) by smtp (OpenSMTPD) with ESMTPSA id 9bedbf15 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Sun, 7 Aug 2022 19:13:50 +0000 (UTC) Date: Sun, 7 Aug 2022 21:13:48 +0200 From: Philipp Takacs To: questions@freebsd.org Subject: Re: zfs and git upload-pack Message-ID: <20220807211348.401ee1c3@bureaucracy.de> In-Reply-To: <348470bf-0f11-f7b3-e782-881c3f864ffb@holgerdanske.com> References: <20220807102839.7c69f387@bureaucracy.de> <20220807195750.0233e2f3@bureaucracy.de> <348470bf-0f11-f7b3-e782-881c3f864ffb@holgerdanske.com> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu) List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4M18Bp5jhmz454f X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of satanist+freebsd@bureaucracy.de has no SPF policy when checking 80.190.133.201) smtp.mailfrom=satanist+freebsd@bureaucracy.de X-Spamd-Result: default: False [-2.10 / 15.00]; AUTH_NA(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:15598, ipnet:80.190.128.0/19, country:DE]; FROM_EQ_ENVFROM(0.00)[]; MLMMJ_DEST(0.00)[questions@freebsd.org]; DMARC_NA(0.00)[bureaucracy.de]; R_SPF_NA(0.00)[no SPF record]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; TO_DN_NONE(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[questions@freebsd.org]; TAGGED_FROM(0.00)[freebsd] X-ThisMailContainsUnwantedMimeParts: N On Sun, 7 Aug 2022 11:12:20 -0700 David Christensen wrote: > On 8/7/22 10:57, Philipp Takacs wrote: > > On Sun, 7 Aug 2022 09:54:41 -0700 > > David Christensen wrote: > > > >> On 8/7/22 01:28, Philipp wrote: > >>> Hi all > >>> > >>> I host a quite uncommon git repository mostly out of binary > >>> files. I have the problem every time this repo is cloned the host > >>> allocate memory and going to swap. This leads to the host being > >>> unusable and need to force rebooted. > >>> > >>> The repo is stored on a zfs and nullmounted in a jail to run the > >>> git service over ssh. The host is a FreeBSD 13.1 with 4GB RAM and > >>> 4GB swap. > >>> > >>> What I have noticed is that the biggest memory consumtion is from > >>> mmap() a pack file. For the given repo this has the size of 6,7G. > >>> I suspect this file is mapped in memory but not correctly > >>> handled/unmaped (by the kernel) when not enough memory is > >>> available. > >>> > >>> I have tested some options to solve/workaround this issue: > >>> > >>> * limit the zfs ARC size in loader.conf > >>> * zfs set primarycache none for the dataset > >>> * limit datasize, memoryuse and vmemoryuse via login.conf > >>> * limit git packedGitLimit > >>> > >>> None of them have solved the issue. > >>> > >>> On my last test I have noticed that the memory is not freed after > >>> the git commands are stopped. To free the memory I had to unmount > >>> the zfs dataset: > >>> > >>> satanist@confusion:~ $ freecolor -om > >>> total used free shared buffers > >>> cached Mem: 3923 3881 41 0 > >>> 0 0 Swap: 4096 456 3639 > >>> satanist@confusion:~ $ sudo zfs umount > >>> zroot/data/git/bigrepository satanist@confusion:~ $ freecolor -om > >>> total used free shared buffers > >>> cached Mem: 3923 278 3645 0 > >>> 0 0 Swap: 4096 456 3639 > >>> > >>> Any idea how to debug this or what else I could try solve this? > >>> > >>> Philipp > >> > >> > >> Add memory; the more, the better. > > > > Can you borrow me your credit card? > > > > Jokes aside. this repo gets cloned a few times a month. Currently > > the Host dies because one client try to clone this repo. This > > should be work with 4GB memory. > > > Please post console sessions that demonstrate cloning without failure > and cloning with failure. Not sure what you mean. This is a server, a client connect with a git client over ssh and use git-upload-pack to receive the content of the repo. The communication of the git client and git-upload-pack works with stdin/stdout. I can give the logs of my git authorization handler (inside jail): Sat Aug 6 12:01:27 UTC 2022 username: Command is git-upload-pack '/bigrepository' Sat Aug 6 12:01:27 UTC 2022 username: pull from repo /bigrepository Sat Aug 6 12:01:27 UTC 2022 username: Trying to read from /bigrepository (Path /git/bigrepository/bigrepository.git) Sat Aug 6 14:35:23 UTC 2022 username: Pull accepted The last line mean the clone was finished[0]. But at this time everything else on the host was unusable. Here the corresponding content of /var/log/messages: Aug 6 12:25:07 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 12:26:41 confusion sshd[50953]: error: Fssh_kex_exchange_identification: Connection closed by remote host Aug 6 12:28:37 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 12:40:45 confusion sshd[62976]: fatal: Timeout before authentication for 103.188.176.3 port 45910 Aug 6 13:02:03 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (2 occurrences) Aug 6 13:02:03 confusion syslogd: last message repeated 2 times Aug 6 13:08:29 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 13:10:30 confusion syslogd: last message repeated 1 times Aug 6 13:15:39 confusion syslogd: last message repeated 1 times Aug 6 13:23:25 confusion syslogd: last message repeated 2 times Aug 6 13:33:37 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (2 occurrences) Aug 6 13:36:08 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 13:37:59 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (2 occurrences) Aug 6 14:00:51 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 14:00:51 confusion syslogd: last message repeated 3 times Aug 6 14:02:09 confusion syslogd: last message repeated 1 times Aug 6 14:10:38 confusion syslogd: last message repeated 1 times Aug 6 14:10:38 confusion kernel: sonewconn: pcb 0xfffff8007b48a9b0 ([2a01:138:9000:bc42::15e4:2d4d]:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrence s) Aug 6 14:12:18 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (3 occurrences) Aug 6 14:12:18 confusion kernel: sonewconn: pcb 0xfffff8007b48a9b0 ([2a01:138:9000:bc42::15e4:2d4d]:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (2 occurrence s) Aug 6 14:15:04 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 14:16:35 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (2 occurrences) Aug 6 14:17:40 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 14:20:39 confusion syslogd: last message repeated 1 times Aug 6 14:20:39 confusion kernel: sonewconn: pcb 0xfffff8007b48a9b0 ([2a01:138:9000:bc42::15e4:2d4d]:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrence s) Aug 6 14:22:12 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 14:24:20 confusion syslogd: last message repeated 1 times Aug 6 14:27:44 confusion syslogd: last message repeated 1 times Aug 6 14:28:46 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (2 occurrences) Aug 6 14:30:15 confusion login[8129]: ROOT LOGIN (root) ON ttyu2 Aug 6 14:33:00 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0 (10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences) Aug 6 14:34:16 confusion kernel: pid 13305 (java), jid 5, uid 10001, was killed: failed to reclaim memory Aug 6 14:34:28 confusion sshd[70018]: warning: can't get client address: Connection reset by peer Aug 6 14:36:48 confusion kernel: pid 28013 (slapd), jid 1, uid 389, was killed: failed to reclaim memory Aug 6 14:37:06 confusion sshd[74213]: error: Fssh_kex_exchange_identification: Connection closed by remote host Aug 6 14:37:13 confusion sshd[80386]: error: beginning MaxStartups throttling Aug 6 14:37:14 confusion kernel: pid 56727 (smtpd), jid 4, uid 257, was killed: failed to reclaim memory Aug 6 14:34:00 confusion sshd[68901]: fatal: Timeout before authentication for 84.147.51.239 port 58360 Aug 6 14:37:59 confusion login[87884]: ROOT LOGIN (root) ON ttyu2 Aug 6 14:38:00 confusion sshd[80386]: error: beginning MaxStartups throttling Aug 6 14:40:27 confusion syslogd: kernel boot file is /boot/kernel/kernel Aug 6 14:40:27 confusion kernel: ---<>--- Between 12:00 and 14:00 the server started to be slow and running ssh/mosh session stopped working. Starting new sessions over ssh was not possible. The root login at 14:38 was me over ipmi try to somehow get the the host working again. But I could login and only run top then this session was also unusable. I have then restarted the server over ipmi. Philipp [0] https://github.com/cbdevnet/fugit/blob/master/fugit#L94