From owner-freebsd-net@freebsd.org Thu Dec 10 23:36:18 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 48F814BC530 for ; Thu, 10 Dec 2020 23:36:18 +0000 (UTC) (envelope-from bryan-lists@shatow.net) Received: from mail.xzibition.com (mail.xzibition.com [52.11.127.251]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CsVfn2XCGz4V4V; Thu, 10 Dec 2020 23:36:16 +0000 (UTC) (envelope-from bryan-lists@shatow.net) Received: from mail.xzibition.com (localhost [172.31.3.2]) by mail.xzibition.com (Postfix) with ESMTP id 2FB0B1E5AD; Thu, 10 Dec 2020 23:36:10 +0000 (UTC) X-Virus-Scanned: amavisd-new at mail.xzibition.com Received: from mail.xzibition.com ([172.31.3.2]) by mail.xzibition.com (mail.xzibition.com [172.31.3.2]) (amavisd-new, port 10026) with LMTP id VUqWU6P8__Z3; Thu, 10 Dec 2020 23:36:07 +0000 (UTC) Subject: Re: kernel: page fault in unp_pcb_owned_lock2_slowpath DKIM-Filter: OpenDKIM Filter v2.10.3 mail.xzibition.com ECA861E5A3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=shatow.net; s=mxc204805312015; t=1607643367; bh=sKKm0IxhHGUMsNNfg96HUR963CCwS9fscqMWnjrye2Q=; h=Subject:To:Cc:References:From:Date:In-Reply-To; b=sgmv6q5vM9buxq96ZeHcNM5o84JqCpWHpAJ/sSVizJbcZex/nV3F523sTvdhUsYdC hVC+r37NaDZPXYT2lrQPsLu0QHncqGIUZKaa+GsQdI45TJ4pLaXUXc8FsmYflhz743 W9ZW9ecmEg8DEnAzI5AfwVqqBio5XRBCsZ9PPU01erwuTu/DOz/+hin+PzN9ybzPZW o9pKXw16XC+Rpo/HWIGDARwAa/cYv7wMtfRAsSR5EVFwheA++lES5uiOmRssomfPl3 +AmBvLe4u4A4EHE4ejGnfxG3Gyo2/4A2e9Ef3tPvKiCl82Ci0kUP4NwVIZxtDXRbf2 7JDPs1Mk9Aijw== To: Mark Johnston , "Leverett, Bruce" Cc: "freebsd-net@freebsd.org" References: <20201009124933.GB29607@raichu> From: Bryan Drewery Message-ID: <593fd094-42fe-719c-63e5-d1fbbf422563@shatow.net> Date: Thu, 10 Dec 2020 15:36:06 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 MIME-Version: 1.0 In-Reply-To: <20201009124933.GB29607@raichu> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4CsVfn2XCGz4V4V X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=shatow.net header.s=mxc204805312015 header.b=sgmv6q5v; dmarc=pass (policy=none) header.from=shatow.net; spf=pass (mx1.freebsd.org: domain of bryan-lists@shatow.net designates 52.11.127.251 as permitted sender) smtp.mailfrom=bryan-lists@shatow.net X-Spamd-Result: default: False [-2.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[shatow.net:s=mxc204805312015]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; RBL_DBL_DONT_QUERY_IPS(0.00)[52.11.127.251:from]; SPAMHAUS_ZRD(0.00)[52.11.127.251:from:127.0.2.255]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[shatow.net:+]; DMARC_POLICY_ALLOW(-0.50)[shatow.net,none]; NEURAL_SPAM_LONG(1.00)[1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:16509, ipnet:52.10.0.0/15, country:US]; MID_RHS_MATCH_FROM(0.00)[]; MAILMAN_DEST(0.00)[freebsd-net] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Dec 2020 23:36:18 -0000 On 10/9/2020 5:49 AM, Mark Johnston wrote: > On Thu, Oct 08, 2020 at 09:58:09PM +0000, Leverett, Bruce wrote: >> In 12.1, we are seeing a page fault in unp_pcb_owned_lock2_slowpath, while trying to lock unp2. Examination of the crash dump shows that unp2's reference count is down to zero, which it shouldn't be, since the function took a reference on it before unlocking unp. >> >> Could this be a bug that has been fixed in recent versions? I would look into upgrading, or back-porting the fix, if a fix is known. > > I recently fixed a few issues with the unix domain socket locking code. > The commits were merged to stable/12 in r366488. There's a few earlier > fixes in uipc_usrreq.c that were merged after 12.1, so you might have > luck backporting those as well. I'm not sure what the specific bug is > in your case; a backtrace at least might be enough to pinpoint it. For google's sake I'm replying. Probably this: kernel:trap_fatal+0x96 kernel:trap+0x76 kernel:__mtx_lock_sleep+0xe7 kernel:__mtx_lock_flags+0x100 kernel:unp_pcb_owned_lock2_slowpath+0x66 kernel:uipc_send+0x105e kernel:sosend_generic+0x4ae kernel:kern_sendit+0x1a7 kernel:sendit+0x260 kernel:sys_sendto+0x4c kernel:amd64_syscall+0x327 >From my limited triage it appeared to be that unp2 was trying to lock unp->unp_conn as it was being nulled/disconnected elsewhere. It did seem to be races in the ref/locking code. > if ((unp2 = unp->unp_conn) == NULL) { > UNP_PCB_UNLOCK(unp); > error = ENOTCONN; > break; > } > } > if (__predict_false(unp == unp2)) { > if (unp->unp_socket == NULL) { > error = ENOTCONN; > break; > } > goto connect_self; > } > unp_pcb_owned_lock2(unp, unp2, freed); unp2 was set here but unp->unp_conn was NULL by the time it tried to lock unp2. The old locking was strange and seemed to assume some invariant as it set unp2 from unp->unp_conn, then took a ref on unp2, unlocked unp, then locked unp2. But I didn't see how unp2 could be kept alive between the NULL check above and the ref taken in unp_pcb_owned_lock2_slowpath. Anyway I figured it was those commits too. #3 0xffffffff806e2fbf in uipc_send (so=0xfffffe8cc082a6d0, flags=, m=0xffffffbf000007b4, nam=0x0, control=, td=) at /b/mnt/src/sys/kern/uipc_usrreq.c:1095 1095 unp_pcb_owned_lock2(unp, unp2, freed); (gdb) p nam $11 = (struct sockaddr *) 0x0 (gdb) set $unp = ((struct unpcb *)((so)->so_pcb)) (gdb) p *$unp $14 = { unp_mtx = { lock_object = { lo_name = 0xffffffff80ae79ba "unp", lo_flags = 21168128, lo_data = 0, lo_witness = 0xfffff80431cd9b00 }, mtx_lock = 0 }, unp_conn = 0x0, unp_refcount = 1, unp_flags = 0, unp_gcflag = 4, unp_addr = 0x0, unp_socket = 0xfffffe8cc082a6d0, unp_vnode = 0x0, unp_peercred = 0x0, unp_reflink = { le_next = 0xffffffffffffffff, le_prev = 0xffffffffffffffff }, unp_link = { le_next = 0xfffff801b9e68c00, le_prev = 0xfffff801b9b3e120 }, unp_refs = { lh_first = 0x0 }, unp_gencnt = 935, unp_file = 0x0, unp_msgcount = 0, unp_ino = 0 } (gdb) set $unp2 = $unp->unp_conn (gdb) p $unp2 $15 = (struct unpcb *) 0x0 ... #2 0xffffffff806e4017 in unp_pcb_owned_lock2_slowpath (unp=, unp2p=0xfffffe8872867830, freed=0xfffffe887286784c) at /b/mnt/src/sys/kern/uipc_usrreq.c:372 372 UNP_PCB_LOCK(unp2); (gdb) p unp2 $23 = (struct unpcb *) 0xfffff801b9baa900 (gdb) p freed $24 = (int *) 0xfffffe887286784c (gdb) p *freed $25 = 0 -- Regards, Bryan Drewery bdrewery@freenode/EFNet