From owner-freebsd-net@freebsd.org Thu May 6 16:00:10 2021 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 037666260EF for ; Thu, 6 May 2021 16:00:10 +0000 (UTC) (envelope-from schmiedgen@gmx.net) Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass Class 2 CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FbdZd0W9Yz4pRs; Thu, 6 May 2021 16:00:08 +0000 (UTC) (envelope-from schmiedgen@gmx.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1620316805; bh=KzHmDkXWbH0RvRTsQiIvaDC9EW6LrkLVKr7uKUK1RUk=; h=X-UI-Sender-Class:Subject:To:Cc:References:From:Date:In-Reply-To; b=A8qz1Lw0wUaoGkBqPLr/Fwz13uXivQcSxJ0iu30kbf2plCCBX8C6u2rM+ngC3eHIu EkWPJye9hx78reKf3h7jNin1oapV43s6G7bp/6YbbVGYWRPh30il1t0d4C/rkdrOFs o61/YwkFm9AjISWUOpkXyF5ZDYv/CLY2LyqA13xg= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [192.168.10.5] ([62.246.110.10]) by mail.gmx.net (mrgmx105 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MgesG-1l2Jnl2Vwa-00h8MY; Thu, 06 May 2021 18:00:05 +0200 Subject: Re: page fault while in kernel mode - after upgrade from 12.2 to 13.0 To: Mark Johnston Cc: freebsd-net@freebsd.org References: <51a3abc5-76b9-df09-acbe-895b62ec87b3@gmx.net> From: Michael Schmiedgen Message-ID: <90ed0277-9fcc-28c0-a546-c6a80babfa34@gmx.net> Date: Thu, 6 May 2021 18:00:05 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Provags-ID: V03:K1:6E763lALLIynMFa5yccipIc+D0bXJFc176Kz4GQ+WJt9SY3j2hX Dl6HBjkZDEDbqa/je+vf1sMVnEkLSFJ6Xv/t03ZRN/W4IKlMjwRTO4wJGFKcY9No5CjpyOc MFrLgIVUNs+5wIPiBFTZxvZI3H/aqZBIuDeoyrXrNQ7+u/ADFb6xW9RR6+Cut0eB2WfeEvA BkcyxMjZ838cTM0+Xl4wg== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:bno521EeS9I=:0U6ywMTQtz3rEOmt4u+bF/ oL8ZTIZckp+vvvBd4ehPUpk/gH7/VhoYe1Pyo9s5teF1S8zVxEDC3OCo3tnq7/pLg7aLdvage vfadrvYBiHH8d6usT5JkQFFoRWKrDkPWwQ8fiiRrEJ+UXyfyzvDxlMMKNH+Z+jtJKtD0ThaCu HJVxWPKLiEldDxk79SqBLTOHEYkjvTgU/t48KeZUcOBau0mA1cQHP/E0mR7h64HXsF5sGi2Rm oM0cHaPJ/hT8bB+Pw4VrDFzfU2sGySH1AvzCQtlcT4eou/kDbhm30TGrZvE5fZRN1hqu/q9cJ gV9lGgzd1mhfTaRP6eTt6QD9rVECATrtgaOyGjC09o3DRMgMA8Hu+vNEdDZFOGPIAqa429KgC luVh8Jnn8SjLnuV+aztZxLcFNlEqexjntQY/XXFwexMjaN+VATWIl1kgHhPzHHZsf9QV4+c0N 83YAR4WvDQd2o4mJc7mkl62MhuudvbRRrx6h99fAoqiO87EcczaxahzsWoGH1S++PipMeaKQM vc6uaiXJPLi3ibUr4W4LXDJYFhd7wi5i4KVXVD6R2pbM5xy1+u6akZ8y64fXmaWJE6CoUOLN7 rqE4CiBjoHHd10OSw0tFzMSoOryknWx4QoW/euNUJTzGNA1RC8EnYUtYhFr9Sa2LuICkpTOIs uNURoEupem/mTC5NaW8LaOzwICKxSBSruIYOAq6posPv4a07+F3cteY0oQ08hJWcuCZ+gONe5 5ySm4xQXJLVs/eIIs1Vl+L5fQforbFFRkC2ecY8Hr4q6smbTCB8IOUl8+Odo+xBkwKHkrgFfv sHfgsNtHTM3Equj3D4g1voSOnbw2I3sYd7v1i9jlFJaWAF31bZUBuvdxqJe4JQRH4JGRO1psS 39hcIJtuza0OsLKFVu1socRkCZQ4XYDnoOqlnOzU7XITuvDaL+UwuD4UjYMVi61VFhAz1CFA5 OoW4Lq9jNMC4wLHP6H9qgcB/e/7Qd7dTSBVohZyQCt+NG7EE4AJVO00OneiYIZ5Ug2UJXyYCp nTwh+x1GvU6xQHjKPnS3zGkDNWrNG7IT0r6WDRfQe7Z3Kc2VDaZZ/dzzNmCvaE86/FfM6QO9Q qfGvfdiVUhDG50IN1J9tmhdolc/HRsn1pdCW9diol5yXzlSZnj/Y5PBzMgRYxOXSgBgPolvef b0tPahIOT0cW5rRZRMxtO1U38IVOUhw8hn3hl+dhW5hDXhwzImGYqG9a6ngq4+JiKF1j4= X-Rspamd-Queue-Id: 4FbdZd0W9Yz4pRs X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmx.net header.s=badeba3b8450 header.b=A8qz1Lw0; dmarc=pass (policy=none) header.from=gmx.net; spf=pass (mx1.freebsd.org: domain of schmiedgen@gmx.net designates 212.227.17.21 as permitted sender) smtp.mailfrom=schmiedgen@gmx.net X-Spamd-Result: default: False [-4.10 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[gmx.net]; R_SPF_ALLOW(-0.20)[+ip4:212.227.17.0/27]; DKIM_TRACE(0.00)[gmx.net:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[gmx.net,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCVD_IN_DNSWL_LOW(-0.10)[212.227.17.21:from]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmx.net]; MID_RHS_MATCH_FROM(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[212.227.17.21:from]; DWL_DNSWL_NONE(0.00)[gmx.net:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmx.net:s=badeba3b8450]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; ASN(0.00)[asn:8560, ipnet:212.227.0.0/16, country:DE]; SPAMHAUS_ZRD(0.00)[212.227.17.21:from:127.0.2.255]; RWL_MAILSPIKE_POSSIBLE(0.00)[212.227.17.21:from]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-net] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 May 2021 16:00:10 -0000 On 05.05.2021 20:38, Mark Johnston wrote: > On Wed, May 05, 2021 at 06:35:32PM +0200, Michael Schmiedgen wrote: >> On 04.05.2021 21:02, Mark Johnston wrote: >>> This looks like fairly random kernel memory corruption. Are you able = to >>> build an INVARIANTS kernel and test that? Assuming you're using 13.0, >>> you'd grab the 13.0 sources, add "options INVARIANT_SUPPORT" and >>> "options INVARIANTS" to the GENERIC kernel configuration in >>> sys/amd64/conf, and do a "make buildkernel installkernel". >> >> Below some info with an INVARIANTS kernel. Please let me know if I can = provide >> further information. Thank you! > > Thanks, this helped a lot. I believe https://reviews.freebsd.org/D30129 > will fix the problem. That patch is against the main branch but applies > cleanly to 13.0. I applied the patch and the server is running fine now for 8 hours with th= e INVARIANTS kernel, including the Samba jail and SIP VM. I just compiled my custom kernel and it looks like it is working too. Are there plans to get this MFCed or even as Errata? BTW, we got 2 other systems, also with userland NAT but different workload= . After an uncertain amount of time, mostly weeks, the natd starts to spin 1= 00% CPU on these systems. Quick noobish workaround was restarting natd every n= ight. I saw your recent commits that applied some more safety in that area, do y= ou plan to MFC these as well? I can imagine that could help with my NAT probl= ems. Anyway, many thanks for your investigation and your fix, much appreciated! Michael