From owner-freebsd-stable@freebsd.org Sun May 23 07:44:19 2021 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id C4AD96419CF for ; Sun, 23 May 2021 07:44:19 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic312-25.consmr.mail.gq1.yahoo.com (sonic312-25.consmr.mail.gq1.yahoo.com [98.137.69.206]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Fnsmf29txz3KkT for ; Sun, 23 May 2021 07:44:17 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1621755856; bh=MVsk9ITU2Bu8gVcHsUxBhXvM09E+9r0ce5V4+m6TxjT=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=WDglkncK3pYiWitILMXvcopRxQx33YJL2dbbZeyAoZXmwB4X9DRbt3x09UVskKqK/53Sh0ZsDK2XSi5cPGVQW5Lea6CRy0ufnh+HfkYr5I8pyS/UBcmFlXgFFftlNZti5t9mdasFo+UvnK1jQ/auP8x8J08tfUWKpf5lHY7KwNNsyAPILFN7ziQBaQQtbXIffxj3p8SiHA6dw2xesOJ7S6Y3HpvlG0j66oZL6nTwFIDEBgisXYTyfX+vbnigwgw+L3JLxLKHWPYrgnV2CuOOLiYsSB7Txw75mb15PH+oHTgQSujS+0ObFSbONmDLFN8JF/A1mbVUfECYTM8glJaLkQ== X-YMail-OSG: 8xYWiYIVM1lW7aiAO2rl0a60AhtDN7LNdNlZqDik_q5rKFcbeqwudK.SzFm82L7 lRslipHqvX4bDN2IjYyU3deVzuxTJZeY7p2_wcEu3ft8QXvUe1_sB7k00V7UbA6IavJ0ur02_P7. 2zdEp8XSi9uai4IH2_QhvwlZCf10kIqlCtBZpUbfPvRxBdty3ZrrCct5ZHu9zyDfb9gnCex9Xddx FY9iLGud7pMtaTitU0Yzltpim60x01VNHylwlsDN_qacG5bqNR27KhG2EJhy3qgf0gHlImdocFve 69cUtk9UjcBfmd6PHi0a9._VQFTC5YFXvd7GFYLYDPjKhlF09vTtU2CsbjX5wjOHCbOg5tKB.nsx dnf1p1HFOYQhjvO8q2BklS79g2v7xegIIkFBT0mCTARsOAZLZyTpqjW_LDe5X4zgvMTErNIOibEv STvcg0.FA_BgU9dyO.LzaNEtW0EshclBjM_fQLP5VxjQ4gblZVnT1osH5Zh60.RL.3T2g.t6it6v kDlyenQE7Tj5o.pn5qP_zrOPqRalUQ6BKjikFAE53G2EMkJK9bicky4Msb0wYAb1vv3bM.WATYIa E3_5233BwwHEXkj_2ba9Xggizjv1dFo1HeG7xcKiGKB3HiAD0tWtsYjQyK23AyJbUOIp434MiE9E x1_G7DGF1pt.EwmX5UY4wqXSRpTeRme6_w0q9.xQpm8LNyfKre4gcpiFdMBMgYyqWQhi3mKte2Cx HZuLjqjga162aWCxHLVm22gqocZGJgHtsPuWkEZgfulPYda46Zl4xxcVExem3MxaqqNBL9ef8Ydl nOoIJMMKhpWtMcXMSFfWGY.JIsxmZfAEdnXpadg8qGjsURfsoGXAlz0r03hvZA5crX0oCsuRMlIB D8fAccQvWII42N2jct.aapR35BuOH1tNJRXRzHRgeLyeKKUb3W.j5i0_p3WjparMT1vK4UjTeP4a MiE9PTDCRCoIbI3UtOj0DKZjLPdaX7MXv9BcDdN8gZ4cd1rCqaFFzwSJfWY1unQ8ns.M..51wmCf LjJ6qs0pgg_JbGvzZy_OjQUgbe5ho9M1M6wh50_zktfkeWKWYaXxkdl3GogfpIizJVZjgM7DBvU1 fEGVBB56H4OjaHLa8cxE.yUrR_AONUFx9Le3fTElwsf_LqF6KhQemfN7R0IK9b_tXUOzOC3OMVb8 T3stjbqKxcTFcGVEVqBaI3tNm1vf2qV07Gp9oKEckaOqjm1aLAfFnjE_o06cn2xJmopqJgyMe2aU hK7bkSJZR1eyHe7EhDWRrNCndSH76CuqNV6BE7akjIzvJrlVwy1fWDIOvf4H4ywZSxcIWj22bwrq rworcDnOCkmXV8bVfthxjeFJoZoKw5HYJOuSiwaDw0xRgcKTTLm_dRdVeTFcD.p01zuCIlIRHLY_ B0owDSyXgBgiwwmkZLjjils0kEp7r6v.BPS_hMX_nz4io1MMIMBUfcSBHEvo152vdnGLvf3W.2nD qWAfWXVbGxjQB2kxbeWm_WgMRCIMnwCuqPb4uQUiB39Pk.QzmLNXITi_TicgTn6ImH1tzLReN2cW OYkKJqtqs9v1Wb.96M51YGZb5j0SGmg.r7HCU3XZUsVl5oYMfe1YUNPvjSrCPkp8ONviAmRUl90y HFEsGi0C9oyn6ln74bBfrRbzj0qlvxD50qVZI.FyzjS4X7U6RP5qqfR_psUmaOp1vaApvFg3VvnI hUlhjztNUC0Ui07pl6b7p.o3YjNxzTHi333lh8_zdjqS0apXgGD1vQqA8qSDO4zzsIrr7q19QuRi KwDS.dyLw3mm_OLynqrFFMbLh.qpR5bNvnjj8mwBiL_aiz2BpUL6eG5EahVfkIhtyXCU36p9ud2i gMR4olLdnU6GP3Tja.l6Bo_FIgPQT3Z2rTSQDUAax0qtLQj4W1_XiOI8hTB_FVJFHDhi3Mc7s8Zi 4sY_.BEiQdtEyn6xm3qd5FMSt99gAd7SEv2mcd7HEFCrBjBaI2MURH6OUqUjpyXr538.OisIvXWE DO_NvHZ6qoRlbLdlu3Qk5s3BQW6n3S2U8MQxNsFBTuR2sfvv9Nz1eAzWDljx9G75nx1LVgy.WQsF L8RQn9B7XZHh_euv_Xda5ZcZtCOkNW8iiRFFZntO.D2DtGmIDO8AQJxI4_cq7dNm3Y58kWcJn7P0 e7APA4ZoAJHrmk5GEp.oJrUDfrh__9SJ20SGkToB1g_fpPNHJLNo1xowsH0mwrWAtKBNX1oMlYnF aK4SrdhYpwFL8GmOG0lkuwZetkt.vrt4iIX_wz1NKe4BFixUPxY70fr1FqB_jih0qOGQpR_HZvoO nwjTVjFxHUexFDXQj6cRtk2Z56d7DOukh8pzThqv8v7dfQMzgrNL92megqNajgfTiyzypaL86sO5 0a9l31r4dXcFYy8Q6xosDrDt6zs3m7p2vc9Hdg6GtHhEQ6t1mw74pW023V_uoR6hbwqdcAYO9V.o poodamQWLObHYdn7yt6uGVHSouKdlvAa99aK7uBEXczzGC23Zvo.6UX9cJuf5KTzKAxQbpHiIP2l AKwkFP7GQD306smy2F8hvhBiLTTjjxf0rwnvw975Z9b8cag6h2iPMXnyP1QlZWCvMndwq_9IXAmW 5PszczSyOgRc5c5gFjx.LMyQziqdMXp6qfgQhLzu8NjZ5gl3RzjR8ZsSyGF.Auxfj4RkjwpOlvsu afcCfAKDT0amNrg-- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic312.consmr.mail.gq1.yahoo.com with HTTP; Sun, 23 May 2021 07:44:16 +0000 Received: by kubenode568.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 95bd5f056e770ba6020368f58c02398c; Sun, 23 May 2021 07:44:14 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.80.0.2.43\)) Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context) From: Mark Millard In-Reply-To: Date: Sun, 23 May 2021 00:44:13 -0700 Cc: FreeBSD-STABLE Mailing List Content-Transfer-Encoding: quoted-printable Message-Id: <47AE7DDF-F4BA-4632-BDCC-FB1F1AE30810@yahoo.com> References: <623369D9-5EE5-4FEF-B9AD-56499E8F1C09.ref@yahoo.com> <623369D9-5EE5-4FEF-B9AD-56499E8F1C09@yahoo.com> <04D7264A-206B-4281-B452-779B01EA3327@yahoo.com> <34E915B3-30DF-408C-A931-C39188F3EB0F@yahoo.com> <508C3B05-79E5-49ED-8032-DA7DF249E154@yahoo.com> To: Rick Macklem X-Mailer: Apple Mail (2.3654.80.0.2.43) X-Rspamd-Queue-Id: 4Fnsmf29txz3KkT X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.50 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; RBL_DBL_DONT_QUERY_IPS(0.00)[98.137.69.206:from]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; SPAMHAUS_ZRD(0.00)[98.137.69.206:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.69.206:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.69.206:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-stable] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 May 2021 07:44:19 -0000 On 2021-May-21, at 17:56, Rick Macklem wrote: > Mark Millard wrote: > [stuff snipped] >> Well, why is it that ls -R, find, and diff -r all get file >> name problems via genet0 but diff -r gets no problems >> comparing the content of files that it does match up (the >> vast majority)? Any clue how could the problems possibly >> be unique to the handling of file names/paths? Does it >> suggest anything else to look into for getting some more >> potentially useful evidence? > Well, all I can do is describe the most common TSO related > failure: > - When a read RPC reply (including NFS/RPC/TCP/IP headers) > is slightly less than 64K bytes (many TSO implementations are > limited to 64K or 32 discontiguous segments, think 32 2K > mbuf clusters), the driver decides it is ok, but when the MAC > header is added it exceeds what the hardware can handle correctly... > --> This will happen when reading a regular file that is slightly less > than a multiple of 64K in size. > or > --> This will happen when reading just about any large directory, > since the directory reply for a 64K request is converted to Sun = XDR > format and clipped at the last full directory entry that will fit = within 64K. > For ports, where most files are small, I think you can tell which is = more > likely to happen. > --> If TSO is disabled, I have no idea how this might matter, but?? >=20 >> I'll note that netstat -I ue0 -d and netstat -I genet0 -d >> do not report changes in Ierrs or Idrop in a before vs. >> after failures comparison. (There may be better figures >> to look at for all I know.) >>=20 >> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6" >> and got no obvious change in behavior. > All we know is that the data is getting corrupted somehow. >=20 > NFS traffic looks very different than typical TCP traffic. It is > mostly small messages travelling in both directions concurrently, > with some large messages thrown in the mix. > All I'm saying is that, testing a net interface with something like > bulk data transfer in one direction doesn't verify it works for NFS > traffic. >=20 > Also, the large RPC messages are a chain of about 33 mbufs of > various lengths, including a mix of partial clusters and regular > data mbufs, whereas a bulk send on a socket will typically > result in an mbuf chain of a lot of full 2K clusters. > --> As such, NFS can be good at tickling subtle bugs it the > net driver related to mbuf handling. >=20 > rick >=20 >>> W.r.t. reverting r367492...the patch to replace r367492 was just >>> committed to "main" by rscheff@ with a two week MFC, so it >>> should be in stable/13 soon. Not sure if an errata can be done >>> for it for releng13.0? >>=20 >> That update is reported to be causing "rack" related panics: >>=20 >> = https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.h= tml >>=20 >> reports (via links): >>=20 >> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ = /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_= stacks/rack.c:10632 >>=20 >> Still, I have a non-debug update to main building and will >> likely do a debug build as well. llvm is rebuilding, so >> the builds will take a notable time. I got the following built and installed on the two machines: # uname -apKU FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 = main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.= aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 # uname -apKU FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 = main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.= aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 Note that both are booted with debug builds of main. Using the context with the alternate EtherNet device that has not had an associated diff -r, find, pr ls -R failure yet yet got a panic that looks likely to be unrelated: # mount -onoatime 192.168.1.187:/usr/ports/ /mnt/ # diff -r /usr/ports/ /mnt/ | more nvme0: cpl does not map to outstanding cmd cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0 panic: received completion for unknown cmd cpuid =3D 3 time =3D 1621743752 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x188 panic() at panic+0x44 nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc nvme_timeout() at nvme_timeout+0x3c softclock_call_cc() at softclock_call_cc+0x124 softclock() at softclock+0x60 ithread_loop() at ithread_loop+0x2a8 fork_exit() at fork_exit+0x74 fork_trampoline() at fork_trampoline+0x14 KDB: enter: panic [ thread pid 12 tid 100028 ] Stopped at kdb_enter+0x48: undefined f904411f db>=20 Based on the "nvme" references, I expect this is tied to handling the Optane 480 GiByte that is in the PCIe slot and is the boot/only media for the machine doing the diff. "db> dump" seems to have worked. After reboot, zpool scrub found no errors. So, trying again . . . I got some "Expensive timeout(9) function" notices: Expensive timeout(9) function: 0xffff000000717b64(0) 1.210285924 s Expensive timeout(9) function: 0xffff000000717b64(0) 4.001010935 s 0xffff000000717b64 looks to be uma_timeout: ffff000000717b60 b ffff000000717b3c = ffff000000717b64 stp x29, x30, [sp, #-32]! ffff000000717b68 stp x20, x19, [sp, #16] . . . . . . Hmm. The debug kernel test context seems to take a very long time. It has not failed so far but is still going. So I stopped it and switch to testing with the genet0 device that was involved for the earlier failures. . . . It did not fail. Nor did the debug kernel report anything beyond: if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 Expensive timeout(9) function: 0xffff00000050c088(0) 6.318652023 s on one machine and: if_delmulti_locked: detaching ifnet instance 0xffffa0000b56b800 on the other. So I may reboot into the also-updated non-debug builds on both machines and try in that context. >>> Thanks for isolating this, rick >>> ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy. >>=20 >> I'll warn that the primary "small arm" development/support >> folk(s) do not work on the RPi*'s these days, beyond >> committing what others provide and the like. >=20 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)