From owner-freebsd-stable@freebsd.org Sun May 23 07:44:19 2021 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id C4AD96419CF for ; Sun, 23 May 2021 07:44:19 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic312-25.consmr.mail.gq1.yahoo.com (sonic312-25.consmr.mail.gq1.yahoo.com [98.137.69.206]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Fnsmf29txz3KkT for ; Sun, 23 May 2021 07:44:17 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1621755856; bh=MVsk9ITU2Bu8gVcHsUxBhXvM09E+9r0ce5V4+m6TxjT=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=WDglkncK3pYiWitILMXvcopRxQx33YJL2dbbZeyAoZXmwB4X9DRbt3x09UVskKqK/53Sh0ZsDK2XSi5cPGVQW5Lea6CRy0ufnh+HfkYr5I8pyS/UBcmFlXgFFftlNZti5t9mdasFo+UvnK1jQ/auP8x8J08tfUWKpf5lHY7KwNNsyAPILFN7ziQBaQQtbXIffxj3p8SiHA6dw2xesOJ7S6Y3HpvlG0j66oZL6nTwFIDEBgisXYTyfX+vbnigwgw+L3JLxLKHWPYrgnV2CuOOLiYsSB7Txw75mb15PH+oHTgQSujS+0ObFSbONmDLFN8JF/A1mbVUfECYTM8glJaLkQ== X-YMail-OSG: 8xYWiYIVM1lW7aiAO2rl0a60AhtDN7LNdNlZqDik_q5rKFcbeqwudK.SzFm82L7 lRslipHqvX4bDN2IjYyU3deVzuxTJZeY7p2_wcEu3ft8QXvUe1_sB7k00V7UbA6IavJ0ur02_P7. 2zdEp8XSi9uai4IH2_QhvwlZCf10kIqlCtBZpUbfPvRxBdty3ZrrCct5ZHu9zyDfb9gnCex9Xddx FY9iLGud7pMtaTitU0Yzltpim60x01VNHylwlsDN_qacG5bqNR27KhG2EJhy3qgf0gHlImdocFve 69cUtk9UjcBfmd6PHi0a9._VQFTC5YFXvd7GFYLYDPjKhlF09vTtU2CsbjX5wjOHCbOg5tKB.nsx dnf1p1HFOYQhjvO8q2BklS79g2v7xegIIkFBT0mCTARsOAZLZyTpqjW_LDe5X4zgvMTErNIOibEv STvcg0.FA_BgU9dyO.LzaNEtW0EshclBjM_fQLP5VxjQ4gblZVnT1osH5Zh60.RL.3T2g.t6it6v kDlyenQE7Tj5o.pn5qP_zrOPqRalUQ6BKjikFAE53G2EMkJK9bicky4Msb0wYAb1vv3bM.WATYIa E3_5233BwwHEXkj_2ba9Xggizjv1dFo1HeG7xcKiGKB3HiAD0tWtsYjQyK23AyJbUOIp434MiE9E x1_G7DGF1pt.EwmX5UY4wqXSRpTeRme6_w0q9.xQpm8LNyfKre4gcpiFdMBMgYyqWQhi3mKte2Cx HZuLjqjga162aWCxHLVm22gqocZGJgHtsPuWkEZgfulPYda46Zl4xxcVExem3MxaqqNBL9ef8Ydl nOoIJMMKhpWtMcXMSFfWGY.JIsxmZfAEdnXpadg8qGjsURfsoGXAlz0r03hvZA5crX0oCsuRMlIB D8fAccQvWII42N2jct.aapR35BuOH1tNJRXRzHRgeLyeKKUb3W.j5i0_p3WjparMT1vK4UjTeP4a MiE9PTDCRCoIbI3UtOj0DKZjLPdaX7MXv9BcDdN8gZ4cd1rCqaFFzwSJfWY1unQ8ns.M..51wmCf LjJ6qs0pgg_JbGvzZy_OjQUgbe5ho9M1M6wh50_zktfkeWKWYaXxkdl3GogfpIizJVZjgM7DBvU1 fEGVBB56H4OjaHLa8cxE.yUrR_AONUFx9Le3fTElwsf_LqF6KhQemfN7R0IK9b_tXUOzOC3OMVb8 T3stjbqKxcTFcGVEVqBaI3tNm1vf2qV07Gp9oKEckaOqjm1aLAfFnjE_o06cn2xJmopqJgyMe2aU hK7bkSJZR1eyHe7EhDWRrNCndSH76CuqNV6BE7akjIzvJrlVwy1fWDIOvf4H4ywZSxcIWj22bwrq rworcDnOCkmXV8bVfthxjeFJoZoKw5HYJOuSiwaDw0xRgcKTTLm_dRdVeTFcD.p01zuCIlIRHLY_ B0owDSyXgBgiwwmkZLjjils0kEp7r6v.BPS_hMX_nz4io1MMIMBUfcSBHEvo152vdnGLvf3W.2nD qWAfWXVbGxjQB2kxbeWm_WgMRCIMnwCuqPb4uQUiB39Pk.QzmLNXITi_TicgTn6ImH1tzLReN2cW OYkKJqtqs9v1Wb.96M51YGZb5j0SGmg.r7HCU3XZUsVl5oYMfe1YUNPvjSrCPkp8ONviAmRUl90y HFEsGi0C9oyn6ln74bBfrRbzj0qlvxD50qVZI.FyzjS4X7U6RP5qqfR_psUmaOp1vaApvFg3VvnI hUlhjztNUC0Ui07pl6b7p.o3YjNxzTHi333lh8_zdjqS0apXgGD1vQqA8qSDO4zzsIrr7q19QuRi KwDS.dyLw3mm_OLynqrFFMbLh.qpR5bNvnjj8mwBiL_aiz2BpUL6eG5EahVfkIhtyXCU36p9ud2i gMR4olLdnU6GP3Tja.l6Bo_FIgPQT3Z2rTSQDUAax0qtLQj4W1_XiOI8hTB_FVJFHDhi3Mc7s8Zi 4sY_.BEiQdtEyn6xm3qd5FMSt99gAd7SEv2mcd7HEFCrBjBaI2MURH6OUqUjpyXr538.OisIvXWE DO_NvHZ6qoRlbLdlu3Qk5s3BQW6n3S2U8MQxNsFBTuR2sfvv9Nz1eAzWDljx9G75nx1LVgy.WQsF L8RQn9B7XZHh_euv_Xda5ZcZtCOkNW8iiRFFZntO.D2DtGmIDO8AQJxI4_cq7dNm3Y58kWcJn7P0 e7APA4ZoAJHrmk5GEp.oJrUDfrh__9SJ20SGkToB1g_fpPNHJLNo1xowsH0mwrWAtKBNX1oMlYnF aK4SrdhYpwFL8GmOG0lkuwZetkt.vrt4iIX_wz1NKe4BFixUPxY70fr1FqB_jih0qOGQpR_HZvoO nwjTVjFxHUexFDXQj6cRtk2Z56d7DOukh8pzThqv8v7dfQMzgrNL92megqNajgfTiyzypaL86sO5 0a9l31r4dXcFYy8Q6xosDrDt6zs3m7p2vc9Hdg6GtHhEQ6t1mw74pW023V_uoR6hbwqdcAYO9V.o poodamQWLObHYdn7yt6uGVHSouKdlvAa99aK7uBEXczzGC23Zvo.6UX9cJuf5KTzKAxQbpHiIP2l AKwkFP7GQD306smy2F8hvhBiLTTjjxf0rwnvw975Z9b8cag6h2iPMXnyP1QlZWCvMndwq_9IXAmW 5PszczSyOgRc5c5gFjx.LMyQziqdMXp6qfgQhLzu8NjZ5gl3RzjR8ZsSyGF.Auxfj4RkjwpOlvsu afcCfAKDT0amNrg-- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic312.consmr.mail.gq1.yahoo.com with HTTP; Sun, 23 May 2021 07:44:16 +0000 Received: by kubenode568.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 95bd5f056e770ba6020368f58c02398c; Sun, 23 May 2021 07:44:14 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.80.0.2.43\)) Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context) From: Mark Millard In-Reply-To: Date: Sun, 23 May 2021 00:44:13 -0700 Cc: FreeBSD-STABLE Mailing List Content-Transfer-Encoding: quoted-printable Message-Id: <47AE7DDF-F4BA-4632-BDCC-FB1F1AE30810@yahoo.com> References: <623369D9-5EE5-4FEF-B9AD-56499E8F1C09.ref@yahoo.com> <623369D9-5EE5-4FEF-B9AD-56499E8F1C09@yahoo.com> <04D7264A-206B-4281-B452-779B01EA3327@yahoo.com> <34E915B3-30DF-408C-A931-C39188F3EB0F@yahoo.com> <508C3B05-79E5-49ED-8032-DA7DF249E154@yahoo.com> To: Rick Macklem X-Mailer: Apple Mail (2.3654.80.0.2.43) X-Rspamd-Queue-Id: 4Fnsmf29txz3KkT X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.50 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; RBL_DBL_DONT_QUERY_IPS(0.00)[98.137.69.206:from]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; SPAMHAUS_ZRD(0.00)[98.137.69.206:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.69.206:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.69.206:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-stable] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 May 2021 07:44:19 -0000 On 2021-May-21, at 17:56, Rick Macklem wrote: > Mark Millard wrote: > [stuff snipped] >> Well, why is it that ls -R, find, and diff -r all get file >> name problems via genet0 but diff -r gets no problems >> comparing the content of files that it does match up (the >> vast majority)? Any clue how could the problems possibly >> be unique to the handling of file names/paths? Does it >> suggest anything else to look into for getting some more >> potentially useful evidence? > Well, all I can do is describe the most common TSO related > failure: > - When a read RPC reply (including NFS/RPC/TCP/IP headers) > is slightly less than 64K bytes (many TSO implementations are > limited to 64K or 32 discontiguous segments, think 32 2K > mbuf clusters), the driver decides it is ok, but when the MAC > header is added it exceeds what the hardware can handle correctly... > --> This will happen when reading a regular file that is slightly less > than a multiple of 64K in size. > or > --> This will happen when reading just about any large directory, > since the directory reply for a 64K request is converted to Sun = XDR > format and clipped at the last full directory entry that will fit = within 64K. > For ports, where most files are small, I think you can tell which is = more > likely to happen. > --> If TSO is disabled, I have no idea how this might matter, but?? >=20 >> I'll note that netstat -I ue0 -d and netstat -I genet0 -d >> do not report changes in Ierrs or Idrop in a before vs. >> after failures comparison. (There may be better figures >> to look at for all I know.) >>=20 >> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6" >> and got no obvious change in behavior. > All we know is that the data is getting corrupted somehow. >=20 > NFS traffic looks very different than typical TCP traffic. It is > mostly small messages travelling in both directions concurrently, > with some large messages thrown in the mix. > All I'm saying is that, testing a net interface with something like > bulk data transfer in one direction doesn't verify it works for NFS > traffic. >=20 > Also, the large RPC messages are a chain of about 33 mbufs of > various lengths, including a mix of partial clusters and regular > data mbufs, whereas a bulk send on a socket will typically > result in an mbuf chain of a lot of full 2K clusters. > --> As such, NFS can be good at tickling subtle bugs it the > net driver related to mbuf handling. >=20 > rick >=20 >>> W.r.t. reverting r367492...the patch to replace r367492 was just >>> committed to "main" by rscheff@ with a two week MFC, so it >>> should be in stable/13 soon. Not sure if an errata can be done >>> for it for releng13.0? >>=20 >> That update is reported to be causing "rack" related panics: >>=20 >> = https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.h= tml >>=20 >> reports (via links): >>=20 >> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ = /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_= stacks/rack.c:10632 >>=20 >> Still, I have a non-debug update to main building and will >> likely do a debug build as well. llvm is rebuilding, so >> the builds will take a notable time. I got the following built and installed on the two machines: # uname -apKU FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 = main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.= aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 # uname -apKU FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 = main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.= aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 Note that both are booted with debug builds of main. Using the context with the alternate EtherNet device that has not had an associated diff -r, find, pr ls -R failure yet yet got a panic that looks likely to be unrelated: # mount -onoatime 192.168.1.187:/usr/ports/ /mnt/ # diff -r /usr/ports/ /mnt/ | more nvme0: cpl does not map to outstanding cmd cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0 panic: received completion for unknown cmd cpuid =3D 3 time =3D 1621743752 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x188 panic() at panic+0x44 nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc nvme_timeout() at nvme_timeout+0x3c softclock_call_cc() at softclock_call_cc+0x124 softclock() at softclock+0x60 ithread_loop() at ithread_loop+0x2a8 fork_exit() at fork_exit+0x74 fork_trampoline() at fork_trampoline+0x14 KDB: enter: panic [ thread pid 12 tid 100028 ] Stopped at kdb_enter+0x48: undefined f904411f db>=20 Based on the "nvme" references, I expect this is tied to handling the Optane 480 GiByte that is in the PCIe slot and is the boot/only media for the machine doing the diff. "db> dump" seems to have worked. After reboot, zpool scrub found no errors. So, trying again . . . I got some "Expensive timeout(9) function" notices: Expensive timeout(9) function: 0xffff000000717b64(0) 1.210285924 s Expensive timeout(9) function: 0xffff000000717b64(0) 4.001010935 s 0xffff000000717b64 looks to be uma_timeout: ffff000000717b60 b ffff000000717b3c = ffff000000717b64 stp x29, x30, [sp, #-32]! ffff000000717b68 stp x20, x19, [sp, #16] . . . . . . Hmm. The debug kernel test context seems to take a very long time. It has not failed so far but is still going. So I stopped it and switch to testing with the genet0 device that was involved for the earlier failures. . . . It did not fail. Nor did the debug kernel report anything beyond: if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 Expensive timeout(9) function: 0xffff00000050c088(0) 6.318652023 s on one machine and: if_delmulti_locked: detaching ifnet instance 0xffffa0000b56b800 on the other. So I may reboot into the also-updated non-debug builds on both machines and try in that context. >>> Thanks for isolating this, rick >>> ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy. >>=20 >> I'll warn that the primary "small arm" development/support >> folk(s) do not work on the RPi*'s these days, beyond >> committing what others provide and the like. >=20 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-stable@freebsd.org Sun May 23 08:27:56 2021 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 40B556426F3 for ; Sun, 23 May 2021 08:27:56 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic317-22.consmr.mail.gq1.yahoo.com (sonic317-22.consmr.mail.gq1.yahoo.com [98.137.66.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Fntkz1MZXz4SCj for ; Sun, 23 May 2021 08:27:54 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1621758473; bh=bIf4OIk4pQ2UA7yrY98LrGjC1OchfkepDWNkKLJzWu+=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=M+8f+J8jLSDwXZlBpN2jUuNafqbMULGmH2t92v9SE1G6+bWsJx1LiHSoz/ypW8Z/So6XY+lT0SNNocH1H0rHCFX6xkkBUM5Z5cbUmQHAV/bp/AbdWlPUFviRJbRARJBGBHAcUSk6tUitma44E+cRQrilxMd9PLVc9Ijubo75VLQJ5u0C+QIzN6OgfDOWIj6NXpA1wd0Pti22EgbloOde2Dnz6/V0Xzo1nnSVv96n1nej7are3K7oo+9TP039uJtbUn9cimZ28lD6q1r2J1QPVf2Djv4ldZr0wyzDJUQcyVelTgc89r5OmD8CT4UoQRgT7B5o5ak0/YKM2GLGk68SGw== X-YMail-OSG: hKL64qsVM1kGV.RhQCzZd82g64vRlIgyCMsZsRmNEkayDkFjFfACNRswNR8Kpkg 5GwgrhCrLMiki6xw_rl1iqFO1i9FGCZH0WHFsCOiEG_zw4rhED760cwIpHHQBvjKaE9clGfJw0pQ _9eT7DawBK70qjhGRh9V4tYKA8gyVX0VdrDZoThg4wOU2u3VCwxzWWAR5LgWQo71YW0bCYEsIvc0 bVlhhjAdXj7KjIVVAwqrhidspUK6ONzyEmNtiQ.iDeGdp7bAy6Qgdstzfx_5RgMl9Awppn6P_tql 1UErZyD7GUxuD_mfQq1pqtywThHxfZdInPpiHv.KP1DL.A.1a._vipa1WEK7eWf9aXg1SHAYsXp0 YnTyjBj0amkOCcDySFBqRLmwQIHR6813W5IH_Uf22DPt3pYYxXTrUA8EniE6RxKYPoADGqVs7m7w GTWC9fvobLkinlHFwr6UbWaANGu7f4WwOMyJK3st.PNii3sBui930BQj_Q8uBCzG4Te2m9Kb6mHG pRogG8X.PBE7MbFZVgOU4T9LBz8I0R.fbYefCXBc0ULVm0RfHiIox9d3y7W.sb6f570v7kX0N4Nn EsOt_h3j75J1C0K0eQyn4ZcsYeywtDisEgmdmPTkMjY.A.QQ.Lh0_N78eY3DVW7IToY2CgOd19gW eDbT8rDQjbsLKdYJAH7Nh7SXfSNKG_tE_59yOIs3VLW2ZfLLHWF56O1AHkaTpU5ZOCAHm.37bpfj WUn0cInzFDF.ZtdDPZywMdXCgo1AtOxgcfHBWvLRmTd00n7Ryc22S5Z2LD8gBPhuUckSWyntXjuQ q49mDqYDKarpXSRxaSWn6Y2oKPy1o2FJi8w5EbvRzKYYVbvJvvXgJ85M4K_VEbdmV0ZwDbqsufgq S7nGS6K68U6jNvnCTHSpXE6Q.It5cp_Eb9Rru6_xkpJOUbLS1rnjHddZFarY5fbWEtQHuqxJrjoc Itqe8HaXU93xTuUcCyBF1RrMiMM2CGQx8bBaqAajIR4HLT2n5fqO9COw7rlasgJIWeskAqftxTHn N1jyTlHuti6fO.gsDvTCIXkMaf8LcsJjnQ7_bDHliQGwZyiGSGL3vL1oRnZC5BhTPxtHaqwGseBq YVD5euedk6BvTsmhAQEyKfCB723_tPEM.YD.kW_sIQ8ZUXmi0GWCSdVF68TBMWT2y17MbBwih.Ix WekTkwAZWa4v1RoVBtC2v5OBguFUeD3lBEZoq2u5tg9YOYJCt4FzxJJ_3EfZKJmuKVU0BNAO_LS7 YQOWucvuAQwT7JAKUlG666nNKOuQT_dW5MBQtyw63GZYy9nNrQwOIdQ4pFLbf7oUNIW4JcBzdGvA H.dOsfKoDyj4cWLnRwAY4pzZc5d.jP.FzcTkjFAOfUU2E840C2PXqoRtwAkNj2rlhk2Y09w1zeTL DViS06MaZuPx4udntoZP7MuZM2evYZFu.1ZoRxNhWMKap5e0290uMZBXxmRTmULrUf0cwbdAhoQ5 6iH1LpdQOqNfgaJ.xTlD6gJtP5Y2LXKs_nWKLDFbiHsbRkjOPFHEfcLATvHLp_z77XbgGagAdcEI yGj9rUJVvVhal2LifQi.E5hpjsYzTpUEb.A.s_7guzVPKj6m_6Ypqp_reMl9Z8NKr5CkvNNp3jH2 kqOuVDeMrgtvpBJ8XnQaWX2I3K.CJ6lp.xOJ3iODgfh8DmvQbUvw1n5KAAD0za_qpnTdlHNm14ps Cv4jIIcmz2eeD_3_UEC7U01TV5SutNQYroVAWYKsa2hE77Wfl72odDlJj8BtA.Ar.Nfi_Cwfohm5 GN5CnNVrZvv9Q05_KTxg2TmX48BKjyzq3x3rnK653KO7fpFflMsvxavzWKxB_gBr4p1z4KCCGPT2 rlLxRF71FLVhcD394rmElsuXweFUhI1Qgk4SdnDxK5XsxyXjXr_JaQz62eIqqVFoYjDyq46caIlS ubj5o2E_L3CMWnTDJde2AE0ix2PkLaMBO5SCira2sg9LeoDyZZuZ5G5JLD_ZxWDKplsAj7q3QpUV ycpg8JqVFMLb7z.CAtBqOfzASbrKYJ0ToFgYn5PureKRXSEje2DD1Z9Hh0.PrEJoZRydd4pXIxkq CJw5b_gXy9grrEdpDkAXcslzWdshQ0R4Tvvrp1HhPiveJBDJvlHIg7B74rvOhSGf9yeFdQde0R9q 4KrqnrW2E.oacQxSRt6bjdvmBLoE3NLz7bWV3tp6x4x44p0KwPSXkYUtbR4UChYPOTFI9eD3uOnz aIN7mRhMVVm3I5BFkteIZ9ZVJ6hDXLajy3.5hLE5Qim.eesXlMUGQVm.M9I_NFyswR8e6QOkS2TO jhpNBnnNoH7FVKFTUZ7Uaf8IQTY6prmHK45HeUihJzkjDLg_XB3I8OzNt12WX7dithKtQGDIVkoi RKB.kcFGcnNURxm6ETsg40uaqZc6IA1ZZbQFFN7DxRF2i2JpzjG7iVLFR1xGg8I3HQGPC4Jh9ngg JF9BBA4SYHpuGGXtQeU3ZRiBIQaSzTI_95GIsB9OyRyMmtxBDblCGBQJG4jHxbWEfc0ck2yhOuIF 5QLFeLUuN7PinioSNffGLS9tZUd4NJDmkjOnhh9lrEUz9wMi04KLF2EX17QpazBdRuAi.GN7QgNj ghFu3dKMnkNUgMlux3D5vG_o2uGmqCto68Bx6GPdOdFVa3a3BrTnduQKMMI9ka2zCkX8lUpXmVWG 6NCjwhn8jA3TnPc5ANA-- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic317.consmr.mail.gq1.yahoo.com with HTTP; Sun, 23 May 2021 08:27:53 +0000 Received: by kubenode568.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 0d7b07fd701f11badb94660b39388a3e; Sun, 23 May 2021 08:27:48 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.80.0.2.43\)) Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context) From: Mark Millard In-Reply-To: <47AE7DDF-F4BA-4632-BDCC-FB1F1AE30810@yahoo.com> Date: Sun, 23 May 2021 01:27:47 -0700 Cc: FreeBSD-STABLE Mailing List Content-Transfer-Encoding: quoted-printable Message-Id: <6F0F0719-F029-4DE9-AEB8-5A9FF8303C6F@yahoo.com> References: <623369D9-5EE5-4FEF-B9AD-56499E8F1C09.ref@yahoo.com> <623369D9-5EE5-4FEF-B9AD-56499E8F1C09@yahoo.com> <04D7264A-206B-4281-B452-779B01EA3327@yahoo.com> <34E915B3-30DF-408C-A931-C39188F3EB0F@yahoo.com> <508C3B05-79E5-49ED-8032-DA7DF249E154@yahoo.com> <47AE7DDF-F4BA-4632-BDCC-FB1F1AE30810@yahoo.com> To: Rick Macklem X-Mailer: Apple Mail (2.3654.80.0.2.43) X-Rspamd-Queue-Id: 4Fntkz1MZXz4SCj X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.50 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; RCVD_TLS_LAST(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; ARC_NA(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[98.137.66.148:from]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; SPAMHAUS_ZRD(0.00)[98.137.66.148:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.66.148:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.66.148:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-stable] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 May 2021 08:27:56 -0000 On 2021-May-23, at 00:44, Mark Millard wrote: > On 2021-May-21, at 17:56, Rick Macklem = wrote: >=20 >> Mark Millard wrote: >> [stuff snipped] >>> Well, why is it that ls -R, find, and diff -r all get file >>> name problems via genet0 but diff -r gets no problems >>> comparing the content of files that it does match up (the >>> vast majority)? Any clue how could the problems possibly >>> be unique to the handling of file names/paths? Does it >>> suggest anything else to look into for getting some more >>> potentially useful evidence? >> Well, all I can do is describe the most common TSO related >> failure: >> - When a read RPC reply (including NFS/RPC/TCP/IP headers) >> is slightly less than 64K bytes (many TSO implementations are >> limited to 64K or 32 discontiguous segments, think 32 2K >> mbuf clusters), the driver decides it is ok, but when the MAC >> header is added it exceeds what the hardware can handle correctly... >> --> This will happen when reading a regular file that is slightly = less >> than a multiple of 64K in size. >> or >> --> This will happen when reading just about any large directory, >> since the directory reply for a 64K request is converted to Sun = XDR >> format and clipped at the last full directory entry that will fit = within 64K. >> For ports, where most files are small, I think you can tell which is = more >> likely to happen. >> --> If TSO is disabled, I have no idea how this might matter, but?? >>=20 >>> I'll note that netstat -I ue0 -d and netstat -I genet0 -d >>> do not report changes in Ierrs or Idrop in a before vs. >>> after failures comparison. (There may be better figures >>> to look at for all I know.) >>>=20 >>> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6" >>> and got no obvious change in behavior. >> All we know is that the data is getting corrupted somehow. >>=20 >> NFS traffic looks very different than typical TCP traffic. It is >> mostly small messages travelling in both directions concurrently, >> with some large messages thrown in the mix. >> All I'm saying is that, testing a net interface with something like >> bulk data transfer in one direction doesn't verify it works for NFS >> traffic. >>=20 >> Also, the large RPC messages are a chain of about 33 mbufs of >> various lengths, including a mix of partial clusters and regular >> data mbufs, whereas a bulk send on a socket will typically >> result in an mbuf chain of a lot of full 2K clusters. >> --> As such, NFS can be good at tickling subtle bugs it the >> net driver related to mbuf handling. >>=20 >> rick >>=20 >>>> W.r.t. reverting r367492...the patch to replace r367492 was just >>>> committed to "main" by rscheff@ with a two week MFC, so it >>>> should be in stable/13 soon. Not sure if an errata can be done >>>> for it for releng13.0? >>>=20 >>> That update is reported to be causing "rack" related panics: >>>=20 >>> = https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.h= tml >>>=20 >>> reports (via links): >>>=20 >>> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ = /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_= stacks/rack.c:10632 >>>=20 >>> Still, I have a non-debug update to main building and will >>> likely do a debug build as well. llvm is rebuilding, so >>> the builds will take a notable time. >=20 > I got the following built and installed on the two > machines: >=20 > # uname -apKU > FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 = main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.= aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 >=20 > # uname -apKU > FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 = main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.= aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 >=20 > Note that both are booted with debug builds of main. >=20 > Using the context with the alternate EtherNet device that has not > had an associated diff -r, find, pr ls -R failure yet > yet got a panic that looks likely to be unrelated: >=20 > # mount -onoatime 192.168.1.187:/usr/ports/ /mnt/ > # diff -r /usr/ports/ /mnt/ | more > nvme0: cpl does not map to outstanding cmd > cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0 > panic: received completion for unknown cmd > cpuid =3D 3 > time =3D 1621743752 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > vpanic() at vpanic+0x188 > panic() at panic+0x44 > nvme_qpair_process_completions() at = nvme_qpair_process_completions+0x1fc > nvme_timeout() at nvme_timeout+0x3c > softclock_call_cc() at softclock_call_cc+0x124 > softclock() at softclock+0x60 > ithread_loop() at ithread_loop+0x2a8 > fork_exit() at fork_exit+0x74 > fork_trampoline() at fork_trampoline+0x14 > KDB: enter: panic > [ thread pid 12 tid 100028 ] > Stopped at kdb_enter+0x48: undefined f904411f > db>=20 >=20 > Based on the "nvme" references, I expect this is tied to > handling the Optane 480 GiByte that is in the PCIe slot > and is the boot/only media for the machine doing the diff. >=20 > "db> dump" seems to have worked. >=20 > After reboot, zpool scrub found no errors. >=20 > So, trying again . . . >=20 > I got some "Expensive timeout(9) function" notices: >=20 > Expensive timeout(9) function: 0xffff000000717b64(0) 1.210285924 s > Expensive timeout(9) function: 0xffff000000717b64(0) 4.001010935 s >=20 > 0xffff000000717b64 looks to be uma_timeout: >=20 > ffff000000717b60 b ffff000000717b3c = > ffff000000717b64 stp x29, x30, [sp, #-32]! > ffff000000717b68 stp x20, x19, [sp, #16] > . . . >=20 > . . . Hmm. The debug kernel test context seems to take a > very long time. It has not failed so far but is still > going. >=20 > So I stopped it and switch to testing with the genet0 device > that was involved for the earlier failures. . . . >=20 > It did not fail. Nor did the debug kernel report anything > beyond: >=20 > if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 > if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 > Expensive timeout(9) function: 0xffff00000050c088(0) 6.318652023 s >=20 > on one machine and: >=20 > if_delmulti_locked: detaching ifnet instance 0xffffa0000b56b800 >=20 > on the other. >=20 > So I may reboot into the also-updated non-debug builds on both > machines and try in that context. >=20 The non-debug build pair of machines got the problem: # diff -r /usr/ports/ /mnt/ | more Only in /mnt/devel/electron12/files:=20 Only in /usr/ports/devel/electron12/files: = patch-chrome_browser_media_webrtc_webrtc__logging__controller.cc Only in /usr/ports/devel/electron12/files: = patch-components_previews_core_previews__features.cc Only in /mnt/devel/electron12/files: =D6=8F=DC=A62= ^H Only in /mnt/www/chromium/files: patch-chrome_browser_chrome__browser Only in /usr/ports/www/chromium/files: = patch-chrome_browser_chrome__browser__main__posix.cc I'll note that it turns out that the debug build had more than is typical enabled: DIAGNOSTICS, BUF_TRACKING, and FULL_BUF_TRACKING were also enabled. I'd forgotten that I'd previously had a reason to add those to what my debug builds included (for a prior problem investigation). I'd not done debug builds in some time. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-stable@freebsd.org Sun May 23 19:20:57 2021 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 5390E64DACA for ; Sun, 23 May 2021 19:20:57 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic301-22.consmr.mail.gq1.yahoo.com (sonic301-22.consmr.mail.gq1.yahoo.com [98.137.64.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Fp9DS0wdFz4TXk for ; Sun, 23 May 2021 19:20:55 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1621797654; bh=Q9j1lckaJQdLlns6bRrnBeRU3Dh09a58aB/NMQBBE9G=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=p0w6tAYU409iMKiHlf4G5xOi0pE7SRuFPtG2GqI15hA5GfqQG2jg/pwAzjpl31eRE73yu3jhMPNP7QDSltdM9+cz4zDIWDsnEVTfh/T7DKgWge/9bDYJYT6wPxSWuYGbGhdF3tarZXseDnGXoV+ZsssbKXUbwD5NjVb85zKGZK6wYEyAH4T53opNsTmKkSJKckq9bkTKcVCSrrOkbBmCByhuArpGWLjmGsA6uPH3oqqWOP62kGnwsv9fDB6LkkJVzA8pRc36r1VI62vPUPyuYHjP6vRuwf+DRB7Qsl3tZ6hZBoaYNS9b/yoZKXHbvgwYqcoGNw2KQLQpMDyLSRTTvw== X-YMail-OSG: r1vMsk4VM1kZi8lsQDMz5WtpK3iJIcxtQD698q.tU8LH.HMchhNtr64wi2FjdsE ObLVgJNoxjoqvjnhelnx.uz6ORj0985Go96uw6S2CUFin1putajwRfUGyhrqOwS13ygiEKhkqrmD XBO.XB0pSJoU6cT0nxJkr1k0fp3_s3S3aW2b1ujNYA7tCKOAh5z7.rMZ6xuHR3JxxLwstPtSChMJ f4X1.CClo17gBbrVdDqXibxcD9j_c1PFJLe875ZajxRh8Rei8crM76HL2D69mEehVzjI1N2Ag0lP pH.D4fu7tSEt6apACErxAbfWbZkRf9smzVGBP6JgMdRYz3Wku9GJC1jtpAVT1eYAFUkL7NTTBPYG 3Erxe_aTH.DdDYFXyLDbu7TbqKkX4L18MM2pCEO3gIv2UGEJHaUEE5Ux2_D7J0TG3FxRi0DVUUbE wuYgPjmPvYFPoknlgDCCJGrMQQFlX1vITKKIQVg6ygOvEv1H8g6eVuWQ53nVlDTDgUNAmYw6Yxvd LLiUzVXql4E2saBHWUeSdVpwf6AjItHFiixg2wnuxkwNGV29mv.fvKk4ICKg6CA4.UMechEhhdLb KOLbc0nLBTK93do57V.b7I3vbq28HiizgD3C8K_GuVuKabHKfrtCX9tAadOIkyt_jYNMr4mwWLD6 8XwuxI6rFRHMzIMJSUhubBW.G7HUuDyd26AdnEZmiU.bAsoSXags1SBX7YGoUS3IFhg8pZfbAAt7 y1nVcqSxQZ7WCK6IbNcmxiMuPz.SlCyRYDD7hH7bkC19maRvLd0WyXqEoYG09zTK0Ohh2Jr8J3zg 1.GMjMy9woqVuF3Fh_xWfiia5kXwKxGGqBmC7MMrxprhX6Zko_O6xj7eapHsm3J1vMwwTXZrj0hZ dQFZUqpICIbBDMSrS6XIOCc_Od_uiAFySDooYsRC7vSni2U2GWBlxUajv5DgwOwW6vS3183lXhkZ 8Fp.G2PJyZNlqveTe5uHqsQtMhfhSZSJGu1P_.RyQSXKodRTNl5mDDdTk6HDxHU7Mj9RlN.yEI_m WVCctAQOTFvVzxqByoNPcOmsK90kdMf9pHdfukNvD.1zXXNOOVlMpIyYxH7JayNTjT.RS9fm3DDf FQJ.oa.KwM7Tdo.NHyL7aCMrUHFy3WXLzy9rxWvhG_Hrsx4lLx63QJY3cG1QUAOtRyCFmwwWcBMT z9pnS.VZWLyZXaT58a5fztZANe78G6U5gtSFrW8RvAqxn6R17qn.os0_soxoQrt_uywAP_mlyusQ CvGSjjLsmb6c1Ehu2ZP8ALmVk5gQ.SwfD1tv0e8eEyglhbPl.DHiaea6JmIB2U8jdwAXyruff4Rj vOWErKmvJXei_rWj49B7U5rEJQM.T_DRmGIEK_9vYPdc4sGtIJ1HZMAtVogaem1bEPTapb8evOQQ nu6dhipCG8tWEe8R53JpBc_cznvZKNNsggldq7m51IemFrEM2gGruv05fgf94Ogdn_lAx5sbUIz6 .RA2rBc2MxLFvvASW4WN3vhbRWw_hkXjy..UklA4o.BFpcAzceQoG0gUyc9EA6yJcG9eLBX1LIAZ 0WowaxAohPltqKErSLGHDwTNvg9OjP.5pJHz8ChzTF8Gk.1ZMmyOUv2Ui8JZWjjvetwsQjaET1qt OGgSMtI5byD.tB9i5vXCF37b8xHhHqKEYI9iTLTi_IXmgTbYr0.ulJ3BGw9lSva.AN64.JOzMgue 98FrXM05F1t0y37rxVmC6DC0K_avYE70e7N6y9VFJvmWHl6fijFjzKPHNsytqlkSEbIqHmrZxvW9 .RQRLXbSR9UNk1P721r0Qz3zeVVV0k39UKnx6s2wJJueM7WmU84_bLWkvkseuTVegA_DRCHKR3EU 4jEe8FSOnPmvBrHaqAR09hF3SMntjKSmof4mcyCN57o.Z3vsR6mPWiBKqbMWv268SRKI7dlthYKS XEfrzK93_oxc6E2sasZUIm_J66DovvIjXwno0MjdvG1f6.SdcJn6bsxIRa1FPNxvXd5D2eqUucGd Zc8yqXCLsMP0UKNWpT8S4ut0ukAhSGPN9sfEwVhfxM9RulKwN7BvvEytiS2Msh2XxVwaBzuNR4Tu rSanhQ8Dz6FgFdvBjztqjPfN27Sb5IHiMGu2tJAMrqk6Wfy.z_t0GgneDqQgK1IDg8EqY.828b_c VxP1LVnc609eJ6cZ8.FEY9kggMR9cMB6s8CnoBXLLXJr3o90BKLSJ4tpELgfsv2ISP1TEXG4qziB Rrt9YPp8pcxmw9g5FEwir3nHuVw4ZMBl6ajTmbBv6pfhdFjFAYv3zEHFVdEly3l.vDv1Zfo8DXyG jribqaNuOja93OHCrg.84CntKMEdNTRv16JFYI2nuUf6U3W4UAdvrqF2JpnAG1pBzfjHy05a6hBI T_znHlwqqg4kj9iMg_X2vihAF2Hv8j4jcVYT_Poq8ol.YTAvAeWQdwAWEFpWcGVEdZdvCHmEUrTI ZJjHlKHp60FAL97DpWMM2Y0KYHn8aXzvV4BZYSjKjOOHVbJVZTycoghpzLTzm9oGMfd9EPzmFFAL xnWw0kpKwr9Th9X2WmC6pyPbd0K8QbOMj0nGSBXbBU5P1dmed8gjcHE.pAV60edk6HQJKvLWPyY_ J6AKZSRWC1fwBkTXjlgCWPxwlNO79wAlva2t3nSkXdpbjQdSbf7NdLCRVaGzvZLL_tmYe5dFXdfb PXxMILBmAe_w1UdEylKA- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic301.consmr.mail.gq1.yahoo.com with HTTP; Sun, 23 May 2021 19:20:54 +0000 Received: by kubenode581.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 03eae780b799445c94cedaffd5d804ed; Sun, 23 May 2021 19:20:53 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.80.0.2.43\)) Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context) From: Mark Millard In-Reply-To: <6F0F0719-F029-4DE9-AEB8-5A9FF8303C6F@yahoo.com> Date: Sun, 23 May 2021 12:20:52 -0700 Cc: FreeBSD-STABLE Mailing List Content-Transfer-Encoding: quoted-printable Message-Id: <88E486FD-3AA6-49D9-828B-D2F5267835D1@yahoo.com> References: <623369D9-5EE5-4FEF-B9AD-56499E8F1C09.ref@yahoo.com> <623369D9-5EE5-4FEF-B9AD-56499E8F1C09@yahoo.com> <04D7264A-206B-4281-B452-779B01EA3327@yahoo.com> <34E915B3-30DF-408C-A931-C39188F3EB0F@yahoo.com> <508C3B05-79E5-49ED-8032-DA7DF249E154@yahoo.com> <47AE7DDF-F4BA-4632-BDCC-FB1F1AE30810@yahoo.com> <6F0F0719-F029-4DE9-AEB8-5A9FF8303C6F@yahoo.com> To: Rick Macklem X-Mailer: Apple Mail (2.3654.80.0.2.43) X-Rspamd-Queue-Id: 4Fp9DS0wdFz4TXk X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.50 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; RBL_DBL_DONT_QUERY_IPS(0.00)[98.137.64.148:from]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; SPAMHAUS_ZRD(0.00)[98.137.64.148:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.64.148:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.64.148:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-stable] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 May 2021 19:20:57 -0000 On 2021-May-23, at 01:27, Mark Millard wrote: > On 2021-May-23, at 00:44, Mark Millard wrote: >=20 >> On 2021-May-21, at 17:56, Rick Macklem = wrote: >>=20 >>> Mark Millard wrote: >>> [stuff snipped] >>>> Well, why is it that ls -R, find, and diff -r all get file >>>> name problems via genet0 but diff -r gets no problems >>>> comparing the content of files that it does match up (the >>>> vast majority)? Any clue how could the problems possibly >>>> be unique to the handling of file names/paths? Does it >>>> suggest anything else to look into for getting some more >>>> potentially useful evidence? >>> Well, all I can do is describe the most common TSO related >>> failure: >>> - When a read RPC reply (including NFS/RPC/TCP/IP headers) >>> is slightly less than 64K bytes (many TSO implementations are >>> limited to 64K or 32 discontiguous segments, think 32 2K >>> mbuf clusters), the driver decides it is ok, but when the MAC >>> header is added it exceeds what the hardware can handle correctly... >>> --> This will happen when reading a regular file that is slightly = less >>> than a multiple of 64K in size. >>> or >>> --> This will happen when reading just about any large directory, >>> since the directory reply for a 64K request is converted to Sun = XDR >>> format and clipped at the last full directory entry that will fit = within 64K. >>> For ports, where most files are small, I think you can tell which is = more >>> likely to happen. >>> --> If TSO is disabled, I have no idea how this might matter, but?? >>>=20 >>>> I'll note that netstat -I ue0 -d and netstat -I genet0 -d >>>> do not report changes in Ierrs or Idrop in a before vs. >>>> after failures comparison. (There may be better figures >>>> to look at for all I know.) >>>>=20 >>>> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6" >>>> and got no obvious change in behavior. >>> All we know is that the data is getting corrupted somehow. >>>=20 >>> NFS traffic looks very different than typical TCP traffic. It is >>> mostly small messages travelling in both directions concurrently, >>> with some large messages thrown in the mix. >>> All I'm saying is that, testing a net interface with something like >>> bulk data transfer in one direction doesn't verify it works for NFS >>> traffic. >>>=20 >>> Also, the large RPC messages are a chain of about 33 mbufs of >>> various lengths, including a mix of partial clusters and regular >>> data mbufs, whereas a bulk send on a socket will typically >>> result in an mbuf chain of a lot of full 2K clusters. >>> --> As such, NFS can be good at tickling subtle bugs it the >>> net driver related to mbuf handling. >>>=20 >>> rick >>>=20 >>>>> W.r.t. reverting r367492...the patch to replace r367492 was just >>>>> committed to "main" by rscheff@ with a two week MFC, so it >>>>> should be in stable/13 soon. Not sure if an errata can be done >>>>> for it for releng13.0? >>>>=20 >>>> That update is reported to be causing "rack" related panics: >>>>=20 >>>> = https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.h= tml >>>>=20 >>>> reports (via links): >>>>=20 >>>> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ = /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_= stacks/rack.c:10632 >>>>=20 >>>> Still, I have a non-debug update to main building and will >>>> likely do a debug build as well. llvm is rebuilding, so >>>> the builds will take a notable time. >>=20 >> I got the following built and installed on the two >> machines: >>=20 >> # uname -apKU >> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 = main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.= aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 >>=20 >> # uname -apKU >> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 = main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.= aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 >>=20 >> Note that both are booted with debug builds of main. >>=20 >> Using the context with the alternate EtherNet device that has not >> had an associated diff -r, find, pr ls -R failure yet >> yet got a panic that looks likely to be unrelated: >>=20 >> # mount -onoatime 192.168.1.187:/usr/ports/ /mnt/ >> # diff -r /usr/ports/ /mnt/ | more >> nvme0: cpl does not map to outstanding cmd >> cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0 >> panic: received completion for unknown cmd >> cpuid =3D 3 >> time =3D 1621743752 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self >> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >> vpanic() at vpanic+0x188 >> panic() at panic+0x44 >> nvme_qpair_process_completions() at = nvme_qpair_process_completions+0x1fc >> nvme_timeout() at nvme_timeout+0x3c >> softclock_call_cc() at softclock_call_cc+0x124 >> softclock() at softclock+0x60 >> ithread_loop() at ithread_loop+0x2a8 >> fork_exit() at fork_exit+0x74 >> fork_trampoline() at fork_trampoline+0x14 >> KDB: enter: panic >> [ thread pid 12 tid 100028 ] >> Stopped at kdb_enter+0x48: undefined f904411f >> db>=20 >>=20 >> Based on the "nvme" references, I expect this is tied to >> handling the Optane 480 GiByte that is in the PCIe slot >> and is the boot/only media for the machine doing the diff. >>=20 >> "db> dump" seems to have worked. >>=20 >> After reboot, zpool scrub found no errors. >>=20 >> So, trying again . . . >>=20 >> I got some "Expensive timeout(9) function" notices: >>=20 >> Expensive timeout(9) function: 0xffff000000717b64(0) 1.210285924 s >> Expensive timeout(9) function: 0xffff000000717b64(0) 4.001010935 s >>=20 >> 0xffff000000717b64 looks to be uma_timeout: >>=20 >> ffff000000717b60 b ffff000000717b3c = >> ffff000000717b64 stp x29, x30, [sp, #-32]! >> ffff000000717b68 stp x20, x19, [sp, #16] >> . . . >>=20 >> . . . Hmm. The debug kernel test context seems to take a >> very long time. It has not failed so far but is still >> going. >>=20 >> So I stopped it and switch to testing with the genet0 device >> that was involved for the earlier failures. . . . >>=20 >> It did not fail. Nor did the debug kernel report anything >> beyond: >>=20 >> if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 >> if_delmulti_locked: detaching ifnet instance 0xffffa00000fc8000 >> Expensive timeout(9) function: 0xffff00000050c088(0) 6.318652023 s >>=20 >> on one machine and: >>=20 >> if_delmulti_locked: detaching ifnet instance 0xffffa0000b56b800 >>=20 >> on the other. >>=20 >> So I may reboot into the also-updated non-debug builds on both >> machines and try in that context. >>=20 >=20 > The non-debug build pair of machines got the problem: >=20 > # diff -r /usr/ports/ /mnt/ | more > Only in /mnt/devel/electron12/files:=20 > Only in /usr/ports/devel/electron12/files: = patch-chrome_browser_media_webrtc_webrtc__logging__controller.cc > Only in /usr/ports/devel/electron12/files: = patch-components_previews_core_previews__features.cc > Only in /mnt/devel/electron12/files: = =D6=8F=DC=A62^H > Only in /mnt/www/chromium/files: patch-chrome_browser_chrome__browser > Only in /usr/ports/www/chromium/files: = patch-chrome_browser_chrome__browser__main__posix.cc >=20 > I'll note that it turns out that the debug build had more > than is typical enabled: DIAGNOSTICS, BUF_TRACKING, and > FULL_BUF_TRACKING were also enabled. I'd forgotten that > I'd previously had a reason to add those to what my debug > builds included (for a prior problem investigation). I'd > not done debug builds in some time. Without DIAGNOSTIC, BUF_TRACKING, and FULL_BUF_TRACKING (so based on a more normal debug build on both sides), the diff -r progressed at a more normal, sustained rate. Yet . . . # mount -onoatime 192.168.1.170:/usr/ports/ /mnt/ # diff -r /usr/ports/ /mnt/ | more #=20 In other words: no failure from the debug build. Also no reports of anything by the debug kernel. Multiple attempts (including some with reboots between): same results. So, it appears that only non-debug builds are broken, for whatever reason. For reference: # uname -apKU FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #2 = main-n246854-03b0505b8fe8-dirty: Sun May 23 05:57:01 PDT 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.= aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From nobody Thu May 27 08:18:42 2021 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id DD732C7F92B for ; Thu, 27 May 2021 08:18:52 +0000 (UTC) (envelope-from ingeborg.hellemo@uit.no) Received: from smtp-relay.uit.no (smtp02.uit.no [129.242.4.116]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp-relay.uit.no", Issuer "GEANT OV RSA CA 4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FrLLg4hqDz4bvr for ; Thu, 27 May 2021 08:18:51 +0000 (UTC) (envelope-from ingeborg.hellemo@uit.no) Received: from nordnytt.cc.uit.no ([129.242.6.226]) by smtp-relay.uit.no over TLS secured channel with Microsoft SMTPSVC(10.0.14393.4169); Thu, 27 May 2021 10:18:42 +0200 Received: from nordnytt.cc.uit.no (localhost [127.0.0.1]) by nordnytt.cc.uit.no (8.15.2/8.15.2) with ESMTP id 14R8Ig20087984 for ; Thu, 27 May 2021 10:18:42 +0200 (CEST) (envelope-from ingeborg.hellemo@uit.no) Message-Id: <202105270818.14R8Ig20087984@nordnytt.cc.uit.no> X-Mailer: exmh version 2.9.0 11/07/2018 with nmh-1.6 To: freebsd-stable@freebsd.org Subject: Service started via service(8) fails From: Ingeborg Hellemo List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Date: Thu, 27 May 2021 10:18:42 +0200 X-OriginalArrivalTime: 27 May 2021 08:18:42.0744 (UTC) FILETIME=[E7DD6380:01D752D0] X-Rspamd-Queue-Id: 4FrLLg4hqDz4bvr X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=uit.no; spf=pass (mx1.freebsd.org: domain of ingeborg.hellemo@uit.no designates 129.242.4.116 as permitted sender) smtp.mailfrom=ingeborg.hellemo@uit.no X-Spamd-Result: default: False [-3.30 / 15.00]; ARC_NA(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[129.242.4.116:from]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; MV_CASE(0.50)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; SPAMHAUS_ZRD(0.00)[129.242.4.116:from:127.0.2.255]; RCVD_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+ip4:129.242.4.0/24]; NEURAL_HAM_LONG(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[uit.no,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:224, ipnet:129.242.0.0/16, country:NO]; RCVD_TLS_LAST(0.00)[]; MAILMAN_DEST(0.00)[freebsd-stable] X-ThisMailContainsUnwantedMimeParts: N FreeBSD 11.4-RELEASE-p3 What, if any, difference is there between using 'service restart' and /usr/local/etc/rc.d/ restart'? I have this mindboggling situation where using 'service tac_plus restart' leads to a service that is running but not working properly, whereas '/usr/local/etc/rc.d/tac_plus restart' gives me a working service. --Ingeborg -- Ingeborg Østrem Hellemo -- ingeborg.hellemo@uit.no Dep. of Information Technology --- Univ. of Tromsø From nobody Thu May 27 09:22:56 2021 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 246C5CFD76C for ; Thu, 27 May 2021 09:23:05 +0000 (UTC) (envelope-from stb@lassitu.de) Received: from gilb.zs64.net (gilb.zs64.net [IPv6:2a00:14b0:4200:32e0::1ea]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "gilb.zs64.net", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FrMmm72rgz4l6f for ; Thu, 27 May 2021 09:23:04 +0000 (UTC) (envelope-from stb@lassitu.de) Received: by gilb.zs64.net (Postfix, from stb@lassitu.de) id 933DD3916C4; Thu, 27 May 2021 09:22:56 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.6\)) Subject: Re: Service started via service(8) fails From: Stefan Bethke In-Reply-To: <202105270818.14R8Ig20087984@nordnytt.cc.uit.no> Date: Thu, 27 May 2021 11:22:56 +0200 Cc: freebsd-stable@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <6D2F0952-05E5-4B1F-B29C-77AA3CF0B785@lassitu.de> References: <202105270818.14R8Ig20087984@nordnytt.cc.uit.no> To: Ingeborg Hellemo X-Mailer: Apple Mail (2.3608.120.23.2.6) X-Rspamd-Queue-Id: 4FrMmm72rgz4l6f X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N Am 27.05.2021 um 10:18 schrieb Ingeborg Hellemo = : >=20 > FreeBSD 11.4-RELEASE-p3 >=20 > What, if any, difference is there between using 'service = restart'=20 > and /usr/local/etc/rc.d/ restart'? >=20 > I have this mindboggling situation where using 'service tac_plus = restart'=20 > leads to a service that is running but not working properly, whereas=20= > '/usr/local/etc/rc.d/tac_plus restart' gives me a working service. Mostly the environment, plus file descriptors. Also, in some weird = cases, the shell you're running the script from will make a difference. For example, PATH will likely be different between the environment that = service(8) creates vs. the one you're running the script from directly. Stefan --=20 Stefan Bethke Fon +49 151 14070811 From eugen@grosbein.net Thu May 27 09:56:12 2021 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 413A4CFFC2E for ; Thu, 27 May 2021 09:56:29 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from hz.grosbein.net (hz.grosbein.net [IPv6:2a01:4f8:c2c:26d8::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FrNWK07DWz4nZx for ; Thu, 27 May 2021 09:56:28 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from eg.sd.rdtc.ru (root@[62.231.161.221]) by hz.grosbein.net (8.15.2/8.15.2) with ESMTPS id 14R9uInG009055 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 27 May 2021 09:56:19 GMT (envelope-from eugen@grosbein.net) X-Envelope-From: eugen@grosbein.net X-Envelope-To: ingeborg.hellemo@uit.no Received: from [10.58.0.10] (dadv@dadvw [10.58.0.10]) by eg.sd.rdtc.ru (8.16.1/8.16.1) with ESMTPS id 14R9uE3f009238 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 27 May 2021 16:56:14 +0700 (+07) (envelope-from eugen@grosbein.net) Subject: Re: Service started via service(8) fails To: Ingeborg Hellemo , freebsd-stable@freebsd.org References: <202105270818.14R8Ig20087984@nordnytt.cc.uit.no> From: Eugene Grosbein Message-ID: Date: Thu, 27 May 2021 16:56:12 +0700 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 In-Reply-To: <202105270818.14R8Ig20087984@nordnytt.cc.uit.no> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_00,LOCAL_FROM, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Report: * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * -0.0 SPF_PASS SPF: sender matches SPF record * 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record * 2.6 LOCAL_FROM From my domains * -0.0 NICE_REPLY_A Looks like a legit reply (A) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on hz.grosbein.net X-Rspamd-Queue-Id: 4FrNWK07DWz4nZx X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N 27.05.2021 15:18, Ingeborg Hellemo wrote: > FreeBSD 11.4-RELEASE-p3 > > What, if any, difference is there between using 'service restart' > and /usr/local/etc/rc.d/ restart'? > > I have this mindboggling situation where using 'service tac_plus restart' > leads to a service that is running but not working properly, whereas > '/usr/local/etc/rc.d/tac_plus restart' gives me a working service. The "service" command resets the environment but direct script invocation inherits the environment. Such problem often is a sign of wrong startup script or configuration of the service that makes incorrect assumption about PATH and/or locale settings. It may fail to find some commands because it does not set PATH to include /usr/local/... directories or fail due to incorrect locale assumptions or even TZ (time zone) settings. From nobody Thu May 27 10:19:01 2021 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id BF12DD79B61 for ; Thu, 27 May 2021 10:19:10 +0000 (UTC) (envelope-from hausen@punkt.de) Received: from mail.punkt.de (mail.punkt.de [217.29.41.227]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4FrP1T62QNz4r7Q for ; Thu, 27 May 2021 10:19:09 +0000 (UTC) (envelope-from hausen@punkt.de) Received: from [217.29.46.87] (kagate.punkt.de [217.29.33.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.punkt.de (Postfix) with ESMTPSA id A14551D8D2 for ; Thu, 27 May 2021 12:19:02 +0200 (CEST) From: "Patrick M. Hausen" Content-Type: multipart/signed; boundary="Apple-Mail=_20512DCB-17C5-4375-83A6-05A5FA9FF3B9"; protocol="application/pgp-signature"; micalg=pgp-sha256 List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.20\)) Subject: Re: Service started via service(8) fails Date: Thu, 27 May 2021 12:19:01 +0200 References: <202105270818.14R8Ig20087984@nordnytt.cc.uit.no> To: freebsd-stable stable In-Reply-To: Message-Id: X-Mailer: Apple Mail (2.3445.104.20) X-Rspamd-Queue-Id: 4FrP1T62QNz4r7Q X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of hausen@punkt.de designates 217.29.41.227 as permitted sender) smtp.mailfrom=hausen@punkt.de X-Spamd-Result: default: False [-4.89 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ip4:217.29.32.0/20]; HAS_ATTACHMENT(0.00)[]; TO_DN_ALL(0.00)[]; NEURAL_HAM_SHORT(-0.99)[-0.993]; SIGNED_PGP(-2.00)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RBL_DBL_DONT_QUERY_IPS(0.00)[217.29.41.227:from]; ASN(0.00)[asn:16188, ipnet:217.29.32.0/20, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; R_DKIM_NA(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.20)[multipart/signed,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; DMARC_NA(0.00)[punkt.de]; RCPT_COUNT_ONE(0.00)[1]; SPAMHAUS_ZRD(0.00)[217.29.41.227:from:127.0.2.255]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-stable] X-ThisMailContainsUnwantedMimeParts: N --Apple-Mail=_20512DCB-17C5-4375-83A6-05A5FA9FF3B9 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi all, > Am 27.05.2021 um 11:56 schrieb Eugene Grosbein : >=20 > 27.05.2021 15:18, Ingeborg Hellemo wrote: >=20 >> FreeBSD 11.4-RELEASE-p3 >>=20 >> What, if any, difference is there between using 'service = restart' >> and /usr/local/etc/rc.d/ restart'? >>=20 >> I have this mindboggling situation where using 'service tac_plus = restart' >> leads to a service that is running but not working properly, whereas >> '/usr/local/etc/rc.d/tac_plus restart' gives me a working service. >=20 > The "service" command resets the environment but direct script = invocation inherits the environment. > Such problem often is a sign of wrong startup script or configuration = of the service > that makes incorrect assumption about PATH and/or locale settings. > It may fail to find some commands because it does not set PATH to = include /usr/local/... directories > or fail due to incorrect locale assumptions or even TZ (time zone) = settings. Since /usr/sbin/service is itself a shell script you could try sh -x /usr/sbin/service tac_plus restart sh -x /usr/local/etc/rc.d/tac_plus restart and compare what is actually executed. Kind regards, Patrick -- punkt.de GmbH Patrick M. Hausen .infrastructure Kaiserallee 13a 76133 Karlsruhe Tel. +49 721 9109500 https://infrastructure.punkt.de info@punkt.de AG Mannheim 108285 Gesch=C3=A4ftsf=C3=BChrer: J=C3=BCrgen Egeling, Daniel Lienert, Fabian = Stein --Apple-Mail=_20512DCB-17C5-4375-83A6-05A5FA9FF3B9 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEgzqrjO/mj9CSsTg2kG8u4u3aiVwFAmCvchUACgkQkG8u4u3a iVzLTgf7Buavle4bgMQ0gOpJzKtUugZvva0CPQ1/1+GNqOZAAoCJzg4tSN0h2xrE fV0k24r7UBo9Dmko7LtksgS0/cD1FVSn/ZU4TzAQnBJlXfiVHoEiVHJeT+Hb/LZb hx7jM5Ogq7Ib7XkhyEKjrzS9qMAUOCgYXciaqOvVWp3tqVM6xa5Gg4CUrlK/dY1N Q2R6DKjyk+Kmnt9p/gyi0tBXVuzj88q9eAnL9d1XYkCXEPz0fToN+8tSzJ6THYC0 O7FZLxp+wqr6YqEAHwFrDN8H4TPw58jDPcidXjurWSR+l84mhLp1/gZ1cpSIrAU6 y5xU9DdTawTOZ58Jx3gEDpbXfT9Whg== =i0YY -----END PGP SIGNATURE----- --Apple-Mail=_20512DCB-17C5-4375-83A6-05A5FA9FF3B9-- From nobody Sun May 30 15:46:55 2021 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 0B211DF45E3 for ; Sun, 30 May 2021 15:47:04 +0000 (UTC) (envelope-from reichert@numachi.com) Received: from away.numachi.com (away.numachi.com [66.228.38.138]) by mx1.freebsd.org (Postfix) with SMTP id 4FtN8R36nKz3Dc2 for ; Sun, 30 May 2021 15:47:03 +0000 (UTC) (envelope-from reichert@numachi.com) Received: (qmail 17213 invoked from network); 30 May 2021 15:46:55 -0000 Received: from unknown (HELO meisai.numachi.com) (72.71.251.201) by away.numachi.com with SMTP; 30 May 2021 15:46:55 -0000 Received: (qmail 22431 invoked by uid 1001); 30 May 2021 15:46:55 -0000 Date: Sun, 30 May 2021 11:46:55 -0400 From: Brian Reichert To: "Patrick M. Hausen" Cc: freebsd-stable stable Subject: Re: Service started via service(8) fails Message-ID: <20210530154655.GV73712@numachi.com> References: <202105270818.14R8Ig20087984@nordnytt.cc.uit.no> List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i X-Rspamd-Queue-Id: 4FtN8R36nKz3Dc2 X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of reichert@numachi.com has no SPF policy when checking 66.228.38.138) smtp.mailfrom=reichert@numachi.com X-Spamd-Result: default: False [0.50 / 15.00]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[66.228.38.138:from]; MV_CASE(0.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[numachi.com]; AUTH_NA(1.00)[]; NEURAL_SPAM_SHORT(1.00)[1.000]; SPAMHAUS_ZRD(0.00)[66.228.38.138:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; R_SPF_NA(0.00)[no SPF record]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:63949, ipnet:66.228.32.0/20, country:US]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-stable] X-ThisMailContainsUnwantedMimeParts: N On Thu, May 27, 2021 at 12:19:01PM +0200, Patrick M. Hausen wrote: > >> I have this mindboggling situation where using 'service tac_plus restart' > >> leads to a service that is running but not working properly, whereas > >> '/usr/local/etc/rc.d/tac_plus restart' gives me a working service. I explore things by stripping the environment, e.g.: env - /usr/local/etc/rc.d/tac_plus restart You'd likely have to provide a minimal path, if your script doesn't explicitly do so. -- Brian Reichert BSD admin/developer at large