From nobody Sat May 29 20:25:04 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 7964FEA8237 for ; Sat, 29 May 2021 20:25:14 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic301-22.consmr.mail.gq1.yahoo.com (sonic301-22.consmr.mail.gq1.yahoo.com [98.137.64.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4FstMr6Gn0z4YRH for ; Sat, 29 May 2021 20:25:12 +0000 (UTC) (envelope-from marklmi@yahoo.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1622319910; bh=p4aggnXXG4H/PoBO3Vypui1Uh04jeJJyUGWSIMbSzFo=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=iokO3iQsQv6cFWFiJxyvcVeBs9Ylyo0umsmvyJaG6PMeZGg48vvhrv9/prr2OiecO9ocOwjNDgPQF7lsFE41hpyMEiKaXYJRXccfT6Y4cKiJDtkA+bzrW4Jpy9p4XmWR6s5ejAZoo7Ws7W1qmWyaBm3/f7DpNZFYZyxSzRfbnmboJ/8vkJw/LCS1qo4qXCIY1g5jiiub0N3wISoBY5mjN/jFVmeapNsyTWumAfgGOtelNOTkN3dcnIH3QEwK0+EFoswx/Hu2QwGYFQVQuMOrGQM2nHgCFfDL0XFd2SQP0N2niaatYeHvVtxwxUIbFyrWf+lhGlWCglkv6njgAHlzmA== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1622319910; bh=0Raub8PdcZJE+R2MLz/vl4bAhoxaYDIHtI0DnLYETV8=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=TDKQ7g11H22W71FGcwN69quHFR5uY8jorY5xEzdliaqrK9ge0/jt0qREtN4dsZbqfacKsfeIEtW9zXzJCGJgAt2QCX5AhunuXwvjscIUKG5rh4955VS6xnubnWila2cMjlRyalldbWaD4rQW0ivmHXm9eFVfbFYMLgr6b6lYKJHFqN4sjrU4Mp9+8d1jkhPGnuSk7GRzaq9/X9st3pe8E4XSBdBuy0dzhbsIEb40Uz3dge3VP0uyuVfCPUgNCeFYnQ8x9eyA2pk2oxZ9DnF+4AWrNs572N9vYkrCKg+8WlFQ+OukAs9XA0ntwG+qqFgaNJAykHCRjS8pHVnLbitQOA== X-YMail-OSG: 8EB6HQ8VM1nyiDP7gSQHbEOmAH00tmzIB4w3GNlgI36RnGGZZvp9lskV.tp9_FO uNTCXh9OGoZD553fH9UgxHpt2N49lJMf7G.g0S1Ur7Hdxr7.nsARVQtXw7PMUwHG5y2tkuAr6K9H 3BXQKMZRwJIfn8_89gFDwU6A4zcFD4_4lOx4G75DIjO0tAd4TG9Kz1SGTnNN0IsCtGaZF0GcMuK_ xMD_1bvZTebRCwJDNzOOe2TgHLh.63PUjbZyQdl_6trAnKBDNh6ofxnQ6GTGrq0JD6zLNO1K3NXX e73rFODU1O65ssLq2oci.UV6PBqFtQk4FAU1r_k76fNslDaWE1ifatlXHjZHQo6j7ZXLH5nnHI5m apm0WV7V8Xn0V6IYdqLKve2VnDqd01vUJAXe668ZIf7oJgbmOk9jOub5EI0mAiuMM9mzvspG5GGj 3Sz5XEVZl9sjPqpY7HkprA6oXG7q.3.FlrCOg8fgNvbQNL8VZ.JwMTpXiSCjoOpSzrdBG_SxwaRU up.aCwhybdUjEb4mSUBOKyhU9BoulWo1PVSb6rZvHcoffbIKtC.1OS1smDKNTfRSPjH0zMjzU6m1 B5DwmTtDTbVDMz59ONr2z1fZoNGy9GnEm4BOn_f.hyRe1rISH412EKC3l20Wltookbn11YON4sHz 1T9bKoCKtm2yPt.ZSMtcG44IU5FXZMw4B3X2HSrZsJZF.Q8y_4pBwujQyWmzYjRHCUbb_xVqVk1G HlixcovY49awQdHg2cL0paCnnZQTr68V5q41QNWOoNrXDAObL0O7zDh.gXn.5jYk2OVx9dovoLjR nYFW1qe27gqv2HQ40RKutK17xGxavDLkzqSkUzkUPIUoB_fX2cBAFAcJgVyhLwaiCP2dE1bA3Jj9 N9VxFjj8Kb.KxBDjiscshjx3hzKQK0zMPQZ0J0Y_renZBEyIb0m8XYtXmIYEf27p8cAM4G9SK3Xt 41IFycUd.A0lAAtFkOWly1CNGGZnxhdBwokO1XQfnFNIAiIC01q_pJDzJs1EUoEBog3wVu2X8CML 8OE9ZaVznmSaOL3j9hQ288AUj6B7AXDteFXc3kpEVTSCISBETY0MbIipW7bQWE9Jgl00oScKDnL6 vY2io_a5eisvY0ALQhrgWWwTu7_p4n82t8Efg2YI1In4XSzBAJWoGkaVP9AAGd7oMUUjhT5rzL92 s1sK36.Q.44S1CkK5WPK160wY8hUL1UPxz.vJX_7CaZWfkRn3b6P5f7AvXTNfZtCd_f4QtX3Bvhn Mn.zFpvH0Z5YQULquaCVR_Z0Ha634WQ6tMsjRr.SmKVqukXb_7oKiUVOEa6ZJex9Z2PwHnh4db6f Oyomoigkhc84L2OSaRKoxRjM84aFv9lh4HV34inTUcZqyzF1b.GeJXh15OeG_TI1ClApP7sYOpz0 .Sae_bJwYWxrOkk.Pl6vTveVtoz5PqV5UpYIrsw1m6u6dSousdIMQnam5R9FDC13pWSP3PIfjGZ4 92LObL01A8r0pjDrclUD0XQXFdznVWY4Td1Ji5i1_R77bCpv6Wu22qgfWtsZ1tGIMTraUVImqqnO iImJXOI7jiYnuQ0qvQ223JnpPZ3RGBs6SLSfSkm3JneoLPimRTCU_24wmzA4ZhJcTpyuItqY3uBJ 4Dsb1V0CP5YFRElHq_8H4XY11Hd4bS2um7cWEISFhe6SQbz8iRjLhn_O8gQVfsysH4in17owq1Vb KBF_cI97qyuuunier2npPf1FUWKGMVrm20XmGnABgIXOXv64uIrgz6yFu7VAHMiiMYzl3WsYQ6Bd U_vjps4p0N4bSawcw8C.y4oYFrkOPYfOiHzfJxKWmhj4o8YF6fSRgugjsNYSoTdfzVmYnUpNJNBO pJEEUv7X19kOVaJ.kOw6SlN7LbUXXuV8Pq7ZDVgaNmq.kq8xIx2VuHUBlsM5yxtnBgSLo83VWswx Y2dF8Aq0_f2BssR1CzU.RWyg7eN20qhTMLhz94lytpZaSWKMekvTRGZZKCX5YjIFOSH46DUZ5XiM n69cHGN.cGX66HQgMNiM4FFA5bxxx9BcdZ9oCx8Cmwjo75YEYixBVDsaVhSWF1AZOL4JiFVG3sbV FiX.vVSlHtvamyi0BLQVb6B8vramW16YSiNi1yDPw62KMkPuJvPc3RMItSdk.UGk7JFetTNMVPMN EoSYZhR_7npTWzoe0BmCxs7rYNfnMdmSYrwzljp7EFcPde3dHc_saA64H.h0MuoDO.2M5OwITO0J UKJCRZ5SRnHXbGYcKszaXFvTEAzuOwP2IdlR43oKqhsDqfuhniqGhLqmejr8aELPBNkwFjFmwSxz snzDKZ730SuHVmswRgtW9fK4vbHtclgy2YXxGg74Ha1kebLr2DkgZRfTuC5PbkAOLF1P4GVnvALU aEAyudGBu6BBaSrtXwKI5OFA8xTVXrMoKkH5GXOZW2QNFUbv4_Wwez2GtuLOgTWE9T67IdUfe8KJ g2ZC7TfWZ9lV2q9edendcE1bSmZfT4WcdNs7CHbCMbQh0OPpwZPy7GBDNqFkDd.o6zrAnAvgO25c LZzdj.CtWVoMzGjQDR98FBRtNBEXbUasNvaMlqHqKPCgAsAJE21GTY1kNSxfw1.yTZVP3T8O3HJn EuivBVFxQc7rvraVnWJXSUHb9hAIY9LYCAZw3Pj.0IUDRvE9BL4P9qtHG2kxQkSkPYHxv_9o- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic301.consmr.mail.gq1.yahoo.com with HTTP; Sat, 29 May 2021 20:25:10 +0000 Received: by kubenode514.mail-prod1.omega.bf1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 6949223eab58cfe0afae3c20c4a03d16; Sat, 29 May 2021 20:25:07 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.80.0.2.43\)) Subject: Re: I got a panic for "nvme0: cpl does not map to outstanding cmd" on a MACHIATObin Double Shot In-Reply-To: Date: Sat, 29 May 2021 13:25:04 -0700 Cc: freebsd-current , freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: <4515C859-98C7-4FE2-AFBA-08CFF28D0774@yahoo.com> References: <063D5E36-126F-497C-97AF-827BADC1ED2F.ref@yahoo.com> <063D5E36-126F-497C-97AF-827BADC1ED2F@yahoo.com> <90A23579-61B7-4619-AFD7-68439FCC16F3@yahoo.com> To: Warner Losh X-Mailer: Apple Mail (2.3654.80.0.2.43) X-Rspamd-Queue-Id: 4FstMr6Gn0z4YRH X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=iokO3iQs; dmarc=pass (policy=reject) header.from=yahoo.com; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.64.148 as permitted sender) smtp.mailfrom=marklmi@yahoo.com X-Spamd-Result: default: False [-2.16 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; ARC_NA(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[98.137.64.148:from]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_SPAM_SHORT(0.34)[0.340]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; SPAMHAUS_ZRD(0.00)[98.137.64.148:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.64.148:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.64.148:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-current] Reply-To: marklmi@yahoo.com From: Mark Millard via freebsd-current X-Original-From: Mark Millard X-ThisMailContainsUnwantedMimeParts: N On 2021-May-29, at 01:15, Mark Millard wrote: > On 2021-May-23, at 00:46, Mark Millard via freebsd-current = wrote: >=20 >> On 2021-May-23, at 00:08, Mark Millard via freebsd-current = wrote: >>=20 >>> On 2021-May-22, at 22:16, Warner Losh wrote: >>>=20 >>>> On Sat, May 22, 2021 at 10:44 PM Mark Millard via freebsd-arm = wrote: >>>> # mount -onoatime 192.168.1.187:/usr/ports/ /mnt/ >>>> # diff -r /usr/ports/ /mnt/ | more >>>> nvme0: cpl does not map to outstanding cmd >>>> cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 = dnr:0 >>>> panic: received completion for unknown cmd >>>>=20 >>>> cid 0x7e has no currently active command. The cid is used by the = driver >>>> to map completions back to requests. >>>>=20 >>>> So, there's usually 3 possibilities that I've seen this with. >>>>=20 >>>> (1) There's a missing cache flush so you get a bogus cpl back = because something stale >>>> was read. It's unlikely to be this one because the rest of this = look like a successful >>>> command completed: sc =3D 0 is successful completion and sct is a = generic command queued. >>>>=20 >>>> (2) We're looking at the completion record twice because we failed = to properly update the >>>> head pointer and we've already completed the command. I've only = ever seen this in a >>>> panic situation where we interrupt the completion routine because = something else >>>> paniced. >>>>=20 >>>> (3) There's something that's corrupting the act_tr array in the = qpair. I've not seen this, >>>> but if something else smashes that area (zeroing it in this case), = then that could cause >>>> an error like this. >>>=20 >>> Of note may be that I buildworld and buildkernel with extra >>> tuning enabled, targeting the cortex-a72. In one past example >>> this lead to finding a missing synchronization related to XHCI >>> handling that was fixed. (The fix was not aarch64 specific at >>> all.) For that: A cortex-a53 did not show the problem with or >>> without that tuning. A cortex-a72 showed the problem only with >>> the cortex-a72 tuning, not with targeting a cortex-a53 tuning >>> or generic armv7, for example. >>>=20 >>> Not that I've any evidence specifically suggesting such would >>> be involved here. But it might be good to keep in mind as a >>> possaibility. >>>=20 >>>> Or it could be something new I've not seen nor thought about = before. >>>>=20 >>>> cpuid =3D 3 >>>> time =3D 1621743752 >>>> KDB: stack backtrace: >>>> db_trace_self() at db_trace_self >>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >>>> vpanic() at vpanic+0x188 >>>> panic() at panic+0x44 >>>> nvme_qpair_process_completions() at = nvme_qpair_process_completions+0x1fc >>>> nvme_timeout() at nvme_timeout+0x3c >>>> softclock_call_cc() at softclock_call_cc+0x124 >>>> softclock() at softclock+0x60 >>>> ithread_loop() at ithread_loop+0x2a8 >>>> fork_exit() at fork_exit+0x74 >>>> fork_trampoline() at fork_trampoline+0x14 >>>> KDB: enter: panic >>>> [ thread pid 12 tid 100028 ] >>>> Stopped at kdb_enter+0x48: undefined f904411f >>>> db>=20 >>>>=20 >>>> Based on the "nvme" references, I expect this is tied to >>>> handling the Optane 480 GiByte that is in the PCIe slot >>>> and is the boot/only media for the machine doing the diff. >>>>=20 >>>> "db> dump" seems to have worked. >>>>=20 >>>> After the reboot, zpool scrub found no errors. >>>>=20 >>>> For reference: >>>>=20 >>>> # uname -apKU >>>> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 = main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.= aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 >>>>=20 >>>> If you have the dump, I suggest starting to make sure that the = act_tr array looks sane. Make >>>> sure all the live pointers point to a sane looking tr. Make sure = that tr is on the active list, etc >>>>=20 >>>> It will take a fair amount of driver reading, though, to see how we = got here. I'd also check to >>>> make sure that qpair->num_enttries > cpl.cid (0x3e in this case). >>>>=20 >>>=20 >>> Okay. I got this while trying to test an odd diff -r over NFS >>> issue with the more recent software. So the two will potentially >>> compete for time. >>>=20 >>> As investigation will be exploratory for me, not familiar, I'll >>> probably publish periodic notes on things as I go along looking >>> at stuff. >>>=20 >>> My first is that the /var/crash/core.txt.0 has a gdb backtrace: >>>=20 >>> . . . >>> #10 0xffff00000047900c in panic ( >>> fmt=3D0x12 ) >>> at /usr/main-src/sys/kern/kern_shutdown.c:843 >>> #11 0xffff0000002226b4 in nvme_qpair_process_completions ( >>> qpair=3Dqpair@entry=3D0xffffa00008724300) >>> at /usr/main-src/sys/dev/nvme/nvme_qpair.c:617 >>> #12 0xffff000000223354 in nvme_timeout = (arg=3Darg@entry=3D0xffffa0000b053980) >>> at /usr/main-src/sys/dev/nvme/nvme_qpair.c:938 >>> #13 0xffff000000495bf8 in softclock_call_cc (c=3D0xffffa0000b0539a0,=20= >>> cc=3Dcc@entry=3D0xffff000000de3500 , direct=3D0) >>> at /usr/main-src/sys/kern/kern_timeout.c:696 >>> #14 0xffff000000495fb0 in softclock (arg=3D0xffff000000de3500 = ) >>> at /usr/main-src/sys/kern/kern_timeout.c:816 >>> #15 0xffff0000004356dc in intr_event_execute_handlers (p=3D,=20 >>> ie=3D0xffffa000058bc700) at /usr/main-src/sys/kern/kern_intr.c:1168 >>> #16 ithread_execute_handlers (p=3D, = ie=3D0xffffa000058bc700) >>> at /usr/main-src/sys/kern/kern_intr.c:1181 >>> #17 ithread_loop (arg=3D, = arg@entry=3D0xffffa000058aef60) >>> at /usr/main-src/sys/kern/kern_intr.c:1269 >>> #18 0xffff000000431f6c in fork_exit ( >>> callout=3D0xffff000000435430 , = arg=3D0xffffa000058aef60,=20 >>> frame=3D0xffff0000eb7cc990) at = /usr/main-src/sys/kern/kern_fork.c:1083 >>> #19 >>>=20 >>> So via kgdb . . . >>>=20 >>> (kgdb) up 11 >>> #11 0xffff0000002226b4 in nvme_qpair_process_completions = (qpair=3Dqpair@entry=3D0xffffa00008724300) at = /usr/main-src/sys/dev/nvme/nvme_qpair.c:617 >>> 617 KASSERT(0, ("received completion for = unknown cmd")); >>>=20 >>> (kgdb) print/x cpl.cid >>> $4 =3D 0x7e >>> (kgdb) print/x qpair->num_entries >>> $5 =3D 0x100 >>>=20 >>> Based on also seeing the code: >>>=20 >>> qpair->act_tr =3D malloc_domainset(sizeof(struct nvme_tracker = *) * >>> qpair->num_entries, M_NVME, DOMAINSET_PREF(qpair->domain), >>> M_ZERO | M_WAITOK); >>>=20 >>> (kgdb) print qpair->act_tr >>> $6 =3D (struct nvme_tracker **) 0xffffa00008725800 >>> (kgdb) x/256g 0xffffa00008725800 >>> 0xffffa00008725800: 0x0000000000000000 0x0000000000000000 >>> 0xffffa00008725810: 0x0000000000000000 0x0000000000000000 >>> . . . >>> 0xffffa00008725fe0: 0x0000000000000000 0x0000000000000000 >>> 0xffffa00008725ff0: 0x0000000000000000 0x0000000000000000 >>>=20 >>> It was all zeros (null pointers). No "live" pointers and, so, >>> no tr's to inspect. >>>=20 >>> As none of this is familiar context beyond general programming >>> concepts, it may be some time before I find anything else >>> potentially of interest to report. If you have other specific >>> things you would like me to look at, let me know. >>>=20 >>=20 >> A fairly obvious thing I should have provided: >>=20 >> (kgdb) print/x *qpair >> $15 =3D {ctrlr =3D 0xffff0000fe154000, id =3D 0x3, domain =3D 0x0, = cpu =3D 0x2, vector =3D 0x3, rid =3D 0x4, res =3D 0xffffa000086ded80, = tag =3D 0xffffa0000877b780, num_entries =3D 0x100, num_trackers =3D = 0x80,=20 >> sq_tdbl_off =3D 0x1018, cq_hdbl_off =3D 0x101c, phase =3D 0x1, = sq_head =3D 0x1f, sq_tail =3D 0x20, cq_head =3D 0x20, num_cmds =3D = 0x420, num_intr_handler_calls =3D 0xe66c, num_retries =3D 0x0, = num_failures =3D 0x0,=20 >> cmd =3D 0xffff000100ebb000, cpl =3D 0xffff000100ebf000, dma_tag =3D = 0xffffa0000b093e00, dma_tag_payload =3D 0xffffa000059ef000, queuemem_map = =3D 0xffffa00005a07700, cmd_bus_addr =3D 0xacbb000,=20 >> cpl_bus_addr =3D 0xacbf000, free_tr =3D {tqh_first =3D = 0xffffa0000b053a80, tqh_last =3D 0xffffa0000869da80}, outstanding_tr =3D = {tqh_first =3D 0xffffa0000b053980, tqh_last =3D 0xffffa0000b053980}, = queued_req =3D { >> stqh_first =3D 0x0, stqh_last =3D 0xffffa000087243c8}, act_tr =3D = 0xffffa00008725800, is_enabled =3D 0x1, lock =3D {lock_object =3D = {lo_name =3D 0xffff00000090321f, lo_flags =3D 0x1030000, lo_data =3D = 0x0,=20 >> lo_witness =3D 0xffffa0043fd96080}, mtx_lock =3D 0x0}} >>=20 >> Looks like I need to boot into the non-debug builds for the >> other problem I'm testing for repeatability after a commit. >=20 >=20 > I've no figured out anything interesting so far. >=20 > But I've run into something that looks odd to me > (not that I've any evidence it is related to the > panic, more likely my ignorance): >=20 > There is a use of atomic_store_rel_int(&qpair->cq_head, 0) > for which I do not find any matching atomic_load_acq_int > use (or other explicit _acq), so so there is no "synchronizes > with" status in the code to establish an ordering across > threads that involves the atomic_store_rel_int that I've > found, just implicit/default relaxed-load-operations. A grep > for "cq_head" under dev/nvme/ shows only: >=20 > ./dev/nvme/nvme_private.h: uint32_t cq_head; > ./dev/nvme/nvme_sysctl.c: SYSCTL_ADD_UINT(ctrlr_ctx, que_list, = OID_AUTO, "cq_head", > ./dev/nvme/nvme_sysctl.c: CTLFLAG_RD, &qpair->cq_head, 0, > ./dev/nvme/nvme_qpair.c: * below, but before we can reset = cq_head to zero at 2. Also cope with > ./dev/nvme/nvme_qpair.c: if (qpair->cq_head =3D=3D = qpair->num_entries) { > ./dev/nvme/nvme_qpair.c: * Here we know that = we need to zero cq_head and then negate > ./dev/nvme/nvme_qpair.c: * the phase, which = hasn't been assigned if cq_head isn't > ./dev/nvme/nvme_qpair.c: qpair->cq_head =3D 0; > ./dev/nvme/nvme_qpair.c: } else if (qpair->cq_head =3D=3D= 0) { > ./dev/nvme/nvme_qpair.c: cpl =3D = qpair->cpl[qpair->cq_head]; > ./dev/nvme/nvme_qpair.c: * qpair->cq_head at 1 = below. Later, we re-enter this > ./dev/nvme/nvme_qpair.c: * won't have updated = cq_head. Rather than panic again, > ./dev/nvme/nvme_qpair.c: = nvme_dump_completion(&qpair->cpl[qpair->cq_head]); > ./dev/nvme/nvme_qpair.c: if (++qpair->cq_head =3D=3D = qpair->num_entries) { /* 1 */ > ./dev/nvme/nvme_qpair.c: = atomic_store_rel_int(&qpair->cq_head, 0); /* 2 */ > ./dev/nvme/nvme_qpair.c: qpair->cq_hdbl_off, = qpair->cq_head); > ./dev/nvme/nvme_qpair.c: qpair->sq_head =3D qpair->sq_tail =3D = qpair->cq_head =3D 0; >=20 > (2 lines above the last has the atomic_store_rel_int use.) >=20 > atomic_thread_fence_rel use would have "synchronizes > with" based on ordinary loads reading something stored > after the atomic_thread_fence_rel. Such is documented > in "man atomic". But that is not what this code is doing. > "man atomic" does not mention ordinary loads getting such > a status by reading what an atomic_store_rel_int wrote. > It does reference the atomic_thread_fence_rel related > status for ordinary loads. >=20 > So I'm clueless about what is intended to be going on > relative to that "atomic_store_rel_int(&qpair->cq_head, 0)". > Overall, the code does not appear to me to match up with the > aarch64, powerpc64, or powerpc code generation requirements > to have any matching "synchronizes with" relationships. > (I'll not list machine instruction sequences here. I'd have > to look up the detailed sequences. But, as I remember, more > than a load is involved in the code sequences for the > acquire side on these types of processors. Nothing in the > source indicates to generate the additional code as far as > I can tell.) Adding notes about the code generation, using fairly weak sequences (these are for C++/C11 semantics as I understand): (Picking prefix-form for powerpc:) powerpc Load Relaxed vs. Acquire: ld vs. ld;cmp;bc;isync powerpc Fence: Acquire: lwsync powerpc Store Relaxed vs. Release: st vs. "Fence: Release";st powerpc Fence: Release: lwsync Note: I am unsure how the above mixes with FreeBSD's plain-load vs. release-fence rules for atomic, for example. I've never seen a instruction level breakdowns of what implements FreeBSD's specific rules for powerpc64. aarch64 Load Relaxed vs. Acquire: LDR vs. LDAR aarch64 Fence: Acquire: DMB ISH LD aarch64 Store Relaxed vs. Release: STR vs. STLR aarch64 Fence: Release: DMB ISH Note: Again I am unsure of the pain-load vs. acquire-fence rule for FreeBSD. And, here, similarly for the plain-store vs. release-fence rule. I've never seen a instruction level breakdowns of what implements FreeBSD's specific rules for aarch64. armv7 Load Relaxed vs. Acquire: ldr vs. ldr;"Fence: Acquire" (or = ldr;teq;beq;isb) armv7 Fence: Acquire: dmb ish armv7 Store Relaxed vs. Release: str vs. "Fence: Release";str armv7 Fence: Release: dmb ish Note: The above fits with FreeBSD's plain-load vs. release-fence and plain-stores vs. acquire-fence rules because the fences are used to form Store-Release and Load-Acquire. (If I gather correctly FreeBSD does not support aarch32 for its distinctions from armv7, for example the aarch32's alternative SeqCst mapping that does not mix well with the armv7 mapping. Still I list aarch32 for reference.) aarch32 Load Relaxed vs. Acquire: LDR vs. LDA aarch32 Fence: Acquire: DMB ISH LD aarch32 Store Relaxed vs. Release: STR vs. STL aarch32 Fence: Release: DMB ISH Note: Again I would be unsure of the pain-load vs. acquire-fence rule for FreeBSD. Similarly for the plain-store vs. release-fence rule. I got these code sequences from: https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html I was unable to use the code sequences to figure out the intended operation of the: atomic_store_rel_int(&qpair->cq_head, 0) absent any atomic_load_acq_int to match --and everything else being based on just plain load/store for cq_head . =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)