Date: Sun, 12 Dec 2021 00:59:13 -0800 From: Mark Millard via freebsd-arm <freebsd-arm@freebsd.org> To: =?utf-8?Q?Kornel_Dul=C4=99ba?= <mindal@semihalf.com>, Emmanuel Vadot <manu@bidouilliste.com> Cc: Free BSD <freebsd-arm@freebsd.org> Subject: Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled? Message-ID: <B0A353B3-DE8C-4C3F-A2E8-9D4F8F877520@yahoo.com> In-Reply-To: <7EFA98DF-325F-4821-A040-FB4A9E66AB8F@yahoo.com> References: <243CBFC7-DFB5-4F8B-A8A3-CFF78455148D.ref@yahoo.com> <243CBFC7-DFB5-4F8B-A8A3-CFF78455148D@yahoo.com> <20211209081930.7970b6995a8f7c5f7466227d@bidouilliste.com> <053617FD-AA34-4A3F-853A-4D2E44F8254B@yahoo.com> <43901D57-9C39-4FAC-A2BE-CCE642791705@yahoo.com> <CAKpxNiwxvs7-%2BsNa1mX8rAUy_Bs4FdE1%2Bamf5hZXB9CehEJdwQ@mail.gmail.com> <8DAA50A1-3CF0-4AFA-9977-58FE15D4F171@yahoo.com> <CAKpxNiyzKF_JgMFEPK00jU=%2B9_qUq3Vg9KzSos8oCXNs2%2BPYyw@mail.gmail.com> <21B0478B-340F-4BB2-9189-B5A6AE458134@yahoo.com> <CCB7E706-E866-4141-AB8F-BE7065376EAA@yahoo.com> <7717F6CF-0239-4DC0-B23F-B9D5F75C0A8D@yahoo.com> <7EFA98DF-325F-4821-A040-FB4A9E66AB8F@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2021-Dec-12, at 00:29, Mark Millard via freebsd-arm = <freebsd-arm@freebsd.org> wrote: > On 2021-Dec-11, at 16:19, Mark Millard <marklmi@yahoo.com> wrote: >=20 >> [I've cut out the history: just presenting some new evidence.] >>=20 >> First, a little context from getting to the db> prompt. >>=20 >> db> ps >> pid ppid pgrp uid state wmesg wchan cmd >> 18 0 0 0 DL syncer 0xffff000000eca5a8 [syncer] >> 17 0 0 0 DL vlruwt 0xffffa00007d2ea60 [vnlru] >> 16 0 0 0 DL (threaded) = [bufdaemon] >> 100089 D qsleep 0xffff000000ec9478 = [bufdaemon] >> 100092 D - 0xffff000000c11100 = [bufspacedaemon-0] >> 100093 D - 0xffff000000c21680 = [bufspacedaemon-1] >> 9 0 0 0 DL psleep 0xffff000000ef0650 [vmdaemon] >> 8 0 0 0 DL (threaded) = [pagedaemon] >> 100087 D psleep 0xffff000000ee2b38 [dom0] >> 100094 D launds 0xffff000000ee2b44 = [laundry: dom0] >> 100095 D umarcl 0xffff0000007b38d8 [uma] >> 7 0 0 0 DL mmcsd d 0xffffa00007b72e00 = [mmcsd0boot1: mmc/sd] >> 6 0 0 0 DL mmcsd d 0xffffa00007b71300 = [mmcsd0boot0: mmc/sd] >> 5 0 0 0 DL mmcreq 0xffff00009b5d0710 [mmcsd0: = mmc/sd card] >> 4 0 0 0 DL - 0xffff000000ccc020 = [rand_harvestq] >> 15 0 0 0 DL (threaded) [usb] >> . . . >>=20 >> and "mmcreq" is from the while loop in: >>=20 >> static int >> mmc_wait_for_req(struct mmc_softc *sc, struct mmc_request *req) >> { >>=20 >> req->done =3D mmc_wakeup; >> req->done_data =3D sc; >> if (__predict_false(mmc_debug > 1)) { >> device_printf(sc->dev, "REQUEST: CMD%d arg %#x flags = %#x", >> req->cmd->opcode, req->cmd->arg, req->cmd->flags); = =20 >> if (req->cmd->data) { >> printf(" data %d\n", (int)req->cmd->data->len);=20= >> } else >> printf("\n"); >> } >> MMCBR_REQUEST(device_get_parent(sc->dev), sc->dev, req); >> MMC_LOCK(sc); >> while ((req->flags & MMC_REQ_DONE) =3D=3D 0) >> msleep(req, &sc->sc_mtx, 0, "mmcreq", 0); >> MMC_UNLOCK(sc); >> if (__predict_false(mmc_debug > 2 || (mmc_debug > 0 && >> req->cmd->error !=3D MMC_ERR_NONE))) >> device_printf(sc->dev, "CMD%d RESULT: %d\n", >> req->cmd->opcode, req->cmd->error); >> return (0); >> } >>=20 >> So it appears that the error report: >>=20 >> mmcsd0: Error indicated: 4 Failed >>=20 >> ends up associated with (req->flags & MMC_REQ_DONE) =3D=3D 0 staying >> true in the above code: an unbounded loop with MMC_LOCK(sc) active. >> The "4" in the error report seems to be from: >>=20 >> #define MMC_ERR_FAILED 4 >>=20 >> It looks like there are some problems with handling errors, problems >> such that it gets stuck looping (no panic, no progress). >>=20 >> That seems to be separate from why the MMC_ERR_FAILED was generated >> in the first place. So: 2 problems, not just one. Thus it may be a >> good context for tackling the looping problem with a known example >> failure to look at. >>=20 >>=20 >>=20 >> Just for reference, I tried "boot -v" with debug.verbose_sysinit=3D1 = in place, >> just to capture and report the tail of the output for the boot = failure. >>=20 >> . . . >> subsystem f000000 >> release_aps(0)... Release APs...done >> done. >> intr_irq_shuffle(0)... Trying to mount root from = ufs:/dev/gpt/Rock64root []... >> done. >> netisr_start(0)... done. >> taskqgroup_bind_softirq(0)... done. >> GEOM: new disk mmcsd0 >> GEOM: new disk mmcsd0boot0 >> GEOM: new disk mmcsd0boot1 >> smp_after_idle_runnable(0)... done. >> taskqgroup_bind_if_config_tqg(0)... done. >> taskqgroup_bind_if_io_tqg(0)... done. >> tmr_setup_user_access(0)... done. >> subsystem f000001 >> mmcsd0: Error indicated: 4 Failed >> epoch_init_smp(0)... done. >> subsystem f100000 >> racctd_init(0)... done. >> subsystem fffffff >> start_periodic_resettodr(0)... done. >> oktousecallout(0)... done. >> clknode_finish(0)... Unresolved linked clock found: hdmi_phy >> Unresolved linked clock found: usb480m_phy >> done. >> regulator_constraint(0)... done. >> regulator_shutdown(0)... regulator: shutting down unused regulators >> regulator: shutting down vcc_sd... busy >> done. >> uhub0: 1 port with 1 removable, self powered >> uhub2: 2 ports with 2 removable, self powered >> uhub3: 1 port with 1 removable, self powered >> uhub1: 1 port with 1 removable, self powered >> ugen4.2: <Samsung PSSD T7 Touch> at usbus4 >> umass0 on uhub2 >> umass0: <Samsung PSSD T7 Touch, class 0/0, rev 3.20/1.00, addr 1> on = usbus4 >> umass0: SCSI over Bulk-Only; quirks =3D 0x0000 >> umass0:0:0: Attached to scbus0 >> pass0 at umass-sim0 bus 0 scbus0 target 0 lun 0 >> pass0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI = device >> pass0: Serial Number REPLACED >> pass0: 400.000MB/s transfers >> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 >> da0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI device >> da0: Serial Number REPLACED >> da0: 400.000MB/s transfers >> da0: 953869MB (1953525168 512 byte sectors) >> da0: quirks=3D0x2<NO_6_BYTE> >> da0: Delete methods: <NONE(*),ZERO> >> random: unblocking device. >>=20 >> No more output after that. >=20 > As for why MMC_ERR_FAILED results, the following code diff is > intended to suggest what I think may be incomplete about sticking > to what the device-specific code supports vs. does not support > (not supporting HS200 here). The code does compile in my context > but is untested. It is now tested (at least to be a useful hack): no longer am I running an older 1400042 kernel. For reference, # uname -apKU FreeBSD Rock64_RPi_4_3_2v1p2 14.0-CURRENT FreeBSD 14.0-CURRENT #18 = main-n251456-22c4ab6cb015-dirty: Sun Dec 12 00:34:53 PST 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA53-nodbg-clang/usr/main-src/arm6= 4.aarch64/sys/GENERIC-NODBG-CA53 arm64 aarch64 1400043 1400043 And it reports during the boot (other than the "REPLACED"): mmcsd0: 125GB <MMCHC DJNB4R 0.7 SN REPLACED MFG 06/2016 by 21 0x0000> at = mmc0 52.0MHz/8bit/1016-block So it no longer sets up a mode that the rk3328-specific-code does not actually support. (Nothing that I've done here deals with the looping issue when there is a MMC_ERR_FAILED or the like.) > The email handling may mess up some leading > whitespace --but, again, I'm only trying to suggest a type of > change. >=20 > # git -C /usr/main-src/ diff /usr/main-src/sys/dev/mmc > diff --git a/sys/dev/mmc/mmc.c b/sys/dev/mmc/mmc.c > index 9c73dfd57ce0..dffd1c382684 100644 > --- a/sys/dev/mmc/mmc.c > +++ b/sys/dev/mmc/mmc.c > @@ -59,6 +59,7 @@ __FBSDID("$FreeBSD$"); > #include <sys/param.h> > #include <sys/systm.h> > #include <sys/kernel.h> > +#include <sys/kobj.h> > #include <sys/malloc.h> > #include <sys/lock.h> > #include <sys/module.h> > @@ -1512,6 +1513,8 @@ mmc_timing_to_string(enum mmc_bus_timing timing) > static bool > mmc_host_timing(device_t dev, enum mmc_bus_timing timing) > { > + kobjop_desc_t kobj_desc; > + kobj_method_t *kobj_method; > int host_caps; >=20 > host_caps =3D mmcbr_get_caps(dev); > @@ -1543,14 +1546,37 @@ mmc_host_timing(device_t dev, enum = mmc_bus_timing timing) > case bus_timing_mmc_ddr52: > return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_DDR52)); > case bus_timing_mmc_hs200: > - return (HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS200_120) || > - HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS200_180)); > case bus_timing_mmc_hs400: > - return (HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS400_120) || > - HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS400_180)); > case bus_timing_mmc_hs400es: > - return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400 | > - MMC_CAP_MMC_ENH_STROBE)); > + /* > + * Disable eMMC modes that require use of > + * MMC_SEND_TUNING_BLOCK_HS200 to set things up if = either the > + * tune or re-tune method is the default NULL = implementation. > + */ > + kobj_desc =3D &mmcbr_tune_desc; > + kobj_method =3D = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, > + kobj_desc); > + if (kobj_method =3D=3D &kobj_desc->deflt) > + return (false); > + kobj_desc =3D &mmcbr_retune_desc; > + kobj_method =3D = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, > + kobj_desc); > + if (kobj_method =3D=3D &kobj_desc->deflt) { > + return (false); > + } > + > + /* > + * Otherwise track the host capabilities. > + */ > + if (timing =3D=3D bus_timing_mmc_hs200) > + return (HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS200_120) || > + HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS200_180)); > + if (timing =3D=3D bus_timing_mmc_hs400) > + return (HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS400_120) || > + HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS400_180)); > + if (timing =3D=3D bus_timing_mmc_hs400es) > + return (HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS400 | > + MMC_CAP_MMC_ENH_STROBE)); > } >=20 > #undef HOST_TIMING_CAP >=20 >=20 > In other words: have mmc_host_timing avoid returning true for some > combinations that definitely do not have sufficient software support > present at the time. (So far as I can tell, the rk3328's get the > NULL-implementations as things are.) >=20 > I expect that this sort of thing would go back to using > MMC_CAP_MMC_DDR52 for the rk3328's, as an example. Working, but in a > slower mode, the same mode as FreeBSD was previously using. >=20 > A possible incompleteness in the suggestion is that there is also a > drive-strength setting involved. If that also had "kobj" interfacing > and NULL-implementation possibilities, then in the future there would > be more to test for possibly forcing return-false than I did above. >=20 > Hopefully this sort of thing would help, possibly more than just for > rk3328's. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B0A353B3-DE8C-4C3F-A2E8-9D4F8F877520>