Date: Fri, 17 Dec 2021 07:49:30 -0800 From: Mark Millard via freebsd-arm <freebsd-arm@freebsd.org> To: =?utf-8?Q?Kornel_Dul=C4=99ba?= <mindal@semihalf.com> Cc: Emmanuel Vadot <manu@bidouilliste.com>, Free BSD <freebsd-arm@freebsd.org> Subject: Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled? Message-ID: <27734923-C09E-429A-B54C-5123AEC37641@yahoo.com> In-Reply-To: <CAKpxNizrKkEDazAgaKoO-k9Y2j4H4J9mfBWKKeV1PUSs9kBMXA@mail.gmail.com> References: <243CBFC7-DFB5-4F8B-A8A3-CFF78455148D.ref@yahoo.com> <243CBFC7-DFB5-4F8B-A8A3-CFF78455148D@yahoo.com> <20211209081930.7970b6995a8f7c5f7466227d@bidouilliste.com> <053617FD-AA34-4A3F-853A-4D2E44F8254B@yahoo.com> <43901D57-9C39-4FAC-A2BE-CCE642791705@yahoo.com> <CAKpxNiwxvs7-%2BsNa1mX8rAUy_Bs4FdE1%2Bamf5hZXB9CehEJdwQ@mail.gmail.com> <8DAA50A1-3CF0-4AFA-9977-58FE15D4F171@yahoo.com> <CAKpxNiyzKF_JgMFEPK00jU=%2B9_qUq3Vg9KzSos8oCXNs2%2BPYyw@mail.gmail.com> <21B0478B-340F-4BB2-9189-B5A6AE458134@yahoo.com> <CCB7E706-E866-4141-AB8F-BE7065376EAA@yahoo.com> <7717F6CF-0239-4DC0-B23F-B9D5F75C0A8D@yahoo.com> <7EFA98DF-325F-4821-A040-FB4A9E66AB8F@yahoo.com> <B0A353B3-DE8C-4C3F-A2E8-9D4F8F877520@yahoo.com> <BC646BE4-1F1D-4EE0-8A32-9398DC428649@yahoo.com> <CAKpxNizrKkEDazAgaKoO-k9Y2j4H4J9mfBWKKeV1PUSs9kBMXA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2021-Dec-17, at 03:17, Kornel Dul=C4=99ba <mindal@semihalf.com> = wrote: >>>>> [I've cut out the history: just presenting some new evidence.] >>>>>=20 >>>>> First, a little context from getting to the db> prompt. >>>>>=20 >>>>> db> ps >>>>> pid ppid pgrp uid state wmesg wchan cmd >>>>> 18 0 0 0 DL syncer 0xffff000000eca5a8 [syncer] >>>>> 17 0 0 0 DL vlruwt 0xffffa00007d2ea60 [vnlru] >>>>> 16 0 0 0 DL (threaded) = [bufdaemon] >>>>> 100089 D qsleep 0xffff000000ec9478 = [bufdaemon] >>>>> 100092 D - 0xffff000000c11100 = [bufspacedaemon-0] >>>>> 100093 D - 0xffff000000c21680 = [bufspacedaemon-1] >>>>> 9 0 0 0 DL psleep 0xffff000000ef0650 = [vmdaemon] >>>>> 8 0 0 0 DL (threaded) = [pagedaemon] >>>>> 100087 D psleep 0xffff000000ee2b38 = [dom0] >>>>> 100094 D launds 0xffff000000ee2b44 = [laundry: dom0] >>>>> 100095 D umarcl 0xffff0000007b38d8 [uma] >>>>> 7 0 0 0 DL mmcsd d 0xffffa00007b72e00 = [mmcsd0boot1: mmc/sd] >>>>> 6 0 0 0 DL mmcsd d 0xffffa00007b71300 = [mmcsd0boot0: mmc/sd] >>>>> 5 0 0 0 DL mmcreq 0xffff00009b5d0710 [mmcsd0: = mmc/sd card] >>>>> 4 0 0 0 DL - 0xffff000000ccc020 = [rand_harvestq] >>>>> 15 0 0 0 DL (threaded) [usb] >>>>> . . . >>>>>=20 >>>>> and "mmcreq" is from the while loop in: >>>>>=20 >>>>> static int >>>>> mmc_wait_for_req(struct mmc_softc *sc, struct mmc_request *req) >>>>> { >>>>>=20 >>>>> req->done =3D mmc_wakeup; >>>>> req->done_data =3D sc; >>>>> if (__predict_false(mmc_debug > 1)) { >>>>> device_printf(sc->dev, "REQUEST: CMD%d arg %#x flags = %#x", >>>>> req->cmd->opcode, req->cmd->arg, req->cmd->flags); >>>>> if (req->cmd->data) { >>>>> printf(" data %d\n", = (int)req->cmd->data->len); >>>>> } else >>>>> printf("\n"); >>>>> } >>>>> MMCBR_REQUEST(device_get_parent(sc->dev), sc->dev, req); >>>>> MMC_LOCK(sc); >>>>> while ((req->flags & MMC_REQ_DONE) =3D=3D 0) >>>>> msleep(req, &sc->sc_mtx, 0, "mmcreq", 0); >>>>> MMC_UNLOCK(sc); >>>>> if (__predict_false(mmc_debug > 2 || (mmc_debug > 0 && >>>>> req->cmd->error !=3D MMC_ERR_NONE))) >>>>> device_printf(sc->dev, "CMD%d RESULT: %d\n", >>>>> req->cmd->opcode, req->cmd->error); >>>>> return (0); >>>>> } >>>>>=20 >>>>> So it appears that the error report: >>>>>=20 >>>>> mmcsd0: Error indicated: 4 Failed >>>>>=20 >>>>> ends up associated with (req->flags & MMC_REQ_DONE) =3D=3D 0 = staying >>>>> true in the above code: an unbounded loop with MMC_LOCK(sc) = active. >>>>> The "4" in the error report seems to be from: >>>>>=20 >>>>> #define MMC_ERR_FAILED 4 >>>>>=20 >>>>> It looks like there are some problems with handling errors, = problems >>>>> such that it gets stuck looping (no panic, no progress). >>>>>=20 >>>>> That seems to be separate from why the MMC_ERR_FAILED was = generated >>>>> in the first place. So: 2 problems, not just one. Thus it may be a >>>>> good context for tackling the looping problem with a known example >>>>> failure to look at. >>>>>=20 >>>>>=20 >>>>>=20 >>>>> Just for reference, I tried "boot -v" with debug.verbose_sysinit=3D1= in place, >>>>> just to capture and report the tail of the output for the boot = failure. >>>>>=20 >>>>> . . . >>>>> subsystem f000000 >>>>> release_aps(0)... Release APs...done >>>>> done. >>>>> intr_irq_shuffle(0)... Trying to mount root from = ufs:/dev/gpt/Rock64root []... >>>>> done. >>>>> netisr_start(0)... done. >>>>> taskqgroup_bind_softirq(0)... done. >>>>> GEOM: new disk mmcsd0 >>>>> GEOM: new disk mmcsd0boot0 >>>>> GEOM: new disk mmcsd0boot1 >>>>> smp_after_idle_runnable(0)... done. >>>>> taskqgroup_bind_if_config_tqg(0)... done. >>>>> taskqgroup_bind_if_io_tqg(0)... done. >>>>> tmr_setup_user_access(0)... done. >>>>> subsystem f000001 >>>>> mmcsd0: Error indicated: 4 Failed >>>>> epoch_init_smp(0)... done. >>>>> subsystem f100000 >>>>> racctd_init(0)... done. >>>>> subsystem fffffff >>>>> start_periodic_resettodr(0)... done. >>>>> oktousecallout(0)... done. >>>>> clknode_finish(0)... Unresolved linked clock found: hdmi_phy >>>>> Unresolved linked clock found: usb480m_phy >>>>> done. >>>>> regulator_constraint(0)... done. >>>>> regulator_shutdown(0)... regulator: shutting down unused = regulators >>>>> regulator: shutting down vcc_sd... busy >>>>> done. >>>>> uhub0: 1 port with 1 removable, self powered >>>>> uhub2: 2 ports with 2 removable, self powered >>>>> uhub3: 1 port with 1 removable, self powered >>>>> uhub1: 1 port with 1 removable, self powered >>>>> ugen4.2: <Samsung PSSD T7 Touch> at usbus4 >>>>> umass0 on uhub2 >>>>> umass0: <Samsung PSSD T7 Touch, class 0/0, rev 3.20/1.00, addr 1> = on usbus4 >>>>> umass0: SCSI over Bulk-Only; quirks =3D 0x0000 >>>>> umass0:0:0: Attached to scbus0 >>>>> pass0 at umass-sim0 bus 0 scbus0 target 0 lun 0 >>>>> pass0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI = device >>>>> pass0: Serial Number REPLACED >>>>> pass0: 400.000MB/s transfers >>>>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 >>>>> da0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI = device >>>>> da0: Serial Number REPLACED >>>>> da0: 400.000MB/s transfers >>>>> da0: 953869MB (1953525168 512 byte sectors) >>>>> da0: quirks=3D0x2<NO_6_BYTE> >>>>> da0: Delete methods: <NONE(*),ZERO> >>>>> random: unblocking device. >>>>>=20 >>>>> No more output after that. >>>>=20 >>>> As for why MMC_ERR_FAILED results, the following code diff is >>>> intended to suggest what I think may be incomplete about sticking >>>> to what the device-specific code supports vs. does not support >>>> (not supporting HS200 here). The code does compile in my context >>>> but is untested. >>>=20 >>> It is now tested (at least to be a useful hack): no longer am I >>> running an older 1400042 kernel. For reference, >>>=20 >>> # uname -apKU >>> FreeBSD Rock64_RPi_4_3_2v1p2 14.0-CURRENT FreeBSD 14.0-CURRENT #18 = main-n251456-22c4ab6cb015-dirty: Sun Dec 12 00:34:53 PST 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA53-nodbg-clang/usr/main-src/arm6= 4.aarch64/sys/GENERIC-NODBG-CA53 arm64 aarch64 1400043 1400043 >>>=20 >>> And it reports during the boot (other than the "REPLACED"): >>>=20 >>> mmcsd0: 125GB <MMCHC DJNB4R 0.7 SN REPLACED MFG 06/2016 by 21 = 0x0000> at mmc0 52.0MHz/8bit/1016-block >>>=20 >>> So it no longer sets up a mode that the rk3328-specific-code does = not >>> actually support. >>>=20 >>> (Nothing that I've done here deals with the looping issue when there >>> is a MMC_ERR_FAILED or the like.) >>>=20 >>>> The email handling may mess up some leading >>>> whitespace --but, again, I'm only trying to suggest a type of >>>> change. >>>>=20 >>>> # git -C /usr/main-src/ diff /usr/main-src/sys/dev/mmc >>>> diff --git a/sys/dev/mmc/mmc.c b/sys/dev/mmc/mmc.c >>>> index 9c73dfd57ce0..dffd1c382684 100644 >>>> --- a/sys/dev/mmc/mmc.c >>>> +++ b/sys/dev/mmc/mmc.c >>>> @@ -59,6 +59,7 @@ __FBSDID("$FreeBSD$"); >>>> #include <sys/param.h> >>>> #include <sys/systm.h> >>>> #include <sys/kernel.h> >>>> +#include <sys/kobj.h> >>>> #include <sys/malloc.h> >>>> #include <sys/lock.h> >>>> #include <sys/module.h> >>>> @@ -1512,6 +1513,8 @@ mmc_timing_to_string(enum mmc_bus_timing = timing) >>>> static bool >>>> mmc_host_timing(device_t dev, enum mmc_bus_timing timing) >>>> { >>>> + kobjop_desc_t kobj_desc; >>>> + kobj_method_t *kobj_method; >>>> int host_caps; >>>>=20 >>>> host_caps =3D mmcbr_get_caps(dev); >>>> @@ -1543,14 +1546,37 @@ mmc_host_timing(device_t dev, enum = mmc_bus_timing timing) >>>> case bus_timing_mmc_ddr52: >>>> return (HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_DDR52)); >>>> case bus_timing_mmc_hs200: >>>> - return (HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS200_120) || >>>> - HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS200_180)); >>>> case bus_timing_mmc_hs400: >>>> - return (HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS400_120) || >>>> - HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS400_180)); >>>> case bus_timing_mmc_hs400es: >>>> - return (HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS400 | >>>> - MMC_CAP_MMC_ENH_STROBE)); >>>> + /* >>>> + * Disable eMMC modes that require use of >>>> + * MMC_SEND_TUNING_BLOCK_HS200 to set things up if = either the >>>> + * tune or re-tune method is the default NULL = implementation. >>>> + */ >>>> + kobj_desc =3D &mmcbr_tune_desc; >>>> + kobj_method =3D = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, >>>> + kobj_desc); >>>> + if (kobj_method =3D=3D &kobj_desc->deflt) >>>> + return (false); >>>> + kobj_desc =3D &mmcbr_retune_desc; >>>> + kobj_method =3D = kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, >>>> + kobj_desc); >>>> + if (kobj_method =3D=3D &kobj_desc->deflt) { >>>> + return (false); >>>> + } >>>> + >>>> + /* >>>> + * Otherwise track the host capabilities. >>>> + */ >>>> + if (timing =3D=3D bus_timing_mmc_hs200) >>>> + return (HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS200_120) || >>>> + HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS200_180)); >>>> + if (timing =3D=3D bus_timing_mmc_hs400) >>>> + return (HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS400_120) || >>>> + HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS400_180)); >>>> + if (timing =3D=3D bus_timing_mmc_hs400es) >>>> + return (HOST_TIMING_CAP(host_caps, = MMC_CAP_MMC_HS400 | >>>> + MMC_CAP_MMC_ENH_STROBE)); >>>> } >>>>=20 >>>> #undef HOST_TIMING_CAP >>>>=20 >>>>=20 >>>> In other words: have mmc_host_timing avoid returning true for some >>>> combinations that definitely do not have sufficient software = support >>>> present at the time. (So far as I can tell, the rk3328's get the >>>> NULL-implementations as things are.) >>>>=20 >>>> I expect that this sort of thing would go back to using >>>> MMC_CAP_MMC_DDR52 for the rk3328's, as an example. Working, but in = a >>>> slower mode, the same mode as FreeBSD was previously using. >>>>=20 >>>> A possible incompleteness in the suggestion is that there is also a >>>> drive-strength setting involved. If that also had "kobj" = interfacing >>>> and NULL-implementation possibilities, then in the future there = would >>>> be more to test for possibly forcing return-false than I did above. >>>>=20 >>>> Hopefully this sort of thing would help, possibly more than just = for >>>> rk3328's. >>>=20 >>>=20 >>=20 >> As for what was happening without my patch . . . >>=20 >> sys/dev/mmc/mmcbr_if.m defines: >>=20 >> static int >> null_retune(device_t brdev __unused, device_t reqdev __unused, >> bool reset __unused) >> { >>=20 >> return (0); >> } >>=20 >> static int >> null_tune(device_t brdev __unused, device_t reqdev __unused, >> bool hs400 __unused) >> { >>=20 >> return (0); >> } >> . . . >> # >> # Called by the mmcbus with the bridge claimed to execute initial = tuning. >> # >> METHOD int tune { >> device_t brdev; >> device_t reqdev; >> bool hs400; >> } DEFAULT null_tune; >>=20 >> # >> # Called by the mmcbus with the bridge claimed to execute re-tuning. >> # >> METHOD int retune { >> device_t brdev; >> device_t reqdev; >> bool reset; >> } DEFAULT null_retune; >> . . . >>=20 >> It is these success-reporting no-op routines that were being >> used to attempt the tuning: so there was no tuning done. >>=20 >> The code that I added detects that these routines would be >> used and avoids allowing contexts that would involve putting >> them to use with HS200 mode. >>=20 >> I'll note that there is another such null_* routine that the >> code (even with my patch) does not deal with avoiding the use >> of: >>=20 >> . . . >> static int >> null_switch_vccq(device_t brdev __unused, device_t reqdev = __unused) >> { >>=20 >> return (0); >> } >> . . . >> # >> # Called by the mmcbus to switch the signaling voltage (VCCQ). >> # >> METHOD int switch_vccq { >> device_t brdev; >> device_t reqdev; >> } DEFAULT null_switch_vccq; >> . . . >>=20 >> /usr/main-src/sys/dev/sdhci/sdhci.c has somewhat analogous code for >> somewhat analogous null_* routines. null_set_uhs_timing for that is >> from sys/dev/sdhci/sdhci_if.m (but the other two are again the above >> null_tune and null_retune routines, so not repeated here): >>=20 >> . . . >> static void >> null_set_uhs_timing(device_t brdev __unused, >> struct sdhci_slot *slot __unused) >> { >>=20 >> } >> . . . >> METHOD void set_uhs_timing { >> device_t brdev; >> struct sdhci_slot *slot; >> } DEFAULT null_set_uhs_timing; >> . . . >>=20 >> sdhci_init_slot(device_t dev, struct sdhci_slot *slot, int num) >> in sdhci.c looks like (in part): >>=20 >> . . . >> /* >> * Disable UHS-I and eMMC modes if the set_uhs_timing method = is the >> * default NULL implementation. >> */ >> kobj_desc =3D &sdhci_set_uhs_timing_desc; >> kobj_method =3D kobj_lookup_method(((kobj_t)dev)->ops->cls, = NULL, >> kobj_desc); >> if (kobj_method =3D=3D &kobj_desc->deflt) >> host_caps &=3D ~(MMC_CAP_UHS_SDR12 | MMC_CAP_UHS_SDR25 = | >> MMC_CAP_UHS_SDR50 | MMC_CAP_UHS_DDR50 | = MMC_CAP_UHS_SDR104 | >> MMC_CAP_MMC_DDR52 | MMC_CAP_MMC_HS200 | = MMC_CAP_MMC_HS400); >>=20 >> #define SDHCI_CAP_MODES_TUNING(caps2) = \ >> (((caps2) & SDHCI_TUNE_SDR50 ? MMC_CAP_UHS_SDR50 : 0) | = \ >> MMC_CAP_UHS_DDR50 | MMC_CAP_UHS_SDR104 | MMC_CAP_MMC_HS200 | = \ >> MMC_CAP_MMC_HS400) >>=20 >> /* >> * Disable UHS-I and eMMC modes that require (re-)tuning if = either >> * the tune or re-tune method is the default NULL = implementation. >> */ >> kobj_desc =3D &mmcbr_tune_desc; >> kobj_method =3D kobj_lookup_method(((kobj_t)dev)->ops->cls, = NULL, >> kobj_desc); >> if (kobj_method =3D=3D &kobj_desc->deflt) >> goto no_tuning; >> kobj_desc =3D &mmcbr_retune_desc; >> kobj_method =3D kobj_lookup_method(((kobj_t)dev)->ops->cls, = NULL, >> kobj_desc); >> if (kobj_method =3D=3D &kobj_desc->deflt) { >> no_tuning: >> host_caps &=3D ~(SDHCI_CAP_MODES_TUNING(caps2)); >> } >> . . . >>=20 >> What I've done in my patch is analogous to what the the code shown >> after the #define SDHCI_CAP_MODES_TUNING above does, translated to >> fit the mmc's pre-existing code structure. >=20 > Good catch. For some reason mmcbr_tune/mmcbr_retune are not > implemented in sdhci_fdt.c. > This looks like a separate bug/issue. > Note that pretty much all other SDHCI drivers (sdhci_pci.c, > sdhci_xenon.c, sdhci_fsl_fdt.c, sdhci_acpi.c) provide some > implementation of mmcbr_tune/mmcbr_retune. > I agree that the logic in sdhci.c should disable HS200 if those > methods are implemented. > Could you try hooking mmcbr_tune and mmcbr_retune to their generic > sdhci implementations? What valid generic sdhci implementations? There is no such thing as far as I can tell. JESD84-B51 (e.MMC v5.1's standard) does nothing to standardize how adjustments are made during HS200 tuning, beyond how to get a known-pattern to test against (via: CMD21). That, of itself, does not make any adjustments. Also, as far as I can tell, the mmc that I changed and the sdhci code quoted above are independent at run-time. One or the other is used, not both. Otherwise the above quoted code would have prevented HS200 from being attempted. > If we're lucky this could be enough to make HS200 work on rock64, > though it's unlikely. I do not see that there is any generic sdhci tuning code to enable the use of: just avoiding use of code that is a no-op that falsely indicates success by avoiding needing HS200 tuning. =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?27734923-C09E-429A-B54C-5123AEC37641>