Date: Fri, 17 Dec 2021 12:17:12 +0100 From: =?UTF-8?Q?Kornel_Dul=C4=99ba?= <mindal@semihalf.com> To: Mark Millard <marklmi@yahoo.com> Cc: Emmanuel Vadot <manu@bidouilliste.com>, Free BSD <freebsd-arm@freebsd.org> Subject: Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled? Message-ID: <CAKpxNizrKkEDazAgaKoO-k9Y2j4H4J9mfBWKKeV1PUSs9kBMXA@mail.gmail.com> In-Reply-To: <BC646BE4-1F1D-4EE0-8A32-9398DC428649@yahoo.com> References: <243CBFC7-DFB5-4F8B-A8A3-CFF78455148D.ref@yahoo.com> <243CBFC7-DFB5-4F8B-A8A3-CFF78455148D@yahoo.com> <20211209081930.7970b6995a8f7c5f7466227d@bidouilliste.com> <053617FD-AA34-4A3F-853A-4D2E44F8254B@yahoo.com> <43901D57-9C39-4FAC-A2BE-CCE642791705@yahoo.com> <CAKpxNiwxvs7-%2BsNa1mX8rAUy_Bs4FdE1%2Bamf5hZXB9CehEJdwQ@mail.gmail.com> <8DAA50A1-3CF0-4AFA-9977-58FE15D4F171@yahoo.com> <CAKpxNiyzKF_JgMFEPK00jU=%2B9_qUq3Vg9KzSos8oCXNs2%2BPYyw@mail.gmail.com> <21B0478B-340F-4BB2-9189-B5A6AE458134@yahoo.com> <CCB7E706-E866-4141-AB8F-BE7065376EAA@yahoo.com> <7717F6CF-0239-4DC0-B23F-B9D5F75C0A8D@yahoo.com> <7EFA98DF-325F-4821-A040-FB4A9E66AB8F@yahoo.com> <B0A353B3-DE8C-4C3F-A2E8-9D4F8F877520@yahoo.com> <BC646BE4-1F1D-4EE0-8A32-9398DC428649@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> >>> [I've cut out the history: just presenting some new evidence.] > >>> > >>> First, a little context from getting to the db> prompt. > >>> > >>> db> ps > >>> pid ppid pgrp uid state wmesg wchan cmd > >>> 18 0 0 0 DL syncer 0xffff000000eca5a8 [syncer] > >>> 17 0 0 0 DL vlruwt 0xffffa00007d2ea60 [vnlru] > >>> 16 0 0 0 DL (threaded) [bufdaemon] > >>> 100089 D qsleep 0xffff000000ec9478 [bufdaem= on] > >>> 100092 D - 0xffff000000c11100 [bufspac= edaemon-0] > >>> 100093 D - 0xffff000000c21680 [bufspac= edaemon-1] > >>> 9 0 0 0 DL psleep 0xffff000000ef0650 [vmdaemon] > >>> 8 0 0 0 DL (threaded) [pagedaemon= ] > >>> 100087 D psleep 0xffff000000ee2b38 [dom0] > >>> 100094 D launds 0xffff000000ee2b44 [laundry= : dom0] > >>> 100095 D umarcl 0xffff0000007b38d8 [uma] > >>> 7 0 0 0 DL mmcsd d 0xffffa00007b72e00 [mmcsd0boot= 1: mmc/sd] > >>> 6 0 0 0 DL mmcsd d 0xffffa00007b71300 [mmcsd0boot= 0: mmc/sd] > >>> 5 0 0 0 DL mmcreq 0xffff00009b5d0710 [mmcsd0: mm= c/sd card] > >>> 4 0 0 0 DL - 0xffff000000ccc020 [rand_harve= stq] > >>> 15 0 0 0 DL (threaded) [usb] > >>> . . . > >>> > >>> and "mmcreq" is from the while loop in: > >>> > >>> static int > >>> mmc_wait_for_req(struct mmc_softc *sc, struct mmc_request *req) > >>> { > >>> > >>> req->done =3D mmc_wakeup; > >>> req->done_data =3D sc; > >>> if (__predict_false(mmc_debug > 1)) { > >>> device_printf(sc->dev, "REQUEST: CMD%d arg %#x flags %#x= ", > >>> req->cmd->opcode, req->cmd->arg, req->cmd->flags); > >>> if (req->cmd->data) { > >>> printf(" data %d\n", (int)req->cmd->data->len); > >>> } else > >>> printf("\n"); > >>> } > >>> MMCBR_REQUEST(device_get_parent(sc->dev), sc->dev, req); > >>> MMC_LOCK(sc); > >>> while ((req->flags & MMC_REQ_DONE) =3D=3D 0) > >>> msleep(req, &sc->sc_mtx, 0, "mmcreq", 0); > >>> MMC_UNLOCK(sc); > >>> if (__predict_false(mmc_debug > 2 || (mmc_debug > 0 && > >>> req->cmd->error !=3D MMC_ERR_NONE))) > >>> device_printf(sc->dev, "CMD%d RESULT: %d\n", > >>> req->cmd->opcode, req->cmd->error); > >>> return (0); > >>> } > >>> > >>> So it appears that the error report: > >>> > >>> mmcsd0: Error indicated: 4 Failed > >>> > >>> ends up associated with (req->flags & MMC_REQ_DONE) =3D=3D 0 staying > >>> true in the above code: an unbounded loop with MMC_LOCK(sc) active. > >>> The "4" in the error report seems to be from: > >>> > >>> #define MMC_ERR_FAILED 4 > >>> > >>> It looks like there are some problems with handling errors, problems > >>> such that it gets stuck looping (no panic, no progress). > >>> > >>> That seems to be separate from why the MMC_ERR_FAILED was generated > >>> in the first place. So: 2 problems, not just one. Thus it may be a > >>> good context for tackling the looping problem with a known example > >>> failure to look at. > >>> > >>> > >>> > >>> Just for reference, I tried "boot -v" with debug.verbose_sysinit=3D1 = in place, > >>> just to capture and report the tail of the output for the boot failur= e. > >>> > >>> . . . > >>> subsystem f000000 > >>> release_aps(0)... Release APs...done > >>> done. > >>> intr_irq_shuffle(0)... Trying to mount root from ufs:/dev/gpt/Rock64r= oot []... > >>> done. > >>> netisr_start(0)... done. > >>> taskqgroup_bind_softirq(0)... done. > >>> GEOM: new disk mmcsd0 > >>> GEOM: new disk mmcsd0boot0 > >>> GEOM: new disk mmcsd0boot1 > >>> smp_after_idle_runnable(0)... done. > >>> taskqgroup_bind_if_config_tqg(0)... done. > >>> taskqgroup_bind_if_io_tqg(0)... done. > >>> tmr_setup_user_access(0)... done. > >>> subsystem f000001 > >>> mmcsd0: Error indicated: 4 Failed > >>> epoch_init_smp(0)... done. > >>> subsystem f100000 > >>> racctd_init(0)... done. > >>> subsystem fffffff > >>> start_periodic_resettodr(0)... done. > >>> oktousecallout(0)... done. > >>> clknode_finish(0)... Unresolved linked clock found: hdmi_phy > >>> Unresolved linked clock found: usb480m_phy > >>> done. > >>> regulator_constraint(0)... done. > >>> regulator_shutdown(0)... regulator: shutting down unused regulators > >>> regulator: shutting down vcc_sd... busy > >>> done. > >>> uhub0: 1 port with 1 removable, self powered > >>> uhub2: 2 ports with 2 removable, self powered > >>> uhub3: 1 port with 1 removable, self powered > >>> uhub1: 1 port with 1 removable, self powered > >>> ugen4.2: <Samsung PSSD T7 Touch> at usbus4 > >>> umass0 on uhub2 > >>> umass0: <Samsung PSSD T7 Touch, class 0/0, rev 3.20/1.00, addr 1> on = usbus4 > >>> umass0: SCSI over Bulk-Only; quirks =3D 0x0000 > >>> umass0:0:0: Attached to scbus0 > >>> pass0 at umass-sim0 bus 0 scbus0 target 0 lun 0 > >>> pass0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI devic= e > >>> pass0: Serial Number REPLACED > >>> pass0: 400.000MB/s transfers > >>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 > >>> da0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI device > >>> da0: Serial Number REPLACED > >>> da0: 400.000MB/s transfers > >>> da0: 953869MB (1953525168 512 byte sectors) > >>> da0: quirks=3D0x2<NO_6_BYTE> > >>> da0: Delete methods: <NONE(*),ZERO> > >>> random: unblocking device. > >>> > >>> No more output after that. > >> > >> As for why MMC_ERR_FAILED results, the following code diff is > >> intended to suggest what I think may be incomplete about sticking > >> to what the device-specific code supports vs. does not support > >> (not supporting HS200 here). The code does compile in my context > >> but is untested. > > > > It is now tested (at least to be a useful hack): no longer am I > > running an older 1400042 kernel. For reference, > > > > # uname -apKU > > FreeBSD Rock64_RPi_4_3_2v1p2 14.0-CURRENT FreeBSD 14.0-CURRENT #18 main= -n251456-22c4ab6cb015-dirty: Sun Dec 12 00:34:53 PST 2021 root@CA72_16G= p_ZFS:/usr/obj/BUILDs/main-CA53-nodbg-clang/usr/main-src/arm64.aarch64/sys/= GENERIC-NODBG-CA53 arm64 aarch64 1400043 1400043 > > > > And it reports during the boot (other than the "REPLACED"): > > > > mmcsd0: 125GB <MMCHC DJNB4R 0.7 SN REPLACED MFG 06/2016 by 21 0x0000> a= t mmc0 52.0MHz/8bit/1016-block > > > > So it no longer sets up a mode that the rk3328-specific-code does not > > actually support. > > > > (Nothing that I've done here deals with the looping issue when there > > is a MMC_ERR_FAILED or the like.) > > > >> The email handling may mess up some leading > >> whitespace --but, again, I'm only trying to suggest a type of > >> change. > >> > >> # git -C /usr/main-src/ diff /usr/main-src/sys/dev/mmc > >> diff --git a/sys/dev/mmc/mmc.c b/sys/dev/mmc/mmc.c > >> index 9c73dfd57ce0..dffd1c382684 100644 > >> --- a/sys/dev/mmc/mmc.c > >> +++ b/sys/dev/mmc/mmc.c > >> @@ -59,6 +59,7 @@ __FBSDID("$FreeBSD$"); > >> #include <sys/param.h> > >> #include <sys/systm.h> > >> #include <sys/kernel.h> > >> +#include <sys/kobj.h> > >> #include <sys/malloc.h> > >> #include <sys/lock.h> > >> #include <sys/module.h> > >> @@ -1512,6 +1513,8 @@ mmc_timing_to_string(enum mmc_bus_timing timing) > >> static bool > >> mmc_host_timing(device_t dev, enum mmc_bus_timing timing) > >> { > >> + kobjop_desc_t kobj_desc; > >> + kobj_method_t *kobj_method; > >> int host_caps; > >> > >> host_caps =3D mmcbr_get_caps(dev); > >> @@ -1543,14 +1546,37 @@ mmc_host_timing(device_t dev, enum mmc_bus_tim= ing timing) > >> case bus_timing_mmc_ddr52: > >> return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_DDR52)); > >> case bus_timing_mmc_hs200: > >> - return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_1= 20) || > >> - HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS200_1= 80)); > >> case bus_timing_mmc_hs400: > >> - return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_1= 20) || > >> - HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400_1= 80)); > >> case bus_timing_mmc_hs400es: > >> - return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400 | > >> - MMC_CAP_MMC_ENH_STROBE)); > >> + /* > >> + * Disable eMMC modes that require use of > >> + * MMC_SEND_TUNING_BLOCK_HS200 to set things up if eit= her the > >> + * tune or re-tune method is the default NULL implemen= tation. > >> + */ > >> + kobj_desc =3D &mmcbr_tune_desc; > >> + kobj_method =3D kobj_lookup_method(((kobj_t)dev)->ops-= >cls, NULL, > >> + kobj_desc); > >> + if (kobj_method =3D=3D &kobj_desc->deflt) > >> + return (false); > >> + kobj_desc =3D &mmcbr_retune_desc; > >> + kobj_method =3D kobj_lookup_method(((kobj_t)dev)->ops-= >cls, NULL, > >> + kobj_desc); > >> + if (kobj_method =3D=3D &kobj_desc->deflt) { > >> + return (false); > >> + } > >> + > >> + /* > >> + * Otherwise track the host capabilities. > >> + */ > >> + if (timing =3D=3D bus_timing_mmc_hs200) > >> + return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC= _HS200_120) || > >> + HOST_TIMING_CAP(host_caps, MMC_CAP_MMC= _HS200_180)); > >> + if (timing =3D=3D bus_timing_mmc_hs400) > >> + return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC= _HS400_120) || > >> + HOST_TIMING_CAP(host_caps, MMC_CAP_MMC= _HS400_180)); > >> + if (timing =3D=3D bus_timing_mmc_hs400es) > >> + return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC= _HS400 | > >> + MMC_CAP_MMC_ENH_STROBE)); > >> } > >> > >> #undef HOST_TIMING_CAP > >> > >> > >> In other words: have mmc_host_timing avoid returning true for some > >> combinations that definitely do not have sufficient software support > >> present at the time. (So far as I can tell, the rk3328's get the > >> NULL-implementations as things are.) > >> > >> I expect that this sort of thing would go back to using > >> MMC_CAP_MMC_DDR52 for the rk3328's, as an example. Working, but in a > >> slower mode, the same mode as FreeBSD was previously using. > >> > >> A possible incompleteness in the suggestion is that there is also a > >> drive-strength setting involved. If that also had "kobj" interfacing > >> and NULL-implementation possibilities, then in the future there would > >> be more to test for possibly forcing return-false than I did above. > >> > >> Hopefully this sort of thing would help, possibly more than just for > >> rk3328's. > > > > > > As for what was happening without my patch . . . > > sys/dev/mmc/mmcbr_if.m defines: > > static int > null_retune(device_t brdev __unused, device_t reqdev __unused, > bool reset __unused) > { > > return (0); > } > > static int > null_tune(device_t brdev __unused, device_t reqdev __unused, > bool hs400 __unused) > { > > return (0); > } > . . . > # > # Called by the mmcbus with the bridge claimed to execute initial tuning. > # > METHOD int tune { > device_t brdev; > device_t reqdev; > bool hs400; > } DEFAULT null_tune; > > # > # Called by the mmcbus with the bridge claimed to execute re-tuning. > # > METHOD int retune { > device_t brdev; > device_t reqdev; > bool reset; > } DEFAULT null_retune; > . . . > > It is these success-reporting no-op routines that were being > used to attempt the tuning: so there was no tuning done. > > The code that I added detects that these routines would be > used and avoids allowing contexts that would involve putting > them to use with HS200 mode. > > I'll note that there is another such null_* routine that the > code (even with my patch) does not deal with avoiding the use > of: > > . . . > static int > null_switch_vccq(device_t brdev __unused, device_t reqdev __unuse= d) > { > > return (0); > } > . . . > # > # Called by the mmcbus to switch the signaling voltage (VCCQ). > # > METHOD int switch_vccq { > device_t brdev; > device_t reqdev; > } DEFAULT null_switch_vccq; > . . . > > /usr/main-src/sys/dev/sdhci/sdhci.c has somewhat analogous code for > somewhat analogous null_* routines. null_set_uhs_timing for that is > from sys/dev/sdhci/sdhci_if.m (but the other two are again the above > null_tune and null_retune routines, so not repeated here): > > . . . > static void > null_set_uhs_timing(device_t brdev __unused, > struct sdhci_slot *slot __unused) > { > > } > . . . > METHOD void set_uhs_timing { > device_t brdev; > struct sdhci_slot *slot; > } DEFAULT null_set_uhs_timing; > . . . > > sdhci_init_slot(device_t dev, struct sdhci_slot *slot, int num) > in sdhci.c looks like (in part): > > . . . > /* > * Disable UHS-I and eMMC modes if the set_uhs_timing method is t= he > * default NULL implementation. > */ > kobj_desc =3D &sdhci_set_uhs_timing_desc; > kobj_method =3D kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, > kobj_desc); > if (kobj_method =3D=3D &kobj_desc->deflt) > host_caps &=3D ~(MMC_CAP_UHS_SDR12 | MMC_CAP_UHS_SDR25 | > MMC_CAP_UHS_SDR50 | MMC_CAP_UHS_DDR50 | MMC_CAP_UHS_S= DR104 | > MMC_CAP_MMC_DDR52 | MMC_CAP_MMC_HS200 | MMC_CAP_MMC_H= S400); > > #define SDHCI_CAP_MODES_TUNING(caps2) \ > (((caps2) & SDHCI_TUNE_SDR50 ? MMC_CAP_UHS_SDR50 : 0) | \ > MMC_CAP_UHS_DDR50 | MMC_CAP_UHS_SDR104 | MMC_CAP_MMC_HS200 | \ > MMC_CAP_MMC_HS400) > > /* > * Disable UHS-I and eMMC modes that require (re-)tuning if eithe= r > * the tune or re-tune method is the default NULL implementation. > */ > kobj_desc =3D &mmcbr_tune_desc; > kobj_method =3D kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, > kobj_desc); > if (kobj_method =3D=3D &kobj_desc->deflt) > goto no_tuning; > kobj_desc =3D &mmcbr_retune_desc; > kobj_method =3D kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL, > kobj_desc); > if (kobj_method =3D=3D &kobj_desc->deflt) { > no_tuning: > host_caps &=3D ~(SDHCI_CAP_MODES_TUNING(caps2)); > } > . . . > > What I've done in my patch is analogous to what the the code shown > after the #define SDHCI_CAP_MODES_TUNING above does, translated to > fit the mmc's pre-existing code structure. Good catch. For some reason mmcbr_tune/mmcbr_retune are not implemented in sdhci_fdt.c. This looks like a separate bug/issue. Note that pretty much all other SDHCI drivers (sdhci_pci.c, sdhci_xenon.c, sdhci_fsl_fdt.c, sdhci_acpi.c) provide some implementation of mmcbr_tune/mmcbr_retune. I agree that the logic in sdhci.c should disable HS200 if those methods are implemented. Could you try hooking mmcbr_tune and mmcbr_retune to their generic sdhci implementations? If we're lucky this could be enough to make HS200 work on rock64, though it's unlikely.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAKpxNizrKkEDazAgaKoO-k9Y2j4H4J9mfBWKKeV1PUSs9kBMXA>