Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 12 Dec 2021 00:59:13 -0800
From:      Mark Millard via freebsd-arm <freebsd-arm@freebsd.org>
To:        =?utf-8?Q?Kornel_Dul=C4=99ba?= <mindal@semihalf.com>, Emmanuel Vadot <manu@bidouilliste.com>
Cc:        Free BSD <freebsd-arm@freebsd.org>
Subject:   Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled?
Message-ID:  <B0A353B3-DE8C-4C3F-A2E8-9D4F8F877520@yahoo.com>
In-Reply-To: <7EFA98DF-325F-4821-A040-FB4A9E66AB8F@yahoo.com>
References:  <243CBFC7-DFB5-4F8B-A8A3-CFF78455148D.ref@yahoo.com> <243CBFC7-DFB5-4F8B-A8A3-CFF78455148D@yahoo.com> <20211209081930.7970b6995a8f7c5f7466227d@bidouilliste.com> <053617FD-AA34-4A3F-853A-4D2E44F8254B@yahoo.com> <43901D57-9C39-4FAC-A2BE-CCE642791705@yahoo.com> <CAKpxNiwxvs7-%2BsNa1mX8rAUy_Bs4FdE1%2Bamf5hZXB9CehEJdwQ@mail.gmail.com> <8DAA50A1-3CF0-4AFA-9977-58FE15D4F171@yahoo.com> <CAKpxNiyzKF_JgMFEPK00jU=%2B9_qUq3Vg9KzSos8oCXNs2%2BPYyw@mail.gmail.com> <21B0478B-340F-4BB2-9189-B5A6AE458134@yahoo.com> <CCB7E706-E866-4141-AB8F-BE7065376EAA@yahoo.com> <7717F6CF-0239-4DC0-B23F-B9D5F75C0A8D@yahoo.com> <7EFA98DF-325F-4821-A040-FB4A9E66AB8F@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2021-Dec-12, at 00:29, Mark Millard via freebsd-arm =
<freebsd-arm@freebsd.org> wrote:

> On 2021-Dec-11, at 16:19, Mark Millard <marklmi@yahoo.com> wrote:
>=20
>> [I've cut out the history: just presenting some new evidence.]
>>=20
>> First, a little context from getting to the db> prompt.
>>=20
>> db> ps
>> pid  ppid  pgrp   uid  state   wmesg   wchan               cmd
>>  18     0     0     0  DL      syncer  0xffff000000eca5a8  [syncer]
>>  17     0     0     0  DL      vlruwt  0xffffa00007d2ea60  [vnlru]
>>  16     0     0     0  DL      (threaded)                  =
[bufdaemon]
>> 100089                   D       qsleep  0xffff000000ec9478  =
[bufdaemon]
>> 100092                   D       -       0xffff000000c11100  =
[bufspacedaemon-0]
>> 100093                   D       -       0xffff000000c21680  =
[bufspacedaemon-1]
>>   9     0     0     0  DL      psleep  0xffff000000ef0650  [vmdaemon]
>>   8     0     0     0  DL      (threaded)                  =
[pagedaemon]
>> 100087                   D       psleep  0xffff000000ee2b38  [dom0]
>> 100094                   D       launds  0xffff000000ee2b44  =
[laundry: dom0]
>> 100095                   D       umarcl  0xffff0000007b38d8  [uma]
>>   7     0     0     0  DL      mmcsd d 0xffffa00007b72e00  =
[mmcsd0boot1: mmc/sd]
>>   6     0     0     0  DL      mmcsd d 0xffffa00007b71300  =
[mmcsd0boot0: mmc/sd]
>>   5     0     0     0  DL      mmcreq  0xffff00009b5d0710  [mmcsd0: =
mmc/sd card]
>>   4     0     0     0  DL      -       0xffff000000ccc020  =
[rand_harvestq]
>>  15     0     0     0  DL      (threaded)                  [usb]
>> . . .
>>=20
>> and "mmcreq" is from the while loop in:
>>=20
>> static int
>> mmc_wait_for_req(struct mmc_softc *sc, struct mmc_request *req)
>> {
>>=20
>>       req->done =3D mmc_wakeup;
>>       req->done_data =3D sc;
>>       if (__predict_false(mmc_debug > 1)) {
>>               device_printf(sc->dev, "REQUEST: CMD%d arg %#x flags =
%#x",
>>                   req->cmd->opcode, req->cmd->arg, req->cmd->flags);  =
  =20
>>               if (req->cmd->data) {
>>                       printf(" data %d\n", (int)req->cmd->data->len);=20=

>>               } else
>>                       printf("\n");
>>       }
>>       MMCBR_REQUEST(device_get_parent(sc->dev), sc->dev, req);
>>       MMC_LOCK(sc);
>>       while ((req->flags & MMC_REQ_DONE) =3D=3D 0)
>>               msleep(req, &sc->sc_mtx, 0, "mmcreq", 0);
>>       MMC_UNLOCK(sc);
>>       if (__predict_false(mmc_debug > 2 || (mmc_debug > 0 &&
>>           req->cmd->error !=3D MMC_ERR_NONE)))
>>               device_printf(sc->dev, "CMD%d RESULT: %d\n",
>>                   req->cmd->opcode, req->cmd->error);
>>       return (0);
>> }
>>=20
>> So it appears that the error report:
>>=20
>> mmcsd0: Error indicated: 4 Failed
>>=20
>> ends up associated with (req->flags & MMC_REQ_DONE) =3D=3D 0 staying
>> true in the above code: an unbounded loop with MMC_LOCK(sc) active.
>> The "4" in the error report seems to be from:
>>=20
>> #define MMC_ERR_FAILED  4
>>=20
>> It looks like there are some problems with handling errors, problems
>> such that it gets stuck looping (no panic, no progress).
>>=20
>> That seems to be separate from why the MMC_ERR_FAILED was generated
>> in the first place. So: 2 problems, not just one. Thus it may be a
>> good context for tackling the looping problem with a known example
>> failure to look at.
>>=20
>>=20
>>=20
>> Just for reference, I tried "boot -v" with debug.verbose_sysinit=3D1 =
in place,
>> just to capture and report the tail of the output for the boot =
failure.
>>=20
>> . . .
>> subsystem f000000
>>  release_aps(0)... Release APs...done
>> done.
>>  intr_irq_shuffle(0)... Trying to mount root from =
ufs:/dev/gpt/Rock64root []...
>> done.
>>  netisr_start(0)... done.
>>  taskqgroup_bind_softirq(0)... done.
>> GEOM: new disk mmcsd0
>> GEOM: new disk mmcsd0boot0
>> GEOM: new disk mmcsd0boot1
>>  smp_after_idle_runnable(0)... done.
>>  taskqgroup_bind_if_config_tqg(0)... done.
>>  taskqgroup_bind_if_io_tqg(0)... done.
>>  tmr_setup_user_access(0)... done.
>> subsystem f000001
>> mmcsd0: Error indicated: 4 Failed
>>  epoch_init_smp(0)... done.
>> subsystem f100000
>>  racctd_init(0)... done.
>> subsystem fffffff
>>  start_periodic_resettodr(0)... done.
>>  oktousecallout(0)... done.
>>  clknode_finish(0)... Unresolved linked clock found: hdmi_phy
>> Unresolved linked clock found: usb480m_phy
>> done.
>>  regulator_constraint(0)... done.
>>  regulator_shutdown(0)... regulator: shutting down unused regulators
>> regulator: shutting down vcc_sd... busy
>> done.
>> uhub0: 1 port with 1 removable, self powered
>> uhub2: 2 ports with 2 removable, self powered
>> uhub3: 1 port with 1 removable, self powered
>> uhub1: 1 port with 1 removable, self powered
>> ugen4.2: <Samsung PSSD T7 Touch> at usbus4
>> umass0 on uhub2
>> umass0: <Samsung PSSD T7 Touch, class 0/0, rev 3.20/1.00, addr 1> on =
usbus4
>> umass0:  SCSI over Bulk-Only; quirks =3D 0x0000
>> umass0:0:0: Attached to scbus0
>> pass0 at umass-sim0 bus 0 scbus0 target 0 lun 0
>> pass0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI =
device
>> pass0: Serial Number REPLACED
>> pass0: 400.000MB/s transfers
>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
>> da0: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI device
>> da0: Serial Number REPLACED
>> da0: 400.000MB/s transfers
>> da0: 953869MB (1953525168 512 byte sectors)
>> da0: quirks=3D0x2<NO_6_BYTE>
>> da0: Delete methods: <NONE(*),ZERO>
>> random: unblocking device.
>>=20
>> No more output after that.
>=20
> As for why MMC_ERR_FAILED results, the following code diff is
> intended to suggest what I think may be incomplete about sticking
> to what the device-specific code supports vs. does not support
> (not supporting HS200 here). The code does compile in my context
> but is untested.

It is now tested (at least to be a useful hack): no longer am I
running an older 1400042 kernel. For reference,

# uname -apKU
FreeBSD Rock64_RPi_4_3_2v1p2 14.0-CURRENT FreeBSD 14.0-CURRENT #18 =
main-n251456-22c4ab6cb015-dirty: Sun Dec 12 00:34:53 PST 2021     =
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA53-nodbg-clang/usr/main-src/arm6=
4.aarch64/sys/GENERIC-NODBG-CA53  arm64 aarch64 1400043 1400043

And it reports during the boot (other than the "REPLACED"):

mmcsd0: 125GB <MMCHC DJNB4R 0.7 SN REPLACED MFG 06/2016 by 21 0x0000> at =
mmc0 52.0MHz/8bit/1016-block

So it no longer sets up a mode that the rk3328-specific-code does not
actually support.

(Nothing that I've done here deals with the looping issue when there
is a MMC_ERR_FAILED or the like.)

> The email handling may mess up some leading
> whitespace --but, again, I'm only trying to suggest a type of
> change.
>=20
> # git -C /usr/main-src/ diff /usr/main-src/sys/dev/mmc
> diff --git a/sys/dev/mmc/mmc.c b/sys/dev/mmc/mmc.c
> index 9c73dfd57ce0..dffd1c382684 100644
> --- a/sys/dev/mmc/mmc.c
> +++ b/sys/dev/mmc/mmc.c
> @@ -59,6 +59,7 @@ __FBSDID("$FreeBSD$");
> #include <sys/param.h>
> #include <sys/systm.h>
> #include <sys/kernel.h>
> +#include <sys/kobj.h>
> #include <sys/malloc.h>
> #include <sys/lock.h>
> #include <sys/module.h>
> @@ -1512,6 +1513,8 @@ mmc_timing_to_string(enum mmc_bus_timing timing)
> static bool
> mmc_host_timing(device_t dev, enum mmc_bus_timing timing)
> {
> +       kobjop_desc_t kobj_desc;
> +       kobj_method_t *kobj_method;
>        int host_caps;
>=20
>        host_caps =3D mmcbr_get_caps(dev);
> @@ -1543,14 +1546,37 @@ mmc_host_timing(device_t dev, enum =
mmc_bus_timing timing)
>        case bus_timing_mmc_ddr52:
>                return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_DDR52));
>        case bus_timing_mmc_hs200:
> -               return (HOST_TIMING_CAP(host_caps, =
MMC_CAP_MMC_HS200_120) ||
> -                       HOST_TIMING_CAP(host_caps, =
MMC_CAP_MMC_HS200_180));
>        case bus_timing_mmc_hs400:
> -               return (HOST_TIMING_CAP(host_caps, =
MMC_CAP_MMC_HS400_120) ||
> -                       HOST_TIMING_CAP(host_caps, =
MMC_CAP_MMC_HS400_180));
>        case bus_timing_mmc_hs400es:
> -               return (HOST_TIMING_CAP(host_caps, MMC_CAP_MMC_HS400 |
> -                   MMC_CAP_MMC_ENH_STROBE));
> +               /*
> +                * Disable eMMC modes that require use of
> +                * MMC_SEND_TUNING_BLOCK_HS200 to set things up if =
either the
> +                * tune or re-tune method is the default NULL =
implementation.
> +                */
> +               kobj_desc =3D &mmcbr_tune_desc;
> +               kobj_method =3D =
kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL,
> +                   kobj_desc);
> +               if (kobj_method =3D=3D &kobj_desc->deflt)
> +                       return (false);
> +               kobj_desc =3D &mmcbr_retune_desc;
> +               kobj_method =3D =
kobj_lookup_method(((kobj_t)dev)->ops->cls, NULL,
> +                   kobj_desc);
> +               if (kobj_method =3D=3D &kobj_desc->deflt) {
> +                       return (false);
> +               }
> +
> +               /*
> +                * Otherwise track the host capabilities.
> +                */
> +               if (timing =3D=3D bus_timing_mmc_hs200)
> +                       return (HOST_TIMING_CAP(host_caps, =
MMC_CAP_MMC_HS200_120) ||
> +                               HOST_TIMING_CAP(host_caps, =
MMC_CAP_MMC_HS200_180));
> +               if (timing =3D=3D bus_timing_mmc_hs400)
> +                       return (HOST_TIMING_CAP(host_caps, =
MMC_CAP_MMC_HS400_120) ||
> +                               HOST_TIMING_CAP(host_caps, =
MMC_CAP_MMC_HS400_180));
> +               if (timing =3D=3D bus_timing_mmc_hs400es)
> +                       return (HOST_TIMING_CAP(host_caps, =
MMC_CAP_MMC_HS400 |
> +                               MMC_CAP_MMC_ENH_STROBE));
>        }
>=20
> #undef HOST_TIMING_CAP
>=20
>=20
> In other words: have mmc_host_timing avoid returning true for some
> combinations that definitely do not have sufficient software support
> present at the time. (So far as I can tell, the rk3328's get the
> NULL-implementations as things are.)
>=20
> I expect that this sort of thing would go back to using
> MMC_CAP_MMC_DDR52 for the rk3328's, as an example. Working, but in a
> slower mode, the same mode as FreeBSD was previously using.
>=20
> A possible incompleteness in the suggestion is that there is also a
> drive-strength setting involved. If that also had "kobj" interfacing
> and NULL-implementation possibilities, then in the future there would
> be more to test for possibly forcing return-false than I did above.
>=20
> Hopefully this sort of thing would help, possibly more than just for
> rk3328's.





=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B0A353B3-DE8C-4C3F-A2E8-9D4F8F877520>