Date: Tue, 4 Sep 2018 22:00:34 -0700 From: Xin Li <delphij@FreeBSD.org> To: cem@freebsd.org, Lev Serebryakov <lev@freebsd.org> Cc: FreeBSD Current <freebsd-current@freebsd.org>, freebsd-fs <freebsd-fs@freebsd.org>, Mark R V Murray <markm@FreeBSD.org>, re@FreeBSD.org Subject: Re: newfs silently fails if random is not ready (?) Message-ID: <b3e7a8eb-f2ed-d146-dba0-9d1f730b6d5d@FreeBSD.org> In-Reply-To: <CAG6CVpWMxhYJ=tjDgAkn1BJqPuyMgCaPbPdcGzFh3Oj5nox8MQ@mail.gmail.com> References: <609400979.20180904230820@serebryakov.spb.ru> <CAG6CVpWzaBGvEdpNBrMQSPkxBn6pybP0SWyuYUhg0Qev4RvLwA@mail.gmail.com> <1942661439.20180904235514@serebryakov.spb.ru> <CAG6CVpWmXPUZAozTdJa%2BrczVyo9wHqr=uLP2U-O%2BPytSWr6_Ug@mail.gmail.com> <774228883.20180905001035@serebryakov.spb.ru> <CAG6CVpV7h5cuhC1o1qEqj%2BCxdnU1AHE4mPJW9KM4UCGv_u-%2BYA@mail.gmail.com> <CAG6CVpWMxhYJ=tjDgAkn1BJqPuyMgCaPbPdcGzFh3Oj5nox8MQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --duU4at2AbwwH2ZisV9lm5ZIdFCMgF6cyK Content-Type: multipart/mixed; boundary="nbL0vslebR2my3jPF7fmlKUq0uf8jsbOq"; protected-headers="v1" From: Xin Li <delphij@FreeBSD.org> To: cem@freebsd.org, Lev Serebryakov <lev@freebsd.org> Cc: FreeBSD Current <freebsd-current@freebsd.org>, freebsd-fs <freebsd-fs@freebsd.org>, Mark R V Murray <markm@FreeBSD.org>, re@FreeBSD.org Message-ID: <b3e7a8eb-f2ed-d146-dba0-9d1f730b6d5d@FreeBSD.org> Subject: Re: newfs silently fails if random is not ready (?) References: <609400979.20180904230820@serebryakov.spb.ru> <CAG6CVpWzaBGvEdpNBrMQSPkxBn6pybP0SWyuYUhg0Qev4RvLwA@mail.gmail.com> <1942661439.20180904235514@serebryakov.spb.ru> <CAG6CVpWmXPUZAozTdJa+rczVyo9wHqr=uLP2U-O+PytSWr6_Ug@mail.gmail.com> <774228883.20180905001035@serebryakov.spb.ru> <CAG6CVpV7h5cuhC1o1qEqj+CxdnU1AHE4mPJW9KM4UCGv_u-+YA@mail.gmail.com> <CAG6CVpWMxhYJ=tjDgAkn1BJqPuyMgCaPbPdcGzFh3Oj5nox8MQ@mail.gmail.com> In-Reply-To: <CAG6CVpWMxhYJ=tjDgAkn1BJqPuyMgCaPbPdcGzFh3Oj5nox8MQ@mail.gmail.com> --nbL0vslebR2my3jPF7fmlKUq0uf8jsbOq Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 9/4/18 21:39, Conrad Meyer wrote: > With current libc, I instead see: >=20 > load: 0.10 cmd: blocked_random_poc 1668 [randseed] 1.27r 0.00u 0.00s > 0% 2328k (SIGINFO) >=20 > $ procstat -kk 1668 > PID TID COMM TDNAME KSTACK > 1668 100609 blocked_random_poc - mi_switch+0xd3 > sleepq_catch_signals+0x386 sleepq_timedwait_sig+0x12 _sleep+0x272 > read_random_uio+0xb3 sys_getrandom+0xa3 amd64_syscall+0x940 > fast_syscall_common+0x101 >=20 > and: >=20 > $ truss ./blocked_random_poc > ... > getrandom(0x7fffffffd340,40,0) ERR#35 'Resource > temporarily unavailable' > thr_self(0x7fffffffd310) =3D 0 (0x0) > thr_kill(100609,SIGKILL) =3D 0 (0x0) > SIGNAL 9 (SIGKILL) code=3DSI_NOINFO >=20 > So getrandom(2) (via READ_RANDOM_UIO) is returning a bogus EAGAIN > after we have already slept until random was seeded. This bubbles up > to getentropy(3) -> arc4random(3), which sees a surprising failure > from getentropy(3) and raises KILL against the program. >=20 > I believe the EWOULDBLOCK is just a boring leak of tsleep(9)'s timeout > condition. This may be sufficient to fix the problem: >=20 > --- a/sys/dev/random/randomdev.c > +++ b/sys/dev/random/randomdev.c > @@ -156,6 +156,10 @@ READ_RANDOM_UIO(struct uio *uio, bool nonblock) > error =3D tsleep(&random_alg_context, PCATCH, "randseed= ", hz/10); > if (error =3D=3D ERESTART || error =3D=3D EINTR) > break; > + /* Squash hz/10 timeout condition */ > + if (error =3D=3D EWOULDBLOCK) > + error =3D 0; > + KASSERT(error =3D=3D 0, ("unexpected %d", error)); > } > if (error =3D=3D 0) { > read_rate_increment((uio->uio_resid + > sizeof(uint32_t))/sizeof(uint32_t)); +markm, re I think the proposed change is reasonable (note that I think the same theory applies to the tsleep_sbt() case below as well, which should be handled similarly). > Best, > Conrad >=20 >=20 > On Tue, Sep 4, 2018 at 8:13 PM, Conrad Meyer <cem@freebsd.org> wrote: >> Hi Lev, >> >> I took a first attempt at reproducing this problem on a fast >> desktop-class system. First steps, give us a way to revert back to >> unseeded status: >> >> --- a/sys/dev/random/fortuna.c >> +++ b/sys/dev/random/fortuna.c >> @@ -39,6 +39,7 @@ __FBSDID("$FreeBSD$"); >> >> #ifdef _KERNEL >> #include <sys/param.h> >> +#include <sys/fail.h> >> #include <sys/kernel.h> >> #include <sys/lock.h> >> #include <sys/malloc.h> >> @@ -384,6 +385,17 @@ random_fortuna_pre_read(void) >> return; >> } >> >> + /* >> + * When set, pretend we do not have enough entropy to reseed y= et. >> + */ >> + KFAIL_POINT_CODE(DEBUG_FP, random_fortuna_pre_read, { >> + if (RETURN_VALUE !=3D 0) { >> + RANDOM_RESEED_UNLOCK(); >> + return; >> + } >> + }); >> + >> + >> #ifdef _KERNEL >> fortuna_state.fs_lasttime =3D now; >> #endif >> @@ -442,5 +454,11 @@ bool >> random_fortuna_seeded(void) >> { >> >> + /* When set, act as if we are not seeded. */ >> + KFAIL_POINT_CODE(DEBUG_FP, random_fortuna_seeded, { >> + if (RETURN_VALUE !=3D 0) >> + fortuna_state.fs_counter =3D UINT128_ZERO; >> + }); >> + >> return (!uint128_is_zero(fortuna_state.fs_counter)); >> } >> >> >> Second step, enable the failpoints and launch repro program: >> >> $ sudo sysctl debug.fail_point.random_fortuna_pre_read=3D'return(1)' >> debug.fail_point.random_fortuna_pre_read: off -> return(1) >> $ sudo sysctl debug.fail_point.random_fortuna_seeded=3D'return(1)' >> debug.fail_point.random_fortuna_seeded: off -> return(1) >> >> $ cat ./blocked_random_poc.c >> #include <stdio.h> >> #include <stdlib.h> >> #include <unistd.h> >> >> int >> main(int argc, char **argv) >> { >> printf("%x\n", arc4random()); >> return (0); >> } >> >> >> $ ./blocked_random_poc >> ... >> >> >> Third step, I looked at what that process was doing: >> >> Curiously, it is not in getrandom() at all, but instead the ARND >> sysctl fallback. I probably need to rebuild world (libc) to test this= >> (new libc arc4random based on Chacha). >> >> $ procstat -kk 1196 >> PID TID COMM TDNAME KSTACK >> 1196 100435 blocked_random_poc - read_random+0x3d >> sysctl_kern_arnd+0x3a sysctl_root_handler_locked+0x89 >> sysctl_root.isra.8+0x167 userland_sysctl+0x126 sys___sysctl+0x7b >> amd64_syscall+0x940 fast_syscall_common+0x101 >> >> >> When I unblocked the failpoints, it completed successfully: >> >> $ sudo sysctl debug.fail_point.random_fortuna_pre_read=3D'off' >> debug.fail_point.random_fortuna_pre_read: return(1) -> off >> $ sudo sysctl debug.fail_point.random_fortuna_seeded=3Doff >> debug.fail_point.random_fortuna_seeded: return(1) -> off >> >> ... >> 9e5eb30f >> >> >> Best, >> Conrad --nbL0vslebR2my3jPF7fmlKUq0uf8jsbOq-- --duU4at2AbwwH2ZisV9lm5ZIdFCMgF6cyK Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJbj2L3AAoJEJW2GBstM+nsxVwP/jIyR53g2isbfBVdaseuiiCs Ql9eS1x1xzpIxAHPAndb4bPdROmpZzIgeocZZ1wRM1h/A6Z3isS8AJmtww4D6+W7 Hwm+r1nyGDBv7wgUMqMavQs1JIMpimv/pDScbXD43chlB7n5p1BdSdAcQuu4d3Aq eVrm1eIaVzTldmA5TVS9lBtqkXI9RCx0fwDccDujPB2DNxZoHcp+1h7rNkL31yRg UzF8PtaMLgN1LeDT0BXtYsjtUCZgZtJSZ9PzZWFCjGYVitBYIHYrdrXKbLBjDE00 HEVD/Eyb9dhBhJqFQ9kIprcFJujoY9pAjaDL/qIA8ZPCvyUDt7hbIuaWjxZaC2ep RCAAB5btM9KTRpNAsqt0MhSJC+I/dFmWcgheG4+XOEMSUFlluoIfxVeTFDjgOzt8 OUjc2oLyn8uPCsJQg4q48WwrUGH4hDv8hccFJ1WH7rhfMcR8/51jQHvWt7ObKcx+ mHZUoYBgusePhD1OO/XasBcSwmABviXzpWk/Q6UaFbnFa8BX0uYwFH3dNEeIOmBO Y6ZkoWL2Bg3fdyTHYWe1pnysdQ2DowCxyS8RL1HQgoOAOULnkIHK6MKhNGYVJYn7 bK08/nczyKXz1a2vujPboYLwGfTcyYZb0tLwApRnvnU74jao+nuQO06oZ2TU17uB szrA1pupCukN9iedsogh =auFP -----END PGP SIGNATURE----- --duU4at2AbwwH2ZisV9lm5ZIdFCMgF6cyK--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b3e7a8eb-f2ed-d146-dba0-9d1f730b6d5d>