Date: Tue, 4 Sep 2018 21:39:07 -0700 From: Conrad Meyer <cem@freebsd.org> To: Lev Serebryakov <lev@freebsd.org> Cc: FreeBSD Current <freebsd-current@freebsd.org>, freebsd-fs <freebsd-fs@freebsd.org>, Xin LI <delphij@freebsd.org> Subject: Re: newfs silently fails if random is not ready (?) Message-ID: <CAG6CVpWMxhYJ=tjDgAkn1BJqPuyMgCaPbPdcGzFh3Oj5nox8MQ@mail.gmail.com> In-Reply-To: <CAG6CVpV7h5cuhC1o1qEqj%2BCxdnU1AHE4mPJW9KM4UCGv_u-%2BYA@mail.gmail.com> References: <609400979.20180904230820@serebryakov.spb.ru> <CAG6CVpWzaBGvEdpNBrMQSPkxBn6pybP0SWyuYUhg0Qev4RvLwA@mail.gmail.com> <1942661439.20180904235514@serebryakov.spb.ru> <CAG6CVpWmXPUZAozTdJa%2BrczVyo9wHqr=uLP2U-O%2BPytSWr6_Ug@mail.gmail.com> <774228883.20180905001035@serebryakov.spb.ru> <CAG6CVpV7h5cuhC1o1qEqj%2BCxdnU1AHE4mPJW9KM4UCGv_u-%2BYA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
With current libc, I instead see: load: 0.10 cmd: blocked_random_poc 1668 [randseed] 1.27r 0.00u 0.00s 0% 2328k (SIGINFO) $ procstat -kk 1668 PID TID COMM TDNAME KSTACK 1668 100609 blocked_random_poc - mi_switch+0xd3 sleepq_catch_signals+0x386 sleepq_timedwait_sig+0x12 _sleep+0x272 read_random_uio+0xb3 sys_getrandom+0xa3 amd64_syscall+0x940 fast_syscall_common+0x101 and: $ truss ./blocked_random_poc ... getrandom(0x7fffffffd340,40,0) ERR#35 'Resource temporarily unavailable' thr_self(0x7fffffffd310) = 0 (0x0) thr_kill(100609,SIGKILL) = 0 (0x0) SIGNAL 9 (SIGKILL) code=SI_NOINFO So getrandom(2) (via READ_RANDOM_UIO) is returning a bogus EAGAIN after we have already slept until random was seeded. This bubbles up to getentropy(3) -> arc4random(3), which sees a surprising failure from getentropy(3) and raises KILL against the program. I believe the EWOULDBLOCK is just a boring leak of tsleep(9)'s timeout condition. This may be sufficient to fix the problem: --- a/sys/dev/random/randomdev.c +++ b/sys/dev/random/randomdev.c @@ -156,6 +156,10 @@ READ_RANDOM_UIO(struct uio *uio, bool nonblock) error = tsleep(&random_alg_context, PCATCH, "randseed", hz/10); if (error == ERESTART || error == EINTR) break; + /* Squash hz/10 timeout condition */ + if (error == EWOULDBLOCK) + error = 0; + KASSERT(error == 0, ("unexpected %d", error)); } if (error == 0) { read_rate_increment((uio->uio_resid + sizeof(uint32_t))/sizeof(uint32_t)); Best, Conrad On Tue, Sep 4, 2018 at 8:13 PM, Conrad Meyer <cem@freebsd.org> wrote: > Hi Lev, > > I took a first attempt at reproducing this problem on a fast > desktop-class system. First steps, give us a way to revert back to > unseeded status: > > --- a/sys/dev/random/fortuna.c > +++ b/sys/dev/random/fortuna.c > @@ -39,6 +39,7 @@ __FBSDID("$FreeBSD$"); > > #ifdef _KERNEL > #include <sys/param.h> > +#include <sys/fail.h> > #include <sys/kernel.h> > #include <sys/lock.h> > #include <sys/malloc.h> > @@ -384,6 +385,17 @@ random_fortuna_pre_read(void) > return; > } > > + /* > + * When set, pretend we do not have enough entropy to reseed yet. > + */ > + KFAIL_POINT_CODE(DEBUG_FP, random_fortuna_pre_read, { > + if (RETURN_VALUE != 0) { > + RANDOM_RESEED_UNLOCK(); > + return; > + } > + }); > + > + > #ifdef _KERNEL > fortuna_state.fs_lasttime = now; > #endif > @@ -442,5 +454,11 @@ bool > random_fortuna_seeded(void) > { > > + /* When set, act as if we are not seeded. */ > + KFAIL_POINT_CODE(DEBUG_FP, random_fortuna_seeded, { > + if (RETURN_VALUE != 0) > + fortuna_state.fs_counter = UINT128_ZERO; > + }); > + > return (!uint128_is_zero(fortuna_state.fs_counter)); > } > > > Second step, enable the failpoints and launch repro program: > > $ sudo sysctl debug.fail_point.random_fortuna_pre_read='return(1)' > debug.fail_point.random_fortuna_pre_read: off -> return(1) > $ sudo sysctl debug.fail_point.random_fortuna_seeded='return(1)' > debug.fail_point.random_fortuna_seeded: off -> return(1) > > $ cat ./blocked_random_poc.c > #include <stdio.h> > #include <stdlib.h> > #include <unistd.h> > > int > main(int argc, char **argv) > { > printf("%x\n", arc4random()); > return (0); > } > > > $ ./blocked_random_poc > ... > > > Third step, I looked at what that process was doing: > > Curiously, it is not in getrandom() at all, but instead the ARND > sysctl fallback. I probably need to rebuild world (libc) to test this > (new libc arc4random based on Chacha). > > $ procstat -kk 1196 > PID TID COMM TDNAME KSTACK > 1196 100435 blocked_random_poc - read_random+0x3d > sysctl_kern_arnd+0x3a sysctl_root_handler_locked+0x89 > sysctl_root.isra.8+0x167 userland_sysctl+0x126 sys___sysctl+0x7b > amd64_syscall+0x940 fast_syscall_common+0x101 > > > When I unblocked the failpoints, it completed successfully: > > $ sudo sysctl debug.fail_point.random_fortuna_pre_read='off' > debug.fail_point.random_fortuna_pre_read: return(1) -> off > $ sudo sysctl debug.fail_point.random_fortuna_seeded=off > debug.fail_point.random_fortuna_seeded: return(1) -> off > > ... > 9e5eb30f > > > Best, > Conrad
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAG6CVpWMxhYJ=tjDgAkn1BJqPuyMgCaPbPdcGzFh3Oj5nox8MQ>