Date: Tue, 4 Sep 2018 21:39:07 -0700 From: Conrad Meyer <cem@freebsd.org> To: Lev Serebryakov <lev@freebsd.org> Cc: FreeBSD Current <freebsd-current@freebsd.org>, freebsd-fs <freebsd-fs@freebsd.org>, Xin LI <delphij@freebsd.org> Subject: Re: newfs silently fails if random is not ready (?) Message-ID: <CAG6CVpWMxhYJ=tjDgAkn1BJqPuyMgCaPbPdcGzFh3Oj5nox8MQ@mail.gmail.com> In-Reply-To: <CAG6CVpV7h5cuhC1o1qEqj%2BCxdnU1AHE4mPJW9KM4UCGv_u-%2BYA@mail.gmail.com> References: <609400979.20180904230820@serebryakov.spb.ru> <CAG6CVpWzaBGvEdpNBrMQSPkxBn6pybP0SWyuYUhg0Qev4RvLwA@mail.gmail.com> <1942661439.20180904235514@serebryakov.spb.ru> <CAG6CVpWmXPUZAozTdJa%2BrczVyo9wHqr=uLP2U-O%2BPytSWr6_Ug@mail.gmail.com> <774228883.20180905001035@serebryakov.spb.ru> <CAG6CVpV7h5cuhC1o1qEqj%2BCxdnU1AHE4mPJW9KM4UCGv_u-%2BYA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
With current libc, I instead see:
load: 0.10  cmd: blocked_random_poc 1668 [randseed] 1.27r 0.00u 0.00s
0% 2328k (SIGINFO)
$ procstat -kk 1668
  PID    TID COMM                TDNAME              KSTACK
 1668 100609 blocked_random_poc  -                   mi_switch+0xd3
sleepq_catch_signals+0x386 sleepq_timedwait_sig+0x12 _sleep+0x272
read_random_uio+0xb3 sys_getrandom+0xa3 amd64_syscall+0x940
fast_syscall_common+0x101
and:
$ truss ./blocked_random_poc
...
getrandom(0x7fffffffd340,40,0)                   ERR#35 'Resource
temporarily unavailable'
thr_self(0x7fffffffd310)                         = 0 (0x0)
thr_kill(100609,SIGKILL)                         = 0 (0x0)
SIGNAL 9 (SIGKILL) code=SI_NOINFO
So getrandom(2) (via READ_RANDOM_UIO) is returning a bogus EAGAIN
after we have already slept until random was seeded.  This bubbles up
to getentropy(3) -> arc4random(3), which sees a surprising failure
from getentropy(3) and raises KILL against the program.
I believe the EWOULDBLOCK is just a boring leak of tsleep(9)'s timeout
condition.  This may be sufficient to fix the problem:
--- a/sys/dev/random/randomdev.c
+++ b/sys/dev/random/randomdev.c
@@ -156,6 +156,10 @@ READ_RANDOM_UIO(struct uio *uio, bool nonblock)
                error = tsleep(&random_alg_context, PCATCH, "randseed", hz/10);
                if (error == ERESTART || error == EINTR)
                        break;
+               /* Squash hz/10 timeout condition */
+               if (error == EWOULDBLOCK)
+                       error = 0;
+               KASSERT(error == 0, ("unexpected %d", error));
        }
        if (error == 0) {
                read_rate_increment((uio->uio_resid +
sizeof(uint32_t))/sizeof(uint32_t));
Best,
Conrad
On Tue, Sep 4, 2018 at 8:13 PM, Conrad Meyer <cem@freebsd.org> wrote:
> Hi Lev,
>
> I took a first attempt at reproducing this problem on a fast
> desktop-class system.  First steps, give us a way to revert back to
> unseeded status:
>
> --- a/sys/dev/random/fortuna.c
> +++ b/sys/dev/random/fortuna.c
> @@ -39,6 +39,7 @@ __FBSDID("$FreeBSD$");
>
>  #ifdef _KERNEL
>  #include <sys/param.h>
> +#include <sys/fail.h>
>  #include <sys/kernel.h>
>  #include <sys/lock.h>
>  #include <sys/malloc.h>
> @@ -384,6 +385,17 @@ random_fortuna_pre_read(void)
>                 return;
>         }
>
> +       /*
> +        * When set, pretend we do not have enough entropy to reseed yet.
> +        */
> +       KFAIL_POINT_CODE(DEBUG_FP, random_fortuna_pre_read, {
> +               if (RETURN_VALUE != 0) {
> +                       RANDOM_RESEED_UNLOCK();
> +                       return;
> +               }
> +       });
> +
> +
>  #ifdef _KERNEL
>         fortuna_state.fs_lasttime = now;
>  #endif
> @@ -442,5 +454,11 @@ bool
>  random_fortuna_seeded(void)
>  {
>
> +       /* When set, act as if we are not seeded. */
> +       KFAIL_POINT_CODE(DEBUG_FP, random_fortuna_seeded, {
> +               if (RETURN_VALUE != 0)
> +                       fortuna_state.fs_counter = UINT128_ZERO;
> +       });
> +
>         return (!uint128_is_zero(fortuna_state.fs_counter));
>  }
>
>
> Second step, enable the failpoints and launch repro program:
>
> $ sudo sysctl debug.fail_point.random_fortuna_pre_read='return(1)'
> debug.fail_point.random_fortuna_pre_read: off -> return(1)
> $ sudo sysctl debug.fail_point.random_fortuna_seeded='return(1)'
> debug.fail_point.random_fortuna_seeded: off -> return(1)
>
> $ cat ./blocked_random_poc.c
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> int
> main(int argc, char **argv)
> {
>         printf("%x\n", arc4random());
>         return (0);
> }
>
>
> $ ./blocked_random_poc
> ...
>
>
> Third step, I looked at what that process was doing:
>
> Curiously, it is not in getrandom() at all, but instead the ARND
> sysctl fallback.  I probably need to rebuild world (libc) to test this
> (new libc arc4random based on Chacha).
>
> $ procstat -kk 1196
>   PID    TID COMM                TDNAME              KSTACK
>  1196 100435 blocked_random_poc  -                   read_random+0x3d
> sysctl_kern_arnd+0x3a sysctl_root_handler_locked+0x89
> sysctl_root.isra.8+0x167 userland_sysctl+0x126 sys___sysctl+0x7b
> amd64_syscall+0x940 fast_syscall_common+0x101
>
>
> When I unblocked the failpoints, it completed successfully:
>
> $ sudo sysctl debug.fail_point.random_fortuna_pre_read='off'
> debug.fail_point.random_fortuna_pre_read: return(1) -> off
> $ sudo sysctl debug.fail_point.random_fortuna_seeded=off
> debug.fail_point.random_fortuna_seeded: return(1) -> off
>
> ...
> 9e5eb30f
>
>
> Best,
> Conrad
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAG6CVpWMxhYJ=tjDgAkn1BJqPuyMgCaPbPdcGzFh3Oj5nox8MQ>
