From owner-freebsd-current@freebsd.org Wed Sep 5 04:46:08 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 27D62FE2E6F; Wed, 5 Sep 2018 04:46:08 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: from mail-it0-f68.google.com (mail-it0-f68.google.com [209.85.214.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A351E7089D; Wed, 5 Sep 2018 04:46:07 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: by mail-it0-f68.google.com with SMTP id j81-v6so8213709ite.0; Tue, 04 Sep 2018 21:46:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :from:date:message-id:subject:to:cc; bh=OoWhYAHctVTxe05zzMqXe8eCxjffx2Ls5rIvvSc2eng=; b=hXzoDPyYLAwbPUzu5PPw7FfDeqK8YXv8zDNu50JKc31Nube0R4otMYaTwP1QU9NOiQ c4osuVQsLq/DF2EeirZVD2X77YHVGTVngV716cPh4mhrEJPQFpsimVbbLTOL9zy4Mm7T xHM53vQO75WwGMcO0dQwebb34zPo00fOOsoWqHqunLBy2Y1kwmb5wCS2x7CpWGdpptfX CTRxfkBYNTlSMClMGlUDAzNeTzPX5lFvAMISyZlUqpfe4UJh7AfsSBwHKchla96CooeW Slse5+psZLZ/+G9VKw5CQ4ABk3sX+H/V/SBlN/C2TXb5vneg47BKv4lMiXr8rJ+jBvRM VVBA== X-Gm-Message-State: APzg51DLv7SA8nGS423TDpqEJRYMIYDw5AlRWfcrytn1aoozfH2MMUxA RiKVCmp0OfvmdjiK+YZnNxCjUoQR X-Google-Smtp-Source: ANB0VdanrQjIRa9uAPEQJKoWDImHXIScYfZByDgepI4dhy0Nmg2BwJOgkeupulboKIUGwWykq5bHGA== X-Received: by 2002:a02:39a:: with SMTP id e26-v6mr25422749jae.135.1536122349172; Tue, 04 Sep 2018 21:39:09 -0700 (PDT) Received: from mail-it0-f45.google.com (mail-it0-f45.google.com. [209.85.214.45]) by smtp.gmail.com with ESMTPSA id b129-v6sm639178ioa.75.2018.09.04.21.39.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 04 Sep 2018 21:39:08 -0700 (PDT) Received: by mail-it0-f45.google.com with SMTP id p129-v6so7847513ite.3; Tue, 04 Sep 2018 21:39:08 -0700 (PDT) X-Received: by 2002:a24:41cd:: with SMTP id b74-v6mr2516930itd.85.1536122348375; Tue, 04 Sep 2018 21:39:08 -0700 (PDT) MIME-Version: 1.0 Reply-To: cem@freebsd.org Received: by 2002:a02:9542:0:0:0:0:0 with HTTP; Tue, 4 Sep 2018 21:39:07 -0700 (PDT) In-Reply-To: References: <609400979.20180904230820@serebryakov.spb.ru> <1942661439.20180904235514@serebryakov.spb.ru> <774228883.20180905001035@serebryakov.spb.ru> From: Conrad Meyer Date: Tue, 4 Sep 2018 21:39:07 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: newfs silently fails if random is not ready (?) To: Lev Serebryakov Cc: FreeBSD Current , freebsd-fs , Xin LI Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Sep 2018 04:46:08 -0000 With current libc, I instead see: load: 0.10 cmd: blocked_random_poc 1668 [randseed] 1.27r 0.00u 0.00s 0% 2328k (SIGINFO) $ procstat -kk 1668 PID TID COMM TDNAME KSTACK 1668 100609 blocked_random_poc - mi_switch+0xd3 sleepq_catch_signals+0x386 sleepq_timedwait_sig+0x12 _sleep+0x272 read_random_uio+0xb3 sys_getrandom+0xa3 amd64_syscall+0x940 fast_syscall_common+0x101 and: $ truss ./blocked_random_poc ... getrandom(0x7fffffffd340,40,0) ERR#35 'Resource temporarily unavailable' thr_self(0x7fffffffd310) = 0 (0x0) thr_kill(100609,SIGKILL) = 0 (0x0) SIGNAL 9 (SIGKILL) code=SI_NOINFO So getrandom(2) (via READ_RANDOM_UIO) is returning a bogus EAGAIN after we have already slept until random was seeded. This bubbles up to getentropy(3) -> arc4random(3), which sees a surprising failure from getentropy(3) and raises KILL against the program. I believe the EWOULDBLOCK is just a boring leak of tsleep(9)'s timeout condition. This may be sufficient to fix the problem: --- a/sys/dev/random/randomdev.c +++ b/sys/dev/random/randomdev.c @@ -156,6 +156,10 @@ READ_RANDOM_UIO(struct uio *uio, bool nonblock) error = tsleep(&random_alg_context, PCATCH, "randseed", hz/10); if (error == ERESTART || error == EINTR) break; + /* Squash hz/10 timeout condition */ + if (error == EWOULDBLOCK) + error = 0; + KASSERT(error == 0, ("unexpected %d", error)); } if (error == 0) { read_rate_increment((uio->uio_resid + sizeof(uint32_t))/sizeof(uint32_t)); Best, Conrad On Tue, Sep 4, 2018 at 8:13 PM, Conrad Meyer wrote: > Hi Lev, > > I took a first attempt at reproducing this problem on a fast > desktop-class system. First steps, give us a way to revert back to > unseeded status: > > --- a/sys/dev/random/fortuna.c > +++ b/sys/dev/random/fortuna.c > @@ -39,6 +39,7 @@ __FBSDID("$FreeBSD$"); > > #ifdef _KERNEL > #include > +#include > #include > #include > #include > @@ -384,6 +385,17 @@ random_fortuna_pre_read(void) > return; > } > > + /* > + * When set, pretend we do not have enough entropy to reseed yet. > + */ > + KFAIL_POINT_CODE(DEBUG_FP, random_fortuna_pre_read, { > + if (RETURN_VALUE != 0) { > + RANDOM_RESEED_UNLOCK(); > + return; > + } > + }); > + > + > #ifdef _KERNEL > fortuna_state.fs_lasttime = now; > #endif > @@ -442,5 +454,11 @@ bool > random_fortuna_seeded(void) > { > > + /* When set, act as if we are not seeded. */ > + KFAIL_POINT_CODE(DEBUG_FP, random_fortuna_seeded, { > + if (RETURN_VALUE != 0) > + fortuna_state.fs_counter = UINT128_ZERO; > + }); > + > return (!uint128_is_zero(fortuna_state.fs_counter)); > } > > > Second step, enable the failpoints and launch repro program: > > $ sudo sysctl debug.fail_point.random_fortuna_pre_read='return(1)' > debug.fail_point.random_fortuna_pre_read: off -> return(1) > $ sudo sysctl debug.fail_point.random_fortuna_seeded='return(1)' > debug.fail_point.random_fortuna_seeded: off -> return(1) > > $ cat ./blocked_random_poc.c > #include > #include > #include > > int > main(int argc, char **argv) > { > printf("%x\n", arc4random()); > return (0); > } > > > $ ./blocked_random_poc > ... > > > Third step, I looked at what that process was doing: > > Curiously, it is not in getrandom() at all, but instead the ARND > sysctl fallback. I probably need to rebuild world (libc) to test this > (new libc arc4random based on Chacha). > > $ procstat -kk 1196 > PID TID COMM TDNAME KSTACK > 1196 100435 blocked_random_poc - read_random+0x3d > sysctl_kern_arnd+0x3a sysctl_root_handler_locked+0x89 > sysctl_root.isra.8+0x167 userland_sysctl+0x126 sys___sysctl+0x7b > amd64_syscall+0x940 fast_syscall_common+0x101 > > > When I unblocked the failpoints, it completed successfully: > > $ sudo sysctl debug.fail_point.random_fortuna_pre_read='off' > debug.fail_point.random_fortuna_pre_read: return(1) -> off > $ sudo sysctl debug.fail_point.random_fortuna_seeded=off > debug.fail_point.random_fortuna_seeded: return(1) -> off > > ... > 9e5eb30f > > > Best, > Conrad