From owner-freebsd-bugs@freebsd.org  Thu May 11 09:24:07 2017
Return-Path: <owner-freebsd-bugs@freebsd.org>
Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7C7EFD68BDE
 for <freebsd-bugs@mailman.ysv.freebsd.org>;
 Thu, 11 May 2017 09:24:07 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6C7C61B16
 for <freebsd-bugs@FreeBSD.org>; Thu, 11 May 2017 09:24:07 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v4B9O7pb055431
 for <freebsd-bugs@FreeBSD.org>; Thu, 11 May 2017 09:24:07 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-bugs@FreeBSD.org
Subject: [Bug 219216] sched_bind() blocks if the entropy pool is starved
Date: Thu, 11 May 2017 09:24:07 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 11.0-STABLE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: kami@freebsd.org
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform
 op_sys bug_status bug_severity priority component assigned_to reporter
Message-ID: <bug-219216-8@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-bugs>,
 <mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs/>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
 <mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 May 2017 09:24:07 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219216

            Bug ID: 219216
           Summary: sched_bind() blocks if the entropy pool is starved
           Product: Base System
           Version: 11.0-STABLE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: kami@freebsd.org

I recently updated my 11-stable system:

FreeBSD AprilRyan.norad 11.0-STABLE FreeBSD 11.0-STABLE #3 r318143: Wed May=
 10
17:56:12 CEST 2017=20=20=20=20
root@AprilRyan.norad:/usr/obj/S403/amd64/usr/src/sys/S403  amd64

I immediately noticed that rand_harvestq is permanently running and consumi=
ng a
small but significant amount of CPU time, now. To investigate I started to =
`dd
bs=3D1m < /dev/random > /dev/null`

Coincidentally I was running a release candidate of powerd++ in foreground =
mode
with temperature throttling at the same time:
https://github.com/lonkamikaze/powerdxx/releases/tag/0.3.0-rc1

The following happened when I started the `dd`:

- Two cores were fully consumed, one by dd, one by random_harvestq
- powerd++ started to stutter and then completely freeze

After I killed the `dd` process the following happened:

- random_harvestq continued to consume an entire core for a long time
- powerd++ remained frozen

By erratically swiping my fingers over the touch screen I got powerd++ to
return operation in a stuttering fashion. It took several minutes before the
system acted normal again.

The two surprising conclusions so far:

- /dev/random blocks
- powerd++ consumes randomness

So I investigated the issue to find that it is the access to the following
sysctls that blocks:

dev.cpu.0.temperature
dev.cpu.1.temperature
dev.cpu.2.temperature
dev.cpu.3.temperature

Unloading the coretemp module in the blocked state resulted in a kernel pan=
ic
that told me coretemp was stuck in coretemp_get_val_sysctl().

With an unhealthy dose of uprintf() calls I figured out that the block happ=
ens
in coretemp_get_thermal_msr() (see /usr/src/sys/dev/coretemp/coretemp.c:306=
).

The problem is the following code:

311         thread_lock(curthread);
312         sched_bind(curthread, cpu);
313         thread_unlock(curthread);

The call to sched_bind() blocks when the entropy pool is starved (I suspect
only if the thread is not currently running on the right core any way).

Because I cannot fiddle with and replace sched_ule at runtime, I have decid=
ed
this is as far as I'm digging.

I think that the scheduler depends on entropy is very worrying, not to say a
bug, especially if randomness is a scarce resource. I got the system to pan=
ic
many times during this investigation, mostly because locks have been held t=
oo
long. E.g.:

spin lock 0xffffffff81c8e380 (sched lock 3) held by 0xfffff80028b19560 (tid
100196) too long
spin lock 0xffffffff81c8e380 (sched lock 3) held by 0xfffff80028b19560 (tid
100196) too long
panic: spin lock held too long
cpuid =3D 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe045154f=
850
vpanic() at vpanic+0x186/frame 0xfffffe045154f8d0
panic() at panic+0x43/frame 0xfffffe045154f930
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0x311/frame 0xfffffe045154=
f9a0
sched_idletd() at sched_idletd+0x3aa/frame 0xfffffe045154fa70
fork_exit() at fork_exit+0x85/frame 0xfffffe045154fab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe045154fab0
--- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 ---
KDB: enter: panic
Uptime: 4m31s


I also find it questionable that entropy harvesting continues after initial=
ly
seeding the RNG, making /dev/random susceptible to entropy poisoning by a
malicious process that feeds bad entropy into /dev/random.

--=20
You are receiving this mail because:
You are the assignee for the bug.=