Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 May 2017 09:24:07 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 219216] sched_bind() blocks if the entropy pool is starved
Message-ID:  <bug-219216-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219216

            Bug ID: 219216
           Summary: sched_bind() blocks if the entropy pool is starved
           Product: Base System
           Version: 11.0-STABLE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: kami@freebsd.org

I recently updated my 11-stable system:

FreeBSD AprilRyan.norad 11.0-STABLE FreeBSD 11.0-STABLE #3 r318143: Wed May=
 10
17:56:12 CEST 2017=20=20=20=20
root@AprilRyan.norad:/usr/obj/S403/amd64/usr/src/sys/S403  amd64

I immediately noticed that rand_harvestq is permanently running and consumi=
ng a
small but significant amount of CPU time, now. To investigate I started to =
`dd
bs=3D1m < /dev/random > /dev/null`

Coincidentally I was running a release candidate of powerd++ in foreground =
mode
with temperature throttling at the same time:
https://github.com/lonkamikaze/powerdxx/releases/tag/0.3.0-rc1

The following happened when I started the `dd`:

- Two cores were fully consumed, one by dd, one by random_harvestq
- powerd++ started to stutter and then completely freeze

After I killed the `dd` process the following happened:

- random_harvestq continued to consume an entire core for a long time
- powerd++ remained frozen

By erratically swiping my fingers over the touch screen I got powerd++ to
return operation in a stuttering fashion. It took several minutes before the
system acted normal again.

The two surprising conclusions so far:

- /dev/random blocks
- powerd++ consumes randomness

So I investigated the issue to find that it is the access to the following
sysctls that blocks:

dev.cpu.0.temperature
dev.cpu.1.temperature
dev.cpu.2.temperature
dev.cpu.3.temperature

Unloading the coretemp module in the blocked state resulted in a kernel pan=
ic
that told me coretemp was stuck in coretemp_get_val_sysctl().

With an unhealthy dose of uprintf() calls I figured out that the block happ=
ens
in coretemp_get_thermal_msr() (see /usr/src/sys/dev/coretemp/coretemp.c:306=
).

The problem is the following code:

311         thread_lock(curthread);
312         sched_bind(curthread, cpu);
313         thread_unlock(curthread);

The call to sched_bind() blocks when the entropy pool is starved (I suspect
only if the thread is not currently running on the right core any way).

Because I cannot fiddle with and replace sched_ule at runtime, I have decid=
ed
this is as far as I'm digging.

I think that the scheduler depends on entropy is very worrying, not to say a
bug, especially if randomness is a scarce resource. I got the system to pan=
ic
many times during this investigation, mostly because locks have been held t=
oo
long. E.g.:

spin lock 0xffffffff81c8e380 (sched lock 3) held by 0xfffff80028b19560 (tid
100196) too long
spin lock 0xffffffff81c8e380 (sched lock 3) held by 0xfffff80028b19560 (tid
100196) too long
panic: spin lock held too long
cpuid =3D 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe045154f=
850
vpanic() at vpanic+0x186/frame 0xfffffe045154f8d0
panic() at panic+0x43/frame 0xfffffe045154f930
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0x311/frame 0xfffffe045154=
f9a0
sched_idletd() at sched_idletd+0x3aa/frame 0xfffffe045154fa70
fork_exit() at fork_exit+0x85/frame 0xfffffe045154fab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe045154fab0
--- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 ---
KDB: enter: panic
Uptime: 4m31s


I also find it questionable that entropy harvesting continues after initial=
ly
seeding the RNG, making /dev/random susceptible to entropy poisoning by a
malicious process that feeds bad entropy into /dev/random.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-219216-8>