From owner-freebsd-bugs@freebsd.org Thu May 11 09:24:07 2017 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7C7EFD68BDE for ; Thu, 11 May 2017 09:24:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6C7C61B16 for ; Thu, 11 May 2017 09:24:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v4B9O7pb055431 for ; Thu, 11 May 2017 09:24:07 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 219216] sched_bind() blocks if the entropy pool is starved Date: Thu, 11 May 2017 09:24:07 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: kami@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 May 2017 09:24:07 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219216 Bug ID: 219216 Summary: sched_bind() blocks if the entropy pool is starved Product: Base System Version: 11.0-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Many People Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: kami@freebsd.org I recently updated my 11-stable system: FreeBSD AprilRyan.norad 11.0-STABLE FreeBSD 11.0-STABLE #3 r318143: Wed May= 10 17:56:12 CEST 2017=20=20=20=20 root@AprilRyan.norad:/usr/obj/S403/amd64/usr/src/sys/S403 amd64 I immediately noticed that rand_harvestq is permanently running and consumi= ng a small but significant amount of CPU time, now. To investigate I started to = `dd bs=3D1m < /dev/random > /dev/null` Coincidentally I was running a release candidate of powerd++ in foreground = mode with temperature throttling at the same time: https://github.com/lonkamikaze/powerdxx/releases/tag/0.3.0-rc1 The following happened when I started the `dd`: - Two cores were fully consumed, one by dd, one by random_harvestq - powerd++ started to stutter and then completely freeze After I killed the `dd` process the following happened: - random_harvestq continued to consume an entire core for a long time - powerd++ remained frozen By erratically swiping my fingers over the touch screen I got powerd++ to return operation in a stuttering fashion. It took several minutes before the system acted normal again. The two surprising conclusions so far: - /dev/random blocks - powerd++ consumes randomness So I investigated the issue to find that it is the access to the following sysctls that blocks: dev.cpu.0.temperature dev.cpu.1.temperature dev.cpu.2.temperature dev.cpu.3.temperature Unloading the coretemp module in the blocked state resulted in a kernel pan= ic that told me coretemp was stuck in coretemp_get_val_sysctl(). With an unhealthy dose of uprintf() calls I figured out that the block happ= ens in coretemp_get_thermal_msr() (see /usr/src/sys/dev/coretemp/coretemp.c:306= ). The problem is the following code: 311 thread_lock(curthread); 312 sched_bind(curthread, cpu); 313 thread_unlock(curthread); The call to sched_bind() blocks when the entropy pool is starved (I suspect only if the thread is not currently running on the right core any way). Because I cannot fiddle with and replace sched_ule at runtime, I have decid= ed this is as far as I'm digging. I think that the scheduler depends on entropy is very worrying, not to say a bug, especially if randomness is a scarce resource. I got the system to pan= ic many times during this investigation, mostly because locks have been held t= oo long. E.g.: spin lock 0xffffffff81c8e380 (sched lock 3) held by 0xfffff80028b19560 (tid 100196) too long spin lock 0xffffffff81c8e380 (sched lock 3) held by 0xfffff80028b19560 (tid 100196) too long panic: spin lock held too long cpuid =3D 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe045154f= 850 vpanic() at vpanic+0x186/frame 0xfffffe045154f8d0 panic() at panic+0x43/frame 0xfffffe045154f930 _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0x311/frame 0xfffffe045154= f9a0 sched_idletd() at sched_idletd+0x3aa/frame 0xfffffe045154fa70 fork_exit() at fork_exit+0x85/frame 0xfffffe045154fab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe045154fab0 --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- KDB: enter: panic Uptime: 4m31s I also find it questionable that entropy harvesting continues after initial= ly seeding the RNG, making /dev/random susceptible to entropy poisoning by a malicious process that feeds bad entropy into /dev/random. --=20 You are receiving this mail because: You are the assignee for the bug.=