Date: Fri, 21 Nov 2014 15:34:37 -0500 From: "Ellis H. Wilson III" <ellisw@panasas.com> To: <freebsd-current@freebsd.org> Subject: Re: WITNESS observes 2 LORs on Boot of Release 10.1 Message-ID: <546FA1DD.2070109@panasas.com> In-Reply-To: <546BF3F5.8030109@panasas.com> References: <546BA9D3.6070007@panasas.com> <alpine.GSO.1.10.1411181734520.19231@multics.mit.edu> <546BF3F5.8030109@panasas.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11/18/2014 08:35 PM, Ellis H. Wilson III wrote: > If nobody has seen these before, I'll try and put together fixes for > them. Please somebody speak up if you have seen them or have useful > information for me to go on in my patches. I've started to dig into the relevant code in sys/dev/random/random_harvestq.c, sys/dev/syscons/syscons.c, and sys/kern/subr_sleepqueue.c. Before I start, and this is mainly geared to my responder Benjamin Kaduk, based on your response, are you suggesting that the cnputc WITNESS panic you expected to happen is now completely unavoidable in FreeBSD 10? I.E., is this a spinlock that WITNESS falls over each time but that is provably deadlock free that the developers have decided cannot be BLESSED for some reason? I guess I just can't wrap my head around why we would ever move to a regime where SKIPSPIN is the default for testing... That just seems like an open invitation for introducing spinlock regressions. Moving onto the LORs I'm seeing, a question I have as a newbie to WITNESS debugging is how exactly to interpret the output if I see a stacktrace and then a LOR output like the following: lock order reversal: 1st 0xffffffff81633d88 entropy harvest mutex (entropy harvest mutex) @ /usr/src/sys/dev/random/random_harvestq.c:198 2nd 0xffffffff813b6208 scrlock (scrlock) @ /usr/src/sys/dev/syscons/syscons.c:2682 Does this mean WITNESS has already stored an ordering of #1 harvest_mtx then #2 scp->scr_lock, and somewhere somebody tried to lock scp->scr_lock without first getting harvest_mtx? Or the reverse (WITNESS previously recorded scrlock and then harvest and the lines it spit out were the offenders?) Along those lines, in 10.0 and 10.1 releases I get two LORs showing up almost on-top of each other, with the other LOR showing up as: lock order reversal: 1st 0xffffffff81633d88 entropy harvest mutex (entropy harvest mutex) @ /usr/src/sys/dev/random/random_harvestq.c:198 2nd 0xffffffff81424bb8 sleepq chain (sleepq chain) @ /usr/src/sys/kern/subr_sleepqueue.c:240 This seems like maybe two LORs are detected at the same time, which perhaps suggests that the harvest_mtx should have been taken /after/ both of the other locks mentioned (scrlock and sleepq). I'm happy to do the legwork implementing, testing, and submitting a patch for this, but I would really appreciate a pointer in the right direction from somebody who already has handled some LORs before. Thanks! ellis
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?546FA1DD.2070109>