Date: Tue, 19 Oct 2004 17:45:59 -0400 From: John Baldwin <jhb@freebsd.org> To: Julian Elischer <julian@elischer.org> Cc: Robert Watson <rwatson@freebsd.org> Subject: Re: Preemption-related bug in propagate_priority(): it's OK to hold Giant but not be running Message-ID: <200410191745.59493.jhb@FreeBSD.org> In-Reply-To: <41758849.5080500@elischer.org> References: <Pine.NEB.3.96L.1041017151317.29749A-100000@fledge.watson.org> <200410191718.37092.jhb@FreeBSD.org> <41758849.5080500@elischer.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 19 October 2004 05:34 pm, Julian Elischer wrote: > John Baldwin wrote: > >On Tuesday 19 October 2004 05:03 pm, John Baldwin wrote: > >>On Sunday 17 October 2004 03:17 pm, Robert Watson wrote: > >>>Ran into an interesting problem relating to WITNESS and a > >>>multiply-destroyed mutex today, but I didn't get the real panic message > >>>because of what looks like one or two other bugs. Here's the > >>> high-level: > >>> > >>>sodealloc() tries to destroy the mutex on a socket more than once, > >>>causing WITNESS to panic. The soft clock ithread preempts the serial > >>>output in panic() when a critical section is released, resulting in a > >>>context switch to that ithread. That ithread then attempts to lock a > >>>mutex, and is horrified to find out that Giant is held by the thread it > >>>preempted, but that that thread isn't blocked: > >>> > >>>tiger-2# ppanic: process 15955(tcpconnect):2 holds Giant but isn't > >>>blocked on a lock > >>> > >>> % rwatson -- note that double p is from the initiating panic > >>> > >>>cpuid = 0 > >>>KDB: enter: panic > >>>[thread 100071] > >>>Stopped at kdb_enter+0x2b: nop > >>>db> trace > >>>kdb_enter(c07fc101) at kdb_enter+0x2b > >>>panic(c07ff1ec,3e53,c5f90588,2,c080c957) at panic+0x127 > >>>propagate_priority(c547b180,c08c47b8,c5f91780,c08bdbc0,0) at > >>>propagate_priority+ > >>>0x139 > >>>turnstile_wait(0,c08bdbc0,c5f91780,c08bdbc0,2,c07fb4a5,217) at > >>>turnstile_wait+0x > >>>2de > >>>_mtx_lock_sleep(c08bdbc0,c547b180,0,c07fd124,f7) at > >>> _mtx_lock_sleep+0x15f _mtx_lock_flags(c08bdbc0,0,c07fd124,f7,0) at > >>> _mtx_lock_flags+0x85 softclock(0) at softclock+0x16a > >>>ithread_loop(c5436e80,e942fd48,c5436e80,c05f7c80,0) at > >>> ithread_loop+0x124 fork_exit(c05f7c80,c5436e80,e942fd48) at > >>> fork_exit+0xa4 > >>>fork_trampoline() at fork_trampoline+0x8 > >>>--- trap 0x1, eip = 0, esp = 0xe942fd7c, ebp = 0 --- > >>>db> show locks > >>>db> show pcpu > >>>cpuid = 0 > >>>curthread = 0xc547b180: pid 87 "swi4: clock sio" > >>>curpcb = 0xe942fda0 > >>>fpcurthread = none > >>>idlethread = 0xc53e4a80: pid 14 "idle: cpu0" > >>>APIC ID = 0 > >>>currentldt = 0x28 > >>>spin locks held: > >>>db> ps > >>> pid proc uarea uid ppid pgrp flag stat wmesg wchan > >>>cmd 15955 c5f90400 ebbb1000 0 525 15955 0004002 [Can run] > >>> tcpconnect ... > >>> 87 c548b400 e946e000 0 0 0 000020c [LOCK Giant c5432d00] > >>>swi4: clock sio > >>>db> trace 15955 > >>>sched_switch(c5f91780,c5451d80,6) at sched_switch+0x16f > >>>mi_switch(6,c5451d80,c5451ed0,c5451d80,ebba492c) at mi_switch+0x264 > >>>maybe_preempt(c5451d80) at maybe_preempt+0x156 > >>>sched_add(c5451d80,4,c53e2d80,c5451d80,c5450800) at sched_add+0x153 > >>>setrunqueue(c5451d80,4) at setrunqueue+0xab > >>>ithread_schedule(c53e2d80,1f,c5f91780,c5f91780,2580) at > >>>ithread_schedule+0xb3 > >>>intr_execute_handlers(c53dc0d8,ebba49a8,1f,ebba49f0,c0780ad3) at > >>>intr_execute_ha > >>>ndlers+0xf5 > >>>lapic_handle_intr(4f) at lapic_handle_intr+0x2e > >>>Xapic_isr2() at Xapic_isr2+0x33 > >>>--- interrupt, eip = 0xc0619446, esp = 0xebba49ec, ebp = 0xebba49f0 --- > >>>critical_exit(3f8,ebba4a30,c076f731,c08fa000,0) at critical_exit+0x92 > >>>_mtx_unlock_spin_flags(c08fa000,0,c08174fc,be4) at > >>>_mtx_unlock_spin_flags+0x8d > >>>siocnputc(c08987c0,70) at siocnputc+0xb9 > >>>cnputc(70) at cnputc+0x4d > >>>putchar(70,ebba4b44) at putchar+0x52 > >>>kvprintf(c07fc2ed,c06219b8,ebba4b44,a,ebba4b64) at kvprintf+0x77 > >>>printf(c07fc2ed,c08bf060,c08bf060,100,c07ff91e,ebba4b98) at printf+0x43 > >>>panic(c07ff91e,c07b69b1,c0813aa8,c07ff6af,c5a23ebc) at panic+0xcb > >>>witness_destroy(c5a23ebc,c5a23dec,ebba4bd0,c063f15f,c5a23ebc) at > >>>witness_destroy > >>>+0x40 > >>>mtx_destroy(c5a23ebc,c5e94e00,c5a23ebc,ebba4be8,c063f4fd) at > >>>mtx_destroy+0x7a > >>>sodealloc(c5a23dec,c5a23eb0,c5832bb0,c5a23dec,ebba4c00) at > >>> sodealloc+0xf3 sofree(c5a23dec) at sofree+0x281 > >>>soclose(c5a23dec,c5832bb0,0,ebba4c2c,c05efad0) at soclose+0x24d > >>>soo_close(c5832bb0,c5f91780) at soo_close+0x4b > >>>fdrop_locked(c5832bb0,c5f91780,c53ccf40,0,c07f90b1) at fdrop_locked+0x84 > >>>fdrop(c5832bb0,c5f91780,ebba4c78,c0628c5c,c07f90ba) at fdrop+0x24 > >>>closef(c5832bb0,c5f91780,c5ebc628,0,c07f90b1) at closef+0x1db > >>>close(c5f91780,ebba4d14,1,14b,292) at close+0xe6 > >>>syscall(2f,bfbf002f,bfbf002f,3,bfbfedf7) at syscall+0x213 > >>>Xint0x80_syscall() at Xint0x80_syscall+0x1f > >>>--- syscall (6, FreeBSD ELF32, close), eip = 0x280c154b, esp = > >>>0xbfbfec0c, ebp = > >>> 0xbfbfec48 --- > >>>db> show locks 15955 > >>>exclusive sleep mutex Giant r = 0 (0xc08bdbc0) locked @ > >>>kern/kern_descrip.c:967 > >>> > >>>So on face value, it looks like propagate_priority() is unhappy about > >>> the tcpconnect thread (that was preempted) still holding Giant. > >>> However, given that preemption is supposed to preempt but leave a > >>> thread runnable, this would seem not to be a panic-worthy condition. > >> > >>Yeah, pp doesn't like a thread in TD_CAN_RUN(). That state can only > >> happen if you have already panic'd though. I can fix it to allow > >> TD_CAN_RUN if the kernel has already panic'd though. > > > >Actually, this won't work. Here's the problem: the reason you got to pp() > > is that you blocked on Giant, right. So your ithread is going to block > > after pp() returns. Since the thread at the end of the chain is > > TD_CAN_RUN and due to the weird TDF_INPANIC stuff we have in > > choosethread(), the thread holding the lock that the ithread needs is > > _never_ going to run, so your ithread will never resume. The end result > > is that if you made pp() just return rather than panic'ing in this case, > > you would get a single 'p' character on your output, and then the machine > > would deadlock. At least this way you get into ddb. :-P > > it would be relatively simple to put those 'discarded' threads onto a > single list in the kernel > and 'c' could put them back on the run queue :-) That's a really nasty hack. I almost wonder if we shouldn't just bump the priority of the thread that panics way up instead, or maybe use a dedicated thread for handling panics or something. -- John Baldwin <jhb@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200410191745.59493.jhb>