Date: Fri, 19 Aug 2011 18:44:37 +0200 From: Attilio Rao <attilio@freebsd.org> To: Andrew Boyer <aboyer@averesystems.com> Cc: freebsd-stable@freebsd.org, Eugene Grosbein <egrosbein@rdtc.ru>, Vishal.Shah@netapp.com, Andriy Gapon <avg@freebsd.org>, Hans Petter Selasky <hselasky@c2i.net>, Jeremiah Lott <jlott@averesystems.com>, Steven Hartland <killing@multiplay.co.uk> Subject: Re: USB/coredump hangs in 8 and 9 Message-ID: <CAJ-FndD6SyzNSG9whzz%2BzAeXO4mTmRbD8uU4ttNXJhDobdeG-g@mail.gmail.com> In-Reply-To: <DA1FD6FD-2E57-4EC4-899D-2C1CBB769456@averesystems.com> References: <DA1FD6FD-2E57-4EC4-899D-2C1CBB769456@averesystems.com>
next in thread | previous in thread | raw e-mail | index | archive | help
2011/8/12 Andrew Boyer <aboyer@averesystems.com>: > Re: panic: bufwrite: buffer is not busy??? (originally on freebsd-net) > Re: debugging frequent kernel panics on 8.2-RELEASE (originally on freebs= d-stable) > Re: System hang in USB umass module while processing panic =C2=A0(origina= lly on freebsd-usb) > > Hello Andriy and Hans, > > Sorry for tying in so many discussions on this topic, but I think I have = an explanation for the problems we have been reporting* with hanging coredu= mps on multicore systems on 8.2-RELEASE, and it has implications for Andriy= 's proposed scheduler patch** and for USB. > > In today's 8.X and 9.X branches, nothing that I can find stops the other = CPUs when the kernel panics, but many parts of the locking code get disable= d (grep on 'panicstr'). =C2=A0The 'bufwrite: buffer is not busy???' panic i= s caused by the syncer encountering an error. =C2=A0If that happens when it= 's on the dumping CPU everything hangs. =C2=A0If it's running on a differen= t CPU, it will be blocked and hidden by the panic_cpu spinlock in panic(), = and the dump continues, polling every attached keyboard for a Ctl-C. > > But, the new 8.X USB stack relies on multithreading. =C2=A0(The new stack= is the variable that broke coredumps for us in the 7.1->8.2 transition, I = think.) =C2=A0SVN 224223 fixes a hang that would happen when dumpsys() poll= s the USB keyboard (IPMI KVM, in our case). =C2=A0That helps, but it only g= ets as far as usb_process(), where it hangs in a loop around a cv_wait() ca= ll. =C2=A0This is easy to reproduce by adding code to the watchdog to break= into the debugger if panicstr is set. > > I am experimenting with Andriy's patch** to stop the scheduler and it see= ms to be most of the way there, stopping the CPUs and disabling the rest of= locking. =C2=A0There are a few places that still reference panicstr, but t= hat's minor. =C2=A0These are the changes I made to the patch: > =C2=A0* Changed ukbd_do_poll() to return immediately if SCHEDULER_STOPPED= () is true, so that we don't hang up in USB. =C2=A0ukbd_yield() =C2=A0locks= up in DROP_GIANT(), and if you skip ukbd_yield(), usbd_transfer_poll() loc= ks up trying to drop mutexes. > =C2=A0* Changed the call to spinlock_enter() back to critical_enter(), so= that interrupts stay enabled and the hardclock still functions. Which spinlock_enter() are you referring here? I think that having interrupts fast handlers running during panic/shutdown is something we should avoid like hell. Attilio --=20 Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndD6SyzNSG9whzz%2BzAeXO4mTmRbD8uU4ttNXJhDobdeG-g>