Date: Fri, 14 Nov 2003 18:23:23 +0000 From: Peter Edwards <peter.edwards@openet-telecom.com> To: John Baldwin <jhb@FreeBSD.org> Cc: current@freebsd.org Subject: Re: Fwd: propgagate_priority() crashes: recursive msleep() ?? Message-ID: <3FB51D9B.8090209@openet-telecom.com> In-Reply-To: <XFMail.20031114113013.jhb@FreeBSD.org> References: <XFMail.20031114113013.jhb@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin wrote: >On 14-Nov-2003 Peter Edwards wrote: > > >>(Aplogies if this message is a duplicate: The original is AWOL for quite >>a while now) >> >>Hi, >>I'm getting a crash in propagate priority, as mentioned by a few people recently. Bug reports and >>comments about it seemed to have dropped off, so given that I can reliably reproduce it, I was >>trying to work out why it's going on. >> >>One thing I found quite odd was the following stack trace. It appears that msleep() is being >>called recursively via cursig() calling stopevent. When msleep calls cursig(), it has temporarily >>dropped Giant. Surely this is bogus? (This is from a a kernel updated in the last few hours) >> >>#0 sched_switch (td=0xc4b30780) at /scratch/src/sys/kern/sched_4bsd.c:606 >>#1 0xc050d8db in mi_switch () at /scratch/src/sys/kern/kern_synch.c:514 >>#2 0xc050cf7f in msleep (ident=0xc4dc2bc8, mtx=0xc4dc2b04, priority=92, wmesg=0x0, >> timo=0) at /scratch/src/sys/kern/kern_synch.c:255 >>#3 0xc0534255 in stopevent (p=0xc4dc2a98, event=2, val=2) >> at /scratch/src/sys/kern/sys_process.c:740 >>#4 0xc0509362 in issignal (td=0xc4b30780) at /scratch/src/sys/kern/kern_sig.c:2082 >>#5 0xc0504eb8 in cursig (td=0xc4b30780) at /scratch/src/sys/sys/signalvar.h:227 >>#6 0xc050d0f2 in msleep (ident=0xc4dc2a98, mtx=0xc4dc2b04, priority=348, wmesg=0x0, >> timo=0) at /scratch/src/sys/kern/kern_synch.c:294 >>#7 0xc04eb82f in wait1 (td=0xc4b30780, uap=0xddcd6d10, compat=0) >> at /scratch/src/sys/kern/kern_exit.c:766 >>#8 0xc04eab90 in wait4 (td=0x0, uap=0x0) at /scratch/src/sys/kern/kern_exit.c:548 >>#9 0xc06241d0 in syscall (frame= >> {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 134899628, tf_esi = 134912305, tf_ebp = >>-1077943784, tf_isp = -573739660, tf_ebx = 772, tf_edx = 135012352, tf_ecx = 13, tf_eax = 7, >>tf_trapno = 12, tf_err = 2, tf_eip = 134525375, tf_cs = 31, tf_eflags = 646, tf_esp = >>-1077943812, tf_ss = 47}) at /scratch/src/sys/i386/i386/trap.c:1010 >> >> > >Are you using gdb or something else that does ptrace? Jeff has pointed >out why pp panics here, because this thread owns the sigacts lock while >asleep. However, doing a double sleep like this is very bogus and bad. >Grrrr. > > > I was using "truss": the actual command I ran was # truss mount unreachablehost:/mnt /mnt (where "unreachablehost" was the IP address of a host I had no route to) IIRC, the panicing thread was in softclock (possibly handling the terminal ^C, not sure), the mount command was waiting on the mount_nfs child to finish, and I assume the mount_nfs child was waiting in vain for a response it was never going to get. But, I suppose any traced process arriving in msleep (or cursig) is problematic. Silly question: Could the STOPEVENT stuff in issignal() just be delayed until userret()? I thought that was done for some other similar circumstances.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3FB51D9B.8090209>