Date: Sun, 10 Jan 1999 13:28:25 -0500 (EST) From: Luoqi Chen <luoqi@watermarkgroup.com> To: dg@FreeBSD.ORG, syssgm@dtir.qld.gov.au Cc: freebsd-current@FreeBSD.ORG Subject: Re: Hangs on "inode" and "thrd_sleep" Message-ID: <199901101828.NAA07251@lor.watermarkgroup.com>
next in thread | raw e-mail | index | archive | help
I ran into the same problem before. If you search the -current archive for deadlock, I posted a patch to solve this problem (I should have filed a PR for this). There are more serious problems with locking in vm_fault(), which are more difficult to fix, see PR 8416 by Tor Egge. -lq > My test machine hung last night during 'make -j5 buildworld' with 7 processes > in "thrd_sleep" and 2 in "inode". Thus began a marathon DDB session > (punctuated by some reluctant sleep). > > The machine is a 486DX2/66 with 16Mb ram, AHA1542CF, 1Gb hard disk, kernel > from 29/12/98, compiling current from yesterday, elf binaries, elf kernel, > softupdates. No NFS involved. Plenty of swap, and with only 16Mb ram and > parallel builds it does an awful lot of paging. > > The last visible bit of the compilation log went like this: > > cc -fpic -DPIC ... alias_util.so > building profiled alias library > building standard alias library > building shared alias library (version 2) > > Since it was a parallel make, possibly all 3 library builds are running in > parallel. Certainly there are 3 tsort and 3 nm processes active (well, they > would be if the whole thing wasn't wedged). > > The processes in "thrd_sleep" are trying to lock exec_map. Exec_map has > 1 shared lock, 7 waiting, and LK_NOPAUSE LK_SHARE_NON_ZERO LK_WAIT_NON_ZERO > and LK_WANT_EXCL set. Where's the missing process with the shared lock? > > The processes in "inode" are trying to lock the inode that refers to the > vnode that is "/usr/obj/elf/usr/src/tmp/usr/bin/sed". There is 1 shared lock > and 2 waiting, and LK_SHARE_NON_ZERO LK_WAIT_NON_ZERO and LK_WANT_EXCL set. > Similarly, where is the missing process with the shared lock? > > Well, the exec_map contains 6 entries. Three are largish and must be from > argument copying. The other 3 are single pages, and must come from that > peculiar double-mapping-of-the-text-data-boundary bit in elf_load_section(). > Two of these pages are from the same "sed" vnode that the processes stuck > in "inode" want. Of course, what I really should be saying is that the > same page is in exec_map in two places. > > The problem was not lack of free pages. The free list has hundreds of > free pages. > > I'd like to say I've got to the bottom of all this and add another one > line patch to the kernel, but I've run out of puff. I'll be leaving the > machine on (and stuck) for a while and will try again to determine the > root cause. > > But I will ask: What is likely to happen if two processes attempt to > exec the same binary at the same time and the binary is not in core? > > The only place I can find that issues a shared lock on exec_map is the > vm_fault() (via vm_map_lookup()) to fill that double mapped text/data page. > Everything else seems to want an exclusive lock. Thus I point my finger > vaguely in the direction of the elf exec code and yell "Witch! Burn her!" > > What else can I discover from my hung 486 that could help diagnose this? > I've only got DDB and stupidly disconnected my serial console setup. > > Stephen. > > PS Finding the name of a vnode from the name cache using ddb is slow and > painful. What's the easy way? > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901101828.NAA07251>