From owner-freebsd-stable@FreeBSD.ORG Mon Jun 8 18:17:40 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E353106566C for ; Mon, 8 Jun 2009 18:17:40 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.terabit.net.ua (mail.terabit.net.ua [195.137.202.147]) by mx1.freebsd.org (Postfix) with ESMTP id 24C0D8FC13 for ; Mon, 8 Jun 2009 18:17:40 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from skuns.zoral.com.ua ([91.193.166.194] helo=mail.zoral.com.ua) by mail.terabit.net.ua with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63 (FreeBSD)) (envelope-from ) id 1MDjPc-000OUB-BA; Mon, 08 Jun 2009 21:17:36 +0300 Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id n58IHTvU008727 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 8 Jun 2009 21:17:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3) with ESMTP id n58IHSuF074237; Mon, 8 Jun 2009 21:17:28 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3/Submit) id n58IHSmY074236; Mon, 8 Jun 2009 21:17:28 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 8 Jun 2009 21:17:28 +0300 From: Kostik Belousov To: Mikolaj Golub Message-ID: <20090608181728.GU1927@deviant.kiev.zoral.com.ua> References: <86d49eocwx.fsf@kopusha.onet> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="mx4ocXf5NbpB7gU8" Content-Disposition: inline In-Reply-To: <86d49eocwx.fsf@kopusha.onet> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.1 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua X-Virus-Scanned: mail.terabit.net.ua 1MDjPc-000OUB-BA 82822130c04edd5771eb70daf723cf82 X-Terabit: YES Cc: freebsd-stable@freebsd.org Subject: Re: Assertion lock == sq->sq_lock failed at /usr/src/sys/kern/subr_sleepqueue.c:327 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Jun 2009 18:17:40 -0000 --mx4ocXf5NbpB7gU8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jun 08, 2009 at 08:25:50PM +0300, Mikolaj Golub wrote: > After recent upgrade (on June 3) of my 7-stable host (WITNESS is enabled,= the > previous build was Apr 26), I have been experienced panics when starting = some > our home made application: >=20 > Architecture: i386=20 > Architecture Version: 2=20 > Dump Length: 73797632B (70 MB)=20 > Blocksize: 512=20 > Dumptime: Mon Jun 8 15:29:14 2009=20 > Hostname: zhuzha.ua1=20 > Magic: FreeBSD Kernel Dump=20 > Version String: FreeBSD 7.2-STABLE #18: Wed Jun 3 14:28:49 EEST 2009= =20 > root@zhuzha.ua1:/usr/obj/usr/src/sys/DEBUG=20 > Panic String: Assertion lock =3D=3D sq->sq_lock failed at /usr/src/sys/= kern/subr_sleepqueue.c:327=20 > Dump Parity: 1277108796=20 > Bounds: 18=20 > =20 > Narrowing down the problem I have written a simple test program (in attac= hes) > that models in simplified way the behaviour or our application and reprod= uces > the crash. There are two threads. One of the threads is doing vfork/exec = to > start child process while the other is monitoring the status of the child > calling wait4(). >=20 > At the moment of the crash the parent vfork thread is sleeping on > wchan=3D0xc4d17ae0 (td->td_proc) with lock=3D0xc4d178b8 waiting for the c= hild to > exec. The parent wait4 thread with lock=3D0xc4d17b70 and the same wchan c= alls > sleepq_lookup(wchan) to check if the wait channel already have a sleep qu= eue > associated with it, finds the queue of the vfork thread, tries to insert = the > current thread into this sleep queue and fails on assertion lock=3D=3Dsq-= >sq_lock. >=20 > wait4 treead: >=20 > (kgdb) thr 128 > [Switching to thread 128 (Thread 100164)]#0 doadump () at pcpu.h:196 > 196 in pcpu.h > (kgdb) bt > #0 doadump () at pcpu.h:196 > #1 0xc0496509 in db_fncall (dummy1=3D-1061924641, dummy2=3D0, dummy3=3D-= 1, dummy4=3D0xe69a089c "") > at /usr/src/sys/ddb/db_command.c:516 > #2 0xc0496abf in db_command (last_cmdp=3D0xc0c9ab54, cmd_table=3D0x0, do= pager=3D0) > at /usr/src/sys/ddb/db_command.c:413 > #3 0xc0496b34 in db_command_script (command=3D0xc0c9baa4 "call doadump") > at /usr/src/sys/ddb/db_command.c:484 > #4 0xc049a3a0 in db_script_exec (scriptname=3D0xe69a09a8 "kdb.enter.pani= c", warnifnotfound=3DVariable "warnifnotfound" is not available. > ) > at /usr/src/sys/ddb/db_script.c:302 > #5 0xc049a47d in db_script_kdbenter (eventname=3D0xc0b8fa85 "panic") at = /usr/src/sys/ddb/db_script.c:324 > #6 0xc0498388 in db_trap (type=3D3, code=3D0) at /usr/src/sys/ddb/db_mai= n.c:227 > #7 0xc0810d36 in kdb_trap (type=3D3, code=3D0, tf=3D0xe69a0ae0) at /usr/= src/sys/kern/subr_kdb.c:524 > #8 0xc0add6ab in trap (frame=3D0xe69a0ae0) at /usr/src/sys/i386/i386/tra= p.c:688 > #9 0xc0ac159b in calltrap () at /usr/src/sys/i386/i386/exception.s:159 > #10 0xc0810eba in kdb_enter_why (why=3D0xc0b8fa85 "panic", msg=3D0xc0b8fa= 85 "panic") at cpufunc.h:60 > #11 0xc07e4216 in panic (fmt=3D0xc0b5ecd3 "Assertion %s failed at %s:%d") > at /usr/src/sys/kern/kern_shutdown.c:557 > #12 0xc08194dc in sleepq_add (wchan=3D0xc4d17ae0, lock=3D0xc4d17b70, wmes= g=3D0xc0b959c1 "wait", flags=3D256,=20 > queue=3D0) at /usr/src/sys/kern/subr_sleepqueue.c:327 > #13 0xc07ec94b in _sleep (ident=3D0xc4d17ae0, lock=3DVariable "lock" is n= ot available. > ) at /usr/src/sys/kern/kern_synch.c:203 > #14 0xc07bd8e6 in kern_wait (td=3D0xc41e8480, pid=3D-1, status=3D0xe69a0c= 2c, options=3DVariable "options" is not available. > ) > at /usr/src/sys/kern/kern_exit.c:902 > #15 0xc07bd95b in wait4 (td=3D0xc41e8480, uap=3D0xe69a0cfc) at /usr/src/s= ys/kern/kern_exit.c:688 > #16 0xc0adce23 in syscall (frame=3D0xe69a0d38) at /usr/src/sys/i386/i386/= trap.c:1090 > #17 0xc0ac1600 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception= .s:255 > #18 0x00000033 in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) fr 12 > #12 0xc08194dc in sleepq_add (wchan=3D0xc4d17ae0, lock=3D0xc4d17b70, wmes= g=3D0xc0b959c1 "wait", flags=3D256,=20 > queue=3D0) at /usr/src/sys/kern/subr_sleepqueue.c:327 > 327 MPASS(lock =3D=3D sq->sq_lock); > (kgdb) p lock > $1 =3D (struct lock_object *) 0xc4d17b70 > (kgdb) p sq->sq_lock > $2 =3D (struct lock_object *) 0xc4d178b8 > (kgdb) p *lock > $3 =3D {lo_name =3D 0xc0b8e532 "process lock", lo_type =3D 0xc0b8e532 "pr= ocess lock", lo_flags =3D 21168128,=20 > lo_witness_data =3D {lod_list =3D {stqe_next =3D 0xc0ce4840}, lod_witne= ss =3D 0xc0ce4840}} > (kgdb) p *sq->sq_lock > $4 =3D {lo_name =3D 0xc0b8e532 "process lock", lo_type =3D 0xc0b8e532 "pr= ocess lock", lo_flags =3D 21168128,=20 > lo_witness_data =3D {lod_list =3D {stqe_next =3D 0xc0ce4840}, lod_witne= ss =3D 0xc0ce4840}} >=20 > vfork thread of the parent process: >=20 > (kgdb) thr 127 > [Switching to thread 127 (Thread 100162)]#0 sched_switch (td=3D0xc41e86c= 0, newtd=3DVariable "newtd" is not available. > ) > at /usr/src/sys/kern/sched_ule.c:1944 > 1944 cpuid =3D PCPU_GET(cpuid); > (kgdb) bt > #0 sched_switch (td=3D0xc41e86c0, newtd=3DVariable "newtd" is not availa= ble. > ) at /usr/src/sys/kern/sched_ule.c:1944 > #1 0xc07ec4c3 in mi_switch (flags=3DVariable "flags" is not available. > ) at /usr/src/sys/kern/kern_synch.c:444 > #2 0xc0818bd2 in sleepq_switch (wchan=3D0xc4d17ae0) at /usr/src/sys/kern= /subr_sleepqueue.c:497 > #3 0xc0819800 in sleepq_wait (wchan=3D0xc4d17ae0) at /usr/src/sys/kern/s= ubr_sleepqueue.c:580 > #4 0xc07eca69 in _sleep (ident=3D0xc4d17ae0, lock=3D0xc4d178b8, priority= =3D92, wmesg=3D0xc0b8bacb "ppwait",=20 > timo=3D0) at /usr/src/sys/kern/kern_synch.c:230 > #5 0xc07c0631 in fork1 (td=3D0xc41e86c0, flags=3D-2147483596, pages=3D0,= procp=3D0xe699ac78) > at /usr/src/sys/kern/kern_fork.c:747 > #6 0xc07c07c9 in vfork (td=3D0xc41e86c0, uap=3D0xe699acfc) at /usr/src/s= ys/kern/kern_fork.c:124 > #7 0xc0adce23 in syscall (frame=3D0xe699ad38) at /usr/src/sys/i386/i386/= trap.c:1090 > #8 0xc0ac1600 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception= .s:255 > #9 0x00000033 in ?? () > Previous frame inner to this frame (corrupt stack?) >=20 > And here is the child process: >=20 > (kgdb) thr 129 > [Switching to thread 129 (Thread 100163)]#0 sched_switch (td=3D0xc4d1d90= 0, newtd=3DVariable "newtd" is not available. > ) > at /usr/src/sys/kern/sched_ule.c:1944 > 1944 cpuid =3D PCPU_GET(cpuid); > (kgdb) bt > #0 sched_switch (td=3D0xc4d1d900, newtd=3DVariable "newtd" is not availa= ble. > ) at /usr/src/sys/kern/sched_ule.c:1944 > #1 0xc07ec4c3 in mi_switch (flags=3DVariable "flags" is not available. > ) at /usr/src/sys/kern/kern_synch.c:444 > #2 0xc0818bd2 in sleepq_switch (wchan=3D0xc0ccdac4) at /usr/src/sys/kern= /subr_sleepqueue.c:497 > #3 0xc0818e6f in sleepq_catch_signals (wchan=3D0xc0ccdac4) at /usr/src/s= ys/kern/subr_sleepqueue.c:417 > #4 0xc0819647 in sleepq_timedwait_sig (wchan=3D0xc0ccdac4) at /usr/src/s= ys/kern/subr_sleepqueue.c:631 > #5 0xc07eca31 in _sleep (ident=3D0xc0ccdac4, lock=3D0x0, priority=3D348,= wmesg=3D0xc0b90bcd "nanslp", timo=3D2) > at /usr/src/sys/kern/kern_synch.c:224 > #6 0xc07f3f51 in kern_nanosleep (td=3D0xc4d1d900, rqt=3D0xe699dc64, rmt= =3D0xe699dc6c) > at /usr/src/sys/kern/kern_time.c:373 > #7 0xc07f594f in nanosleep (td=3D0xc4d1d900, uap=3D0xe699dcfc) at /usr/s= rc/sys/kern/kern_time.c:415 > #8 0xc0adce23 in syscall (frame=3D0xe699dd38) at /usr/src/sys/i386/i386/= trap.c:1090 > #9 0xc0ac1600 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception= .s:255 > #10 0x00000033 in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) p &td->td_proc.p_mtx > $5 =3D (struct mtx *) 0xc4d178b8 >=20 > Actually, I am not sure that the problem is observed only on the recent > STABLE. It might have not been triggered by our application before the > upgrade. Currently I don't have the system with some old STABLE to check = this > running the test program. This is fixed in HEAD by r185647. I did not merged it to 7, because I do not believe that somebody runs stable with INVARIANTS on :). --mx4ocXf5NbpB7gU8 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEARECAAYFAkotVbgACgkQC3+MBN1Mb4gTIACgmGEHHMIx/s7s/Qg31guguDrZ YO0An1zFxwnW3/PZCCbE7Vva7Frf+wiF =+RwY -----END PGP SIGNATURE----- --mx4ocXf5NbpB7gU8--