Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 8 Jun 2009 21:17:28 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Mikolaj Golub <to.my.trociny@gmail.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Assertion lock == sq->sq_lock failed at /usr/src/sys/kern/subr_sleepqueue.c:327
Message-ID:  <20090608181728.GU1927@deviant.kiev.zoral.com.ua>
In-Reply-To: <86d49eocwx.fsf@kopusha.onet>
References:  <86d49eocwx.fsf@kopusha.onet>

next in thread | previous in thread | raw e-mail | index | archive | help

--mx4ocXf5NbpB7gU8
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jun 08, 2009 at 08:25:50PM +0300, Mikolaj Golub wrote:
> After recent upgrade (on June 3) of my 7-stable host (WITNESS is enabled,=
 the
> previous build was Apr 26), I have been experienced panics when starting =
some
> our home made application:
>=20
>   Architecture: i386=20
>   Architecture Version: 2=20
>   Dump Length: 73797632B (70 MB)=20
>   Blocksize: 512=20
>   Dumptime: Mon Jun  8 15:29:14 2009=20
>   Hostname: zhuzha.ua1=20
>   Magic: FreeBSD Kernel Dump=20
>   Version String: FreeBSD 7.2-STABLE #18: Wed Jun  3 14:28:49 EEST 2009=
=20
>     root@zhuzha.ua1:/usr/obj/usr/src/sys/DEBUG=20
>   Panic String: Assertion lock =3D=3D sq->sq_lock failed at /usr/src/sys/=
kern/subr_sleepqueue.c:327=20
>   Dump Parity: 1277108796=20
>   Bounds: 18=20
> =20
> Narrowing down the problem I have written a simple test program (in attac=
hes)
> that models in simplified way the behaviour or our application and reprod=
uces
> the crash. There are two threads. One of the threads is doing vfork/exec =
to
> start child process while the other is monitoring the status of the child
> calling wait4().
>=20
> At the moment of the crash the parent vfork thread is sleeping on
> wchan=3D0xc4d17ae0 (td->td_proc) with lock=3D0xc4d178b8 waiting for the c=
hild to
> exec. The parent wait4 thread with lock=3D0xc4d17b70 and the same wchan c=
alls
> sleepq_lookup(wchan) to check if the wait channel already have a sleep qu=
eue
> associated with it, finds the queue of the vfork thread, tries to insert =
the
> current thread into this sleep queue and fails on assertion lock=3D=3Dsq-=
>sq_lock.
>=20
> wait4 treead:
>=20
> (kgdb) thr 128
> [Switching to thread 128 (Thread 100164)]#0  doadump () at pcpu.h:196
> 196     in pcpu.h
> (kgdb) bt
> #0  doadump () at pcpu.h:196
> #1  0xc0496509 in db_fncall (dummy1=3D-1061924641, dummy2=3D0, dummy3=3D-=
1, dummy4=3D0xe69a089c "")
>     at /usr/src/sys/ddb/db_command.c:516
> #2  0xc0496abf in db_command (last_cmdp=3D0xc0c9ab54, cmd_table=3D0x0, do=
pager=3D0)
>     at /usr/src/sys/ddb/db_command.c:413
> #3  0xc0496b34 in db_command_script (command=3D0xc0c9baa4 "call doadump")
>     at /usr/src/sys/ddb/db_command.c:484
> #4  0xc049a3a0 in db_script_exec (scriptname=3D0xe69a09a8 "kdb.enter.pani=
c", warnifnotfound=3DVariable "warnifnotfound" is not available.
> )
>     at /usr/src/sys/ddb/db_script.c:302
> #5  0xc049a47d in db_script_kdbenter (eventname=3D0xc0b8fa85 "panic") at =
/usr/src/sys/ddb/db_script.c:324
> #6  0xc0498388 in db_trap (type=3D3, code=3D0) at /usr/src/sys/ddb/db_mai=
n.c:227
> #7  0xc0810d36 in kdb_trap (type=3D3, code=3D0, tf=3D0xe69a0ae0) at /usr/=
src/sys/kern/subr_kdb.c:524
> #8  0xc0add6ab in trap (frame=3D0xe69a0ae0) at /usr/src/sys/i386/i386/tra=
p.c:688
> #9  0xc0ac159b in calltrap () at /usr/src/sys/i386/i386/exception.s:159
> #10 0xc0810eba in kdb_enter_why (why=3D0xc0b8fa85 "panic", msg=3D0xc0b8fa=
85 "panic") at cpufunc.h:60
> #11 0xc07e4216 in panic (fmt=3D0xc0b5ecd3 "Assertion %s failed at %s:%d")
>     at /usr/src/sys/kern/kern_shutdown.c:557
> #12 0xc08194dc in sleepq_add (wchan=3D0xc4d17ae0, lock=3D0xc4d17b70, wmes=
g=3D0xc0b959c1 "wait", flags=3D256,=20
>     queue=3D0) at /usr/src/sys/kern/subr_sleepqueue.c:327
> #13 0xc07ec94b in _sleep (ident=3D0xc4d17ae0, lock=3DVariable "lock" is n=
ot available.
> ) at /usr/src/sys/kern/kern_synch.c:203
> #14 0xc07bd8e6 in kern_wait (td=3D0xc41e8480, pid=3D-1, status=3D0xe69a0c=
2c, options=3DVariable "options" is not available.
> )
>     at /usr/src/sys/kern/kern_exit.c:902
> #15 0xc07bd95b in wait4 (td=3D0xc41e8480, uap=3D0xe69a0cfc) at /usr/src/s=
ys/kern/kern_exit.c:688
> #16 0xc0adce23 in syscall (frame=3D0xe69a0d38) at /usr/src/sys/i386/i386/=
trap.c:1090
> #17 0xc0ac1600 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception=
.s:255
> #18 0x00000033 in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb) fr 12
> #12 0xc08194dc in sleepq_add (wchan=3D0xc4d17ae0, lock=3D0xc4d17b70, wmes=
g=3D0xc0b959c1 "wait", flags=3D256,=20
>     queue=3D0) at /usr/src/sys/kern/subr_sleepqueue.c:327
> 327                     MPASS(lock =3D=3D sq->sq_lock);
> (kgdb) p lock
> $1 =3D (struct lock_object *) 0xc4d17b70
> (kgdb) p sq->sq_lock
> $2 =3D (struct lock_object *) 0xc4d178b8
> (kgdb) p *lock
> $3 =3D {lo_name =3D 0xc0b8e532 "process lock", lo_type =3D 0xc0b8e532 "pr=
ocess lock", lo_flags =3D 21168128,=20
>   lo_witness_data =3D {lod_list =3D {stqe_next =3D 0xc0ce4840}, lod_witne=
ss =3D 0xc0ce4840}}
> (kgdb) p *sq->sq_lock
> $4 =3D {lo_name =3D 0xc0b8e532 "process lock", lo_type =3D 0xc0b8e532 "pr=
ocess lock", lo_flags =3D 21168128,=20
>   lo_witness_data =3D {lod_list =3D {stqe_next =3D 0xc0ce4840}, lod_witne=
ss =3D 0xc0ce4840}}
>=20
> vfork thread of the parent process:
>=20
> (kgdb) thr 127
> [Switching to thread 127 (Thread 100162)]#0  sched_switch (td=3D0xc41e86c=
0, newtd=3DVariable "newtd" is not available.
> )
>     at /usr/src/sys/kern/sched_ule.c:1944
> 1944                    cpuid =3D PCPU_GET(cpuid);
> (kgdb) bt
> #0  sched_switch (td=3D0xc41e86c0, newtd=3DVariable "newtd" is not availa=
ble.
> ) at /usr/src/sys/kern/sched_ule.c:1944
> #1  0xc07ec4c3 in mi_switch (flags=3DVariable "flags" is not available.
> ) at /usr/src/sys/kern/kern_synch.c:444
> #2  0xc0818bd2 in sleepq_switch (wchan=3D0xc4d17ae0) at /usr/src/sys/kern=
/subr_sleepqueue.c:497
> #3  0xc0819800 in sleepq_wait (wchan=3D0xc4d17ae0) at /usr/src/sys/kern/s=
ubr_sleepqueue.c:580
> #4  0xc07eca69 in _sleep (ident=3D0xc4d17ae0, lock=3D0xc4d178b8, priority=
=3D92, wmesg=3D0xc0b8bacb "ppwait",=20
>     timo=3D0) at /usr/src/sys/kern/kern_synch.c:230
> #5  0xc07c0631 in fork1 (td=3D0xc41e86c0, flags=3D-2147483596, pages=3D0,=
 procp=3D0xe699ac78)
>     at /usr/src/sys/kern/kern_fork.c:747
> #6  0xc07c07c9 in vfork (td=3D0xc41e86c0, uap=3D0xe699acfc) at /usr/src/s=
ys/kern/kern_fork.c:124
> #7  0xc0adce23 in syscall (frame=3D0xe699ad38) at /usr/src/sys/i386/i386/=
trap.c:1090
> #8  0xc0ac1600 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception=
.s:255
> #9  0x00000033 in ?? ()
> Previous frame inner to this frame (corrupt stack?)
>=20
> And here is the child process:
>=20
> (kgdb) thr 129
> [Switching to thread 129 (Thread 100163)]#0  sched_switch (td=3D0xc4d1d90=
0, newtd=3DVariable "newtd" is not available.
> )
>     at /usr/src/sys/kern/sched_ule.c:1944
> 1944                    cpuid =3D PCPU_GET(cpuid);
> (kgdb) bt
> #0  sched_switch (td=3D0xc4d1d900, newtd=3DVariable "newtd" is not availa=
ble.
> ) at /usr/src/sys/kern/sched_ule.c:1944
> #1  0xc07ec4c3 in mi_switch (flags=3DVariable "flags" is not available.
> ) at /usr/src/sys/kern/kern_synch.c:444
> #2  0xc0818bd2 in sleepq_switch (wchan=3D0xc0ccdac4) at /usr/src/sys/kern=
/subr_sleepqueue.c:497
> #3  0xc0818e6f in sleepq_catch_signals (wchan=3D0xc0ccdac4) at /usr/src/s=
ys/kern/subr_sleepqueue.c:417
> #4  0xc0819647 in sleepq_timedwait_sig (wchan=3D0xc0ccdac4) at /usr/src/s=
ys/kern/subr_sleepqueue.c:631
> #5  0xc07eca31 in _sleep (ident=3D0xc0ccdac4, lock=3D0x0, priority=3D348,=
 wmesg=3D0xc0b90bcd "nanslp", timo=3D2)
>     at /usr/src/sys/kern/kern_synch.c:224
> #6  0xc07f3f51 in kern_nanosleep (td=3D0xc4d1d900, rqt=3D0xe699dc64, rmt=
=3D0xe699dc6c)
>     at /usr/src/sys/kern/kern_time.c:373
> #7  0xc07f594f in nanosleep (td=3D0xc4d1d900, uap=3D0xe699dcfc) at /usr/s=
rc/sys/kern/kern_time.c:415
> #8  0xc0adce23 in syscall (frame=3D0xe699dd38) at /usr/src/sys/i386/i386/=
trap.c:1090
> #9  0xc0ac1600 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception=
.s:255
> #10 0x00000033 in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb) p &td->td_proc.p_mtx
> $5 =3D (struct mtx *) 0xc4d178b8
>=20
> Actually, I am not sure that the problem is observed only on the recent
> STABLE. It might have not been triggered by our application before the
> upgrade. Currently I don't have the system with some old STABLE to check =
this
> running the test program.

This is fixed in HEAD by r185647. I did not merged it to 7, because I
do not believe that somebody runs stable with INVARIANTS on :).

--mx4ocXf5NbpB7gU8
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (FreeBSD)

iEYEARECAAYFAkotVbgACgkQC3+MBN1Mb4gTIACgmGEHHMIx/s7s/Qg31guguDrZ
YO0An1zFxwnW3/PZCCbE7Vva7Frf+wiF
=+RwY
-----END PGP SIGNATURE-----

--mx4ocXf5NbpB7gU8--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090608181728.GU1927>