Date: Wed, 23 Jul 2008 11:23:13 -0400 From: John Baldwin <jhb@freebsd.org> To: Kostik Belousov <kostikbel@gmail.com> Cc: Mikhail Teterin <mi+mill@aldan.algebra.com>, Kris Kennaway <kris@freebsd.org>, stable@freebsd.org Subject: Re: "sleeping without queue" ? Message-ID: <200807231123.14229.jhb@freebsd.org> In-Reply-To: <20080723120348.GJ17123@deviant.kiev.zoral.com.ua> References: <48860725.9050808@aldan.algebra.com> <48863C3D.7090401@aldan.algebra.com> <20080723120348.GJ17123@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 23 July 2008 08:03:48 am Kostik Belousov wrote: > On Tue, Jul 22, 2008 at 03:59:57PM -0400, Mikhail Teterin wrote: > > Kostik Belousov =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=B2(=D0=BB=D0=B0= ): > > >On Tue, Jul 22, 2008 at 03:26:29PM -0400, Mikhail Teterin wrote: > > >>Kostik Belousov =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=B2(=D0=BB=D0= =B0): > > >>>Did you switched to the process before doing backtrace (using the pr= oc=20 > > >>><pid> > > >>>command)? > > >>Ok, thanks. Did not know about this one. Here: > > >>... > > >>(kgdb) proc 79759 > > >>(kgdb) bt > > >>#0 sched_switch (td=3D0xffffff01286dc000, newtd=3D0xffffff00010ce000= ,=20 > > >>flags=3D2) at /var/src/sys/kern/sched_4bsd.c:928 > > >>#1 0x0000000000000000 in ?? () > > >>#2 0xffffffff802f1108 in mi_switch (flags=3D678281216, newtd=3D0x2) = at=20 > > >>/var/src/sys/kern/kern_synch.c:442 > > >>#3 0xffffffff80318513 in sleepq_check_timeout () at=20 > > >>/var/src/sys/kern/subr_sleepqueue.c:519 > > >>#4 0xffffffff80318c85 in sleepq_timedwait (wchan=3D0xffffffff8068840= 8) at=20 > > >>/var/src/sys/kern/subr_sleepqueue.c:597 > > >>#5 0xffffffff802f16a2 in _sleep (ident=3D0xffffffff80688408, lock=3D= 0x0,=20 > > >>priority=3D0, wmesg=3D0xffffffff804f3059 "vmo_de", timo=3D1) at=20 > > >>/var/src/sys/kern/kern_synch.c:224 > > >>#6 0xffffffff8043036b in vm_object_deallocate=20 > > >>(object=3D0xffffff0053024a90) at /var/src/sys/vm/vm_object.c:509 > > >From this frame, please, print the object (like p *object) and > > >likewise, print the object that is at the head of the object->shadow_h= ead > > >list. > > kgdb /usr/obj/var/src/sys/SILVER-SMP/kernel.debug /dev/mem > > [GDB will not be able to debug user-mode threads:=20 > > /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] > > GNU gdb 6.1.1 [FreeBSD] > > Copyright 2004 Free Software Foundation, Inc. > > GDB is free software, covered by the GNU General Public License, and yo= u=20 are > > welcome to change it and/or distribute copies of it under certain=20 > > conditions. > > Type "show copying" to see the conditions. > > There is absolutely no warranty for GDB. Type "show warranty" for=20 details. > > This GDB was configured as "amd64-marcel-freebsd". > > There is no member named pathname. > > Reading symbols from /opt/modules/fuse.ko...done. > > Loaded symbols for /opt/modules/fuse.ko > > Reading symbols from /opt/modules/rtc.ko...done. > > Loaded symbols for /opt/modules/rtc.ko > > Reading symbols from /boot/kernel/snd_ich.ko...Reading symbols from=20 > > /boot/kernel/snd_ich.ko.symbols...done. > > done. > > Loaded symbols for /boot/kernel/snd_ich.ko > > Reading symbols from /boot/kernel/msdosfs.ko...Reading symbols from=20 > > /boot/kernel/msdosfs.ko.symbols...done. > > done. > > Loaded symbols for /boot/kernel/msdosfs.ko > > #0 0x0000000000000000 in ?? () > > (kgdb) frame 6 > > Error accessing memory address 0x0: Bad address. > > (kgdb) pid 79759 > > Undefined command: "pid". Try "help". > > (kgdb) proc 79759 > > (kgdb) frame 6 > > #6 0xffffffff8043036b in vm_object_deallocate=20 > > (object=3D0xffffff0053024a90) at /var/src/sys/vm/vm_object.c:509 > > 509 pause("vmo_de", 1); > > (kgdb) p *object > > $1 =3D {mtx =3D {lock_object =3D {lo_name =3D 0xffffffff804f21c4 "vm ob= ject",=20 > > lo_type =3D 0xffffffff804f3018 "standard object", lo_flags =3D 21168128= ,=20 > > lo_witness_data =3D { > > lod_list =3D {stqe_next =3D 0x0}, lod_witness =3D 0x0}}, mtx_loc= k =3D 4,=20 > > mtx_recurse =3D 0}, object_list =3D {tqe_next =3D 0xffffff0005018a90, > > tqe_prev =3D 0xffffff00539a6850}, shadow_head =3D {lh_first =3D=20 > > 0xffffff005d3afa90}, shadow_list =3D {le_next =3D 0x0, le_prev =3D=20 > > 0xffffff005d2cd048}, memq =3D { > > tqh_first =3D 0xffffff007eb9fa58, tqh_last =3D 0xffffff007f864820}, = root=20 > > =3D 0xffffff007ee14d38, size =3D 427, generation =3D 66, ref_count =3D = 2,=20 > > shadow_count =3D 1, > > type =3D 0 '\0', flags =3D 256, pg_color =3D 0, paging_in_progress =3D= 0,=20 > > resident_page_count =3D 44, backing_object =3D 0x0, backing_object_offs= et =3D=20 > > 0, pager_object_list =3D { > > tqe_next =3D 0x0, tqe_prev =3D 0x0}, cache =3D 0x0, handle =3D 0x0, = un_pager=20 > > =3D {vnp =3D {vnp_size =3D 576646}, devp =3D {devp_pglist =3D {tqh_firs= t =3D 0x8cc86, > > tqh_last =3D 0x0}}, swp =3D {swp_bcount =3D 576646}}} > > (kgdb) p (object->shadow_head) > > $2 =3D {lh_first =3D 0xffffff005d3afa90} > > (kgdb) p *object->shadow_head.lh_first > > $3 =3D {mtx =3D {lock_object =3D {lo_name =3D 0xffffffff804f21c4 "vm ob= ject",=20 > > lo_type =3D 0xffffffff804f3018 "standard object", lo_flags =3D 21168128= ,=20 > > lo_witness_data =3D { > > lod_list =3D {stqe_next =3D 0x0}, lod_witness =3D 0x0}}, mtx_loc= k =3D 4,=20 > > mtx_recurse =3D 0}, object_list =3D {tqe_next =3D 0xffffff0066c32340, > > tqe_prev =3D 0xffffff012f673ac0}, shadow_head =3D {lh_first =3D 0x0}= ,=20 > > shadow_list =3D {le_next =3D 0x0, le_prev =3D 0xffffff0053024ad0}, memq= =3D { > > tqh_first =3D 0xffffff007779f9a0, tqh_last =3D 0xffffff0077c04140}, = root=20 > > =3D 0xffffff0077c04130, size =3D 387, generation =3D 3, ref_count =3D 1= ,=20 > > shadow_count =3D 0, > > type =3D 0 '\0', flags =3D 8452, pg_color =3D 0, paging_in_progress = =3D 0,=20 > > resident_page_count =3D 2, backing_object =3D 0xffffff0053024a90,=20 > > backing_object_offset =3D 163840, > > pager_object_list =3D {tqe_next =3D 0x0, tqe_prev =3D 0x0}, cache =3D = 0x0,=20 > > handle =3D 0x0, un_pager =3D {vnp =3D {vnp_size =3D 365278}, devp =3D {= devp_pglist =3D=20 { > > tqh_first =3D 0x592de, tqh_last =3D 0x0}}, swp =3D {swp_bcount = =3D=20 365278}}} > >=20 > >=20 > > > > > >Another question is what scheduler do you use ? > > options SCHED_4BSD # 4BSD scheduler > > options PREEMPTION # Enable kernel thread preempti= on > The state of the both object being destroyed and the object that is shado= wed > looks right for me. Moreover, the shadowed object is not locked, value > of the mtx_lock is 4. It seems as if the thread missed the wakeup > owed by pause. >=20 > John, could it be that the following commit is supposed to fix the issue ? >=20 > r179974 | jhb | 2008-06-24 22:36:33 +0300 (Tue, 24 Jun 2008) | 3 lines >=20 > MFC: Change the roundrobin implementation in the 4BSD scheduler to trigge= r a > userland preemption directly from hardclock() via sched_clock() I don't think this would fix the issue. This patch fixed problems where yo= u=20 had a thread pinned to another CPU that held a lock (typically Giant) that = a=20 callout handler run from softclock needed. This prevented the 'roundrobin'= =20 callout from running which would force all the CPUs to do a context switch= =20 (normally this would have forced the pinned thread holding the lock to=20 eventually run). This involves threads on the run queue not getting to run= ,=20 even though they may have a higher priority than what is running now. I think this case is still a lingering bug in the sleep queue code since th= e=20 thread lock stuff went in. There have been several reports of it but I hav= e=20 been unable to figure out how the wakeup is being missed. > > >>>Also, show the output of ps axl <pid>. > > >> UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME=20 COMMAND > > >> 0 79759 79758 0 96 0 0 16 - DE+ p6 0:00,00=20 > > >>/bin/tcsh -fc=20 > >=20 >>/meow/ports/editors/openoffice.org-3/work/BEB300_m3/solver/300/unxfbsdx.p= ro/bin/ma > > > > > >It makes sense to show the whole ps axl output. > > See http://aldan.algebra.com/~mi/tmp/ps-axl.txt -- I edited it for=20 > > privacy a little bit, but process-states are intact. > > The java-processes in the linuxf have remained unkillable for weeks now= =20 > > -- I even forgot about them. But those are linuxulator problems, wherea= s=20 > > the tcsh is native... > It seems that pid 63930 is problematic too ? >=20 =2D-=20 John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200807231123.14229.jhb>