From owner-freebsd-hackers Tue Aug 12 18:11:00 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id SAA06097 for hackers-outgoing; Tue, 12 Aug 1997 18:11:00 -0700 (PDT) Received: from freefall.freebsd.org (freefall.FreeBSD.ORG [204.216.27.21]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id SAA06092 for ; Tue, 12 Aug 1997 18:10:56 -0700 (PDT) From: Julian Elischer Received: (from julian@localhost) by freefall.freebsd.org (8.8.6/8.8.5) id SAA21060 for hackers; Tue, 12 Aug 1997 18:10:50 -0700 (PDT) Date: Tue, 12 Aug 1997 18:10:50 -0700 (PDT) Message-Id: <199708130110.SAA21060@freefall.freebsd.org> To: hackers@FreeBSD.ORG Subject: 2.2.2+ crash.. more info Sender: owner-freebsd-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk We have several hundred Bsd machines here.. we see this one enough for me to recognise it.. the plot thickens.. I have discovered the following: 1/ the code that crashes: scanning the queues in swithc: looking a the queue array: }, { ph_link = 0xf01be350, ph_rlink = 0xf01be350 }, { ph_link = 0x0, <----- a few instructions before, this ALSO was 0xf07e1000 ph_rlink = 0xf07e1000 }, { ph_link = 0xf01be360, ph_rlink = 0xf01be360 }, { ph_link = 0xf01be368, ph_rlink = 0xf01be368 }, one entry is bogus from the registers we see it was 0xf07e1000 shortly before. looking at the registers we can see what proc struct is being looked at.. $11 = { p_procq = { tqe_next = 0x0, tqe_prev = 0xf01be7e8 }, p_list = { le_next = 0xf07e1200, le_prev = 0xf07d2808 }, ... this is where the NULL came from but wait! this looks like an entry in a sleep queue.. sure enough! in the array of sleep queues... }, { tqh_first = 0x0, tqh_last = 0xf01be7d8 }, { tqh_first = 0x0, tqh_last = 0xf01be7e0 }, { tqh_first = 0xf07e1000, <--------- !!!!! tqh_last = 0xf07e1000 }, { tqh_first = 0x0, tqh_last = 0xf01be7f0 }, { tqh_first = 0x0, so why was this sleeping? looking in the proc struct again.. p_wchan = 0xf272f698, p_wmesg = 0xf015bead "swread", Since the processes proc structure looks liek a sleeping process, it was probably put onto the sleep queue last, when it was already on the runnable queue. how can this happen? some ideas: it was half way through being woken up when the scheduling occured? and still looked like a sleeping process? unlikely.. it was put onto the sleep queue accidentally by interrupt code that just had it's proc address by accident? unlikely. somehow a wakeup occured during the tsleep call? sounds unlikely.. code examinations will follow with more info.. if this strikes anyone as familiar, do chime in! julian