Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Mar 1998 08:37:16 +1100 (EST)
From:      Peter Jeremy <Peter.Jeremy@alcatel.com.au>
To:        freebsd-hackers@FreeBSD.ORG
Subject:   IDE+LPIP causing random lockups
Message-ID:  <199803152137.IAA04231@gsms01.alcatel.com.au>

next in thread | raw e-mail | index | archive | help
I have been having occasional lockups on my main system for some time
now (it occurs with all 2.2.x releases, I can't recall if 2.1.x was
affected).  I am wondering if anyone else has seen something similar.

Configuration:
main machine: VLB-based 486DX2-50 with 2 IDE disks (32-bit, multiblock 16)
	on 1 (VLB) controller, running 2.2.5-R.
2nd machine: Toshiba T1850 (386SX25 with 1 IDE disk) running 2.2.1-R

The machines are joined by a laplink cable and use LPIP.

The symptoms are that my main machine (only) locks up and needs a
reset to recover.  The laptop has never been affected.  The problem
only occurs when there is LPIP activity between the machines and seems
to also correlate with disk activity on the main machine, and using
ssh to transfer data.  Running XFree86 also seems to make it worse
(but this might just be the increased disk activity associated with
running X11 in less than infinite RAM).  Note that between talking to
IDE disks and communicating with a slow host via LPIP, the machine
spends a lot of time inside interrupts - often over 30% according to
top.

Having (finally) gotten around to loading a kernel with DDB, I find
that the lockup appears to be caused by sbcompress() attempting to
add an mbuf to itself - sbappend() is called to append an mbuf to
sockbuf that already includes that mbuf in the mbuf list.  When this
is passed to sbcompress(), it winds up in an infinite loop.

My suspicion is that something is continuing to use an mbuf after it
frees it, probably associated with some sort of interrupt window.
I've added some checks in the mbuf code to try and pick this up, but
haven't found the problem yet.  I did notice that the splXXX()
routines don't atomically update cpl, but the associated comments (in
i386/include/spl.h) say this is OK.

I have (less frequently) seen kernel page faults with addresses like
0xf400xxxx, 0x79xxxxxx.  I haven't looked into these yet.  I'm hoping
they are caused by the mbuf's being used for two things at once.

Does anyone have any ideas?

Peter
--
Peter Jeremy (VK2PJ)                    peter.jeremy@alcatel.com.au
Alcatel Australia Limited
41 Mandible St                          Phone: +61 2 9690 5019
ALEXANDRIA  NSW  2015                   Fax:   +61 2 9690 5247

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199803152137.IAA04231>