From owner-freebsd-hackers Sun Mar 15 13:38:37 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id NAA04848 for freebsd-hackers-outgoing; Sun, 15 Mar 1998 13:38:37 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from gatekeeper.alcatel.com.au (gatekeeper.alcatel.com.au [203.17.66.1]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id NAA04654 for ; Sun, 15 Mar 1998 13:38:05 -0800 (PST) (envelope-from Peter.Jeremy@alcatel.com.au) Received: from mfg1.cim.alcatel.com.au ([139.188.23.1]) by gatekeeper.alcatel.com.au (PMDF V5.1-7 #U2695) with ESMTP id <01IUQ7S1QXB4002WIO@gatekeeper.alcatel.com.au> for freebsd-hackers@FreeBSD.ORG; Mon, 16 Mar 1998 08:37:27 +1000 Received: from cbd.alcatel.com.au by cim.alcatel.com.au (PMDF V5.1-7 #9239) with ESMTP id <01IUQ7RYVVTCA8DHRD@cim.alcatel.com.au> for freebsd-hackers@FreeBSD.ORG; Mon, 16 Mar 1998 08:37:24 +1000 Received: from gsms01.alcatel.com.au by cbd.alcatel.com.au (PMDF V5.1-7 #U2695) with ESMTP id <01IUQ7RV5NWGAZTK1J@cbd.alcatel.com.au> for freebsd-hackers@FreeBSD.ORG; Mon, 16 Mar 1998 08:37:18 +1100 Received: (from jeremyp@localhost) by gsms01.alcatel.com.au (8.8.8/8.7.3) id IAA04231 for freebsd-hackers@FreeBSD.ORG; Mon, 16 Mar 1998 08:37:16 +1100 (EST) Date: Mon, 16 Mar 1998 08:37:16 +1100 (EST) From: Peter Jeremy Subject: IDE+LPIP causing random lockups To: freebsd-hackers@FreeBSD.ORG Message-id: <199803152137.IAA04231@gsms01.alcatel.com.au> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I have been having occasional lockups on my main system for some time now (it occurs with all 2.2.x releases, I can't recall if 2.1.x was affected). I am wondering if anyone else has seen something similar. Configuration: main machine: VLB-based 486DX2-50 with 2 IDE disks (32-bit, multiblock 16) on 1 (VLB) controller, running 2.2.5-R. 2nd machine: Toshiba T1850 (386SX25 with 1 IDE disk) running 2.2.1-R The machines are joined by a laplink cable and use LPIP. The symptoms are that my main machine (only) locks up and needs a reset to recover. The laptop has never been affected. The problem only occurs when there is LPIP activity between the machines and seems to also correlate with disk activity on the main machine, and using ssh to transfer data. Running XFree86 also seems to make it worse (but this might just be the increased disk activity associated with running X11 in less than infinite RAM). Note that between talking to IDE disks and communicating with a slow host via LPIP, the machine spends a lot of time inside interrupts - often over 30% according to top. Having (finally) gotten around to loading a kernel with DDB, I find that the lockup appears to be caused by sbcompress() attempting to add an mbuf to itself - sbappend() is called to append an mbuf to sockbuf that already includes that mbuf in the mbuf list. When this is passed to sbcompress(), it winds up in an infinite loop. My suspicion is that something is continuing to use an mbuf after it frees it, probably associated with some sort of interrupt window. I've added some checks in the mbuf code to try and pick this up, but haven't found the problem yet. I did notice that the splXXX() routines don't atomically update cpl, but the associated comments (in i386/include/spl.h) say this is OK. I have (less frequently) seen kernel page faults with addresses like 0xf400xxxx, 0x79xxxxxx. I haven't looked into these yet. I'm hoping they are caused by the mbuf's being used for two things at once. Does anyone have any ideas? Peter -- Peter Jeremy (VK2PJ) peter.jeremy@alcatel.com.au Alcatel Australia Limited 41 Mandible St Phone: +61 2 9690 5019 ALEXANDRIA NSW 2015 Fax: +61 2 9690 5247 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message