From owner-freebsd-scsi@FreeBSD.ORG Thu Mar 23 09:14:29 2006 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3DA4216A423; Thu, 23 Mar 2006 09:14:29 +0000 (UTC) (envelope-from os@rsu.ru) Received: from mail.r61.net (mail.r61.net [195.208.245.235]) by mx1.FreeBSD.org (Postfix) with ESMTP id E52F543D45; Thu, 23 Mar 2006 09:14:23 +0000 (GMT) (envelope-from os@rsu.ru) Received: from brain.cc.rsu.ru (brain.cc.rsu.ru [195.208.252.154]) (authenticated bits=0) by mail.r61.net (8.13.4/8.13.4) with ESMTP id k2N9EKBV064176 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 23 Mar 2006 12:14:20 +0300 (MSK) (envelope-from os@rsu.ru) Date: Thu, 23 Mar 2006 12:14:19 +0300 (MSK) From: Oleg Sharoiko To: John Baldwin In-Reply-To: <200603131056.09271.jhb@freebsd.org> Message-ID: <20060323092034.W795@brain.cc.rsu.ru> References: <20060215102749.D58480@brain.cc.rsu.ru> <200603091113.38474.jhb@freebsd.org> <20060310173625.X3787@brain.cc.rsu.ru> <200603131056.09271.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV version 0.86.2, clamav-milter version 0.86 on asterix.r61.net X-Virus-Status: Clean Cc: freebsd-scsi@freebsd.org, Andrey Beresovsky Subject: Re: Boot hangs on ips0: resetting adapter, this may take up to 5 minutes X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Mar 2006 09:14:29 -0000 Hi! On Mon, 13 Mar 2006, John Baldwin wrote: JB>> To make GENERIC usable it's enough to comment JB>> options PREEMPTION JB>> Not sure if this helps much. JB>It could point to a bug in a driver. All this time I was doing experiments, but the more I did the less I understood. Now I'd say that I suppose the problem is not with a particular device, but rather with a number of devices installed in the system. The things are different depending on hardware setup and kernel configuration. Just a few examples: The only configuration which I've never seen failing was with no pci cards installed and several devices disabled in BIOS (mouse, floppy, ata, serial ata). This way the system boots fine with GENERIC kernel. As soon as I install additional scsi card (adaptec 29160) SCB timeouts start happening on internal scsi adapter during "Waiting 5 seconds for SCSI devices to settle". The system would still boot after "ahd0: Recovery Initiated - Card was not paused". If I remove bge driver from kernel (keeping additional scsi in system) this timeouts go away. The GENERIC kernel on the system with no pci cards and all devices enabled in BIOS sometimes boots and sometimes hangs with last line "lo0: bpf attached". The same happens with kernel without bge with the exception that for this one chances that it would boot are higher. When ips pci card is installed the GENERIC kernel would definitely hang at boot. Kernel without bge would boot almost for sure. On SMP kernel I was even able to kldload bge when boot have been completed. The same action on UP system produces rather strange results. If I boot to singleuser mode and load if_bge than the system returns to command prompt and I can edit command line and everything looks normal. But as soon as I try to execute something (I suppose disk io is a point here, but I'm not sure) the system becomes extremely slow. It takes about 30 seconds to print a single character on console. The same happens if I load if_bge in multiuser mode. One thing is common to all cases: when system hangs (or becomes slow) Ctrl+Alt+Esc wouldn't work, but sending break on com port still would and it's possible to get into kernel debugger. Unfortunately this doesn't help me. To be true I don't think I can cope with this on my own. I setup remote gdb for this box but it gives nothing to me, due to lack of knowledge on how interrupt delivery works and how interrupt handling is done in FreeBSD. Would it be possible for you, John, or maybe for someone else to look at this box. I can provide full remote access to it with remote gdb, serial console and ip kvm. And another thing, just to remember, is that disabling preemption makes things normal. All tests were done with sources checked out with -r HEAD -D '2006-03-10 15:34:00 UTC'. I have also tested GENERIC built from fresh src - it has same problems. This issue is not specific to scsi problems. I think it would be nice to change mailing list to the more appropriate one. This happens on amd64, and not on i386. Should this conversation be moved to freebsd-amd64? Or maybe another list? -- Oleg Sharoiko. Software and Network Engineer Computer Center of Rostov State University.