From owner-freebsd-scsi@FreeBSD.ORG Tue Sep 9 21:28:18 2008 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 039D61065677 for ; Tue, 9 Sep 2008 21:28:18 +0000 (UTC) (envelope-from kolpanen@qads.kearfott.com) Received: from mailhost.kearfott.com (mailhost.kearfott.com [192.77.142.195]) by mx1.freebsd.org (Postfix) with ESMTP id A1B838FC12 for ; Tue, 9 Sep 2008 21:28:17 +0000 (UTC) (envelope-from kolpanen@qads.kearfott.com) Received: from qads.kearfott.com (qads.kearfott.com [172.18.1.1]) by mailhost.kearfott.com (8.13.1/8.13.1) with ESMTP id m89LFXGY030191 for ; Tue, 9 Sep 2008 17:15:33 -0400 Received: from qads.kearfott.com (localhost [127.0.0.1]) by qads.kearfott.com (8.13.1/8.13.1) with ESMTP id m89LFWVa014451 for ; Tue, 9 Sep 2008 17:15:32 -0400 (EDT) Received: (from kolpanen@localhost) by qads.kearfott.com (8.13.1/8.13.1/Submit) id m89LFW9S024655 for freebsd-scsi@freebsd.org; Tue, 9 Sep 2008 17:15:32 -0400 (EDT) Date: Tue, 9 Sep 2008 17:15:32 -0400 From: "Dennis R. Kolpanen" To: freebsd-scsi@freebsd.org Message-ID: <20080909211532.GA16009@qads.kearfott.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i Subject: iscsi initiator - system hang X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Sep 2008 21:28:18 -0000 On a FreeBSD 7.0 system, certain commands issued against any of the three mounted iscsi drives causes the system to "hang". In this context, hang means: the system continues to respond to pings sendmail stops accepting connections imapd stops accepting connections sshd stops accepting connections shell sessions already established stop accepting commands login at the system console is not possible The problem has been caused, so far, by dump, restore, and pax. These commands work perfectly if they are directed against one of the internal drives and not the iscsi drives. The failures noted above do not normally happen immediately after issuing one of these commands. The problems seem to build over a period of minutes or tens of minutes. Note that the dump/restore/pax commands can take hours to run. Nothing is written to the system console or any of the log files indicating a problem. The only way to recover from the hang is by means of the hardware reset button. When the system was first being set up and no other users were on it, shutting down all but one of the CPUs by means of sysctl and "machdep.hlt_cpus" allowed restoring about 150gb to the three iscsi drives. Once the machine was placed into production, massive hardware problems on an old server required this to be done immediately, this trick no longer works. An overview of the hardware involved: dual, quad-core Intel Xeon processors 16 gb RAM FreeBSD 7.0 amd64 release NetworkAppliance FAS2020 SAN generic kernel A complete dmesg output can be provided if desired. By default, iscontrol creates the iscsi drives with the number of tags set to one. The performance of the iscsi drives with this default setting was quite poor. Based on a recommendation made on a mailing list some time ago, /etc/iscsi.conf was changed to set the tags to 128. This had a dramatic improvement on the iscsi performance. Testing on the system that was rushed into production is not really possible. However, within the next week or so, a nearly identical system should become available and this one could be used for testing. Any ideas on what could be wrong? Any solutions? Thanks for your help. Dennis R. Kolpanen