From owner-freebsd-current@FreeBSD.ORG Fri Sep 25 09:26:00 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F3801065670; Fri, 25 Sep 2009 09:26:00 +0000 (UTC) (envelope-from ken@mthelicon.com) Received: from hercules.mthelicon.com (hercules.mthelicon.com [IPv6:2001:49f0:2023::2]) by mx1.freebsd.org (Postfix) with ESMTP id E18A78FC16; Fri, 25 Sep 2009 09:25:59 +0000 (UTC) Received: from PegaPegII (hydra.fletchermoorland.co.uk [78.33.209.59]) (authenticated bits=0) by hercules.mthelicon.com (8.14.3/8.14.3) with ESMTP id n8P9PqfS054249; Fri, 25 Sep 2009 09:25:53 GMT (envelope-from ken@mthelicon.com) Message-ID: <659421F2E6F84E479FDEE3C9AB2E975F@PegaPegII> From: "Pegasus Mc Cleaft" To: "James R. Van Artsdalen" References: <200909212027.49752.ken@mthelicon.com> <4ABC7D4F.3000202@jrv.org> In-Reply-To: <4ABC7D4F.3000202@jrv.org> Date: Fri, 25 Sep 2009 10:25:51 +0100 Organization: Feathers MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18005 X-MimeOLE: Produced By Microsoft MimeOLE V6.0.6002.18005 X-Antivirus: avast! (VPS 090924-0, 24/09/2009), Outbound message X-Antivirus-Status: Clean X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,STOX_REPLY_TYPE autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on hercules.mthelicon.com Cc: Alexander Motin , FreeBSD Current Subject: Re: SIIS timeout with current r197392: X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Pegasus Mc Cleaft List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Sep 2009 09:26:00 -0000 ----- Original Message ----- From: "James R. Van Artsdalen" > Pegasus Mc Cleaft wrote: >> Hello Current, >> >> Since my latest build of amd64-current kernel and world (r197392) I am >> getting strange timeout errors in my dmesg and eventual system >> instability. > > I believe mav has stated that error handling in SIIS isn't finished or > is problematic in some way. > > I see similar problems: most of the time an error results in a hung > device, requiring reboot. This usually happens within a TB or two or > intense I/O: I have not yet seen a 6 TB ZFS pool complete a "scrub" due > to this. Hi James, I believe I found the problem with my machine, and it _was_ my machine. The device that was hanging is a Asus CD-ROM drive. The error messages displayed were correct, I had a faulty SATA cable between the controller and the drive (Funny how a SATA cable can go bad spontaniously). Re-boots of the system did not clear the fault, but a full power down and power up would mask the fault for about an hour and then it would start throwing the messages into the log every few seconds. It was this behaviour that lead me to believe it was a problem with the SIIS driver. It wasent until I noticed on a reboot the system hung for a little while while interrogating the drive during POST. After a cable change and a lot of swearing, the computer booted fine and the error has never reappeared. Some lessons learned: 1) Debug messages _MAY_ actually be telling the truth! :> 2) Reboots and software resets wont be heard from a SATA device whos port has been scrambled by bad cabling 3) SATA cables may spontaniously decentergrate. 4) Modern computers respond less to threats than my older machines :> This being said, I have seen the other fault where a device hangs during high load / activity. Mine will, if it is going to do it, hang somewhere around midnight to 3am when I am running maintance on the maching (find / -name "*.core" -exec rm {} /; ). It does exactly as you said where a drive hangs, usually with the activity LED still lit. Sometimes the machine will continue on and ZFS will carryon in a degraded state. The odd thing about this is, it only started to so this when I was having problems with the CD ROM on the SIIS card. The ZFS drives are on a completely different controller (JMicron). When the SIIS controler was waiting for the scrambled port to say hello, all sorts of weird things would happen. I would get lock-up of the mouse for 2 seconds, keyboard would lock and if a key was pressed when it happened, it would trigure off the key-repeat (much to my amusement while reading email, hitting the delete key, have the keyboard hang and watch it quickly run through deleting everything in my in-box :> ). My query is, when it is hung, waiting for the SATA port to respond, could it be possible to have the JMicron ports miss an event, or get a double IRQ and cause the device to lock? Best whishes, Peg