From owner-freebsd-fs@FreeBSD.ORG Sun Jan 9 12:54:04 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 021BB106566B; Sun, 9 Jan 2011 12:54:04 +0000 (UTC) (envelope-from rincebrain@gmail.com) Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id 771228FC13; Sun, 9 Jan 2011 12:54:03 +0000 (UTC) Received: by qyk36 with SMTP id 36so18107733qyk.13 for ; Sun, 09 Jan 2011 04:54:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=bVxi646sCt6eWAFASlMHwz0m2lkBuEw/d/PAATnLj7o=; b=BNx/gMGYssz3Szu7ekDko4hzKMfPT/hljMrWtZSLINvQ8V8L2cxly/A1QJplRzZwl1 /CUC0DxZpTdFfR9VeVY7Wo+hipUwi4SEF7eVCUEIBqGhmJkf/vjpRWedYvlyige2E7j9 CQBOkeNfFJLeA38cx2D265UepbOkWAZ+kF5v0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=pwwb3sZTvxuB/JZdTePbYESZMUNd89hRCd8wg+bQ30miy4nhglFW29wlD646gxQqQ8 5Q0uYgbl5S82br3OVV5tM8g2/10uz4qPzmHhyHu0+jSQFM8tPMQ+D68xj3Fo5qZ2DXXp Z2NxbpxXxPIrkhokauWFGqxIZ2SR2N3IO04Hk= MIME-Version: 1.0 Received: by 10.229.246.79 with SMTP id lx15mr24578675qcb.25.1294575755846; Sun, 09 Jan 2011 04:22:35 -0800 (PST) Received: by 10.229.230.5 with HTTP; Sun, 9 Jan 2011 04:22:35 -0800 (PST) In-Reply-To: <20110109121800.GA37231@icarus.home.lan> References: <4D0A09AF.3040005@FreeBSD.org> <4D297943.1040507@fsn.hu> <4D29A0C7.8050002@fsn.hu> <20110109121800.GA37231@icarus.home.lan> Date: Sun, 9 Jan 2011 07:22:35 -0500 Message-ID: From: Rich To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: New ZFSv28 patchset for 8-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jan 2011 12:54:04 -0000 Once upon a time, this was a known problem with the arcmsr driver not correctly interacting with ZFS, resulting in this behavior. Since I'm presuming that the arcmsr driver update which was intended to fix this behavior (in my case, at least) is in your nightly build, it's probably worth pinging the arcmsr driver maintainer about this. - Rich On Sun, Jan 9, 2011 at 7:18 AM, Jeremy Chadwick wrote: > On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote: >> =A0On 01/09/2011 10:00 AM, Attila Nagy wrote: >> > On 12/16/2010 01:44 PM, Martin Matuska wrote: >> >>Hi everyone, >> >> >> >>following the announcement of Pawel Jakub Dawidek (pjd@FreeBSD.org) I = am >> >>providing a ZFSv28 testing patch for 8-STABLE. >> >> >> >>Link to the patch: >> >> >> >>http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215= .patch.xz >> >> >> >> >> >I've got an IO hang with dedup enabled (not sure it's related, >> >I've started to rewrite all data on pool, which makes a heavy >> >load): >> > >> >The processes are in various states: >> >65747 =A0 1001 =A0 =A0 =A01 =A054 =A0 10 28620K 24360K tx->tx =A00 =A0 = 6:58 =A00.00% cvsup >> >80383 =A0 1001 =A0 =A0 =A01 =A054 =A0 10 40616K 30196K select =A01 =A0 = 5:38 =A00.00% rsync >> > 1501 www =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 =A07304K =A02504K zio->i =A0= 0 =A0 2:09 =A00.00% nginx >> > 1479 www =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 =A07304K =A02416K zio->i =A0= 1 =A0 2:03 =A00.00% nginx >> > 1477 www =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 =A07304K =A02664K zio->i =A0= 0 =A0 2:02 =A00.00% nginx >> > 1487 www =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 =A07304K =A02376K zio->i =A0= 0 =A0 1:40 =A00.00% nginx >> > 1490 www =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 =A07304K =A01852K zfs =A0 = =A0 0 =A0 1:30 =A00.00% nginx >> > 1486 www =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 =A07304K =A02400K zfsvfs =A0= 1 =A0 1:05 =A00.00% nginx >> > >> >And everything which wants to touch the pool is/becomes dead. >> > >> >Procstat says about one process: >> ># procstat -k 1497 >> > =A0PID =A0 =A0TID COMM =A0 =A0 =A0 =A0 =A0 =A0 TDNAME =A0 =A0 =A0 =A0 = =A0 KSTACK >> > 1497 100257 nginx =A0 =A0 =A0 =A0 =A0 =A0- =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0mi_switch >> >sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock >> >VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred >> >kern_openat syscallenter syscall Xfast_syscall >> No, it's not related. One of the disks in the RAIDZ2 pool went bad: >> (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0 >> (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error >> (da4:arcmsr0:0:4:0): SCSI status: Check Condition >> (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered >> read error) >> and it seems it froze the whole zpool. Removing the disk by hand >> solved the problem. >> I've seen this previously on other machines with ciss. >> I wonder why ZFS didn't throw it out of the pool. > > Hold on a minute. =A0An unrecoverable read error does not necessarily mea= n > the drive is bad, it could mean that the individual LBA that was > attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad > block was encountered). =A0I would check SMART stats on the disk (since > these are probably SATA given use of arcmsr(4)) and provide those. > *That* will tell you if the disk is bad. =A0I'll help you decode the > attributes values if you provide them. > > My understanding is that a single LBA read failure should not warrant > ZFS marking the disk UNAVAIL in the pool. =A0It should have incremented > the READ error counter and that's it. =A0Did you receive a *single* error > for the disk and then things went catatonic? > > If the entire system got wedged (a soft wedge, e.g. kernel is still > alive but nothing's happening in userland), that could be a different > problem -- either with ZFS or arcmsr(4). =A0Does ZFS have some sort of > timeout value internal to itself where it will literally mark a disk > UNAVAIL in the case that repeated I/O transactions takes "too long"? > What is its error recovery methodology? > > Speaking strictly about Solaris 10 and ZFS: I have seen many, many times > a system "soft wedge" after repeated I/O errors (read or write) are > spewed out on the console for a single SATA disk (via AHCI), but only > when the disk is used as a sole root filesystem disk (no mirror/raidz). > My impression is that ZFS isn't the problem in this scenario. =A0In most > cases, post-mortem debugging on my part shows that disks encountered > some CRC errors (indicating cabling issues, etc.), sometimes as few as > 2, but "something else" went crazy -- or possibly ZFS couldn't mark the > disk UNAVAIL (if it has that logic) because it's a single disk > associated with root. =A0Hardware in this scenario are Hitachi SATA disks > with an ICH ESB2 controller, software is Solaris 10 (Generic_142901-06) > with ZFS v15. > > -- > | Jeremy Chadwick =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 jdc@parodius.com | > | Parodius Networking =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 http://= www.parodius.com/ | > | UNIX Systems Administrator =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Mountain = View, CA, USA | > | Making life hard for others since 1977. =A0 =A0 =A0 =A0 =A0 =A0 =A0 PGP= 4BD6C0CB | > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >