Date: Sun, 09 Jan 2011 12:49:27 +0100 From: Attila Nagy <bra@fsn.hu> To: Martin Matuska <mm@FreeBSD.org> Cc: freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: New ZFSv28 patchset for 8-STABLE Message-ID: <4D29A0C7.8050002@fsn.hu> In-Reply-To: <4D297943.1040507@fsn.hu> References: <4D0A09AF.3040005@FreeBSD.org> <4D297943.1040507@fsn.hu>
next in thread | previous in thread | raw e-mail | index | archive | help
On 01/09/2011 10:00 AM, Attila Nagy wrote: > On 12/16/2010 01:44 PM, Martin Matuska wrote: >> Hi everyone, >> >> following the announcement of Pawel Jakub Dawidek (pjd@FreeBSD.org) I am >> providing a ZFSv28 testing patch for 8-STABLE. >> >> Link to the patch: >> >> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz >> >> > I've got an IO hang with dedup enabled (not sure it's related, I've > started to rewrite all data on pool, which makes a heavy load): > > The processes are in various states: > 65747 1001 1 54 10 28620K 24360K tx->tx 0 6:58 0.00% cvsup > 80383 1001 1 54 10 40616K 30196K select 1 5:38 0.00% rsync > 1501 www 1 44 0 7304K 2504K zio->i 0 2:09 0.00% nginx > 1479 www 1 44 0 7304K 2416K zio->i 1 2:03 0.00% nginx > 1477 www 1 44 0 7304K 2664K zio->i 0 2:02 0.00% nginx > 1487 www 1 44 0 7304K 2376K zio->i 0 1:40 0.00% nginx > 1490 www 1 44 0 7304K 1852K zfs 0 1:30 0.00% nginx > 1486 www 1 44 0 7304K 2400K zfsvfs 1 1:05 0.00% nginx > > And everything which wants to touch the pool is/becomes dead. > > Procstat says about one process: > # procstat -k 1497 > PID TID COMM TDNAME KSTACK > 1497 100257 nginx - mi_switch sleepq_wait > __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV > _vn_lock nullfs_root lookup namei vn_open_cred kern_openat > syscallenter syscall Xfast_syscall No, it's not related. One of the disks in the RAIDZ2 pool went bad: (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0 (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error (da4:arcmsr0:0:4:0): SCSI status: Check Condition (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) and it seems it froze the whole zpool. Removing the disk by hand solved the problem. I've seen this previously on other machines with ciss. I wonder why ZFS didn't throw it out of the pool.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D29A0C7.8050002>