From owner-freebsd-fs@FreeBSD.ORG Sun Jun 17 22:37:40 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EEB84106564A; Sun, 17 Jun 2012 22:37:39 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C1CA68FC20; Sun, 17 Jun 2012 22:37:39 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q5HMbdBJ068598; Sun, 17 Jun 2012 22:37:39 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q5HMbdKp068594; Sun, 17 Jun 2012 22:37:39 GMT (envelope-from linimon) Date: Sun, 17 Jun 2012 22:37:39 GMT Message-Id: <201206172237.q5HMbdKp068594@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/168947: [nfs] [zfs] .zfs/snapshot directory is messed up when viewed by a Linux client, and ls -l can hang it X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Jun 2012 22:37:40 -0000 Synopsis: [nfs] [zfs] .zfs/snapshot directory is messed up when viewed by a Linux client, and ls -l can hang it Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sun Jun 17 22:37:26 UTC 2012 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=168947 From owner-freebsd-fs@FreeBSD.ORG Sun Jun 17 22:39:58 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C9DA7106566C; Sun, 17 Jun 2012 22:39:58 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 9ABF18FC08; Sun, 17 Jun 2012 22:39:58 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q5HMdwHK068833; Sun, 17 Jun 2012 22:39:58 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q5HMdwdw068829; Sun, 17 Jun 2012 22:39:58 GMT (envelope-from linimon) Date: Sun, 17 Jun 2012 22:39:58 GMT Message-Id: <201206172239.q5HMdwdw068829@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-amd64@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/168942: [nfs] [hang] nfsd hangs after being restarted (not -HUP) and then an export being mounted X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Jun 2012 22:39:58 -0000 Synopsis: [nfs] [hang] nfsd hangs after being restarted (not -HUP) and then an export being mounted Responsible-Changed-From-To: freebsd-amd64->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sun Jun 17 22:39:40 UTC 2012 Responsible-Changed-Why: reclassify. http://www.freebsd.org/cgi/query-pr.cgi?pr=168942 From owner-freebsd-fs@FreeBSD.ORG Mon Jun 18 11:07:46 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0725106566B for ; Mon, 18 Jun 2012 11:07:46 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id A88188FC19 for ; Mon, 18 Jun 2012 11:07:46 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q5IB7kOC007966 for ; Mon, 18 Jun 2012 11:07:46 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q5IB7jeM007961 for freebsd-fs@FreeBSD.org; Mon, 18 Jun 2012 11:07:45 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 18 Jun 2012 11:07:45 GMT Message-Id: <201206181107.q5IB7jeM007961@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Jun 2012 11:07:46 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/168947 fs [nfs] [zfs] .zfs/snapshot directory is messed up when o kern/168942 fs [nfs] [hang] nfsd hangs after being restarted (not -HU o kern/168158 fs [zfs] incorrect parsing of sharenfs options in zfs (fs o kern/167979 fs [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste o kern/167977 fs [smbfs] mount_smbfs results are differ when utf-8 or U o kern/167688 fs [fusefs] Incorrect signal handling with direct_io o kern/167685 fs [zfs] ZFS on USB drive prevents shutdown / reboot o kern/167612 fs [portalfs] The portal file system gets stuck inside po o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167066 fs [zfs] ZVOLs not appearing in /dev/zvol o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166566 fs [zfs] zfs split renders 2 disk (MBR based) mirror unbo o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165923 fs [nfs] Writing to NFS-backed mmapped files fails if flu o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/162362 fs [snapshots] [panic] ufs with snapshot(s) panics when g o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161897 fs [zfs] [patch] zfs partition probing causing long delay o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic o kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 277 problems total. From owner-freebsd-fs@FreeBSD.ORG Wed Jun 20 17:20:29 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 96ED21065670 for ; Wed, 20 Jun 2012 17:20:29 +0000 (UTC) (envelope-from rondzierwa@comcast.net) Received: from qmta04.westchester.pa.mail.comcast.net (qmta04.westchester.pa.mail.comcast.net [76.96.62.40]) by mx1.freebsd.org (Postfix) with ESMTP id 540C78FC1A for ; Wed, 20 Jun 2012 17:20:29 +0000 (UTC) Received: from omta08.westchester.pa.mail.comcast.net ([76.96.62.12]) by qmta04.westchester.pa.mail.comcast.net with comcast id QgXC1j0040Fqzac54hLPlT; Wed, 20 Jun 2012 17:20:23 +0000 Received: from sz0192.wc.mail.comcast.net ([76.96.59.160]) by omta08.westchester.pa.mail.comcast.net with comcast id QhLP1j00Z3TRaxG3UhLPzT; Wed, 20 Jun 2012 17:20:23 +0000 Date: Wed, 20 Jun 2012 17:20:23 +0000 (UTC) From: rondzierwa@comcast.net To: freebsd-fs@freebsd.org Message-ID: <1610905794.19241.1340212823047.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> In-Reply-To: <20120618120031.84A4D1065749@hub.freebsd.org> MIME-Version: 1.0 X-Originating-IP: [68.50.136.212] X-Mailer: Zimbra 6.0.13_GA_2944 (ZimbraWebClient - FF3.0 (Win)/6.0.13_GA_2944) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jun 2012 17:20:29 -0000 Greetings, I have a zfs filesystem on an 8.2-release amd64 system. hardware is amd phenom 964 with 8gb memory, 3ware 9650 controller with 8x seagate ST2000DL003 drives. the disks are configured in a raid-5, and present one device to the system. Early today I got some checksum and i/o errors on the console: Jun 20 07:33:43 phoenix root: ZFS: checksum mismatch, zpool=zfsPool path=/dev/da0 offset=7698387574272 size=9728 Jun 20 07:33:43 phoenix root: ZFS: checksum mismatch, zpool=zfsPool path=/dev/da0 offset=7698387564544 size=9728 Jun 20 07:33:43 phoenix root: ZFS: zpool I/O failure, zpool=zfsPool error=86 Jun 20 07:33:43 phoenix root: ZFS: checksum mismatch, zpool=zfsPool path=/dev/da0 offset=7698387574272 size=9728 Jun 20 07:33:43 phoenix root: ZFS: zpool I/O failure, zpool=zfsPool error=86 So I ran a scrub, after a couple of hours i got a pile of checksum errors that looked rather similar: Jun 20 12:45:24 phoenix root: ZFS: checksum mismatch, zpool=zfsPool path=/dev/da0 offset=560450768384 size=4096 zpool status indicates that a file has errors, but doesn't tell me its name: phoenix# zpool status -v zfsPool pool: zfsPool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go config: NAME STATE READ WRITE CKSUM zfsPool ONLINE 0 0 38 da0 ONLINE 0 0 434 1.06M repaired errors: Permanent errors have been detected in the following files: zfsPool/raid:<0x9e241> phoenix# How can I locate and get rid of the offending file? thanks, ron. From owner-freebsd-fs@FreeBSD.ORG Wed Jun 20 17:58:33 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8FDA8106564A for ; Wed, 20 Jun 2012 17:58:33 +0000 (UTC) (envelope-from prvs=1518fb2736=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 217948FC15 for ; Wed, 20 Jun 2012 17:58:32 +0000 (UTC) X-Spam-Processed: mail1.multiplay.co.uk, Wed, 20 Jun 2012 18:57:35 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50020392757.msg for ; Wed, 20 Jun 2012 18:57:35 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1518fb2736=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: , References: <1610905794.19241.1340212823047.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> Date: Wed, 20 Jun 2012 18:58:20 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jun 2012 17:58:33 -0000 ----- Original Message ----- From: .. > zpool status indicates that a file has errors, but doesn't tell me its name: > > phoenix# zpool status -v zfsPool > pool: zfsPool > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go Try waiting for the scrub to complete and see if its more helpful after that. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jun 20 20:55:03 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 50593106567E for ; Wed, 20 Jun 2012 20:55:03 +0000 (UTC) (envelope-from rondzierwa@comcast.net) Received: from qmta09.westchester.pa.mail.comcast.net (qmta09.westchester.pa.mail.comcast.net [76.96.62.96]) by mx1.freebsd.org (Postfix) with ESMTP id E554B8FC0A for ; Wed, 20 Jun 2012 20:55:02 +0000 (UTC) Received: from omta24.westchester.pa.mail.comcast.net ([76.96.62.76]) by qmta09.westchester.pa.mail.comcast.net with comcast id QgfN1j0041ei1Bg59kv28j; Wed, 20 Jun 2012 20:55:02 +0000 Received: from sz0192.wc.mail.comcast.net ([76.96.59.160]) by omta24.westchester.pa.mail.comcast.net with comcast id Qkv31j00M3TRaxG3kkv3Ak; Wed, 20 Jun 2012 20:55:03 +0000 Date: Wed, 20 Jun 2012 20:55:01 +0000 (UTC) From: rondzierwa@comcast.net To: Steven Hartland Message-ID: <566221263.21373.1340225701396.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> In-Reply-To: MIME-Version: 1.0 X-Originating-IP: [68.50.136.212] X-Mailer: Zimbra 6.0.13_GA_2944 (ZimbraWebClient - FF3.0 (Win)/6.0.13_GA_2944) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jun 2012 20:55:03 -0000 Steve. well, it got done, and it found another anonymous file with errors . any idea how to get rid of these? thanks, ron. phoenix# zpool status -v zfsPool pool: zfsPool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 8h29m with 6276 errors on Wed Jun 20 16:18:01 2012 config: NAME STATE READ WRITE CKSUM zfsPool ONLINE 0 0 6.17K da0 ONLINE 0 0 13.0K 1.34M repaired errors: Permanent errors have been detected in the following files: zfsPool/raid:<0x9e241> zfsPool/Build:<0x0> phoenix# ----- Original Message ----- From: "Steven Hartland" To: rondzierwa@comcast.net, freebsd-fs@freebsd.org Sent: Wednesday, June 20, 2012 1:58:20 PM Subject: Re: ZFS Checksum errors ----- Original Message ----- From: .. > zpool status indicates that a file has errors, but doesn't tell me its name: > > phoenix# zpool status -v zfsPool > pool: zfsPool > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go Try waiting for the scrub to complete and see if its more helpful after that. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jun 20 21:16:45 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 42B25106564A for ; Wed, 20 Jun 2012 21:16:45 +0000 (UTC) (envelope-from prvs=1518fb2736=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id AFEF18FC15 for ; Wed, 20 Jun 2012 21:16:44 +0000 (UTC) X-Spam-Processed: mail1.multiplay.co.uk, Wed, 20 Jun 2012 22:16:24 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50020395856.msg for ; Wed, 20 Jun 2012 22:16:23 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1518fb2736=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: References: <566221263.21373.1340225701396.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> Date: Wed, 20 Jun 2012 22:17:07 +0100 MIME-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jun 2012 21:16:45 -0000 Sorry not seen that before I'm afraid. Maybe you can use zdb to get more info but sounds like something is causing nasty issues. I'd check your machine for bad hardware issues such as bad ram, cpu or cabling issues. Regards Steve ----- Original Message -----=20 From: rondzierwa@comcast.net=20 To: Steven Hartland=20 Cc: freebsd-fs@freebsd.org=20 Sent: Wednesday, June 20, 2012 9:55 PM Subject: Re: ZFS Checksum errors Steve.=20 well, it got done, and it found another anonymous file with errors. any idea how to get rid of these? thanks,=20 ron. phoenix# zpool status -v zfsPool pool: zfsPool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 8h29m with 6276 errors on Wed Jun 20 16:18:01 2012 config: NAME STATE READ WRITE CKSUM zfsPool ONLINE 0 0 6.17K da0 ONLINE 0 0 13.0K 1.34M repaired errors: Permanent errors have been detected in the following files: zfsPool/raid:<0x9e241> zfsPool/Build:<0x0> phoenix# ------------------------------------------------------------------------------ From: "Steven Hartland" To: rondzierwa@comcast.net, freebsd-fs@freebsd.org Sent: Wednesday, June 20, 2012 1:58:20 PM Subject: Re: ZFS Checksum errors ----- Original Message -----=20 From: .. > zpool status indicates that a file has errors, but doesn't tell me its name:=20 >=20 > phoenix# zpool status -v zfsPool=20 > pool: zfsPool=20 > state: ONLINE=20 > status: One or more devices has experienced an error resulting in data=20 > corruption. Applications may be affected.=20 > action: Restore the file in question if possible. Otherwise restore the=20 > entire pool from backup.=20 > see: http://www.sun.com/msg/ZFS-8000-8A=20 > scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go=20 Try waiting for the scrub to complete and see if its more helpful after that. Regards Steve =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.=20 In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.=20 In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jun 20 22:44:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B13FE106566B for ; Wed, 20 Jun 2012 22:44:17 +0000 (UTC) (envelope-from rincebrain@gmail.com) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id 823A48FC0C for ; Wed, 20 Jun 2012 22:44:17 +0000 (UTC) Received: by dadv36 with SMTP id v36so11357740dad.13 for ; Wed, 20 Jun 2012 15:44:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=zAn8/GGEP+Fm1/v8H2Q+DeFwPSsTsWNvoT4ij62IMdg=; b=AInlsiSJa0/FR77vDcxAjDnLqWo3xpae05Ibp7fGC4iYHJm08GrOzomP2CnQE8qny9 p6Ohbn/YSPkj6rgtaI1R2uxeUJOrzcaomHIE6wR51HuXuYsAegjJ2gV9bHvtjYBSw3CP qikmkiCG58d4tBS9sffagG4poAeVcSCbzflgHeecJYBJhtq5GF2zldC2bFJTJdYaBEnA ZxeddkAtlP1LIXgeJmUw/rYQRH0DpF/+//FSjEyj5jHBZfu2Q99QrXhLtwlWt1M4imXy VZhJKv10HfWk4hiR629MwzSc0CPM6tqAjtO/LO8tvSP8bGerLlOLtYCfqZwNfmVBlghV XYuA== MIME-Version: 1.0 Received: by 10.68.222.38 with SMTP id qj6mr81333030pbc.6.1340232255027; Wed, 20 Jun 2012 15:44:15 -0700 (PDT) Sender: rincebrain@gmail.com Received: by 10.68.38.10 with HTTP; Wed, 20 Jun 2012 15:44:15 -0700 (PDT) In-Reply-To: References: <566221263.21373.1340225701396.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> Date: Wed, 20 Jun 2012 18:44:15 -0400 X-Google-Sender-Auth: jaEawlKC37L7nW--eLctfqS7Rzk Message-ID: From: Rich To: Steven Hartland Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, rondzierwa@comcast.net Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jun 2012 22:44:17 -0000 I can't speak for every case, but in my experience, that's what happens if the corrupted data points to something that no longer exists because it was deleted or similar - as an example I don't recommend trying, you could figure out which on-disk blocks map to a test file, scramble some of them beyond repair on enough disks to be unrecoverable, do a scrub, and it'll report that file as lost - delete the file, the reference will change to a pointer with no name. - Rich On Wed, Jun 20, 2012 at 5:17 PM, Steven Hartland wrote: > Sorry not seen that before I'm afraid. Maybe you can use zdb to get more = info but sounds like something is causing nasty issues. > > I'd check your machine for bad hardware issues such as bad ram, cpu or ca= bling issues. > > =C2=A0 =C2=A0Regards > =C2=A0 =C2=A0Steve > =C2=A0----- Original Message ----- > =C2=A0From: rondzierwa@comcast.net > =C2=A0To: Steven Hartland > =C2=A0Cc: freebsd-fs@freebsd.org > =C2=A0Sent: Wednesday, June 20, 2012 9:55 PM > =C2=A0Subject: Re: ZFS Checksum errors > > > =C2=A0Steve. > > =C2=A0well, it got done, and it found another anonymous file with errors.= =C2=A0any idea how to get rid of these? > > =C2=A0thanks, > =C2=A0ron. > > > > =C2=A0phoenix# zpool status -v zfsPool > =C2=A0 =C2=A0pool: zfsPool > =C2=A0 state: ONLINE > =C2=A0status: One or more devices has experienced an error resulting in d= ata > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0corruption. =C2=A0Applications may be a= ffected. > =C2=A0action: Restore the file in question if possible. =C2=A0Otherwise r= estore the > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0entire pool from backup. > =C2=A0 =C2=A0 see: http://www.sun.com/msg/ZFS-8000-8A > =C2=A0 scrub: scrub completed after 8h29m with 6276 errors on Wed Jun 20 = 16:18:01 2012 > =C2=A0config: > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0NAME =C2=A0 =C2=A0 =C2=A0 =C2=A0STATE = =C2=A0 =C2=A0 READ WRITE CKSUM > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zfsPool =C2=A0 =C2=A0 ONLINE =C2=A0 =C2= =A0 =C2=A0 0 =C2=A0 =C2=A0 0 6.17K > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0da0 =C2=A0 =C2=A0 =C2=A0 ONLINE = =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 0 13.0K =C2=A01.34M repaired > > =C2=A0errors: Permanent errors have been detected in the following files: > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zfsPool/raid:<0x9e241> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zfsPool/Build:<0x0> > =C2=A0phoenix# > > > > > > -------------------------------------------------------------------------= ----- > =C2=A0From: "Steven Hartland" > =C2=A0To: rondzierwa@comcast.net, freebsd-fs@freebsd.org > =C2=A0Sent: Wednesday, June 20, 2012 1:58:20 PM > =C2=A0Subject: Re: ZFS Checksum errors > > =C2=A0----- Original Message ----- > =C2=A0From: > =C2=A0.. > > =C2=A0> zpool status indicates that a file has errors, but doesn't tell m= e its name: > =C2=A0> > =C2=A0> phoenix# zpool status -v zfsPool > =C2=A0> pool: zfsPool > =C2=A0> state: ONLINE > =C2=A0> status: One or more devices has experienced an error resulting in= data > =C2=A0> corruption. Applications may be affected. > =C2=A0> action: Restore the file in question if possible. Otherwise resto= re the > =C2=A0> entire pool from backup. > =C2=A0> see: http://www.sun.com/msg/ZFS-8000-8A > =C2=A0> scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go > > =C2=A0Try waiting for the scrub to complete and see if its more helpful a= fter that. > > =C2=A0 =C2=A0 =C2=A0Regards > =C2=A0 =C2=A0 =C2=A0Steve > > =C2=A0=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > =C2=A0This e.mail is private and confidential between Multiplay (UK) Ltd.= and the person or entity to whom it is addressed. In the event of misdirec= tion, the recipient is prohibited from using, copying, printing or otherwis= e disseminating it or any information contained in it. > > =C2=A0In the event of misdirection, illegible or incomplete transmission = please telephone +44 845 868 1337 > =C2=A0or return the E.mail to postmaster@multiplay.co.uk. > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > This e.mail is private and confidential between Multiplay (UK) Ltd. and t= he person or entity to whom it is addressed. In the event of misdirection, = the recipient is prohibited from using, copying, printing or otherwise diss= eminating it or any information contained in it. > > In the event of misdirection, illegible or incomplete transmission please= telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Jun 20 23:02:38 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B55FC1065672 for ; Wed, 20 Jun 2012 23:02:38 +0000 (UTC) (envelope-from delphij@gmail.com) Received: from mail-qa0-f47.google.com (mail-qa0-f47.google.com [209.85.216.47]) by mx1.freebsd.org (Postfix) with ESMTP id 66D758FC12 for ; Wed, 20 Jun 2012 23:02:38 +0000 (UTC) Received: by qabg1 with SMTP id g1so3521490qab.13 for ; Wed, 20 Jun 2012 16:02:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=6SV0Ame3wEh1f8+cBUKqEJH8BfwrTksrW/wKo6aeTuQ=; b=K9AcGHrYivbI9xrJYNwyqpimmki3DwZEcM3HRhzz3S2J2twgCIL/hWeibVHwIjqRZ/ 16neJaFmdmw0Bm2rh9DhxlH7diADYpuK2ZJ5OKYYQRPwmU74opeTH5u6z4mbPLt1kdqR hOgQR3ISRvoqG+/m6j34aOoTnyKCctz6J31GM+fB+KDbaAGinlWF0/RoNZlB2AjF9nLe vyrQ9uBpB1YUDQLbuCJtE8/zr4Lq4zhN8GQYzx55Ic1vxN8vX9Etq4hbcxaljKG2usaW FWW6uJfJpreocpkVLB47NxOaNEe4iCrxhcG7iFj8uJ25NDSnH51y99qp63MLPR+DpiJe cNLg== MIME-Version: 1.0 Received: by 10.224.183.17 with SMTP id ce17mr38575382qab.8.1340232969309; Wed, 20 Jun 2012 15:56:09 -0700 (PDT) Received: by 10.229.81.1 with HTTP; Wed, 20 Jun 2012 15:56:09 -0700 (PDT) In-Reply-To: <566221263.21373.1340225701396.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> References: <566221263.21373.1340225701396.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> Date: Wed, 20 Jun 2012 15:56:09 -0700 Message-ID: From: Xin LI To: rondzierwa@comcast.net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jun 2012 23:02:38 -0000 On Wed, Jun 20, 2012 at 1:55 PM, wrote: > Steve. > > well, it got done, and it found another anonymous file with errors . any = idea how to get rid of these? Normally you need to "zpool clear zfsPool", and rerun zpool scrub. If you see these numbers growing again, it's likely that there are some other problems with your hardware. The recommended configuration is to use ZFS to manage disks, or at least split your RAID volumes into smaller ones by the way, since otherwise the volume is seen as a "single disk" to ZFS, making it impossible to repair data errors unless you add additional redundancy (zfs set copies=3D2, etc). > > thanks, > ron. > > > > phoenix# zpool status -v zfsPool > pool: zfsPool > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub completed after 8h29m with 6276 errors on Wed Jun 20 16:18:0= 1 2012 > config: > > NAME STATE READ WRITE CKSUM > zfsPool ONLINE 0 0 6.17K > da0 ONLINE 0 0 13.0K 1.34M repaired > > errors: Permanent errors have been detected in the following files: > > zfsPool/raid:<0x9e241> > zfsPool/Build:<0x0> > phoenix# > > > > > ----- Original Message ----- > From: "Steven Hartland" > To: rondzierwa@comcast.net, freebsd-fs@freebsd.org > Sent: Wednesday, June 20, 2012 1:58:20 PM > Subject: Re: ZFS Checksum errors > > ----- Original Message ----- > From: > .. > >> zpool status indicates that a file has errors, but doesn't tell me its n= ame: >> >> phoenix# zpool status -v zfsPool >> pool: zfsPool >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise restore the >> entire pool from backup. >> see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go > > Try waiting for the scrub to complete and see if its more helpful after t= hat. > > Regards > Steve > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > This e.mail is private and confidential between Multiplay (UK) Ltd. and t= he person or entity to whom it is addressed. In the event of misdirection, = the recipient is prohibited from using, copying, printing or otherwise diss= eminating it or any information contained in it. > > In the event of misdirection, illegible or incomplete transmission please= telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" --=20 Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die From owner-freebsd-fs@FreeBSD.ORG Thu Jun 21 09:07:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B175E106566B for ; Thu, 21 Jun 2012 09:07:22 +0000 (UTC) (envelope-from icameto@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 440768FC0C for ; Thu, 21 Jun 2012 09:07:22 +0000 (UTC) Received: by mail-wg0-f50.google.com with SMTP id ds11so356537wgb.31 for ; Thu, 21 Jun 2012 02:07:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=zS/FS3gJRcAgV85DgnFXmlNq72YAbyr3P4+g0BFum1o=; b=xTzuxal2/hrV+PJYLZt1u9qNokLlAUBvxh049T989bEBOUiJugRIRnjM2dVEouVj6T zCwdTDLIOJ/ZaMTIKF15pMOHQj0sgxUV+My1ULRIr9CAIaPEYy1sSEoj1IUMufJhWymf 05LyTzZaXTGsJNPGTSocNiVGdLZCog1Oeyppm/94qMHxpm2gzTWtLR2cwOSSs0UqcEig baxI3vJVQfBi/ubTpXrdVPctYSAjjQrtuSoZrDDL9rwYjdl+NkiRe/wiZ62WysT7ij6Y R/UXXQQQMetRJiJ0H07iShKt2YSWo3Ogqka06ZfLXq4oj0UG5GANm3q1KZKwrLEUf+/C vj0Q== MIME-Version: 1.0 Received: by 10.180.105.6 with SMTP id gi6mr505647wib.4.1340269641875; Thu, 21 Jun 2012 02:07:21 -0700 (PDT) Received: by 10.216.224.228 with HTTP; Thu, 21 Jun 2012 02:07:21 -0700 (PDT) Date: Thu, 21 Jun 2012 12:07:21 +0300 Message-ID: From: icameto icameto To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: ZFS Encryption with GELI for only /opt partition X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jun 2012 09:07:22 -0000 Hi everyone, I have some problems with ZFS encryption and GELI. I used ZFS for /opt partition(da1.eli which is encrypted form of seperate da1 disk ). And I want to encrypt the /opt partition by using GELI. My disks states' like below *# kldstat* Id Refs Address Size Name 1 15 0xffffffff80100000 c9fe20 kernel 2 1 0xffffffff80da0000 1ad0e0 zfs.ko 3 2 0xffffffff80f4e000 3a68 opensolaris.ko 4 1 0xffffffff80f52000 1cdc0 geom_eli.ko 5 2 0xffffffff80f6f000 2b0b8 crypto.ko 6 2 0xffffffff80f9b000 dc40 zlib.ko *# cat /etc/rc.conf | grep geli * geli_devices="da1" geli_da1_flags="-k /root/da1.key" #geli_detach="NO" *# zpool status* pool: opt state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM opt ONLINE 0 0 0 da1.eli ONLINE 0 0 0 errors: No known data errors *# geli status* Name Status Components da1.eli ACTIVE da1 *# df -h* Filesystem Size Used Avail Capacity Mounted on /dev/da0s1a 9.7G 280M 8.6G 3% / devfs 1.0K 1.0K 0B 100% /dev /dev/da0s1d 15G 734M 14G 5% /usr opt 7.8G 120K 7.8G 0% /opt *# geli detach da1.eli* geli: Cannot destroy device da1.eli (error=16). *# zfs unmount -a* *# df -h* Filesystem Size Used Avail Capacity Mounted on /dev/da0s1a 9.7G 280M 8.6G 3% / devfs 1.0K 1.0K 0B 100% /dev /dev/da0s1d 15G 734M 14G 5% /usr *# geli detach da1.eli* geli: Cannot destroy device da1.eli (error=16). When I use "zfs mount -a" command there must be prompted for entering passphrase, but it immediately mounted by zfs without prompting anything. *# zfs mount -a* *# df -h* Filesystem Size Used Avail Capacity Mounted on /dev/da0s1a 9.7G 280M 8.6G 3% / devfs 1.0K 1.0K 0B 100% /dev /dev/da0s1d 15G 734M 14G 5% /usr opt 7.8G 120K 7.8G 0% /opt But i want to be able to detach encrypted device and remove that from zpool as cannot access by anyone. But I got an error when i try to detach the device (opt partition) . And I can still access the disk on ZFS pool. Isn't it strange buddies ? Briefly, Is there any solution to detach and unmount encrypted disk for only /opt partition(which is in ZFS Pool). Could you please give me advice on this progress ? From owner-freebsd-fs@FreeBSD.ORG Thu Jun 21 11:22:06 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3A9771065675 for ; Thu, 21 Jun 2012 11:22:06 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay01.ispgateway.de (smtprelay01.ispgateway.de [80.67.31.28]) by mx1.freebsd.org (Postfix) with ESMTP id C027E8FC27 for ; Thu, 21 Jun 2012 11:22:05 +0000 (UTC) Received: from [78.35.187.24] (helo=fabiankeil.de) by smtprelay01.ispgateway.de with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1ShfLi-0006sc-QP; Thu, 21 Jun 2012 13:14:54 +0200 Date: Thu, 21 Jun 2012 13:14:43 +0200 From: Fabian Keil To: icameto icameto Message-ID: <20120621131443.59eb24f3@fabiankeil.de> In-Reply-To: References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/A3NqNJB+DRq4WPzstOOYdy0"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Encryption with GELI for only /opt partition X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-fs@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jun 2012 11:22:06 -0000 --Sig_/A3NqNJB+DRq4WPzstOOYdy0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable icameto icameto wrote: > I have some problems with ZFS encryption and GELI. I used ZFS for /opt > partition(da1.eli which is encrypted form of seperate da1 disk ). And I > want to encrypt the /opt partition by using GELI. My disks states' like > below >=20 > *# kldstat* > Id Refs Address Size Name > 1 15 0xffffffff80100000 c9fe20 kernel > 2 1 0xffffffff80da0000 1ad0e0 zfs.ko > 3 2 0xffffffff80f4e000 3a68 opensolaris.ko > 4 1 0xffffffff80f52000 1cdc0 geom_eli.ko > 5 2 0xffffffff80f6f000 2b0b8 crypto.ko > 6 2 0xffffffff80f9b000 dc40 zlib.ko >=20 >=20 > *# cat /etc/rc.conf | grep geli * > geli_devices=3D"da1" > geli_da1_flags=3D"-k /root/da1.key" > #geli_detach=3D"NO" >=20 >=20 > *# zpool status* > pool: opt > state: ONLINE > scrub: none requested > config: >=20 > NAME STATE READ WRITE CKSUM > opt ONLINE 0 0 0 > da1.eli ONLINE 0 0 0 >=20 > errors: No known data errors >=20 > *# geli status* > Name Status Components > da1.eli ACTIVE da1 >=20 > *# df -h* > Filesystem Size Used Avail Capacity Mounted on > /dev/da0s1a 9.7G 280M 8.6G 3% / > devfs 1.0K 1.0K 0B 100% /dev > /dev/da0s1d 15G 734M 14G 5% /usr > opt 7.8G 120K 7.8G 0% /opt >=20 >=20 > *# geli detach da1.eli* > geli: Cannot destroy device da1.eli (error=3D16). >=20 > *# zfs unmount -a* >=20 > *# df -h* > Filesystem Size Used Avail Capacity Mounted on > /dev/da0s1a 9.7G 280M 8.6G 3% / > devfs 1.0K 1.0K 0B 100% /dev > /dev/da0s1d 15G 734M 14G 5% /usr >=20 > *# geli detach da1.eli* > geli: Cannot destroy device da1.eli (error=3D16). This doesn't work because the pool is still imported. Try running "zpool export opt" first, it will automatically unmount the datasets so you can skip the "zfs unmount -a". > When I use "zfs mount -a" command there must be prompted for entering > passphrase, but it immediately mounted by zfs without prompting anything. As the pool hasn't been exported, that's the expected behaviour. Also note that ZFS and geli are not tightly integrated so "zfs mount -a" will never setup the geli provider for you. > *# zfs mount -a* >=20 > *# df -h* > Filesystem Size Used Avail Capacity Mounted on > /dev/da0s1a 9.7G 280M 8.6G 3% / > devfs 1.0K 1.0K 0B 100% /dev > /dev/da0s1d 15G 734M 14G 5% /usr > opt 7.8G 120K 7.8G 0% /opt >=20 >=20 > But i want to be able to detach encrypted device and remove that from > zpool as cannot access by anyone. But I got an error when i try to > detach the device (opt partition) . And I can still access the disk on > ZFS pool. Isn't it strange buddies ? >=20 > Briefly, Is there any solution to detach and unmount encrypted disk for > only /opt partition(which is in ZFS Pool). Could you please give me > advice on this progress ? I'm not aware of a mechanism in FreeBSD's base system that does this automatically, but doing it manually (or with a script) should work. Fabian --Sig_/A3NqNJB+DRq4WPzstOOYdy0 Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAk/jAikACgkQBYqIVf93VJ1p9wCfXS/RXW3h6tcjyPKSGMtxkpWq l7sAoJVlpYCuSZt9MOPqWTqc1uK7R7pm =qN5i -----END PGP SIGNATURE----- --Sig_/A3NqNJB+DRq4WPzstOOYdy0-- From owner-freebsd-fs@FreeBSD.ORG Thu Jun 21 14:01:54 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F0D4106564A for ; Thu, 21 Jun 2012 14:01:54 +0000 (UTC) (envelope-from gleb.kurtsou@gmail.com) Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com [209.85.215.182]) by mx1.freebsd.org (Postfix) with ESMTP id 1965C8FC0C for ; Thu, 21 Jun 2012 14:01:53 +0000 (UTC) Received: by eabm6 with SMTP id m6so321238eab.13 for ; Thu, 21 Jun 2012 07:01:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=SWs/ElDSHh/CH2JqlPKIUwDZ/n7p0dfEJ7pf0IP6pP0=; b=r+OlNtkEAlo+8yzztgHV2354VASLlYPfyZEb5LjcUMVhpwwelecxj/bYTyXPPqMzGP o1MLcOMII7+68wdGCt7vOx8tzu5r9YPUW4yaU8NXIM+Rih9WsE+cgULQPIRQQVm0cU6E ff+AeVAM0EhX88Yndljt0CVUiItjcn2NMLllayvaW1ontPtLZ6/ORRYVpCkhVpOeeqzd zXDobTXihw4V6qAVwinqAlt0mGQhZuAWScJCa2qd/yAIgM/SgBZokNesAL/kV+DBNBg0 ag5PLHtYyMnvMowAahKK+XhQMfxtT1F0N59vZ4Rfedixs7BxAK+32kxYMDBrFMPf7edk 7/vw== Received: by 10.152.105.173 with SMTP id gn13mr26525922lab.20.1340287312889; Thu, 21 Jun 2012 07:01:52 -0700 (PDT) Received: from localhost ([78.157.92.5]) by mx.google.com with ESMTPS id b3sm18632818lbh.6.2012.06.21.07.01.51 (version=SSLv3 cipher=OTHER); Thu, 21 Jun 2012 07:01:51 -0700 (PDT) Date: Thu, 21 Jun 2012 17:01:50 +0300 From: Gleb Kurtsou To: freebsd-fs@freebsd.org Message-ID: <20120621140149.GA59722@reks> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Subject: [RFC] tmpfs RB-Tree for directory entries X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jun 2012 14:01:54 -0000 Hello, Here is patch for CURRENT replacing directory entry linked list with RB-Tree in tmpfs. Performance improvement varies for different workloads, it may be negligible for directories with small number of files or due to VFS name caching. http://people.freebsd.org/~gleb/tmpfs-nrbtree.1.patch This patch is unrelated to similar changes recently committed to DragonFly: https://bugs.dragonflybsd.org/issues/2375 http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/29ca4fd6da8bb70ae90d8e73ea3c47fda22491a7 My patch uses name hashes instead of comparing entries by file name, moreover it reuses the same hash value as directory entry offset and eliminates possible issue of duplicate directory offsets on 64-bit archs. In other words it makes VOP_READDIR on large directories faster for non-zero offsets. I'm willing to commit the patch and would appreciate if people actively using tmpfs give it a try. Thanks, Gleb. ** file_create test from DragonFly PR % time ~/file_create 10000 x tmpfs-file_create-rb + tmpfs-file_create-orig +------------------------------------------------------------------------+ | + | |x + | |xx + | |xx + +| |A| | | |MA_| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 0.112 0.14 0.119 0.1234 0.012116105 + 5 2.551 2.734 2.551 2.5886 0.081309901 Difference at 95.0% confidence 2.4652 +/- 0.0847787 1997.73% +/- 68.7023% (Student's t, pooled s = 0.0581296) ** test1 -- create 5000 files, rename some of them, remove files time sh ~/test1.sh x tmpfs-test1-rb + tmpfs-test1-orig +------------------------------------------------------------------------+ | x x x + +++ | ||___MA_____| | | |_____________A______M______|| +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 4 5.893 6.091 5.932 5.9535 0.093289871 + 4 6.49 7.006 6.987 6.8655 0.25058931 Difference at 95.0% confidence 0.912 +/- 0.327153 15.3187% +/- 5.49514% (Student's t, pooled s = 0.189074) test1.sh: #!/bin/sh for i in `jot 5000`; do echo $i > longername$i; done ls >/dev/null for i in `jot 899 100`; do mv longername$i longername1$i; done ls >/dev/null for i in `jot 899 100`; do mv longername2$i longername3$i; done ls >/dev/null for i in `jot 5000`; do rm longername$i 2>/dev/null; done From owner-freebsd-fs@FreeBSD.ORG Thu Jun 21 21:49:01 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AEB34106564A for ; Thu, 21 Jun 2012 21:49:01 +0000 (UTC) (envelope-from rondzierwa@comcast.net) Received: from qmta08.westchester.pa.mail.comcast.net (qmta08.westchester.pa.mail.comcast.net [76.96.62.80]) by mx1.freebsd.org (Postfix) with ESMTP id 697B68FC08 for ; Thu, 21 Jun 2012 21:49:01 +0000 (UTC) Received: from omta17.westchester.pa.mail.comcast.net ([76.96.62.89]) by qmta08.westchester.pa.mail.comcast.net with comcast id R92s1j0041vXlb8589p1dH; Thu, 21 Jun 2012 21:49:01 +0000 Received: from sz0192.wc.mail.comcast.net ([76.96.59.160]) by omta17.westchester.pa.mail.comcast.net with comcast id R9p11j0143TRaxG3d9p1wq; Thu, 21 Jun 2012 21:49:01 +0000 Date: Thu, 21 Jun 2012 21:48:59 +0000 (UTC) From: rondzierwa@comcast.net To: Xin LI Message-ID: <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> In-Reply-To: MIME-Version: 1.0 X-Originating-IP: [68.50.136.212] X-Mailer: Zimbra 6.0.13_GA_2944 (ZimbraWebClient - FF3.0 (Win)/6.0.13_GA_2944) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jun 2012 21:49:01 -0000 ok, i ran a verify on the raid, and it completed, so I believe that, from the hardware standpoint, da0 should be a functioning, 12TB disk. i did a zpool clear and re-ran the scrub, and the results were almost identical: phoenix# zpool status -v zfsPool pool: zfsPool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 3h39m with 6353 errors on Thu Jun 21 17:28:10 2012 config: NAME STATE READ WRITE CKSUM zfsPool ONLINE 0 0 6.20K da0 ONLINE 0 0 12.5K 24K repaired errors: Permanent errors have been detected in the following files: zfsPool/raid:<0x9e241> zfsPool/Build:<0x0> phoenix# along with the 6,353 I/O errors, there were over 12,000 checksum mismatch errors on the console. The recommendation from ZFS is to restore the file in question. At this point, I would just like to delete the two files. how do i do that? its these kind of antics that make me resistant to the thought of allowing ZFS to manage the raid. it seems to be having problems just managing a big file system. I don't want it to correct anything, or restore anything, just let me delete the files that hurt, fix up the free space list so it doesn't point outside the bounds of the disk, and get on with life. if its finding corrupted files that appear to not have a directory entry associated with them (unlinked files), why doesn't it just delete them? fsck asks you if you want to delete unlinked files, why doesn't zfs do the same, or at least give you the option of deleting bad files when it finds them? this is causing a lot of down time, and its making linux look very attractive in my organization. how do I get this untangled short of reformatting and starting over? ron. ----- Original Message ----- From: "Xin LI" To: rondzierwa@comcast.net Cc: "Steven Hartland" , freebsd-fs@freebsd.org Sent: Wednesday, June 20, 2012 6:56:09 PM Subject: Re: ZFS Checksum errors On Wed, Jun 20, 2012 at 1:55 PM, wrote: > Steve. > > well, it got done, and it found another anonymous file with errors . any idea how to get rid of these? Normally you need to "zpool clear zfsPool", and rerun zpool scrub. If you see these numbers growing again, it's likely that there are some other problems with your hardware. The recommended configuration is to use ZFS to manage disks, or at least split your RAID volumes into smaller ones by the way, since otherwise the volume is seen as a "single disk" to ZFS, making it impossible to repair data errors unless you add additional redundancy (zfs set copies=2, etc). > > thanks, > ron. > > > > phoenix# zpool status -v zfsPool > pool: zfsPool > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub completed after 8h29m with 6276 errors on Wed Jun 20 16:18:01 2012 > config: > > NAME STATE READ WRITE CKSUM > zfsPool ONLINE 0 0 6.17K > da0 ONLINE 0 0 13.0K 1.34M repaired > > errors: Permanent errors have been detected in the following files: > > zfsPool/raid:<0x9e241> > zfsPool/Build:<0x0> > phoenix# > > > > > ----- Original Message ----- > From: "Steven Hartland" > To: rondzierwa@comcast.net, freebsd-fs@freebsd.org > Sent: Wednesday, June 20, 2012 1:58:20 PM > Subject: Re: ZFS Checksum errors > > ----- Original Message ----- > From: > .. > >> zpool status indicates that a file has errors, but doesn't tell me its name: >> >> phoenix# zpool status -v zfsPool >> pool: zfsPool >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise restore the >> entire pool from backup. >> see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go > > Try waiting for the scrub to complete and see if its more helpful after that. > > Regards > Steve > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. > > In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die From owner-freebsd-fs@FreeBSD.ORG Thu Jun 21 21:51:15 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63B46106564A for ; Thu, 21 Jun 2012 21:51:15 +0000 (UTC) (envelope-from rincebrain@gmail.com) Received: from mail-qc0-f182.google.com (mail-qc0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id 16F228FC15 for ; Thu, 21 Jun 2012 21:51:15 +0000 (UTC) Received: by qcsg15 with SMTP id g15so743066qcs.13 for ; Thu, 21 Jun 2012 14:51:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=4xTlhwxX55e4QCz4RcmJn7pUcH+/zAcSYJjToPD8cSs=; b=Ey4wPYiBUyQeu/3V/JRhcZ+i6dJV2akzPPJFJDRLyTr736l9Ie6DAr2ToKDPNF82Lt rJJIZHvenPaBAIPfUeJ8wjqawsxlP14pIddw9FbNiLGuXJqvPvy6RP4FGmaKfJ56MAoy LYqDMBbfKbqK0p9UlpZhf51TpoJvSjr5hTj9cuOZuGqFRuf7k0PNFJHgz93h2y4JmKIj F1kYXh0hSHWh39naWjI5dpnf+ZekyfOFyChe8JOSlXJ37UVelc+tUHzGi27A194zGLVs aob/7iyFsFmeWwVAsxvZOy23tN4HHhQs92DAx/lkhhHAl3xv4ssTPyrJSM7gcBlWofZP iL8Q== MIME-Version: 1.0 Received: by 10.224.116.203 with SMTP id n11mr2182124qaq.61.1340315474523; Thu, 21 Jun 2012 14:51:14 -0700 (PDT) Sender: rincebrain@gmail.com Received: by 10.229.250.6 with HTTP; Thu, 21 Jun 2012 14:51:14 -0700 (PDT) In-Reply-To: <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> References: <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> Date: Thu, 21 Jun 2012 17:51:14 -0400 X-Google-Sender-Auth: AI0lr7-9K9OWN1mwLHbfaum3lkk Message-ID: From: Rich To: rondzierwa@comcast.net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jun 2012 21:51:15 -0000 To be honest, if ZFS says you've got a ton of checksum errors, I would strongly bet in favor of your data being damaged over a bug in ZFS. What're the underlying disks and RAID card? - Rich On Thu, Jun 21, 2012 at 5:48 PM, wrote: > > ok, i ran a verify on the raid, and it completed, so I believe that, from= the hardware standpoint, da0 should be a functioning, 12TB disk. > > i did a zpool clear and re-ran the scrub, and the results were almost ide= ntical: > > phoenix# zpool status -v zfsPool > pool: zfsPool > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub completed after 3h39m with 6353 errors on Thu Jun 21 17:28:1= 0 2012 > config: > > NAME STATE READ WRITE CKSUM > zfsPool ONLINE 0 0 6.20K > da0 ONLINE 0 0 12.5K 24K repaired > > errors: Permanent errors have been detected in the following files: > > zfsPool/raid:<0x9e241> > zfsPool/Build:<0x0> > phoenix# > > along with the 6,353 I/O errors, there were over 12,000 checksum mismatch= errors on the console. > > > The recommendation from ZFS is to restore the file in question. At this p= oint, I would just like to delete the two files. > how do i do that? > > its these kind of antics that make me resistant to the thought of allowin= g ZFS to manage the raid. it seems to be having problems just managing a bi= g file system. I don't want it to correct anything, or restore anything, ju= st let me delete the files that hurt, fix up the free space list so it does= n't point outside the bounds of the disk, and get on with life. > > if its finding corrupted files that appear to not have a directory entry = associated with them (unlinked files), why doesn't it just delete them? fsc= k asks you if you want to delete unlinked files, why doesn't zfs do the sam= e, or at least give you the option of deleting bad files when it finds them= ? > > this is causing a lot of down time, and its making linux look very attrac= tive in my organization. how do I get this untangled short of reformatting = and starting over? > > ron. > > > ----- Original Message ----- > From: "Xin LI" > To: rondzierwa@comcast.net > Cc: "Steven Hartland" , freebsd-fs@freebsd.org > Sent: Wednesday, June 20, 2012 6:56:09 PM > Subject: Re: ZFS Checksum errors > > On Wed, Jun 20, 2012 at 1:55 PM, wrote: >> Steve. >> >> well, it got done, and it found another anonymous file with errors . any= idea how to get rid of these? > > Normally you need to "zpool clear zfsPool", and rerun zpool scrub. If > you see these numbers growing again, it's likely that there are some > other problems with your hardware. The recommended configuration is > to use ZFS to manage disks, or at least split your RAID volumes into > smaller ones by the way, since otherwise the volume is seen as a > "single disk" to ZFS, making it impossible to repair data errors > unless you add additional redundancy (zfs set copies=3D2, etc). > >> >> thanks, >> ron. >> >> >> >> phoenix# zpool status -v zfsPool >> pool: zfsPool >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise restore the >> entire pool from backup. >> see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: scrub completed after 8h29m with 6276 errors on Wed Jun 20 16:18:= 01 2012 >> config: >> >> NAME STATE READ WRITE CKSUM >> zfsPool ONLINE 0 0 6.17K >> da0 ONLINE 0 0 13.0K 1.34M repaired >> >> errors: Permanent errors have been detected in the following files: >> >> zfsPool/raid:<0x9e241> >> zfsPool/Build:<0x0> >> phoenix# >> >> >> >> >> ----- Original Message ----- >> From: "Steven Hartland" >> To: rondzierwa@comcast.net, freebsd-fs@freebsd.org >> Sent: Wednesday, June 20, 2012 1:58:20 PM >> Subject: Re: ZFS Checksum errors >> >> ----- Original Message ----- >> From: >> .. >> >>> zpool status indicates that a file has errors, but doesn't tell me its = name: >>> >>> phoenix# zpool status -v zfsPool >>> pool: zfsPool >>> state: ONLINE >>> status: One or more devices has experienced an error resulting in data >>> corruption. Applications may be affected. >>> action: Restore the file in question if possible. Otherwise restore the >>> entire pool from backup. >>> see: http://www.sun.com/msg/ZFS-8000-8A >>> scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go >> >> Try waiting for the scrub to complete and see if its more helpful after = that. >> >> Regards >> Steve >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> This e.mail is private and confidential between Multiplay (UK) Ltd. and = the person or entity to whom it is addressed. In the event of misdirection,= the recipient is prohibited from using, copying, printing or otherwise dis= seminating it or any information contained in it. >> >> In the event of misdirection, illegible or incomplete transmission pleas= e telephone +44 845 868 1337 >> or return the E.mail to postmaster@multiplay.co.uk. >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > > -- > Xin LI https://www.delphij.net/ > FreeBSD - The Power to Serve! Live free or die > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Jun 21 22:01:31 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2235E106566C for ; Thu, 21 Jun 2012 22:01:31 +0000 (UTC) (envelope-from feld@feld.me) Received: from feld.me (unknown [IPv6:2607:f4e0:100:300::2]) by mx1.freebsd.org (Postfix) with ESMTP id E917F8FC0A for ; Thu, 21 Jun 2012 22:01:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=feld.me; s=blargle; h=In-Reply-To:Message-Id:From:Mime-Version:Date:References:Subject:To:Content-Type; bh=I2MTCUO7HmyCep8oJWzFZOgZz1k9A4OvcN4I/2rOg2I=; b=iGlyDP1hUPzxek8jMktpKrzGbh6pbqFoc2Ulg4ax7wIRma1qCD1zxJxT+87twtMkdcHl43mub8zBPBOxJJwqisymNhMt7vmHHtTFQYJwsFm3/6qfMPjuMgVBS4Y7FTwH; Received: from localhost ([127.0.0.1] helo=mwi1.coffeenet.org) by feld.me with esmtp (Exim 4.77 (FreeBSD)) (envelope-from ) id 1ShpRS-000MVT-8i for freebsd-fs@freebsd.org; Thu, 21 Jun 2012 17:01:30 -0500 Received: from feld@feld.me by mwi1.coffeenet.org (Archiveopteryx 3.1.4) with esmtpa id 1340316089-94480-94479/5/36; Thu, 21 Jun 2012 22:01:29 +0000 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: freebsd-fs@freebsd.org References: <566221263.21373.1340225701396.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> Date: Thu, 21 Jun 2012 17:01:29 -0500 Mime-Version: 1.0 From: Mark Felder Message-Id: In-Reply-To: <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> User-Agent: Opera Mail/12.00 (FreeBSD) X-SA-Score: -1.5 Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jun 2012 22:01:31 -0000 On Thu, 21 Jun 2012 16:48:59 -0500, wrote: > this is causing a lot of down time, and its making linux look very > attractive in my organization. how do I get this untangled short of > reformatting and starting over? I'm pretty confident that you have bad hardware and ZFS is telling you so. It's either a memory issue (ECC RAM being used?), or a problem with your controller, cabling, or HDD. Things like this have happened to people and the problem was a faulty power supply or dirty power, too. When ZFS tells you there are errors and checksum mismatches it's doing its best to PROTECT you from corrupted data and you need to start looking at every piece of hardware in your server that can cause this. From owner-freebsd-fs@FreeBSD.ORG Thu Jun 21 22:04:06 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6ABC11065670 for ; Thu, 21 Jun 2012 22:04:06 +0000 (UTC) (envelope-from prvs=15195631bd=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id EC4F98FC0C for ; Thu, 21 Jun 2012 22:04:05 +0000 (UTC) X-Spam-Processed: mail1.multiplay.co.uk, Thu, 21 Jun 2012 23:03:54 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50020412655.msg for ; Thu, 21 Jun 2012 23:03:53 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=15195631bd=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <7CD42032309D4072A0EC3B0187378658@multiplay.co.uk> From: "Steven Hartland" To: "Rich" , References: <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> Date: Thu, 21 Jun 2012 23:04:38 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jun 2012 22:04:06 -0000 It could also be issues with the hardware above the raid controller, e.g. main memory. Hardware raid check doesn't really tell you too much where as zfs checks are end to end so will detect intermediate issues like memory or cpu problems. We've even had a few issues in the past where the raid controller thought everything was fine but zfs was complaining and it did turn out to be disks on the way out. So at this point I wouldn't rule out anything. Regards Steve ----- Original Message ----- From: "Rich" To: Cc: Sent: Thursday, June 21, 2012 10:51 PM Subject: Re: ZFS Checksum errors To be honest, if ZFS says you've got a ton of checksum errors, I would strongly bet in favor of your data being damaged over a bug in ZFS. What're the underlying disks and RAID card? ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Jun 21 22:52:07 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D99C2106564A for ; Thu, 21 Jun 2012 22:52:07 +0000 (UTC) (envelope-from delphij@gmail.com) Received: from mail-qa0-f51.google.com (mail-qa0-f51.google.com [209.85.216.51]) by mx1.freebsd.org (Postfix) with ESMTP id 8D56E8FC14 for ; Thu, 21 Jun 2012 22:52:07 +0000 (UTC) Received: by qaea16 with SMTP id a16so4073qae.17 for ; Thu, 21 Jun 2012 15:52:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=5rEsbxKoCWQlPlVgNOw1gBbYpYp40Lk18T7OacMLPLA=; b=QRW9LUevtULKy50u9xMPOIiTBL+kUVBQSZg/q2YPkHgyHya1Ec4eewL23s7KXCKyxi rNjkAU/krQOi1VNbBBTwhBqNXbg4QDHovK1OMQux4632/8O3WqnvbBA4KSYgshOzMwKb FFOmkZeRC9TLxYR0fbpXq1hEivgvI2IWWFVlIrbbfX6rnv0nQOyCQUZzE3d8ODZ1pka9 T2HqkKcL1Y0rjUYuhekB3faMoOK5dvnf+E5JnyqcV3FSg7xF7Ex4v2ZD1+zkC88PzCrD z3OiDBVmevCKBIolrIJhbFTjw5InkWb+glZ1jtt7eZ0dDCXuq/4T1LUbrQ5bFsRZ6TMs zd/w== MIME-Version: 1.0 Received: by 10.229.69.31 with SMTP id x31mr15347848qci.101.1340319126615; Thu, 21 Jun 2012 15:52:06 -0700 (PDT) Received: by 10.229.81.1 with HTTP; Thu, 21 Jun 2012 15:52:06 -0700 (PDT) In-Reply-To: <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> References: <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> Date: Thu, 21 Jun 2012 15:52:06 -0700 Message-ID: From: Xin LI To: rondzierwa@comcast.net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jun 2012 22:52:08 -0000 Hi, On Thu, Jun 21, 2012 at 2:48 PM, wrote: > > ok, i ran a verify on the raid, and it completed, so I believe that, from > the hardware standpoint, da0 should be a functioning, 12TB disk. > > i did a zpool clear and re-ran the scrub, and the results were almost > identical: [...] > config: > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 NAME=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 STATE=C2=A0=C2=A0=C2=A0=C2=A0 READ WRITE CKSUM > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 zfsPool=C2=A0=C2=A0=C2=A0=C2= =A0 ONLINE=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0 0 = 6.20K > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 da0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 ONLINE=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2= =A0=C2=A0=C2=A0 0 12.5K=C2=A0 24K repaired This is very likely be a hardware issue, or a driver issue (less likely, since we have done extensive testing on this RAID card and the problems are believed to fixed years ago). There are however a few erratums from AMD that makes me feel quite concerne= d: http://support.amd.com/us/Embedded_TechDocs/41322.pdf Specifically speaking, #264, #298 seems quite serious. How old is your motherboard BIOS? Are you using ECC memory by the way? > errors: Permanent errors have been detected in the following files: > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 zfsPool/raid:<0x9e241> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 zfsPool/Build:<0x0> > phoenix# > > along with the 6,353 I/O errors, there were over 12,000 checksum mismatch > errors on the console. > > > The recommendation from ZFS is to restore the file in question.=C2=A0 At = this > point, I would just like to delete the two files. > how do i do that? > > its these kind of antics that make me resistant to the thought of allowin= g > ZFS to manage the raid.=C2=A0 it seems to be having problems just managin= g a big > file system.=C2=A0 I don't want it to correct anything, or restore anythi= ng, just > let me delete the files that hurt, fix up the free space list so it doesn= 't > point outside the bounds of the disk, and get on with life. Are you *really* sure that these are files? The second one doesn't seem to be a file, but rather some metadata. If hardware issue have been ruled out, what I would do is to copy data over to a different dataset (e.g. Build.new, then validate the data copied, then destroy the current Build dataset, rename Build.new to Build). > if its finding corrupted files that appear to not have a directory entry > associated with them (unlinked files), why doesn't it just delete them? > fsck asks you if you want to delete unlinked files, why doesn't zfs do th= e > same, or at least give you the option of deleting bad files when it finds > them? Normally, ZFS do tell you which files are corrupted, sometimes it takes time since your file might be present in multiple snapshots, and the current set of utilities only gives you one reference for the file's name, and you may need to remove the file (or the snapshot containing it), scrub, then remove the newly revealed reference, etc. Your case seems to be very serious that I really think there are some metadata corruption, which are serious enough that they are already beyond fix. ZFS replicates metadata into different locations, but that does not prevent it from being corrupted in memory. In these situations you will have to use a backup. > this is causing a lot of down time, and its making linux look very > attractive in my organization. how do I get this untangled short of > reformatting and starting over? Linux does not have comparable end-to-end data validation ability that ZFS offers. Use caution if you go that route. > ron. > > > ________________________________ > From: "Xin LI" > To: rondzierwa@comcast.net > Cc: "Steven Hartland" , freebsd-fs@freebsd.org > Sent: Wednesday, June 20, 2012 6:56:09 PM > > Subject: Re: ZFS Checksum errors > > On Wed, Jun 20, 2012 at 1:55 PM, =C2=A0 wrote: >> Steve. >> >> well, it got done, and it found another anonymous file with errors . any >> idea how to get rid of these? > > Normally you need to "zpool clear zfsPool", and rerun zpool scrub. =C2=A0= If > you see these numbers growing again, it's likely that there are some > other problems with your hardware. =C2=A0The recommended configuration is > to use ZFS to manage disks, or at least split your RAID volumes into > smaller ones by the way, since otherwise the volume is seen as a > "single disk" to ZFS, making it impossible to repair data errors > unless you add additional redundancy (zfs set copies=3D2, etc). > >> >> thanks, >> ron. >> >> >> >> phoenix# zpool status -v zfsPool >> pool: zfsPool >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise restore the >> entire pool from backup. >> see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: scrub completed after 8h29m with 6276 errors on Wed Jun 20 16:18:= 01 >> 2012 >> config: >> >> NAME STATE READ WRITE CKSUM >> zfsPool ONLINE 0 0 6.17K >> da0 ONLINE 0 0 13.0K 1.34M repaired >> >> errors: Permanent errors have been detected in the following files: >> >> zfsPool/raid:<0x9e241> >> zfsPool/Build:<0x0> >> phoenix# >> >> >> >> >> ----- Original Message ----- >> From: "Steven Hartland" >> To: rondzierwa@comcast.net, freebsd-fs@freebsd.org >> Sent: Wednesday, June 20, 2012 1:58:20 PM >> Subject: Re: ZFS Checksum errors >> >> ----- Original Message ----- >> From: >> .. >> >>> zpool status indicates that a file has errors, but doesn't tell me its >>> name: >>> >>> phoenix# zpool status -v zfsPool >>> pool: zfsPool >>> state: ONLINE >>> status: One or more devices has experienced an error resulting in data >>> corruption. Applications may be affected. >>> action: Restore the file in question if possible. Otherwise restore the >>> entire pool from backup. >>> see: http://www.sun.com/msg/ZFS-8000-8A >>> scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go >> >> Try waiting for the scrub to complete and see if its more helpful after >> that. >> >> Regards >> Steve >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> This e.mail is private and confidential between Multiplay (UK) Ltd. and >> the person or entity to whom it is addressed. In the event of misdirecti= on, >> the recipient is prohibited from using, copying, printing or otherwise >> disseminating it or any information contained in it. >> >> In the event of misdirection, illegible or incomplete transmission pleas= e >> telephone +44 845 868 1337 >> or return the E.mail to postmaster@multiplay.co.uk. >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > > -- > Xin LI https://www.delphij.net/ > FreeBSD - The Power to Serve! Live free or die --=20 Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die From owner-freebsd-fs@FreeBSD.ORG Fri Jun 22 00:20:03 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A6491106566B for ; Fri, 22 Jun 2012 00:20:03 +0000 (UTC) (envelope-from rondzierwa@comcast.net) Received: from QMTA11.westchester.pa.mail.comcast.net (qmta11.westchester.pa.mail.comcast.net [76.96.59.211]) by mx1.freebsd.org (Postfix) with ESMTP id 43EEC8FC12 for ; Fri, 22 Jun 2012 00:20:02 +0000 (UTC) Received: from omta10.westchester.pa.mail.comcast.net ([76.96.62.28]) by QMTA11.westchester.pa.mail.comcast.net with comcast id RCGN1j0050cZkys5BCKwd8; Fri, 22 Jun 2012 00:19:56 +0000 Received: from sz0192.wc.mail.comcast.net ([76.96.59.160]) by omta10.westchester.pa.mail.comcast.net with comcast id RCKw1j0043TRaxG3WCKwLN; Fri, 22 Jun 2012 00:19:56 +0000 Date: Fri, 22 Jun 2012 00:19:55 +0000 (UTC) From: rondzierwa@comcast.net To: Xin LI Message-ID: <178486397.30705.1340324395308.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> In-Reply-To: MIME-Version: 1.0 X-Originating-IP: [68.50.136.212] X-Mailer: Zimbra 6.0.13_GA_2944 (ZimbraWebClient - FF3.0 (Win)/6.0.13_GA_2944) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jun 2012 00:20:03 -0000 Guys I want to thank you all for the attention to my problem. but i think we are barking up the wrong bug chasing an ongoing hardware problem. I have no doubt that the problem was most likely caused by a hardware failure. But probably not because of memory or processor (my cpu rev was not affected by the two problems you mentioned, and I ran memory pattern tests on this system for days before i started using it, and ran pattern tests on the raw raid before putting zfs on it in order to generate baseline performance metrics). Three days ago I was running a disk pattern generator/checker to determine performance metrics on the disk array with ZFS. The test was configured to operate with a pair of 1TB files, writing one while checking the previous one. During the first file creation, the raid controller began complaining about slot 1 (removed, reset, replaced, removed, reset, replaced, etc). I stopped the test, reseated the connector on the drive, and the complaints stopped. I started the pattern checker to look at the fragment of the file that it created (about 200gig) and that was when ZFS began complaining about checksum errors. I did a zpool stat, and the first of the two files (the one on "/raid") had a name, and it was the pattern checker file. So, I did an rm on the pattern checker file, and ZFS took off producing checksum errors on the console, and I was left with the orphan file. I ran the zpool scrub, and the second file turned up. So, thinking that there was something foul on the underlying array, i did a verify. and It turned up a couple of errors that it fixed on the drive in slot 1. so I did the zpool clear, ran scrub again, with no better results. now to the present. Yes. it was undoubtedly caused by a hardware problem. But I do not believe that it is an ongoing problem. There were physical disk errors while I was trying to create the pattern file, and I now have a corrupted file. These things happen, but in a production environment, we have to be able to fix the resulting mess without starting over. I am willing to bet that the checksum errors are related to the pattern checker file listed as the file that has uncorrectable errors that was being created when the disk errors occurred. if I was forced to guess, i would expect that not only are there errors in the data, but that some of the block pointers reference space that is either in other files, or not withing the space of the raid at all. i'm sure we have all seen this kind of filesystem corruption before. it used to be as simple as running fsck and letting it untangle the bogus file. The remainder of the array appears to function normally, the system is still in production, but in a read-only capacity. There are some 6tb of various media and other files, and they all seem to be accessible. its just these two files that are corrupted. So, how do i "fsck" a zfs volume, remove the bogus files, and get on with my otherwise boring, uneventful life?? thanks again, ron. ----- Original Message ----- From: "Xin LI" To: rondzierwa@comcast.net Cc: "Steven Hartland" , freebsd-fs@freebsd.org Sent: Thursday, June 21, 2012 6:52:06 PM Subject: Re: ZFS Checksum errors Hi, On Thu, Jun 21, 2012 at 2:48 PM, wrote: > > ok, i ran a verify on the raid, and it completed, so I believe that, from > the hardware standpoint, da0 should be a functioning, 12TB disk. > > i did a zpool clear and re-ran the scrub, and the results were almost > identical: [...] > config: > > NAME STATE READ WRITE CKSUM > zfsPool ONLINE 0 0 6.20K > da0 ONLINE 0 0 12.5K 24K repaired This is very likely be a hardware issue, or a driver issue (less likely, since we have done extensive testing on this RAID card and the problems are believed to fixed years ago). There are however a few erratums from AMD that makes me feel quite concerned: http://support.amd.com/us/Embedded_TechDocs/41322.pdf Specifically speaking, #264, #298 seems quite serious. How old is your motherboard BIOS? Are you using ECC memory by the way? > errors: Permanent errors have been detected in the following files: > > zfsPool/raid:<0x9e241> > zfsPool/Build:<0x0> > phoenix# > > along with the 6,353 I/O errors, there were over 12,000 checksum mismatch > errors on the console. > > > The recommendation from ZFS is to restore the file in question. At this > point, I would just like to delete the two files. > how do i do that? > > its these kind of antics that make me resistant to the thought of allowing > ZFS to manage the raid. it seems to be having problems just managing a big > file system. I don't want it to correct anything, or restore anything, just > let me delete the files that hurt, fix up the free space list so it doesn't > point outside the bounds of the disk, and get on with life. Are you *really* sure that these are files? The second one doesn't seem to be a file, but rather some metadata. If hardware issue have been ruled out, what I would do is to copy data over to a different dataset (e.g. Build.new, then validate the data copied, then destroy the current Build dataset, rename Build.new to Build). > if its finding corrupted files that appear to not have a directory entry > associated with them (unlinked files), why doesn't it just delete them? > fsck asks you if you want to delete unlinked files, why doesn't zfs do the > same, or at least give you the option of deleting bad files when it finds > them? Normally, ZFS do tell you which files are corrupted, sometimes it takes time since your file might be present in multiple snapshots, and the current set of utilities only gives you one reference for the file's name, and you may need to remove the file (or the snapshot containing it), scrub, then remove the newly revealed reference, etc. Your case seems to be very serious that I really think there are some metadata corruption, which are serious enough that they are already beyond fix. ZFS replicates metadata into different locations, but that does not prevent it from being corrupted in memory. In these situations you will have to use a backup. > this is causing a lot of down time, and its making linux look very > attractive in my organization. how do I get this untangled short of > reformatting and starting over? Linux does not have comparable end-to-end data validation ability that ZFS offers. Use caution if you go that route. > ron. > > > ________________________________ > From: "Xin LI" > To: rondzierwa@comcast.net > Cc: "Steven Hartland" , freebsd-fs@freebsd.org > Sent: Wednesday, June 20, 2012 6:56:09 PM > > Subject: Re: ZFS Checksum errors > > On Wed, Jun 20, 2012 at 1:55 PM, wrote: >> Steve. >> >> well, it got done, and it found another anonymous file with errors . any >> idea how to get rid of these? > > Normally you need to "zpool clear zfsPool", and rerun zpool scrub. If > you see these numbers growing again, it's likely that there are some > other problems with your hardware. The recommended configuration is > to use ZFS to manage disks, or at least split your RAID volumes into > smaller ones by the way, since otherwise the volume is seen as a > "single disk" to ZFS, making it impossible to repair data errors > unless you add additional redundancy (zfs set copies=2, etc). > >> >> thanks, >> ron. >> >> >> >> phoenix# zpool status -v zfsPool >> pool: zfsPool >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise restore the >> entire pool from backup. >> see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: scrub completed after 8h29m with 6276 errors on Wed Jun 20 16:18:01 >> 2012 >> config: >> >> NAME STATE READ WRITE CKSUM >> zfsPool ONLINE 0 0 6.17K >> da0 ONLINE 0 0 13.0K 1.34M repaired >> >> errors: Permanent errors have been detected in the following files: >> >> zfsPool/raid:<0x9e241> >> zfsPool/Build:<0x0> >> phoenix# >> >> >> >> >> ----- Original Message ----- >> From: "Steven Hartland" >> To: rondzierwa@comcast.net, freebsd-fs@freebsd.org >> Sent: Wednesday, June 20, 2012 1:58:20 PM >> Subject: Re: ZFS Checksum errors >> >> ----- Original Message ----- >> From: >> .. >> >>> zpool status indicates that a file has errors, but doesn't tell me its >>> name: >>> >>> phoenix# zpool status -v zfsPool >>> pool: zfsPool >>> state: ONLINE >>> status: One or more devices has experienced an error resulting in data >>> corruption. Applications may be affected. >>> action: Restore the file in question if possible. Otherwise restore the >>> entire pool from backup. >>> see: http://www.sun.com/msg/ZFS-8000-8A >>> scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go >> >> Try waiting for the scrub to complete and see if its more helpful after >> that. >> >> Regards >> Steve >> >> ================================================ >> This e.mail is private and confidential between Multiplay (UK) Ltd. and >> the person or entity to whom it is addressed. In the event of misdirection, >> the recipient is prohibited from using, copying, printing or otherwise >> disseminating it or any information contained in it. >> >> In the event of misdirection, illegible or incomplete transmission please >> telephone +44 845 868 1337 >> or return the E.mail to postmaster@multiplay.co.uk. >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > > -- > Xin LI https://www.delphij.net/ > FreeBSD - The Power to Serve! Live free or die -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die From owner-freebsd-fs@FreeBSD.ORG Fri Jun 22 00:23:26 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C47D21065674 for ; Fri, 22 Jun 2012 00:23:26 +0000 (UTC) (envelope-from rincebrain@gmail.com) Received: from mail-qa0-f51.google.com (mail-qa0-f51.google.com [209.85.216.51]) by mx1.freebsd.org (Postfix) with ESMTP id 75E558FC08 for ; Fri, 22 Jun 2012 00:23:26 +0000 (UTC) Received: by qaea16 with SMTP id a16so57894qae.17 for ; Thu, 21 Jun 2012 17:23:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=lY19PxRk6bVjJ6ht4PcelV4kcKZUHxZiEg5xeMlohYQ=; b=Oc96gh/o+xMlabBPByYYAeVxSZwS+ab3vHBSCrainhRytPhNYJMPTOYVssHsIGGzYc +t+L1eJC8oqeA9uyVWtfiplf7oPEIBkrUoe8Z3pD8fNbaDWAfr2JdyObSSFfUVZBJjyi XCwifXpUjC9jcGDbVjyV9RBVIzYGCSXBiAilqqZ5bLMT5HP9PTPPO/ns9wUz21RdBo/n ohTB6/CH33dHxsJnUu6YbqUT/y80eRNlM+cn+kWFwaz+0aDT3xCvUhtsFPoiMoZGKNL/ /MBKONfgXE7oREtn3Zzsn5G3KIFJie3fjHrOrgdI6DpAV6jLqlfJyLOZNYh8g4ZRVOFq dDCQ== MIME-Version: 1.0 Received: by 10.224.106.136 with SMTP id x8mr3105162qao.12.1340324599772; Thu, 21 Jun 2012 17:23:19 -0700 (PDT) Sender: rincebrain@gmail.com Received: by 10.229.250.6 with HTTP; Thu, 21 Jun 2012 17:23:19 -0700 (PDT) In-Reply-To: <178486397.30705.1340324395308.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> References: <178486397.30705.1340324395308.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> Date: Thu, 21 Jun 2012 20:23:19 -0400 X-Google-Sender-Auth: NSAaK4p8yMSijhwxIAp8GyUp0fI Message-ID: From: Rich To: rondzierwa@comcast.net Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jun 2012 00:23:26 -0000 What we're telling you is that: - if ZFS reports errors on a scrub, then - you clear and rescrub, then - find more errors your problem is _not_ gone. - Rich On Thu, Jun 21, 2012 at 8:19 PM, wrote: > Guys I want to thank you all for the attention to my problem. but i think we are > barking up the wrong bug chasing an ongoing hardware problem. > > I have no doubt that the problem was most likely caused by a hardware failure. But > probably not because of memory or processor (my cpu rev was not affected by the two > problems you mentioned, and I ran memory pattern tests on this system for days before > i started using it, and ran pattern tests on the raw raid before putting zfs on it in order > to generate baseline performance metrics). > > Three days ago I was running a disk pattern generator/checker to determine > performance metrics on the disk array with ZFS. The test was configured to > operate with a pair of 1TB files, writing one while checking the previous one. > During the first file creation, the raid controller began complaining about slot 1 > (removed, reset, replaced, removed, reset, replaced, etc). I stopped the test, > reseated the connector on the drive, and the complaints stopped. I started the > pattern checker to look at the fragment of the file that it created (about 200gig) > and that was when ZFS began complaining about checksum errors. I did a > zpool stat, and the first of the two files (the one on "/raid") had a name, and it > was the pattern checker file. So, I did an rm on the pattern checker file, and > ZFS took off producing checksum errors on the console, and I was left with the > orphan file. I ran the zpool scrub, and the second file turned up. So, thinking > that there was something foul on the underlying array, i did a verify. and It turned > up a couple of errors that it fixed on the drive in slot 1. so I did the zpool clear, > ran scrub again, with no better results. > > now to the present. Yes. it was undoubtedly caused by a hardware problem. > But I do not believe that it is an ongoing problem. There were physical disk > errors while I was trying to create the pattern file, and I now have a corrupted file. > These things happen, but in a production environment, we have to be able to > fix the resulting mess without starting over. > > I am willing to bet that the checksum errors are related to the pattern checker > file listed as the file that has uncorrectable errors that was being created when > the disk errors occurred. if I was forced to guess, i would expect that not only > are there errors in the data, but that some of the block pointers reference space > that is either in other files, or not withing the space of the raid at all. i'm sure we > have all seen this kind of filesystem corruption before. it used to be as simple > as running fsck and letting it untangle the bogus file. > > The remainder of the array appears to function normally, the system is still in > production, but in a read-only capacity. There are some 6tb of various media > and other files, and they all seem to be accessible. its just these two files that > are corrupted. So, how do i "fsck" a zfs volume, remove the bogus files, and > get on with my otherwise boring, uneventful life?? > > > thanks again, > ron. > > > > ----- Original Message ----- > From: "Xin LI" > To: rondzierwa@comcast.net > Cc: "Steven Hartland" , freebsd-fs@freebsd.org > Sent: Thursday, June 21, 2012 6:52:06 PM > Subject: Re: ZFS Checksum errors > > Hi, > > On Thu, Jun 21, 2012 at 2:48 PM, wrote: >> >> ok, i ran a verify on the raid, and it completed, so I believe that, from >> the hardware standpoint, da0 should be a functioning, 12TB disk. >> >> i did a zpool clear and re-ran the scrub, and the results were almost >> identical: > [...] >> config: >> >> NAME STATE READ WRITE CKSUM >> zfsPool ONLINE 0 0 6.20K >> da0 ONLINE 0 0 12.5K 24K repaired > > This is very likely be a hardware issue, or a driver issue (less > likely, since we have done extensive testing on this RAID card and the > problems are believed to fixed years ago). > > There are however a few erratums from AMD that makes me feel quite concerned: > > http://support.amd.com/us/Embedded_TechDocs/41322.pdf > > Specifically speaking, #264, #298 seems quite serious. How old is > your motherboard BIOS? Are you using ECC memory by the way? > >> errors: Permanent errors have been detected in the following files: >> >> zfsPool/raid:<0x9e241> >> zfsPool/Build:<0x0> >> phoenix# >> >> along with the 6,353 I/O errors, there were over 12,000 checksum mismatch >> errors on the console. >> >> >> The recommendation from ZFS is to restore the file in question. At this >> point, I would just like to delete the two files. >> how do i do that? >> >> its these kind of antics that make me resistant to the thought of allowing >> ZFS to manage the raid. it seems to be having problems just managing a big >> file system. I don't want it to correct anything, or restore anything, just >> let me delete the files that hurt, fix up the free space list so it doesn't >> point outside the bounds of the disk, and get on with life. > > Are you *really* sure that these are files? The second one doesn't > seem to be a file, but rather some metadata. > > If hardware issue have been ruled out, what I would do is to copy data > over to a different dataset (e.g. Build.new, then validate the data > copied, then destroy the current Build dataset, rename Build.new to > Build). > >> if its finding corrupted files that appear to not have a directory entry >> associated with them (unlinked files), why doesn't it just delete them? >> fsck asks you if you want to delete unlinked files, why doesn't zfs do the >> same, or at least give you the option of deleting bad files when it finds >> them? > > Normally, ZFS do tell you which files are corrupted, sometimes it > takes time since your file might be present in multiple snapshots, and > the current set of utilities only gives you one reference for the > file's name, and you may need to remove the file (or the snapshot > containing it), scrub, then remove the newly revealed reference, etc. > > Your case seems to be very serious that I really think there are some > metadata corruption, which are serious enough that they are already > beyond fix. ZFS replicates metadata into different locations, but > that does not prevent it from being corrupted in memory. In these > situations you will have to use a backup. > >> this is causing a lot of down time, and its making linux look very >> attractive in my organization. how do I get this untangled short of >> reformatting and starting over? > > Linux does not have comparable end-to-end data validation ability that > ZFS offers. Use caution if you go that route. > >> ron. >> >> >> ________________________________ >> From: "Xin LI" >> To: rondzierwa@comcast.net >> Cc: "Steven Hartland" , freebsd-fs@freebsd.org >> Sent: Wednesday, June 20, 2012 6:56:09 PM >> >> Subject: Re: ZFS Checksum errors >> >> On Wed, Jun 20, 2012 at 1:55 PM, wrote: >>> Steve. >>> >>> well, it got done, and it found another anonymous file with errors . any >>> idea how to get rid of these? >> >> Normally you need to "zpool clear zfsPool", and rerun zpool scrub. If >> you see these numbers growing again, it's likely that there are some >> other problems with your hardware. The recommended configuration is >> to use ZFS to manage disks, or at least split your RAID volumes into >> smaller ones by the way, since otherwise the volume is seen as a >> "single disk" to ZFS, making it impossible to repair data errors >> unless you add additional redundancy (zfs set copies=2, etc). >> >>> >>> thanks, >>> ron. >>> >>> >>> >>> phoenix# zpool status -v zfsPool >>> pool: zfsPool >>> state: ONLINE >>> status: One or more devices has experienced an error resulting in data >>> corruption. Applications may be affected. >>> action: Restore the file in question if possible. Otherwise restore the >>> entire pool from backup. >>> see: http://www.sun.com/msg/ZFS-8000-8A >>> scrub: scrub completed after 8h29m with 6276 errors on Wed Jun 20 16:18:01 >>> 2012 >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> zfsPool ONLINE 0 0 6.17K >>> da0 ONLINE 0 0 13.0K 1.34M repaired >>> >>> errors: Permanent errors have been detected in the following files: >>> >>> zfsPool/raid:<0x9e241> >>> zfsPool/Build:<0x0> >>> phoenix# >>> >>> >>> >>> >>> ----- Original Message ----- >>> From: "Steven Hartland" >>> To: rondzierwa@comcast.net, freebsd-fs@freebsd.org >>> Sent: Wednesday, June 20, 2012 1:58:20 PM >>> Subject: Re: ZFS Checksum errors >>> >>> ----- Original Message ----- >>> From: >>> .. >>> >>>> zpool status indicates that a file has errors, but doesn't tell me its >>>> name: >>>> >>>> phoenix# zpool status -v zfsPool >>>> pool: zfsPool >>>> state: ONLINE >>>> status: One or more devices has experienced an error resulting in data >>>> corruption. Applications may be affected. >>>> action: Restore the file in question if possible. Otherwise restore the >>>> entire pool from backup. >>>> see: http://www.sun.com/msg/ZFS-8000-8A >>>> scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go >>> >>> Try waiting for the scrub to complete and see if its more helpful after >>> that. >>> >>> Regards >>> Steve >>> >>> ================================================ >>> This e.mail is private and confidential between Multiplay (UK) Ltd. and >>> the person or entity to whom it is addressed. In the event of misdirection, >>> the recipient is prohibited from using, copying, printing or otherwise >>> disseminating it or any information contained in it. >>> >>> In the event of misdirection, illegible or incomplete transmission please >>> telephone +44 845 868 1337 >>> or return the E.mail to postmaster@multiplay.co.uk. >>> >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> >> >> >> -- >> Xin LI https://www.delphij.net/ >> FreeBSD - The Power to Serve! Live free or die > > > > -- > Xin LI https://www.delphij.net/ > FreeBSD - The Power to Serve! Live free or die > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Fri Jun 22 00:30:35 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 56AFF1065679 for ; Fri, 22 Jun 2012 00:30:35 +0000 (UTC) (envelope-from rondzierwa@comcast.net) Received: from qmta09.westchester.pa.mail.comcast.net (qmta09.westchester.pa.mail.comcast.net [76.96.62.96]) by mx1.freebsd.org (Postfix) with ESMTP id EF5848FC0C for ; Fri, 22 Jun 2012 00:30:34 +0000 (UTC) Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88]) by qmta09.westchester.pa.mail.comcast.net with comcast id R03a1j0071uE5Es59CWahQ; Fri, 22 Jun 2012 00:30:34 +0000 Received: from sz0192.wc.mail.comcast.net ([76.96.59.160]) by omta16.westchester.pa.mail.comcast.net with comcast id RCWb1j01z3TRaxG3cCWbuu; Fri, 22 Jun 2012 00:30:35 +0000 Date: Fri, 22 Jun 2012 00:30:33 +0000 (UTC) From: rondzierwa@comcast.net To: Rich Message-ID: <467652020.30738.1340325033684.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> In-Reply-To: MIME-Version: 1.0 X-Originating-IP: [68.50.136.212] X-Mailer: Zimbra 6.0.13_GA_2944 (ZimbraWebClient - FF3.0 (Win)/6.0.13_GA_2944) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jun 2012 00:30:35 -0000 yeah i get that. i guess i'm wondering what it will take to make the problem gone. how many times should i have to iterate the scrub, and if there isn't something that I can do immediately to remove the files that are corrupted. the problem was created by a disk error, that is no longer happening, but now I have this corrupted file. how do i clean up the mess? the scrub takes hours, and there are folks that are watching. i'm working on the third iteration of clear and scrub, how many times should it take? I can be patient, but it would be nice if i had an answer for the folks that keep asking "are we there yet?". ----- Original Message ----- From: "Rich" To: rondzierwa@comcast.net Cc: "Xin LI" , freebsd-fs@freebsd.org Sent: Thursday, June 21, 2012 8:23:19 PM Subject: Re: ZFS Checksum errors What we're telling you is that: - if ZFS reports errors on a scrub, then - you clear and rescrub, then - find more errors your problem is _not_ gone. - Rich On Thu, Jun 21, 2012 at 8:19 PM, wrote: > Guys I want to thank you all for the attention to my problem. but i think we are > barking up the wrong bug chasing an ongoing hardware problem. > > I have no doubt that the problem was most likely caused by a hardware failure. But > probably not because of memory or processor (my cpu rev was not affected by the two > problems you mentioned, and I ran memory pattern tests on this system for days before > i started using it, and ran pattern tests on the raw raid before putting zfs on it in order > to generate baseline performance metrics). > > Three days ago I was running a disk pattern generator/checker to determine > performance metrics on the disk array with ZFS. The test was configured to > operate with a pair of 1TB files, writing one while checking the previous one. > During the first file creation, the raid controller began complaining about slot 1 > (removed, reset, replaced, removed, reset, replaced, etc). I stopped the test, > reseated the connector on the drive, and the complaints stopped. I started the > pattern checker to look at the fragment of the file that it created (about 200gig) > and that was when ZFS began complaining about checksum errors. I did a > zpool stat, and the first of the two files (the one on "/raid") had a name, and it > was the pattern checker file. So, I did an rm on the pattern checker file, and > ZFS took off producing checksum errors on the console, and I was left with the > orphan file. I ran the zpool scrub, and the second file turned up. So, thinking > that there was something foul on the underlying array, i did a verify. and It turned > up a couple of errors that it fixed on the drive in slot 1. so I did the zpool clear, > ran scrub again, with no better results. > > now to the present. Yes. it was undoubtedly caused by a hardware problem. > But I do not believe that it is an ongoing problem. There were physical disk > errors while I was trying to create the pattern file, and I now have a corrupted file. > These things happen, but in a production environment, we have to be able to > fix the resulting mess without starting over. > > I am willing to bet that the checksum errors are related to the pattern checker > file listed as the file that has uncorrectable errors that was being created when > the disk errors occurred. if I was forced to guess, i would expect that not only > are there errors in the data, but that some of the block pointers reference space > that is either in other files, or not withing the space of the raid at all. i'm sure we > have all seen this kind of filesystem corruption before. it used to be as simple > as running fsck and letting it untangle the bogus file. > > The remainder of the array appears to function normally, the system is still in > production, but in a read-only capacity. There are some 6tb of various media > and other files, and they all seem to be accessible. its just these two files that > are corrupted. So, how do i "fsck" a zfs volume, remove the bogus files, and > get on with my otherwise boring, uneventful life?? > > > thanks again, > ron. > > > > ----- Original Message ----- > From: "Xin LI" > To: rondzierwa@comcast.net > Cc: "Steven Hartland" , freebsd-fs@freebsd.org > Sent: Thursday, June 21, 2012 6:52:06 PM > Subject: Re: ZFS Checksum errors > > Hi, > > On Thu, Jun 21, 2012 at 2:48 PM, wrote: >> >> ok, i ran a verify on the raid, and it completed, so I believe that, from >> the hardware standpoint, da0 should be a functioning, 12TB disk. >> >> i did a zpool clear and re-ran the scrub, and the results were almost >> identical: > [...] >> config: >> >> NAME STATE READ WRITE CKSUM >> zfsPool ONLINE 0 0 6.20K >> da0 ONLINE 0 0 12.5K 24K repaired > > This is very likely be a hardware issue, or a driver issue (less > likely, since we have done extensive testing on this RAID card and the > problems are believed to fixed years ago). > > There are however a few erratums from AMD that makes me feel quite concerned: > > http://support.amd.com/us/Embedded_TechDocs/41322.pdf > > Specifically speaking, #264, #298 seems quite serious. How old is > your motherboard BIOS? Are you using ECC memory by the way? > >> errors: Permanent errors have been detected in the following files: >> >> zfsPool/raid:<0x9e241> >> zfsPool/Build:<0x0> >> phoenix# >> >> along with the 6,353 I/O errors, there were over 12,000 checksum mismatch >> errors on the console. >> >> >> The recommendation from ZFS is to restore the file in question. At this >> point, I would just like to delete the two files. >> how do i do that? >> >> its these kind of antics that make me resistant to the thought of allowing >> ZFS to manage the raid. it seems to be having problems just managing a big >> file system. I don't want it to correct anything, or restore anything, just >> let me delete the files that hurt, fix up the free space list so it doesn't >> point outside the bounds of the disk, and get on with life. > > Are you *really* sure that these are files? The second one doesn't > seem to be a file, but rather some metadata. > > If hardware issue have been ruled out, what I would do is to copy data > over to a different dataset (e.g. Build.new, then validate the data > copied, then destroy the current Build dataset, rename Build.new to > Build). > >> if its finding corrupted files that appear to not have a directory entry >> associated with them (unlinked files), why doesn't it just delete them? >> fsck asks you if you want to delete unlinked files, why doesn't zfs do the >> same, or at least give you the option of deleting bad files when it finds >> them? > > Normally, ZFS do tell you which files are corrupted, sometimes it > takes time since your file might be present in multiple snapshots, and > the current set of utilities only gives you one reference for the > file's name, and you may need to remove the file (or the snapshot > containing it), scrub, then remove the newly revealed reference, etc. > > Your case seems to be very serious that I really think there are some > metadata corruption, which are serious enough that they are already > beyond fix. ZFS replicates metadata into different locations, but > that does not prevent it from being corrupted in memory. In these > situations you will have to use a backup. > >> this is causing a lot of down time, and its making linux look very >> attractive in my organization. how do I get this untangled short of >> reformatting and starting over? > > Linux does not have comparable end-to-end data validation ability that > ZFS offers. Use caution if you go that route. > >> ron. >> >> >> ________________________________ >> From: "Xin LI" >> To: rondzierwa@comcast.net >> Cc: "Steven Hartland" , freebsd-fs@freebsd.org >> Sent: Wednesday, June 20, 2012 6:56:09 PM >> >> Subject: Re: ZFS Checksum errors >> >> On Wed, Jun 20, 2012 at 1:55 PM, wrote: >>> Steve. >>> >>> well, it got done, and it found another anonymous file with errors . any >>> idea how to get rid of these? >> >> Normally you need to "zpool clear zfsPool", and rerun zpool scrub. If >> you see these numbers growing again, it's likely that there are some >> other problems with your hardware. The recommended configuration is >> to use ZFS to manage disks, or at least split your RAID volumes into >> smaller ones by the way, since otherwise the volume is seen as a >> "single disk" to ZFS, making it impossible to repair data errors >> unless you add additional redundancy (zfs set copies=2, etc). >> >>> >>> thanks, >>> ron. >>> >>> >>> >>> phoenix# zpool status -v zfsPool >>> pool: zfsPool >>> state: ONLINE >>> status: One or more devices has experienced an error resulting in data >>> corruption. Applications may be affected. >>> action: Restore the file in question if possible. Otherwise restore the >>> entire pool from backup. >>> see: http://www.sun.com/msg/ZFS-8000-8A >>> scrub: scrub completed after 8h29m with 6276 errors on Wed Jun 20 16:18:01 >>> 2012 >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> zfsPool ONLINE 0 0 6.17K >>> da0 ONLINE 0 0 13.0K 1.34M repaired >>> >>> errors: Permanent errors have been detected in the following files: >>> >>> zfsPool/raid:<0x9e241> >>> zfsPool/Build:<0x0> >>> phoenix# >>> >>> >>> >>> >>> ----- Original Message ----- >>> From: "Steven Hartland" >>> To: rondzierwa@comcast.net, freebsd-fs@freebsd.org >>> Sent: Wednesday, June 20, 2012 1:58:20 PM >>> Subject: Re: ZFS Checksum errors >>> >>> ----- Original Message ----- >>> From: >>> .. >>> >>>> zpool status indicates that a file has errors, but doesn't tell me its >>>> name: >>>> >>>> phoenix# zpool status -v zfsPool >>>> pool: zfsPool >>>> state: ONLINE >>>> status: One or more devices has experienced an error resulting in data >>>> corruption. Applications may be affected. >>>> action: Restore the file in question if possible. Otherwise restore the >>>> entire pool from backup. >>>> see: http://www.sun.com/msg/ZFS-8000-8A >>>> scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go >>> >>> Try waiting for the scrub to complete and see if its more helpful after >>> that. >>> >>> Regards >>> Steve >>> >>> ================================================ >>> This e.mail is private and confidential between Multiplay (UK) Ltd. and >>> the person or entity to whom it is addressed. In the event of misdirection, >>> the recipient is prohibited from using, copying, printing or otherwise >>> disseminating it or any information contained in it. >>> >>> In the event of misdirection, illegible or incomplete transmission please >>> telephone +44 845 868 1337 >>> or return the E.mail to postmaster@multiplay.co.uk. >>> >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> >> >> >> -- >> Xin LI https://www.delphij.net/ >> FreeBSD - The Power to Serve! Live free or die > > > > -- > Xin LI https://www.delphij.net/ > FreeBSD - The Power to Serve! Live free or die > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Fri Jun 22 01:13:40 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EF847106566C for ; Fri, 22 Jun 2012 01:13:39 +0000 (UTC) (envelope-from freebsd@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id A015B8FC0A for ; Fri, 22 Jun 2012 01:13:39 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q5M1DSEj009969; Thu, 21 Jun 2012 18:13:29 -0700 (PDT) (envelope-from freebsd@pki2.com) From: Dennis Glatting To: Steven Hartland In-Reply-To: <7CD42032309D4072A0EC3B0187378658@multiplay.co.uk> References: <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> <7CD42032309D4072A0EC3B0187378658@multiplay.co.uk> Content-Type: text/plain; charset="ISO-8859-1" Date: Thu, 21 Jun 2012 18:13:28 -0700 Message-ID: <1340327608.669.2.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q5M1DSEj009969 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: freebsd@pki2.com Cc: freebsd-fs@freebsd.org, Rich , rondzierwa@comcast.net Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jun 2012 01:13:40 -0000 On Thu, 2012-06-21 at 23:04 +0100, Steven Hartland wrote: > It could also be issues with the hardware above the raid controller, e.g. main memory. > > Hardware raid check doesn't really tell you too much where as zfs checks are end to end so > will detect intermediate issues like memory or cpu problems. > Is SMART running? I found smartmontools useful in telling me of disk troubles, such as increasing uncorrectable errors. In each case where SMART reported this (about ten disks in the last two years), the disk was indeed failing and on replacement the problems went away. > We've even had a few issues in the past where the raid controller thought everything was > fine but zfs was complaining and it did turn out to be disks on the way out. > > So at this point I wouldn't rule out anything. > > Regards > Steve > ----- Original Message ----- > From: "Rich" > To: > Cc: > Sent: Thursday, June 21, 2012 10:51 PM > Subject: Re: ZFS Checksum errors > > > To be honest, if ZFS says you've got a ton of checksum errors, I would > strongly bet in favor of your data being damaged over a bug in ZFS. > > What're the underlying disks and RAID card? > > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. > > In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Fri Jun 22 07:49:29 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E8B621065670 for ; Fri, 22 Jun 2012 07:49:29 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.187]) by mx1.freebsd.org (Postfix) with ESMTP id 8B7FB8FC1C for ; Fri, 22 Jun 2012 07:49:29 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mreu1) with ESMTP (Nemesis) id 0M4miX-1RxBpQ0z3l-00zUA4; Fri, 22 Jun 2012 09:49:25 +0200 Message-ID: <4FE42384.8030608@brockmann-consult.de> Date: Fri, 22 Jun 2012 09:49:24 +0200 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120421 Thunderbird/12.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <1610905794.19241.1340212823047.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> In-Reply-To: <1610905794.19241.1340212823047.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> X-Enigmail-Version: 1.4.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:8kbBnV6V4LJ1MabvbLsEqSRNL3hvT+oTPoy0zWTq0nz +vtaL3L+U8awC980h4gQNnLAtH5PHFcgi+ojSmmCpxl+wsQCw8 UzUGw3a6IZJkmqZ/GrGz9XkidbeOFCFN/mdpMhGryXj+m81XnR 8b2YviBVAAUKb9vFbTQrqiRWLA3MUYSkNUqFQfg4T/mEv076m6 IspXc2taTaxNL+a0Oc4rew4hJGfSybIu6xNuVEV35XZA1Z3zsv ERsnOTp/NTLgtSUI9eycZ+KeRZ7b6oOEOZs8U9Qhv/ChpPBQK0 X1eCXJC+3VliH6IoKEyoYvnQJmL/7VGk3Tk6Gqn7FK+FKkFMvQ ADxue80Nfnu9Q4LUjaGaLoffp9F76jVeTQNrHoDWq Cc: rondzierwa@comcast.net Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jun 2012 07:49:30 -0000 In case nobody mentioned it yet, 8.2-RELEASE was a very bad ZFS release. 8.2-STABLE around Sept 2011 was good (with buggy zvols still http://www.freebsd.org/cgi/query-pr.cgi?pr=161968). To anyone with problems with 8.2-RELEASE, I always recommend upgrading the release, creating a new pool, using zfs send to copy the old, and destroying the old pool. All the zfs experience in the world will not be enough to deal with a messed up pool that is only messed up due bugs specific to a single short lived release. Using zfs send should only replicate the things you snapshotted, which excludes that broken file reported in "zpool status -v". I discovered very minor problems in my pool with 8.2-RELEASE, and others reported problems with not being able to remove logs, but being able to OFFLINE them and run degraded... so I preemptively destroyed it. In my case I did it with consumer disks, and then again back to the enterprise disks. And if you do it this way (twice) then you also only need as many disks as it takes to fit your data, rather than the same number. On 06/20/2012 07:20 PM, rondzierwa@comcast.net wrote: > Greetings, > > I have a zfs filesystem on an 8.2-release amd64 system. hardware is amd phenom 964 with 8gb memory, 3ware 9650 controller with 8x seagate ST2000DL003 drives. the disks are configured in a raid-5, and present one device to the system. > > Early today I got some checksum and i/o errors on the console: > Jun 20 07:33:43 phoenix root: ZFS: checksum mismatch, zpool=zfsPool path=/dev/da0 offset=7698387574272 size=9728 > Jun 20 07:33:43 phoenix root: ZFS: checksum mismatch, zpool=zfsPool path=/dev/da0 offset=7698387564544 size=9728 > Jun 20 07:33:43 phoenix root: ZFS: zpool I/O failure, zpool=zfsPool error=86 > Jun 20 07:33:43 phoenix root: ZFS: checksum mismatch, zpool=zfsPool path=/dev/da0 offset=7698387574272 size=9728 > Jun 20 07:33:43 phoenix root: ZFS: zpool I/O failure, zpool=zfsPool error=86 > > > > So I ran a scrub, after a couple of hours i got a pile of checksum errors that looked rather similar: > > Jun 20 12:45:24 phoenix root: ZFS: checksum mismatch, zpool=zfsPool path=/dev/da0 offset=560450768384 size=4096 > > > zpool status indicates that a file has errors, but doesn't tell me its name: > > phoenix# zpool status -v zfsPool > pool: zfsPool > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go > config: > > NAME STATE READ WRITE CKSUM > zfsPool ONLINE 0 0 38 > da0 ONLINE 0 0 434 1.06M repaired > > errors: Permanent errors have been detected in the following files: > > zfsPool/raid:<0x9e241> > phoenix# > > > How can I locate and get rid of the offending file? > > thanks, > ron. > > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de -------------------------------------------- From owner-freebsd-fs@FreeBSD.ORG Fri Jun 22 07:51:49 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2489B106564A for ; Fri, 22 Jun 2012 07:51:49 +0000 (UTC) (envelope-from icameto@gmail.com) Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by mx1.freebsd.org (Postfix) with ESMTP id 99BDE8FC1D for ; Fri, 22 Jun 2012 07:51:48 +0000 (UTC) Received: by wibhm11 with SMTP id hm11so273489wib.13 for ; Fri, 22 Jun 2012 00:51:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=HVlKPLJVfpUTvd9mG7npnl6GA/2ZIs+K7vUej0OPfl4=; b=O/i/uX7lP/8WIGQMchCOjnjRqioUyMRaHGk0DilVCA0dbxSxfjNYlDnymfUadG90cl KAGfN/fnaV4VyA9KRQ8pb2Qy2Kyu2JDM+yIy36zpptaGbUp42vt/vigwtwMlO2fNW/Ff xOz0XFVkh73nt2ANaiucp56IUBrd0oCjJFlmAwrDEIyN6ODyJfSWNJZSULTVH6x5+xdW 4OWBic3K8s6F1PCJhqq7rWJlvQ7SIdUdTmY+/0J7Opph4nUUEo9Q5g0d45RiBk5o2MYV qHLmxkigj/ymCfpswxVOpE86yNaJMU/hKtt98xDKtQ4kgoxzbpP9Knzat0lBHkSc+VTF ZIzA== MIME-Version: 1.0 Received: by 10.216.198.164 with SMTP id v36mr649447wen.199.1340351507569; Fri, 22 Jun 2012 00:51:47 -0700 (PDT) Received: by 10.216.224.228 with HTTP; Fri, 22 Jun 2012 00:51:47 -0700 (PDT) In-Reply-To: <20120621131443.59eb24f3@fabiankeil.de> References: <20120621131443.59eb24f3@fabiankeil.de> Date: Fri, 22 Jun 2012 10:51:47 +0300 Message-ID: From: icameto icameto To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: ZFS Encryption with GELI for only /opt partition X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jun 2012 07:51:49 -0000 So much thanks Fabian, especially for yours quick answer and concern. I run "zpool export opt" and I would like to explain it clearly. There will be one disk which will be used for /opt partition as encrypted. Previously in UFS I was able to detach the opt partition by using GEOM BDE module via these steps. * # kldload geom_bde # mkdir /etc/gbde # gbde init /dev/ad0s1e -i -L /etc/gbde/ad0s1e.lock # gbde attach /dev/ad0s1e -l /etc/gbde/ad0s1e.lock # newfs -U -O2 /dev/ad0s1e.bde # mkdir /encryptedfs # mount /dev/ad0s1e.bde /encryptedfs # gbde detach /dev/ad0s1e # umount /encyrptedfs* Briefly I want to be able to unmount and mount capabilities without harming the datasets in pool of ZFS while using ZFS with GELI for encyptioning purpose. And you know i m capable of unmount the disk(da1.bde etc. ) from /opt mount point while I was using GEOM BDE. When I unmounted this disk(da1.bde), I could use da1 for /opt mount point without any data or dataset loosing . Dear Fabian, I have tried to exporting pool from ZFS, and you right that now i can detach from pool. But when I tried to import the old "opt" pool,I'm getting a warn "cannot import 'opt': no such pool available" about importing process. # geli status Name Status Components da1.eli ACTIVE da1 You said that ZFS and GELI are not thigtly integrated. But is that possible detaching and making inaccessible da1.eli device or making offline ZFS pool temporarily until attached properly with entering passphrase again for making accessible on mount point /opt (ZFS Pool) for this case ? Finally, I can create a script which will be working like a charm. I'm really curios about creating encrypted ZFS pool(for opt) with attaching and detaching capabilities. I guess that I'm doing an error on steps or logical mistake. Could you please help me to handle this issue or steps ? Thanks in advance Sincerely 2012/6/21 Fabian Keil > icameto icameto wrote: > > > I have some problems with ZFS encryption and GELI. I used ZFS for /opt > > partition(da1.eli which is encrypted form of seperate da1 disk ). And I > > want to encrypt the /opt partition by using GELI. My disks states' like > > below > > > > *# kldstat* > > Id Refs Address Size Name > > 1 15 0xffffffff80100000 c9fe20 kernel > > 2 1 0xffffffff80da0000 1ad0e0 zfs.ko > > 3 2 0xffffffff80f4e000 3a68 opensolaris.ko > > 4 1 0xffffffff80f52000 1cdc0 geom_eli.ko > > 5 2 0xffffffff80f6f000 2b0b8 crypto.ko > > 6 2 0xffffffff80f9b000 dc40 zlib.ko > > > > > > *# cat /etc/rc.conf | grep geli * > > geli_devices="da1" > > geli_da1_flags="-k /root/da1.key" > > #geli_detach="NO" > > > > > > *# zpool status* > > pool: opt > > state: ONLINE > > scrub: none requested > > config: > > > > NAME STATE READ WRITE CKSUM > > opt ONLINE 0 0 0 > > da1.eli ONLINE 0 0 0 > > > > errors: No known data errors > > > > *# geli status* > > Name Status Components > > da1.eli ACTIVE da1 > > > > *# df -h* > > Filesystem Size Used Avail Capacity Mounted on > > /dev/da0s1a 9.7G 280M 8.6G 3% / > > devfs 1.0K 1.0K 0B 100% /dev > > /dev/da0s1d 15G 734M 14G 5% /usr > > opt 7.8G 120K 7.8G 0% /opt > > > > > > *# geli detach da1.eli* > > geli: Cannot destroy device da1.eli (error=16). > > > > *# zfs unmount -a* > > > > *# df -h* > > Filesystem Size Used Avail Capacity Mounted on > > /dev/da0s1a 9.7G 280M 8.6G 3% / > > devfs 1.0K 1.0K 0B 100% /dev > > /dev/da0s1d 15G 734M 14G 5% /usr > > > > *# geli detach da1.eli* > > geli: Cannot destroy device da1.eli (error=16). > > This doesn't work because the pool is still imported. > Try running "zpool export opt" first, it will automatically > unmount the datasets so you can skip the "zfs unmount -a". > > > When I use "zfs mount -a" command there must be prompted for entering > > passphrase, but it immediately mounted by zfs without prompting anything. > > As the pool hasn't been exported, that's the expected behaviour. > > Also note that ZFS and geli are not tightly integrated so > "zfs mount -a" will never setup the geli provider for you. > > > *# zfs mount -a* > > > > *# df -h* > > Filesystem Size Used Avail Capacity Mounted on > > /dev/da0s1a 9.7G 280M 8.6G 3% / > > devfs 1.0K 1.0K 0B 100% /dev > > /dev/da0s1d 15G 734M 14G 5% /usr > > opt 7.8G 120K 7.8G 0% /opt > > > > > > But i want to be able to detach encrypted device and remove that from > > zpool as cannot access by anyone. But I got an error when i try to > > detach the device (opt partition) . And I can still access the disk on > > ZFS pool. Isn't it strange buddies ? > > > > Briefly, Is there any solution to detach and unmount encrypted disk for > > only /opt partition(which is in ZFS Pool). Could you please give me > > advice on this progress ? > > I'm not aware of a mechanism in FreeBSD's base system that does > this automatically, but doing it manually (or with a script) should > work. > > Fabian > From owner-freebsd-fs@FreeBSD.ORG Fri Jun 22 07:52:31 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7FAEE106566B; Fri, 22 Jun 2012 07:52:31 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 50DAC8FC0A; Fri, 22 Jun 2012 07:52:31 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q5M7qVnj035821; Fri, 22 Jun 2012 07:52:31 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q5M7qVgB035817; Fri, 22 Jun 2012 07:52:31 GMT (envelope-from linimon) Date: Fri, 22 Jun 2012 07:52:31 GMT Message-Id: <201206220752.q5M7qVgB035817@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/169319: [zfs] zfs resilver can't complete X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jun 2012 07:52:31 -0000 Old Synopsis: zfs resilver can't complete New Synopsis: [zfs] zfs resilver can't complete Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Jun 22 07:52:21 UTC 2012 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=169319 From owner-freebsd-fs@FreeBSD.ORG Fri Jun 22 07:54:42 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 57A781065673 for ; Fri, 22 Jun 2012 07:54:42 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id D1D908FC12 for ; Fri, 22 Jun 2012 07:54:41 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.5/8.14.5) with ESMTP id q5M7sb1I087013 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Fri, 22 Jun 2012 10:54:37 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4FE424BC.5090000@digsys.bg> Date: Fri, 22 Jun 2012 10:54:36 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.5) Gecko/20120607 Thunderbird/10.0.5 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <467652020.30738.1340325033684.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> In-Reply-To: <467652020.30738.1340325033684.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jun 2012 07:54:42 -0000 On 22.06.12 03:30, rondzierwa@comcast.net wrote: > the problem was created by a disk error, that is no longer happening, but now I have this corrupted file. how do i clean up the mess? the scrub takes hours, and there are folks that are watching. i'm working on the third iteration of clear and scrub, how many times should it take? I can be patient, but it would be nice if i had an answer for the folks that keep asking "are we there yet?". The easiest fix to your problem is to - backup all data - destroy the ZFS pool - destroy the RAID volume - create single-disk volumes for each disk or just export disks as JBOD - create your ZFS pool using the individual drives (*) - restore all data - run your tests again You will be able to identify which disk is having problems. Sometimes, problems that you describe are caused by faulty disk. Re-seating the cables (or unplugging and plugging again the hot-swap disk) seem to fix it.. but that is only temporary. Such disks rarely show as 'bad' to "hardware RAID" controllers, but ZFS detects them always. Another "fix" is to stop using ZFS altogether, use some other file system. Do not see any errors anymore. Silently corrupt data. It is your data, your choice. I wouldn't do that. (*) If you have large number of disks, you may wish to label them and use labels instead of 'raw' drive names. You could use either glabel(8) or gpart(8) to create the labels, then use these to build the zpool. If for example you label the disks by their position in the chassis, then you can easily find out which disk to replace from the zpool output. Daniel From owner-freebsd-fs@FreeBSD.ORG Fri Jun 22 08:08:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 95D4D106566C for ; Fri, 22 Jun 2012 08:08:22 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 172178FC15 for ; Fri, 22 Jun 2012 08:08:21 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.5/8.14.5) with ESMTP id q5M7e6oX086975 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Fri, 22 Jun 2012 10:40:06 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4FE42156.3090006@digsys.bg> Date: Fri, 22 Jun 2012 10:40:06 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.5) Gecko/20120607 Thunderbird/10.0.5 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> In-Reply-To: <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jun 2012 08:08:22 -0000 On 22.06.12 00:48, rondzierwa@comcast.net wrote: > its these kind of antics that make me resistant to the thought of allowing ZFS to manage the raid. it seems to be having problems just managing a big file system. I don't want it to correct anything, or restore anything, just let me delete the files that hurt, fix up the free space list so it doesn't point outside the bounds of the disk, and get on with life. Been there, done that... I too believed the hype that "hardware RAID" is the better, more "reliable" solution. But, when a third 3ware RAID array failed on me (I have lots) and had to spend few days and nights to reassemble the pieces, I finally decided to migrate everything to ZFS. In some cases I still do use the expensive 3ware RAID controllers as SLOW multi-port SATA controllers... I have discovered that my disks aren't actually that slow by just using normal HBA instead of "hardware RAID" controllers. I also had few cases, where a disk was considered just perfect, by both the 3ware controller (via any kind of test or verification) and S.M.A.R.T. but produced checksum errors when used with ZFS. I keep one or two such lying around to show to unbelievers. ZFS is way, way, WAY, more reliable for your data than any RAID controller could ever be. The reason is that ZFS checksums each and every block (metadata and data) in memory, before sending it down the pipe to disks and verifies those checksums when data comes back into memory. This is not done by any other system. If your memory is reliable, then you can trust ZFS if it tells you there are checksum errors: these happened somewhere between memory and disks, most probably corrupted RAID controller or on-disk caches, or some flaky bus. If your memory is unreliable, bad luck -- no file system can help. With ZFS over RAID you can only know there are problems "somewhere". With ZFS directly managing your disks, you know exactly which disk or the bus to it is failing and ZFS will automatically correct things. If you have enough redundancy, no data will be damaged. ZFS doesn't really have FAT table or such and free space is managed differently. But you are correct -- there should be tools to fix this kind of corruption. There is instrumentation for this, via zdb, but not enough good documentation and no one-click tool. In any case, you should be more concerned how you got to that corruption. ZFS does not have problem managing big file system. In fact, if anything can manage BIG file system that is ZFS. Your 12TB is in fact, an moderately small filesystem for ZFS -- it's used by way larger installations. Just let ZFS manage the disks directly. Daniel From owner-freebsd-fs@FreeBSD.ORG Fri Jun 22 10:21:16 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 869EA1065672 for ; Fri, 22 Jun 2012 10:21:16 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay02.ispgateway.de (smtprelay02.ispgateway.de [80.67.18.14]) by mx1.freebsd.org (Postfix) with ESMTP id 159118FC17 for ; Fri, 22 Jun 2012 10:21:16 +0000 (UTC) Received: from [87.79.192.146] (helo=fabiankeil.de) by smtprelay02.ispgateway.de with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1Si0zE-0004iB-Ls; Fri, 22 Jun 2012 12:21:08 +0200 Date: Fri, 22 Jun 2012 12:18:40 +0200 From: Fabian Keil To: icameto icameto Message-ID: <20120622121840.14e4f958@fabiankeil.de> In-Reply-To: References: <20120621131443.59eb24f3@fabiankeil.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/g5ft0Am7ltUpxpYbTs7Aa_+"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Encryption with GELI for only /opt partition X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-fs@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jun 2012 10:21:16 -0000 --Sig_/g5ft0Am7ltUpxpYbTs7Aa_+ Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable icameto icameto wrote: > So much thanks Fabian, especially for yours quick answer and concern. I > run "zpool export opt" and I would like to explain it clearly. There > will be one disk which will be used for /opt partition as encrypted. > Previously in UFS I was able to detach the opt partition by using GEOM > BDE module via these steps. > * > # kldload geom_bde > # mkdir /etc/gbde > # gbde init /dev/ad0s1e -i -L /etc/gbde/ad0s1e.lock > # gbde attach /dev/ad0s1e -l /etc/gbde/ad0s1e.lock > # newfs -U -O2 /dev/ad0s1e.bde > # mkdir /encryptedfs > # mount /dev/ad0s1e.bde /encryptedfs > # gbde detach /dev/ad0s1e > # umount /encyrptedfs* Is the order of the last two commands correct? I have no experience with gdbe, but I would expect the detachment to fail if the device is still mounted. The man page seems to at least recommend that the file system is unmounted first as well: | Please notice that detaching an encrypted device | corresponds to physically removing it, do not forget | to unmount the file system first. > Briefly I want to be able to unmount and mount capabilities without > harming the datasets in pool of ZFS while using ZFS with GELI for > encyptioning purpose. And you know i m capable of unmount the > disk(da1.bde etc. ) from /opt mount point while I was using GEOM BDE. > When I unmounted this disk(da1.bde), I could use da1 for /opt mount > point without any data or dataset loosing . Maybe I misunderstand the last sentence, but I don't see how you can mount /opt on da1 directly without corrupting data previously written on da1.bde. > Dear Fabian, I have tried to exporting pool from ZFS, and you right that > now i can detach from pool. But when I tried to import the old "opt" > pool,I'm getting a warn "cannot import 'opt': no such pool available" > about importing process. >=20 > # geli status > Name Status Components > da1.eli ACTIVE da1 How did you recreate da1.eli after detaching it? Did you maybe initialize it again instead of simply attaching it? > You said that ZFS and GELI are not thigtly integrated. But is that > possible detaching and making inaccessible da1.eli device or making > offline ZFS pool temporarily until attached properly with entering > passphrase again for making accessible on mount point /opt (ZFS Pool) > for this case ? That's possible and a lot of people do it daily. I always put a label between geli and the external device as it makes scripting the import easier, but it should work without the label as well. > Finally, I can create a script which will be working like a charm. I'm > really curios about creating encrypted ZFS pool(for opt) with attaching > and detaching capabilities. I guess that I'm doing an error on steps or > logical mistake. Could you please help me to handle this issue or steps ? Without knowing the exact steps you took, I can't tell where the problem is. Could you post the complete list of commands you used to create da1.eli and the ZFS pool, how you exported and detached da1.eli and how you tried to import it again? Fabian --Sig_/g5ft0Am7ltUpxpYbTs7Aa_+ Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAk/kRoMACgkQBYqIVf93VJ0ndQCdH5gjXckaIWnPxWI8UXQDQXLv twQAnRYsUf3oRMHMvin+OwOa5SClVbvC =rJGj -----END PGP SIGNATURE----- --Sig_/g5ft0Am7ltUpxpYbTs7Aa_+--