From owner-freebsd-fs@FreeBSD.ORG Sun Jul 18 07:01:42 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C01A9106564A for ; Sun, 18 Jul 2010 07:01:42 +0000 (UTC) (envelope-from rincebrain@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 882828FC13 for ; Sun, 18 Jul 2010 07:01:42 +0000 (UTC) Received: by iwn35 with SMTP id 35so4473843iwn.13 for ; Sun, 18 Jul 2010 00:01:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=1ienzButZ0IVzDJRzYYzV5HrRqYEeIbOKS3GNGw0lgc=; b=QeAnwRjGdoP/jFdQas+Qz9MIoqRqt9IgBulRZzSPXVGVCu9IMAULRl7ck2ANYaKj/1 W8oF5RM8ipfA7SExNnqNHEkZpl6w4vRBWaLCorlv2oHuMYkBoj3oceLLqKs7zMQM5Tpg 6Rtgbr6paJSnwN8S12PrzTJ7feWCV6nuJg8Oo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=jg7T8WLP0AwsHUxNyZtKWgKO2ZXafHwRnCp5yAbSUt3AmGzLFUL1xnb9VF7dqnpNvb owTUy9pW+YQve8q0oJV5MPGjt422OQIvVwf5LCgmL5u5eLdtDkQkP6AU7jzMPuGJC0NW QDUSP7Oby8WKoy6qOPx8pS7sbrBUbFHPiw4e4= MIME-Version: 1.0 Received: by 10.231.146.134 with SMTP id h6mr3829204ibv.170.1279436501146; Sun, 18 Jul 2010 00:01:41 -0700 (PDT) Received: by 10.231.192.134 with HTTP; Sun, 18 Jul 2010 00:01:41 -0700 (PDT) In-Reply-To: References: Date: Sun, 18 Jul 2010 03:01:41 -0400 Message-ID: From: Rich To: freebsd-fs Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: zpool scrub stops making progress after a period of time? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Jul 2010 07:01:42 -0000 The story has just gotten stranger. # zpool status -v pool: bukkit state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver completed after 33h10m with 53 errors on Sun Jul 18 03:00:20 2010 config: NAME STATE READ WRITE CKSUM bukkit DEGRADED 0 0 53 raidz2 DEGRADED 0 0 106 replacing DEGRADED 0 0 0 da1 FAULTED 0 1.57M 0 corrupted data da11 ONLINE 0 0 0 1.07T resilvered da9 ONLINE 0 0 0 845M resilvered da1 ONLINE 0 0 0 887M resilvered da8 ONLINE 0 0 0 845M resilvered da0 ONLINE 0 0 0 887M resilvered replacing DEGRADED 0 0 0 12471449581279369829 FAULTED 0 1.56M 0 was /dev/da7 da2 ONLINE 0 0 0 1.07T resilvered da6 ONLINE 0 0 0 887M resilvered da10 ONLINE 0 0 0 845M resilvered da5 ONLINE 0 0 0 886M resilvered da7 ONLINE 0 0 0 845M resilvered ...it finished, but still reports the disks as being replaced? - Rich From owner-freebsd-fs@FreeBSD.ORG Sun Jul 18 15:42:08 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6623B106567A; Sun, 18 Jul 2010 15:42:08 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3D7BE8FC21; Sun, 18 Jul 2010 15:42:08 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o6IFg8dO000682; Sun, 18 Jul 2010 15:42:08 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o6IFg8gm000678; Sun, 18 Jul 2010 15:42:08 GMT (envelope-from linimon) Date: Sun, 18 Jul 2010 15:42:08 GMT Message-Id: <201007181542.o6IFg8gm000678@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/147881: [zfs] [patch] ZFS "sharenfs" doesn't allow different "exports" options for different hosts X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Jul 2010 15:42:08 -0000 Old Synopsis: ZFS "sharenfs" doesn't allow different "exports" options for different hosts New Synopsis: [zfs] [patch] ZFS "sharenfs" doesn't allow different "exports" options for different hosts Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sun Jul 18 15:41:41 UTC 2010 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=147881 From owner-freebsd-fs@FreeBSD.ORG Sun Jul 18 15:44:49 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A0F31065674; Sun, 18 Jul 2010 15:44:49 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 218D78FC12; Sun, 18 Jul 2010 15:44:49 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o6IFinmR000848; Sun, 18 Jul 2010 15:44:49 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o6IFimup000844; Sun, 18 Jul 2010 15:44:49 GMT (envelope-from linimon) Date: Sun, 18 Jul 2010 15:44:49 GMT Message-Id: <201007181544.o6IFimup000844@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/147940: [nfs] mounting >1k TCP-NFS mounts fails X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Jul 2010 15:44:49 -0000 Old Synopsis: mounting >1k TCP-NFS mounts fails New Synopsis: [nfs] mounting >1k TCP-NFS mounts fails Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sun Jul 18 15:44:34 UTC 2010 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=147940 From owner-freebsd-fs@FreeBSD.ORG Mon Jul 19 00:39:17 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E6ADD1065672 for ; Mon, 19 Jul 2010 00:39:17 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id A4E038FC0A for ; Mon, 19 Jul 2010 00:39:17 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1OaeO3-0004Ib-T2 for freebsd-fs@freebsd.org; Mon, 19 Jul 2010 02:39:15 +0200 Received: from 193.33.173.33 ([193.33.173.33]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 19 Jul 2010 02:39:15 +0200 Received: from c.kworr by 193.33.173.33 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 19 Jul 2010 02:39:15 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Volodymyr Kostyrko Date: Mon, 19 Jul 2010 03:39:06 +0300 Lines: 25 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: 193.33.173.33 User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; uk-UA; rv:1.9.1.10) Gecko/20100713 Thunderbird/3.0.5 Subject: zfs corruption X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 00:39:18 -0000 Hi all. I have trouble accessing one zpool. It's set as bootable an it even manages to load and start kernel, so I suppose I can rip some files off it. Yet after loading kernel it spits out: panic: solaris asser: sm->sm_space + size <= sm->sm_size, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, line:96 Line number is incorrect, I have added some extra code to obtain values: sm->sm_space = 2147483648 size = 34304 sm->sm_size = 2147483648 Yes, sm_space is equal to sm_size here, and this leads to panic. I'm still trying to page the code to understand where I can patch the driver to obtain a read only access. Have anyone saw that before? If so could someone spare some bits of knowledge? -- Sphinx of black quartz judge my vow. From owner-freebsd-fs@FreeBSD.ORG Mon Jul 19 02:22:34 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F1F9B106564A for ; Mon, 19 Jul 2010 02:22:34 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id A9CB48FC12 for ; Mon, 19 Jul 2010 02:22:34 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AnoGAGpTQ0yDaFvK/2dsb2JhbACTMQEBjDlxvgGFJQQ X-IronPort-AV: E=Sophos;i="4.55,224,1278302400"; d="scan'208";a="86212153" Received: from fraser.cs.uoguelph.ca ([131.104.91.202]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 18 Jul 2010 22:22:31 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id A96CBC399; Sun, 18 Jul 2010 22:22:33 -0400 (EDT) X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca Received: from fraser.cs.uoguelph.ca ([127.0.0.1]) by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UthMehLtREH6; Sun, 18 Jul 2010 22:22:33 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 40AE4C388; Sun, 18 Jul 2010 22:22:33 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o6J2eGb22442; Sun, 18 Jul 2010 22:40:16 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Sun, 18 Jul 2010 22:40:16 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: freebsd-fs@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Subject: fix for remove for NFS through nullfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 02:22:35 -0000 Mikolaj Golub submitted the attached patch that fixes a problem w.r.t. a nullfs mounted NFS mount point for remove. The problem is that, without this patch, NFS does not see that a file is still open (v_usecount > 1) when removed and removes it instead of silly renaming it. This patch increments the v_usecount of the lower level vnode during the remove call, so that silly rename works. kib@ has noted that this may be "racy" and result in silly rename happening when it isn't required but, imho, that is less of a problem than it never working. (I have tested it a bit for NFS and UFS and it seems to work for those file systems under a nullfs mount.) Why I am posting is that I am wondering if anyone knows of a file system type where this extra v_usecount on the vnode at the time of remove will/might cause problems? Thanks in advance for looking at this, rick --- submitted patch for nullfs --- --- fs/nullfs/null_vnops.c.sav 2010-07-18 19:33:00.000000000 -0400 +++ fs/nullfs/null_vnops.c 2010-07-18 19:35:25.000000000 -0400 @@ -499,6 +499,29 @@ } /* + * Increasing refcount of lower vnode is needed at least for the case + * when lower FS is NFS to do sillyrename if the file is in use. + */ +static int +null_remove(struct vop_remove_args *ap) +{ + int retval; + struct vnode *lvp; + boolean_t vreleit; + + if (ap->a_vp->v_usecount > 1) { + lvp = NULLVPTOLOWERVP(ap->a_vp); + VREF(lvp); + vreleit = TRUE; + } else + vreleit = FALSE; + retval = null_bypass(&ap->a_gen); + if (vreleit) + vrele(lvp); + return (retval); +} + +/* * We handle this to eliminate null FS to lower FS * file moving. Don't know why we don't allow this, * possibly we should. @@ -809,6 +832,7 @@ .vop_open = null_open, .vop_print = null_print, .vop_reclaim = null_reclaim, + .vop_remove = null_remove, .vop_rename = null_rename, .vop_setattr = null_setattr, .vop_strategy = VOP_EOPNOTSUPP, From owner-freebsd-fs@FreeBSD.ORG Mon Jul 19 07:41:05 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B30A1065670 for ; Mon, 19 Jul 2010 07:41:05 +0000 (UTC) (envelope-from gleb.kurtsou@gmail.com) Received: from mail-ew0-f54.google.com (mail-ew0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 0A9A78FC18 for ; Mon, 19 Jul 2010 07:41:04 +0000 (UTC) Received: by ewy26 with SMTP id 26so1386763ewy.13 for ; Mon, 19 Jul 2010 00:41:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:references:mime-version:content-type:content-disposition :in-reply-to:user-agent; bh=xthnoVZLWVjvjjWQdjt/kDzBRCUeollDNSDylipw5ts=; b=LyOQHarzEnDtqXo4PV83obqYWM1eSFhsUNmR+it/zBLa8n/BuqWrMJFQtYmLTcSH8E uB/2TzZ4xivUIVWvXyeJBDXBJEn7fsSPYvhb7HIsysJgrzqmpCg+F7hwdBFJt8fjMMbU C+ZsCpyxDf1ZeV8UFDsodJaz72M82e8QuVWI4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=f7lT4382MqTNAfDFoBG0Fwzmabu2HukUlsR4j9Is9V92gH9NWfgb8xXcpWtGGkykKR 9kzaiNufFgiOCqgRiX2yaVuFBSbPNPM6Y+M8TG3tPEBMNepsztCC8XcXaqucF3KdQa0c R0wpD8IXHUtxHbPfM6UfU0gFcTIcGQE0VfstM= Received: by 10.213.27.206 with SMTP id j14mr2586615ebc.3.1279525263507; Mon, 19 Jul 2010 00:41:03 -0700 (PDT) Received: from localhost (136-125-dsl.ipact.nl [82.210.125.136]) by mx.google.com with ESMTPS id a48sm45152136eei.18.2010.07.19.00.40.58 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 19 Jul 2010 00:41:02 -0700 (PDT) Date: Mon, 19 Jul 2010 10:40:50 +0300 From: Gleb Kurtsou To: Rick Macklem Message-ID: <20100719074050.GA2640@tops> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org Subject: Re: fix for remove for NFS through nullfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 07:41:05 -0000 On (18/07/2010 22:40), Rick Macklem wrote: > Mikolaj Golub submitted the attached patch that fixes a problem w.r.t. > a nullfs mounted NFS mount point for remove. The problem is that, > without this patch, NFS does not see that a file is still open > (v_usecount > 1) when removed and removes it instead of silly renaming > it. This patch increments the v_usecount of the lower level vnode > during the remove call, so that silly rename works. kib@ has noted > that this may be "racy" and result in silly rename happening when it > isn't required but, imho, that is less of a problem than it never > working. (I have tested it a bit for NFS and UFS and it seems to > work for those file systems under a nullfs mount.) > > Why I am posting is that I am wondering if anyone knows of a file > system type where this extra v_usecount on the vnode at the time of remove > will/might cause problems? It seems to be easily reproducible only with stacked filesystems. But the problem is that v_usecount can be different from number of open descriptors for the vnode, and it generally is. IMHO using v_usecount is racy for all filesystems. Grep for vref and VREF, it's used all over the place. The issue was discussed some time ago already: http://marc.info/?l=freebsd-hackers&m=125675165319186&w=4 I think the better solution would be to add a means of getting number of opened file descriptors, but not misuse v_usecount, e.g. the patch looks a pure hack for me. Thanks, Gleb. > > Thanks in advance for looking at this, rick > --- submitted patch for nullfs --- > --- fs/nullfs/null_vnops.c.sav 2010-07-18 19:33:00.000000000 -0400 > +++ fs/nullfs/null_vnops.c 2010-07-18 19:35:25.000000000 -0400 > @@ -499,6 +499,29 @@ > } > > /* > + * Increasing refcount of lower vnode is needed at least for the case > + * when lower FS is NFS to do sillyrename if the file is in use. > + */ > +static int > +null_remove(struct vop_remove_args *ap) > +{ > + int retval; > + struct vnode *lvp; > + boolean_t vreleit; > + > + if (ap->a_vp->v_usecount > 1) { > + lvp = NULLVPTOLOWERVP(ap->a_vp); > + VREF(lvp); > + vreleit = TRUE; > + } else > + vreleit = FALSE; > + retval = null_bypass(&ap->a_gen); > + if (vreleit) > + vrele(lvp); > + return (retval); > +} > + > +/* > * We handle this to eliminate null FS to lower FS > * file moving. Don't know why we don't allow this, > * possibly we should. > @@ -809,6 +832,7 @@ > .vop_open = null_open, > .vop_print = null_print, > .vop_reclaim = null_reclaim, > + .vop_remove = null_remove, > .vop_rename = null_rename, > .vop_setattr = null_setattr, > .vop_strategy = VOP_EOPNOTSUPP, > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Jul 19 07:43:33 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 81AAA1065672 for ; Mon, 19 Jul 2010 07:43:33 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 1A5068FC17 for ; Mon, 19 Jul 2010 07:43:32 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o6J7hRDv006437 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 19 Jul 2010 10:43:27 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o6J7hRWG033120; Mon, 19 Jul 2010 10:43:27 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o6J7hR4H033119; Mon, 19 Jul 2010 10:43:27 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 19 Jul 2010 10:43:27 +0300 From: Kostik Belousov To: Rick Macklem Message-ID: <20100719074327.GV2381@deviant.kiev.zoral.com.ua> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1lE8Wy7Exphh2Vpg" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.2 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_50, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org Subject: Re: fix for remove for NFS through nullfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 07:43:33 -0000 --1lE8Wy7Exphh2Vpg Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jul 18, 2010 at 10:40:16PM -0400, Rick Macklem wrote: > Mikolaj Golub submitted the attached patch that fixes a problem w.r.t. > a nullfs mounted NFS mount point for remove. The problem is that, > without this patch, NFS does not see that a file is still open > (v_usecount > 1) when removed and removes it instead of silly renaming > it. This patch increments the v_usecount of the lower level vnode > during the remove call, so that silly rename works. kib@ has noted > that this may be "racy" and result in silly rename happening when it > isn't required but, imho, that is less of a problem than it never > working. (I have tested it a bit for NFS and UFS and it seems to > work for those file systems under a nullfs mount.) Rather, I said "please commit, but note (in commit message or better in comment) that ..." >=20 > Why I am posting is that I am wondering if anyone knows of a file > system type where this extra v_usecount on the vnode at the time of remove > will/might cause problems? >=20 > Thanks in advance for looking at this, rick > --- submitted patch for nullfs --- > --- fs/nullfs/null_vnops.c.sav 2010-07-18 19:33:00.000000000 -0400 > +++ fs/nullfs/null_vnops.c 2010-07-18 19:35:25.000000000 -0400 > @@ -499,6 +499,29 @@ > } >=20 > /* > + * Increasing refcount of lower vnode is needed at least for the case > + * when lower FS is NFS to do sillyrename if the file is in use. > + */ > +static int > +null_remove(struct vop_remove_args *ap) > +{ > + int retval; > + struct vnode *lvp; > + boolean_t vreleit; > + > + if (ap->a_vp->v_usecount > 1) { > + lvp =3D NULLVPTOLOWERVP(ap->a_vp); > + VREF(lvp); > + vreleit =3D TRUE; > + } else > + vreleit =3D FALSE; > + retval =3D null_bypass(&ap->a_gen); > + if (vreleit) > + vrele(lvp); > + return (retval); > +} > + > +/* > * We handle this to eliminate null FS to lower FS > * file moving. Don't know why we don't allow this, > * possibly we should. > @@ -809,6 +832,7 @@ > .vop_open =3D null_open, > .vop_print =3D null_print, > .vop_reclaim =3D null_reclaim, > + .vop_remove =3D null_remove, > .vop_rename =3D null_rename, > .vop_setattr =3D null_setattr, > .vop_strategy =3D VOP_EOPNOTSUPP, > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" --1lE8Wy7Exphh2Vpg Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkxEAh4ACgkQC3+MBN1Mb4gkpwCg2Z5SusfxE9TDvg2sZ8urYpVI gEwAoIMKZyKEmMEHUZRJYeYuFE6cPMKt =+JA4 -----END PGP SIGNATURE----- --1lE8Wy7Exphh2Vpg-- From owner-freebsd-fs@FreeBSD.ORG Mon Jul 19 11:06:56 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 46BCD106567D for ; Mon, 19 Jul 2010 11:06:56 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 345468FC0A for ; Mon, 19 Jul 2010 11:06:56 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o6JB6uoN065692 for ; Mon, 19 Jul 2010 11:06:56 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o6JB6tdX065690 for freebsd-fs@FreeBSD.org; Mon, 19 Jul 2010 11:06:55 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 19 Jul 2010 11:06:55 GMT Message-Id: <201007191106.o6JB6tdX065690@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 11:06:56 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/148709 fs [zfs] [panic] running du with zfs filesystem with shar o kern/148655 fs [zfs] Booting from a degraded raidz no longer works in o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o bin/148296 fs [zfs] [loader] [patch] Very slow probe in /usr/src/sys o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147940 fs [nfs] mounting >1k TCP-NFS mounts fails o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147790 fs [zfs] zfs set acl(mode|inherit) fails on existing zfs o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/147292 fs [nfs] [patch] readahead missing in nfs client options o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server o kern/146375 fs [nfs] [patch] Typos in macro variables names in sys/fs o kern/145778 fs [zfs] [panic] panic in zfs_fuid_map_id (known issue fi s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat s kern/145424 fs [zfs] [patch] move source closer to v15 o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o kern/145309 fs [disklabel]: Editing disk label invalidates the whole o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c o kern/144458 fs [nfs] [patch] nfsd fails as a kld p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o kern/143345 fs [ext2fs] [patch] extfs minor header cleanups to better o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142924 fs [ext2fs] [patch] Small cleanup for the inode struct in o kern/142914 fs [zfs] ZFS performance degradation over time o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142401 fs [ntfs] [patch] Minor updates to NTFS from NetBSD o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140134 fs [msdosfs] write and fsck destroy filesystem integrity o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs o bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/139363 fs [nfs] diskless root nfs mount from non FreeBSD server o kern/138790 fs [zfs] ZFS ceases caching when mem demand is high o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb f kern/137037 fs [zfs] [hang] zfs rollback on root causes FreeBSD to fr o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic o kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [panic] panic: ffs_truncate: read-only filesystem o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int f kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129059 fs [zfs] [patch] ZFS bootloader whitelistable via WITHOUT f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS p kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition f bin/124424 fs [zfs] zfs(8): zfs list -r shows strange snapshots' siz o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121779 fs [ufs] snapinfo(8) (and related tools?) only work for t o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [fs] [snapshot] System crashes when manipulati o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F f kern/119735 fs [zfs] geli + ZFS + samba starting on boot panics 7.0-B o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116913 fs [ffs] [panic] ffs_blkfree: freeing free block p kern/116608 fs [msdosfs] [patch] msdosfs fails to check mount options o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/115645 fs [snapshots] [panic] lockmgr: thread 0xc4c00d80, not ex o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] mount_msdosfs: msdosfs_iconv: Operation not o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [iso9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna f kern/91568 fs [ufs] [panic] writing to UFS/softupdates DVD media in o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o kern/87859 fs [smbfs] System reboot while umount smbfs. o bin/86765 fs [patch] bsdlabel(8) assigning wrong fs type. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o kern/85326 fs [smbfs] [panic] saving a file via samba to an overquot o kern/84589 fs [2TB] 5.4-STABLE unresponsive during background fsck 2 o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 186 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Jul 19 07:40:57 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B047A1065673; Mon, 19 Jul 2010 07:40:57 +0000 (UTC) (envelope-from mi+thun@aldan.algebra.com) Received: from vms173003pub.verizon.net (vms173003pub.verizon.net [206.46.173.3]) by mx1.freebsd.org (Postfix) with ESMTP id 8FC2D8FC15; Mon, 19 Jul 2010 07:40:57 +0000 (UTC) Received: from aldan.narawntapu ([unknown] [173.70.194.135]) by vms173003.mailsrvcs.net (Sun Java(tm) System Messaging Server 7u2-7.02 32bit (built Apr 16 2009)) with ESMTPA id <0L5S009F5L7H9XB1@vms173003.mailsrvcs.net>; Mon, 19 Jul 2010 01:40:31 -0500 (CDT) Message-id: <4C43F35D.5020007@aldan.algebra.com> Date: Mon, 19 Jul 2010 02:40:29 -0400 From: "Mikhail T." User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; uk; rv:1.9.1.10) Gecko/20100716 Thunderbird/3.0.5 MIME-version: 1.0 To: fs@freebsd.org, stable@FreeBSD.org X-Mailman-Approved-At: Mon, 19 Jul 2010 11:08:54 +0000 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 07:40:57 -0000 An 8.1-prerelease machine I have throws the panic in subject quite often. Does anyone care? Is this evidence of some filesystem corruption here, or a known problem that's (almost) solved already? The stacks all look the same: panic: handle_written_inodeblock: bad size ts_to_ct(1279145603.245753580) = [2010-07-14 22:13:23] ... #0 doadump () at pcpu.h:230 #1 0xc05be054 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416 #2 0xc05be261 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:590 #3 0xc07501b9 in softdep_disk_write_complete (bp=0xd81bcc30) at /usr/src/sys/ufs/ffs/ffs_softdep.c:4615 #4 0xc062c386 in bufdone_finish (bp=0xd81bcc30) at buf.h:411 #5 0xc062c7bd in bufdone (bp=0xd81bcc30) at /usr/src/sys/kern/vfs_bio.c:3275 #6 0xc0565bb5 in g_vfs_done (bip=0xc497e8c0) at /usr/src/sys/geom/geom_vfs.c:97 #7 0xc0626db9 in biodone (bp=0xc497e8c0) at /usr/src/sys/kern/vfs_bio.c:3110 #8 0xc056301f in g_io_schedule_up (tp=0xc4044000) at /usr/src/sys/geom/geom_io.c:675 #9 0xc05633c8 in g_up_procbody () at /usr/src/sys/geom/geom_kern.c:95 #10 0xc0595f5f in fork_exit (callout=0xc0563360 , arg=0x0, frame=0xe46a5d38) at /usr/src/sys/kern/kern_fork.c:844 #11 0xc07cf704 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:273 Please, advise. Thanks! -mi From owner-freebsd-fs@FreeBSD.ORG Mon Jul 19 11:44:59 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 71FD91065675 for ; Mon, 19 Jul 2010 11:44:59 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta10.emeryville.ca.mail.comcast.net (qmta10.emeryville.ca.mail.comcast.net [76.96.30.17]) by mx1.freebsd.org (Postfix) with ESMTP id 5A87F8FC16 for ; Mon, 19 Jul 2010 11:44:59 +0000 (UTC) Received: from omta04.emeryville.ca.mail.comcast.net ([76.96.30.35]) by qmta10.emeryville.ca.mail.comcast.net with comcast id jnU11e0020lTkoCAAnXoYT; Mon, 19 Jul 2010 11:31:48 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta04.emeryville.ca.mail.comcast.net with comcast id jnXn1e0063LrwQ28QnXn7m; Mon, 19 Jul 2010 11:31:48 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 2AFF39B425; Mon, 19 Jul 2010 04:31:47 -0700 (PDT) Date: Mon, 19 Jul 2010 04:31:47 -0700 From: Jeremy Chadwick To: "Mikhail T." Message-ID: <20100719113147.GA4786@icarus.home.lan> References: <4C43F35D.5020007@aldan.algebra.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C43F35D.5020007@aldan.algebra.com> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: stable@FreeBSD.org, fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 11:44:59 -0000 On Mon, Jul 19, 2010 at 02:40:29AM -0400, Mikhail T. wrote: > An 8.1-prerelease machine I have throws the panic in subject quite > often. Does anyone care? Is this evidence of some filesystem > corruption here, or a known problem that's (almost) solved already? > > The stacks all look the same: > > panic: handle_written_inodeblock: bad size > ts_to_ct(1279145603.245753580) = [2010-07-14 22:13:23] > ... > #0 doadump () at pcpu.h:230 > #1 0xc05be054 in boot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:416 > #2 0xc05be261 in panic (fmt=Variable "fmt" is not available. > ) at /usr/src/sys/kern/kern_shutdown.c:590 > #3 0xc07501b9 in softdep_disk_write_complete (bp=0xd81bcc30) > at /usr/src/sys/ufs/ffs/ffs_softdep.c:4615 > #4 0xc062c386 in bufdone_finish (bp=0xd81bcc30) at buf.h:411 > #5 0xc062c7bd in bufdone (bp=0xd81bcc30) at > /usr/src/sys/kern/vfs_bio.c:3275 > #6 0xc0565bb5 in g_vfs_done (bip=0xc497e8c0) > at /usr/src/sys/geom/geom_vfs.c:97 > #7 0xc0626db9 in biodone (bp=0xc497e8c0) at > /usr/src/sys/kern/vfs_bio.c:3110 > #8 0xc056301f in g_io_schedule_up (tp=0xc4044000) > at /usr/src/sys/geom/geom_io.c:675 > #9 0xc05633c8 in g_up_procbody () at /usr/src/sys/geom/geom_kern.c:95 > #10 0xc0595f5f in fork_exit (callout=0xc0563360 , > arg=0x0, > frame=0xe46a5d38) at /usr/src/sys/kern/kern_fork.c:844 > #11 0xc07cf704 in fork_trampoline () at > /usr/src/sys/i386/i386/exception.s:273 > > Please, advise. Thanks! If you boot the machine in single-user, and run fsck manually, are there any errors? Only thing I can think of off the top of my head: there's a known situation (also applies to RELENG_7) where a background fsck doesn't correct all errors after a system crash/unclean shutdown. I mention this because I see "softdep" in the above stack trace (usually refers to softupdates). I don't know if this got fixed, but the workaround is to use background_fsck="no" in rc.conf. Yes, after a crash this means you have to wait for the entire fsck to run. I can point you to some (old) discussions about it if need be. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Jul 19 16:58:31 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 07611106566B for ; Mon, 19 Jul 2010 16:58:31 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 4B6218FC14 for ; Mon, 19 Jul 2010 16:58:29 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA12728; Mon, 19 Jul 2010 19:58:27 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4C448432.5030009@icyb.net.ua> Date: Mon, 19 Jul 2010 19:58:26 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100517) MIME-Version: 1.0 To: Kostik Belousov References: <4C03E9C0.2010602@icyb.net.ua> <20100602145856.GR83316@deviant.kiev.zoral.com.ua> <20100602151358.GS83316@deviant.kiev.zoral.com.ua> In-Reply-To: <20100602151358.GS83316@deviant.kiev.zoral.com.ua> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: nullfs distinct path check: take into account filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 16:58:31 -0000 on 02/06/2010 18:13 Kostik Belousov said the following: > On Wed, Jun 02, 2010 at 05:58:56PM +0300, Kostik Belousov wrote: >> On Mon, May 31, 2010 at 07:54:24PM +0300, Andriy Gapon wrote: >>> Right now mount_nullfs doesn't allow to, for example, mount / in >>> /tmp/somedir even if /tmp is a different filesystem. I think there is >>> no reason to do that and thus propose the following patch. >> Reason is that after such mount "find /" would never end. > > Hm, "different filesystems". Sorry, ignore my answer. BTW, I thought about this some more and I think that you had a good point. At present our nullfs is implemented to act only on a filesystem level, not on a global namespace of file paths. That is, when I do $ mount_nullfs target mountpoint I get under only that part of namespace that is rooted at _and_ belongs to the same filesystem as . I think that this is an implementation detail, or implementation choice, but not a characteristic feature of nullfs. I could imagine that nullfs could do global namespace "stitching" regardless of underlying composition of the namespace (i.e. particular filesystem hierarchy). In fact, such an implementation would seem more natural and useful to me. Perhaps, we could add a mount option and a global sysctl to control this behavior. Of course, the code would need to be written for that. And we would have to be careful to not break POLA potentially with security implications. What do you think? -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Jul 19 19:47:31 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A13AB1065675 for ; Mon, 19 Jul 2010 19:47:31 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 3DDE58FC14 for ; Mon, 19 Jul 2010 19:47:30 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o6JJlRI5071325 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 19 Jul 2010 22:47:27 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o6JJlRZB071515; Mon, 19 Jul 2010 22:47:27 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o6JJlQPF071514; Mon, 19 Jul 2010 22:47:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 19 Jul 2010 22:47:26 +0300 From: Kostik Belousov To: Andriy Gapon Message-ID: <20100719194726.GZ2381@deviant.kiev.zoral.com.ua> References: <4C03E9C0.2010602@icyb.net.ua> <20100602145856.GR83316@deviant.kiev.zoral.com.ua> <20100602151358.GS83316@deviant.kiev.zoral.com.ua> <4C448432.5030009@icyb.net.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="UQUSR6hGnEXrhNQw" Content-Disposition: inline In-Reply-To: <4C448432.5030009@icyb.net.ua> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.2 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_50, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org Subject: Re: nullfs distinct path check: take into account filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 19:47:31 -0000 --UQUSR6hGnEXrhNQw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jul 19, 2010 at 07:58:26PM +0300, Andriy Gapon wrote: > on 02/06/2010 18:13 Kostik Belousov said the following: > > On Wed, Jun 02, 2010 at 05:58:56PM +0300, Kostik Belousov wrote: > >> On Mon, May 31, 2010 at 07:54:24PM +0300, Andriy Gapon wrote: > >>> Right now mount_nullfs doesn't allow to, for example, mount / in > >>> /tmp/somedir even if /tmp is a different filesystem. I think there is > >>> no reason to do that and thus propose the following patch. > >> Reason is that after such mount "find /" would never end. > >=20 > > Hm, "different filesystems". Sorry, ignore my answer. >=20 > BTW, I thought about this some more and I think that you had a good point. >=20 > At present our nullfs is implemented to act only on a filesystem > level, not on a global namespace of file paths. That is, when I do > $ mount_nullfs target mountpoint I get under only that > part of namespace that is rooted at _and_ belongs to the same > filesystem as . > > I think that this is an implementation detail, or implementation > choice, but not a characteristic feature of nullfs. > > I could imagine that nullfs could do global namespace "stitching" > regardless of underlying composition of the namespace (i.e. particular > filesystem hierarchy). In fact, such an implementation would seem more > natural and useful to me. > > Perhaps, we could add a mount option and a global sysctl to control > this behavior. Of course, the code would need to be written for that. > And we would have to be careful to not break POLA potentially with > security implications. > > What do you think? Current nullfs operates as vnode op bypass. E.g., when VOP_LOOKUP() is called on the nullfs vnode, VOP_LOOKUP() of the underlying filesystem on the underlying vnode is called. You proposal can only be implemented by calling namei() or lookup() from the nullfs_lookup(), effectively recursing into the VFS. It might work, but there are obvious complications, related to the unexpected lock ownership and stack depth, might be more. --UQUSR6hGnEXrhNQw Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkxEq84ACgkQC3+MBN1Mb4iU8gCg2DhO5zAdV2FZqj0WWdEFqhy8 hmwAoM+0Ajg5kG7E5UoD7Wf9y6dRzQuQ =L4yA -----END PGP SIGNATURE----- --UQUSR6hGnEXrhNQw-- From owner-freebsd-fs@FreeBSD.ORG Mon Jul 19 20:41:26 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 02D821065678 for ; Mon, 19 Jul 2010 20:41:26 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta08.emeryville.ca.mail.comcast.net (qmta08.emeryville.ca.mail.comcast.net [76.96.30.80]) by mx1.freebsd.org (Postfix) with ESMTP id DDA168FC1B for ; Mon, 19 Jul 2010 20:41:25 +0000 (UTC) Received: from omta01.emeryville.ca.mail.comcast.net ([76.96.30.11]) by qmta08.emeryville.ca.mail.comcast.net with comcast id jnit1e0020EPchoA8whRFw; Mon, 19 Jul 2010 20:41:25 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta01.emeryville.ca.mail.comcast.net with comcast id jwhQ1e0053LrwQ28MwhQkv; Mon, 19 Jul 2010 20:41:24 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 0D5649B425; Mon, 19 Jul 2010 13:41:24 -0700 (PDT) Date: Mon, 19 Jul 2010 13:41:24 -0700 From: Jeremy Chadwick To: "Mikhail T." Message-ID: <20100719204124.GA21573@icarus.home.lan> References: <4C43F35D.5020007@aldan.algebra.com> <20100719113147.GA4786@icarus.home.lan> <4C44758F.7080209@aldan.algebra.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4C44758F.7080209@aldan.algebra.com> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 20:41:26 -0000 On Mon, Jul 19, 2010 at 11:55:59AM -0400, Mikhail T. wrote: > 19.07.2010 07:31, Jeremy Chadwick написав(ла): > >If you boot the machine in single-user, and run fsck manually, are there > >any errors? > Thanks, Jeremy... I wish, there was a way to learn, /which/ > file-system is giving trouble... However, after sending the question > out last night, I tried to pkg_delete a package on the machine, and > was very lucky to see a file-system error (inode something or other) > before the panic struck. That, at least, told me, which file-system > was in trouble (/var). I dump-ed it out, re-created, and then > restored it... Although dumping went smooth, there were two errors > at which restore offered to abort. I told it not to and got (most of > the) file-system restored. (The dump is available to anyone wishing > to investigate -- contact me privately. I'm not posting it publicly > because of the passwd-file backup under /var). I see where you're going with this -- the only way you knew it was /var was based on the inode error you saw before the system crashed. > So far seems quiet -- no panics for two more hours before I went to bed. > >Only thing I can think of off the top of my head: there's a known > >situation (also applies to RELENG_7) where a background fsck doesn't > >correct all errors after a system crash/unclean shutdown. I mention > >this because I see "softdep" in the above stack trace (usually refers to > >softupdates). I don't know if this got fixed, but the workaround is to > >use background_fsck="no" in rc.conf. Yes, after a crash this means you > >have to wait for the entire fsck to run. > When setting up my main machine 4 years ago, I turned off background > fsck... But I thought, things have improved sufficiently enough > since then :-( Maybe, background fsck should still be disabled by > default? > > And, IMO, at the very least, *any panic related to a file-system > must clearly identify the file-system in question*... What do you > think? I think that's a reasonable request and would be ideal for situations like this. If it's possible or not is a completely different question. I might be able to code something like this up (would be my first time messing around in the kernel in this regard -- warning, alert, hazard, danger Will Robinson!), but it'd probably make more sense for someone already familiar with that section of code to do it. I'm much more familiar with userland stuff, the kernel is "black magic". ;-) Assuming work tonight isn't that busy for me, I'll see if I can dedicate some cycles to printing this information in the error string you saw. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Jul 19 16:48:16 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1FD72106564A for ; Mon, 19 Jul 2010 16:48:16 +0000 (UTC) (envelope-from mi+thun@aldan.algebra.com) Received: from mail2.timeinc.net (mail2.timeinc.net [64.236.74.30]) by mx1.freebsd.org (Postfix) with ESMTP id BF37F8FC08 for ; Mon, 19 Jul 2010 16:48:15 +0000 (UTC) Received: from mail.timeinc.net (mail.timeinc.net [64.12.55.166]) by mail2.timeinc.net (8.13.8/8.13.8) with ESMTP id o6JFtxKR030635 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 19 Jul 2010 11:55:59 -0400 Received: from ws-mteterin.dev.pathfinder.com (ws-mteterin.dev.pathfinder.com [209.251.223.173]) by mail.timeinc.net (8.13.8/8.13.8) with SMTP id o6JFtxsT021007; Mon, 19 Jul 2010 11:55:59 -0400 Message-ID: <4C44758F.7080209@aldan.algebra.com> Date: Mon, 19 Jul 2010 11:55:59 -0400 From: "Mikhail T." Organization: Virtual Estates, Inc. User-Agent: Mozilla/5.0 (X11; U; Linux i686; uk; rv:1.9.1.10) Gecko/20100512 Lightning/1.0b1 Thunderbird/3.0.5 MIME-Version: 1.0 To: Jeremy Chadwick References: <4C43F35D.5020007@aldan.algebra.com> <20100719113147.GA4786@icarus.home.lan> In-Reply-To: <20100719113147.GA4786@icarus.home.lan> X-Mailman-Approved-At: Mon, 19 Jul 2010 21:14:31 +0000 Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 16:48:16 -0000 19.07.2010 07:31, Jeremy Chadwick (): > If you boot the machine in single-user, and run fsck manually, are there > any errors? > Thanks, Jeremy... I wish, there was a way to learn, /which/ file-system is giving trouble... However, after sending the question out last night, I tried to pkg_delete a package on the machine, and was very lucky to see a file-system error (inode something or other) before the panic struck. That, at least, told me, which file-system was in trouble (/var). I dump-ed it out, re-created, and then restored it... Although dumping went smooth, there were two errors at which restore offered to abort. I told it not to and got (most of the) file-system restored. (The dump is available to anyone wishing to investigate -- contact me privately. I'm not posting it publicly because of the passwd-file backup under /var). So far seems quiet -- no panics for two more hours before I went to bed. > Only thing I can think of off the top of my head: there's a known > situation (also applies to RELENG_7) where a background fsck doesn't > correct all errors after a system crash/unclean shutdown. I mention > this because I see "softdep" in the above stack trace (usually refers to > softupdates). I don't know if this got fixed, but the workaround is to > use background_fsck="no" in rc.conf. Yes, after a crash this means you > have to wait for the entire fsck to run. > When setting up my main machine 4 years ago, I turned off background fsck... But I thought, things have improved sufficiently enough since then :-( Maybe, background fsck should still be disabled by default? And, IMO, at the very least, *any panic related to a file-system must clearly identify the file-system in question*... What do you think? Yours, -mi From owner-freebsd-fs@FreeBSD.ORG Tue Jul 20 13:49:33 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AC79F1065676 for ; Tue, 20 Jul 2010 13:49:33 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net [76.96.30.96]) by mx1.freebsd.org (Postfix) with ESMTP id 921888FC1E for ; Tue, 20 Jul 2010 13:49:33 +0000 (UTC) Received: from omta03.emeryville.ca.mail.comcast.net ([76.96.30.27]) by qmta09.emeryville.ca.mail.comcast.net with comcast id kCyV1e0050b6N64A9DpYr0; Tue, 20 Jul 2010 13:49:32 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta03.emeryville.ca.mail.comcast.net with comcast id kDpX1e00A3LrwQ28PDpYX8; Tue, 20 Jul 2010 13:49:32 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 86DDF9B425; Tue, 20 Jul 2010 06:49:31 -0700 (PDT) Date: Tue, 20 Jul 2010 06:49:31 -0700 From: Jeremy Chadwick To: "Mikhail T." Message-ID: <20100720134931.GA41352@icarus.home.lan> References: <4C43F35D.5020007@aldan.algebra.com> <20100719113147.GA4786@icarus.home.lan> <4C44758F.7080209@aldan.algebra.com> <20100719204124.GA21573@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100719204124.GA21573@icarus.home.lan> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2010 13:49:33 -0000 On Mon, Jul 19, 2010 at 01:41:24PM -0700, Jeremy Chadwick wrote: > On Mon, Jul 19, 2010 at 11:55:59AM -0400, Mikhail T. wrote: > > 19.07.2010 07:31, Jeremy Chadwick написав(ла): > > >If you boot the machine in single-user, and run fsck manually, are there > > >any errors? > > Thanks, Jeremy... I wish, there was a way to learn, /which/ > > file-system is giving trouble... However, after sending the question > > out last night, I tried to pkg_delete a package on the machine, and > > was very lucky to see a file-system error (inode something or other) > > before the panic struck. That, at least, told me, which file-system > > was in trouble (/var). > > [...] > > And, IMO, at the very least, *any panic related to a file-system > > must clearly identify the file-system in question*... What do you > > think? > > [...] > Assuming work tonight isn't that busy for me, I'll see if I can dedicate > some cycles to printing this information in the error string you saw. I spent some time on this tonight. It's not as simple as it sounds, for me anyway. Relevant source bits: src/sys/ufs/ffs/ffs_softdep.c src/sys/ufs/ffs/fs.h src/sys/ufs/ffs/softdep.h ffs_softdep.c, which is almost 6500 lines, contains a large number of inode-related functions which can call panic(). Functions which have easy access to the related inodedep struct are the ones which would be able to print this information easily. Sort of. struct inodedep (see softdep.h) contains a member called id_fs, which is struct fs (see fs.h). struct fs contains a member called fs_fsmnt (a char buffer), which is the name of the mounted filesystem. fs_fsmnt[0] should be NULL ('\0') if the filesystem isn't mounted. So in the case of your panic within handle_written_inodeblock(), it would be as simple as something like: u_char *mntpt = NULL; if (inodedep->id_fs->fs_fsmnt[0] != '\0') mntpt = &inodedep->id_fs->fs_fsmnt; else /* XXX do what here? */ Then, the panic() statements later have to do something like this (taken from real code): if (dp1->di_db[adp->ad_lbn]!=adp->ad_oldblkno) panic("%s: %s: %s #%jd mismatch %d != %jd", "handle_written_inodeblock", (mntpt ? mntpt) : "", "direct pointer", (intmax_t)adp->ad_lbn, dp1->di_db[adp->ad_lbn], (intmax_t)adp->ad_oldblkno); The panic message would look like one of the following: panic: handle_written_inodeblock: /mnt: direct pointer #nnn mismatch nnn != nnn panic: handle_written_inodeblock: : direct pointer #nnn mismatch nnn != nnn The "" string there is a Bad Idea(tm); see below. Secondly, this brings up the question: what happens if someone is doing something like "fsck /var", where /var uses soft updates? /var isn't mounted when this happens. Can these inode-related functions get called during that time? If so, fs_fsmnt would (in theory -- I haven't tested in practise) be null. So in that case, what should get printed as the filesystem? Well, this is where the "" string comes into play. My first answer was: "the name of the device/slice/etc. which the inode is associated with". The problem is that I couldn't find a way to get this information, as it's not stored in struct fs anywhere. One would have to change the kernel ABI to pass this down the stack, which changes the ABI and is not something I'm willing to do (plus there's performance implications as you're passing something else on the stack per every call). Of course there may be a way to get this easily, but I don't see it or know of it. Thirdly, and this is equally as important: given the repetitive nature of this code (it would have to be repeated in numerous functions), making a common function that populates a (global) variable with the fsname its working on would be ideal. But I don't know the implication of this, nor do I see many (I think two?) global variables used within softdep_ffs.c. Extending one of the structs to get access to the necessary information is not as simple as "just do it" -- there are implications when it comes to memory usage and so on. This is not a piece of code to bang on lightly. This should probably be discussed on freebsd-hackers, but cross-posting across 3 separate mailing lists is rude. If you want to drive this, cool, but please start a new thread about the matter (wanting the filesystem or device printed in panic() when things like filesystem panics happen) on freebsd-hackers. I'm not subscribed to that list, so please CC me if you go this route. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Jul 20 14:50:03 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 917AB1065670 for ; Tue, 20 Jul 2010 14:50:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 6676F8FC1C for ; Tue, 20 Jul 2010 14:50:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o6KEo3mr074650 for ; Tue, 20 Jul 2010 14:50:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o6KEo3dA074649; Tue, 20 Jul 2010 14:50:03 GMT (envelope-from gnats) Date: Tue, 20 Jul 2010 14:50:03 GMT Message-Id: <201007201450.o6KEo3dA074649@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: John Baldwin Cc: Subject: Re: kern/147940: [nfs] mounting >1k TCP-NFS mounts fails X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: John Baldwin List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2010 14:50:03 -0000 The following reply was made to PR kern/147940; it has been noted by GNATS. From: John Baldwin To: bug-followup@freebsd.org, rs@bytecamp.net Cc: Subject: Re: kern/147940: [nfs] mounting >1k TCP-NFS mounts fails Date: Tue, 20 Jul 2010 10:42:37 -0400 There are a limited number of privileged ports on a client, only 1k, and some of those ports are used for other services, so you certainly cannot mount 1k TCP NFS mounts unless you disable the privileged port check on the server. nfs_privport=0 is not necessarily a risk if you trust all machines that are able to connect to your NFS server (e.g. you manage all the clients and the server is on a LAN or WAN and not directly connected to the Internet). Even with nfs_privport=1 you are still trusting root on any client machines, nfs_privport=0 only prevents non-root users on client machines from establishing mounts. However, this isn't a bug, this is just the way IP works, and as a result, the way that NFS mounts work. -N for the UDP mounts is effectively similar to having nfs_privport set to 0. I'm not sure exactly how it works (perhaps it requires the mount request to be privileged, but not the normal RPC traffic?), but that is why it is "working". -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Tue Jul 20 15:55:50 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB4D3106566B; Tue, 20 Jul 2010 15:55:50 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 9356F8FC13; Tue, 20 Jul 2010 15:55:50 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o6KFtois041811; Tue, 20 Jul 2010 15:55:50 GMT (envelope-from jhb@freefall.freebsd.org) Received: (from jhb@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o6KFtoj5041807; Tue, 20 Jul 2010 15:55:50 GMT (envelope-from jhb) Date: Tue, 20 Jul 2010 15:55:50 GMT Message-Id: <201007201555.o6KFtoj5041807@freefall.freebsd.org> To: rs@bytecamp.net, jhb@FreeBSD.org, freebsd-fs@FreeBSD.org From: jhb@FreeBSD.org Cc: Subject: Re: kern/147940: [nfs] mounting >1k TCP-NFS mounts fails X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2010 15:55:50 -0000 Synopsis: [nfs] mounting >1k TCP-NFS mounts fails State-Changed-From-To: open->closed State-Changed-By: jhb State-Changed-When: Tue Jul 20 15:55:25 UTC 2010 State-Changed-Why: This isn't a bug in the FreeBSD NFS client but a limit of IP. http://www.freebsd.org/cgi/query-pr.cgi?pr=147940 From owner-freebsd-fs@FreeBSD.ORG Tue Jul 20 16:00:14 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 627C9106566B for ; Tue, 20 Jul 2010 16:00:14 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id 2E2F68FC16 for ; Tue, 20 Jul 2010 16:00:13 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id o6KFipaV084986; Tue, 20 Jul 2010 08:44:51 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201007201544.o6KFipaV084986@chez.mckusick.com> To: Jeremy Chadwick In-reply-to: <20100720134931.GA41352@icarus.home.lan> Date: Tue, 20 Jul 2010 08:44:51 -0700 From: Kirk McKusick Cc: "Mikhail T." , fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2010 16:00:14 -0000 You are on the right track with getting the filesystem information. Any place that one has an inode (say pointer ip), it is possible to get the filesystem information using ip->i_fs->fs_fsmnt. The mount point can also be found from the mount-point structure. So, it is almost always possible to make your way to fs_fsmnt. Since soft updates only runs on mounted filesystems, you will never have a case where the fs_fsmnt has not been filled in for you. You are right that you do not want to add a global variable. Instead the fs_fsmnt should just be gathered from where-ever it is available. Adding it to all the panic's will be a lot of work, but I agree would be useful. I will look into doing so when I get a chance. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Tue Jul 20 16:49:01 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 764B41065677 for ; Tue, 20 Jul 2010 16:49:01 +0000 (UTC) (envelope-from mi+thun@aldan.algebra.com) Received: from mail2.timeinc.net (mail2.timeinc.net [64.236.74.30]) by mx1.freebsd.org (Postfix) with ESMTP id 2270D8FC27 for ; Tue, 20 Jul 2010 16:49:00 +0000 (UTC) Received: from mail.timeinc.net (mail.timeinc.net [64.12.55.166]) by mail2.timeinc.net (8.13.8/8.13.8) with ESMTP id o6KGmxW4021589 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 20 Jul 2010 12:48:59 -0400 Received: from ws-mteterin.dev.pathfinder.com (ws-mteterin.dev.pathfinder.com [209.251.223.173]) by mail.timeinc.net (8.13.8/8.13.8) with SMTP id o6KGmwQ3030470; Tue, 20 Jul 2010 12:48:59 -0400 Message-ID: <4C45D37A.5020304@aldan.algebra.com> Date: Tue, 20 Jul 2010 12:48:58 -0400 From: "Mikhail T." Organization: Virtual Estates, Inc. User-Agent: Mozilla/5.0 (X11; U; Linux i686; uk; rv:1.9.1.10) Gecko/20100512 Lightning/1.0b1 Thunderbird/3.0.5 MIME-Version: 1.0 To: Kirk McKusick References: <201007201544.o6KFipaV084986@chez.mckusick.com> In-Reply-To: <201007201544.o6KFipaV084986@chez.mckusick.com> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 8bit X-Mailman-Approved-At: Tue, 20 Jul 2010 17:45:10 +0000 Cc: fs@freebsd.org Subject: background fsck considered harmful? (Re: panic: handle_written_inodeblock: bad size) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2010 16:49:01 -0000 20.07.2010 11:44, Kirk McKusick (): > Adding it to all the panic's will be a lot of work, > but I agree would be useful. I will look into doing so when I > get a chance. > > Kirk McKusick > How about disabling background fsck in a default install? It seems to be the consensus here, that my troubles were due to fsck not fixing the file-system properly reboot after reboot... Yours, -mi From owner-freebsd-fs@FreeBSD.ORG Tue Jul 20 17:57:11 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E4501065676 for ; Tue, 20 Jul 2010 17:57:11 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [76.96.30.16]) by mx1.freebsd.org (Postfix) with ESMTP id 74D928FC0C for ; Tue, 20 Jul 2010 17:57:11 +0000 (UTC) Received: from omta21.emeryville.ca.mail.comcast.net ([76.96.30.88]) by qmta01.emeryville.ca.mail.comcast.net with comcast id kESA1e0071u4NiLA1HxBVm; Tue, 20 Jul 2010 17:57:11 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta21.emeryville.ca.mail.comcast.net with comcast id kHx91e00D3LrwQ28hHxA7k; Tue, 20 Jul 2010 17:57:10 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id AADC09B425; Tue, 20 Jul 2010 10:57:09 -0700 (PDT) Date: Tue, 20 Jul 2010 10:57:09 -0700 From: Jeremy Chadwick To: Kirk McKusick Message-ID: <20100720175709.GA52321@icarus.home.lan> References: <20100720134931.GA41352@icarus.home.lan> <201007201544.o6KFipaV084986@chez.mckusick.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201007201544.o6KFipaV084986@chez.mckusick.com> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: "Mikhail T." , fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2010 17:57:11 -0000 On Tue, Jul 20, 2010 at 08:44:51AM -0700, Kirk McKusick wrote: > You are on the right track with getting the filesystem information. > Any place that one has an inode (say pointer ip), it is possible > to get the filesystem information using ip->i_fs->fs_fsmnt. The > mount point can also be found from the mount-point structure. Kirk, thanks for chiming in! The latter is the one I'm not familiar with. Are you referring to "struct mount"? If so -- yeah I took a look at that one too, but found it to be used predominantly in other functions and wasn't sure how to "work backwards" to getting what was needed. I also wasn't sure what the difference was between fs_fsmnt vs. f_mntfromname (and f_mnttoname too) in struct mount. > So, it is almost always possible to make your way to fs_fsmnt. > Since soft updates only runs on mounted filesystems, you will never > have a case where the fs_fsmnt has not been filled in for you. Oh, cool. I was taking the paranoid approach ("does this really point to something that's valid? What if..."). :-) > You are right that you do not want to add a global variable. > Instead the fs_fsmnt should just be gathered from where-ever it > is available. Adding it to all the panic's will be a lot of work, > but I agree would be useful. I will look into doing so when I > get a chance. I'd like to volunteer to help implement this if at all possible. I don't mind doing the dirty work and having you review it. There are some functions which are fairly obvious when it comes to getting mountpoint information, and I think a good starting point (for me anyway) would be to enhance those. Standardising the panic() output cosmetically (meaning where in the string the mountpoint gets printed) would also be worthwhile. I'm of the opinion that it should be printed first, e.g.: panic("%s: softdep_setup_inomapdep: dependency for new inode already exists" mountpoint); The biggest problem (for me) is testing. I have no idea how to trigger the error conditions in these functions. I assume it varies; maybe through fsdb(8) or interactively dropping to DDB and forcing the condition. I tend to do all of my work on this sort on a VM of FreeBSD (using VMware Workstation), but if testing on bare metal is required I have a testbed as well. Let me know if you could. Thanks! -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Jul 20 18:51:43 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 735E01065673; Tue, 20 Jul 2010 18:51:43 +0000 (UTC) (envelope-from freebsd@jdc.parodius.com) Received: from sj-iport-3.cisco.com (sj-iport-3.cisco.com [171.71.176.72]) by mx1.freebsd.org (Postfix) with ESMTP id 1D4918FC12; Tue, 20 Jul 2010 18:51:42 +0000 (UTC) Authentication-Results: sj-iport-3.cisco.com; dkim=neutral (message not signed) header.i=none Received: from sj-core-3.cisco.com ([171.68.223.137]) by sj-iport-3.cisco.com with ESMTP; 20 Jul 2010 18:23:11 +0000 Received: from xbh-sjc-231.amer.cisco.com (xbh-sjc-231.cisco.com [128.107.191.100]) by sj-core-3.cisco.com (8.13.8/8.14.3) with ESMTP id o6KIN78P010508; Tue, 20 Jul 2010 18:23:11 GMT Received: from mail pickup service by xbh-sjc-231.amer.cisco.com with Microsoft SMTPSVC; Tue, 20 Jul 2010 11:23:09 -0700 Received: from xbh-rcd-201.cisco.com ([72.163.62.200]) by xbh-sjc-231.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 20 Jul 2010 06:50:34 -0700 Received: from xmb-rcd-312.cisco.com ([72.163.63.27]) by xbh-rcd-201.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 20 Jul 2010 08:50:25 -0500 Received: from xbh-rcd-102.cisco.com ([72.163.62.139]) by xmb-rcd-312.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 20 Jul 2010 08:50:24 -0500 Received: from rtp-iport-2.cisco.com ([64.102.122.149]) by xbh-rcd-102.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 20 Jul 2010 08:50:23 -0500 Received: from rcdn-core-5.cisco.com ([173.37.93.156]) by rtp-iport-2.cisco.com with ESMTP; 20 Jul 2010 13:50:20 +0000 Received: from sj-inbound-e.cisco.com (sj-inbound-e.cisco.com [128.107.243.14]) by rcdn-core-5.cisco.com (8.14.3/8.14.3) with ESMTP id o6KDoJHq013369 for ; Tue, 20 Jul 2010 13:50:20 GMT X-from-outside-Cisco: 69.147.83.53 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjsBACtGRUxFk1M1kWdsb2JhbACDH5xQFQEBAQEJCwoHEQUdpWeJNJF1gSaBRIFVcwSEAIN9XA X-IronPort-AV: E=Sophos;i="4.55,232,1278288000"; d="scan'208";a="165701141" Received: from mx2.freebsd.org ([69.147.83.53]) by sj-inbound-e.cisco.com with ESMTP; 20 Jul 2010 13:50:19 +0000 Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 0AE86179E9B; Tue, 20 Jul 2010 13:49:44 +0000 (UTC) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 45AE1106573C; Tue, 20 Jul 2010 13:49:43 +0000 (UTC) (envelope-from owner-freebsd-stable@freebsd.org) Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9549D1065675 for ; Tue, 20 Jul 2010 13:49:33 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta14.emeryville.ca.mail.comcast.net (qmta14.emeryville.ca.mail.comcast.net [76.96.27.212]) by mx1.freebsd.org (Postfix) with ESMTP id 7B1038FC1D for ; Tue, 20 Jul 2010 13:49:33 +0000 (UTC) Received: from omta03.emeryville.ca.mail.comcast.net ([76.96.30.27]) by qmta14.emeryville.ca.mail.comcast.net with comcast id kDWt1e0030b6N64AEDpYLe; Tue, 20 Jul 2010 13:49:32 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta03.emeryville.ca.mail.comcast.net with comcast id kDpX1e00A3LrwQ28PDpYX8; Tue, 20 Jul 2010 13:49:32 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 86DDF9B425; Tue, 20 Jul 2010 06:49:31 -0700 (PDT) Date: Tue, 20 Jul 2010 06:49:31 -0700 From: Jeremy Chadwick To: "Mikhail T." Message-ID: <20100720134931.GA41352@icarus.home.lan> References: <4C43F35D.5020007@aldan.algebra.com> <20100719113147.GA4786@icarus.home.lan> <4C44758F.7080209@aldan.algebra.com> <20100719204124.GA21573@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100719204124.GA21573@icarus.home.lan> User-Agent: Mutt/1.5.20 (2009-06-14) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-stable@freebsd.org Errors-To: owner-freebsd-stable@freebsd.org X-OriginalArrivalTime: 20 Jul 2010 13:50:23.0924 (UTC) FILETIME=[806B7740:01CB2812] Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2010 18:51:43 -0000 On Mon, Jul 19, 2010 at 01:41:24PM -0700, Jeremy Chadwick wrote: > On Mon, Jul 19, 2010 at 11:55:59AM -0400, Mikhail T. wrote: > > 19.07.2010 07:31, Jeremy Chadwick написав(ла): > > >If you boot the machine in single-user, and run fsck manually, are there > > >any errors? > > Thanks, Jeremy... I wish, there was a way to learn, /which/ > > file-system is giving trouble... However, after sending the question > > out last night, I tried to pkg_delete a package on the machine, and > > was very lucky to see a file-system error (inode something or other) > > before the panic struck. That, at least, told me, which file-system > > was in trouble (/var). > > [...] > > And, IMO, at the very least, *any panic related to a file-system > > must clearly identify the file-system in question*... What do you > > think? > > [...] > Assuming work tonight isn't that busy for me, I'll see if I can dedicate > some cycles to printing this information in the error string you saw. I spent some time on this tonight. It's not as simple as it sounds, for me anyway. Relevant source bits: src/sys/ufs/ffs/ffs_softdep.c src/sys/ufs/ffs/fs.h src/sys/ufs/ffs/softdep.h ffs_softdep.c, which is almost 6500 lines, contains a large number of inode-related functions which can call panic(). Functions which have easy access to the related inodedep struct are the ones which would be able to print this information easily. Sort of. struct inodedep (see softdep.h) contains a member called id_fs, which is struct fs (see fs.h). struct fs contains a member called fs_fsmnt (a char buffer), which is the name of the mounted filesystem. fs_fsmnt[0] should be NULL ('\0') if the filesystem isn't mounted. So in the case of your panic within handle_written_inodeblock(), it would be as simple as something like: u_char *mntpt = NULL; if (inodedep->id_fs->fs_fsmnt[0] != '\0') mntpt = &inodedep->id_fs->fs_fsmnt; else /* XXX do what here? */ Then, the panic() statements later have to do something like this (taken from real code): if (dp1->di_db[adp->ad_lbn]!=adp->ad_oldblkno) panic("%s: %s: %s #%jd mismatch %d != %jd", "handle_written_inodeblock", (mntpt ? mntpt) : "", "direct pointer", (intmax_t)adp->ad_lbn, dp1->di_db[adp->ad_lbn], (intmax_t)adp->ad_oldblkno); The panic message would look like one of the following: panic: handle_written_inodeblock: /mnt: direct pointer #nnn mismatch nnn != nnn panic: handle_written_inodeblock: : direct pointer #nnn mismatch nnn != nnn The "" string there is a Bad Idea(tm); see below. Secondly, this brings up the question: what happens if someone is doing something like "fsck /var", where /var uses soft updates? /var isn't mounted when this happens. Can these inode-related functions get called during that time? If so, fs_fsmnt would (in theory -- I haven't tested in practise) be null. So in that case, what should get printed as the filesystem? Well, this is where the "" string comes into play. My first answer was: "the name of the device/slice/etc. which the inode is associated with". The problem is that I couldn't find a way to get this information, as it's not stored in struct fs anywhere. One would have to change the kernel ABI to pass this down the stack, which changes the ABI and is not something I'm willing to do (plus there's performance implications as you're passing something else on the stack per every call). Of course there may be a way to get this easily, but I don't see it or know of it. Thirdly, and this is equally as important: given the repetitive nature of this code (it would have to be repeated in numerous functions), making a common function that populates a (global) variable with the fsname its working on would be ideal. But I don't know the implication of this, nor do I see many (I think two?) global variables used within softdep_ffs.c. Extending one of the structs to get access to the necessary information is not as simple as "just do it" -- there are implications when it comes to memory usage and so on. This is not a piece of code to bang on lightly. This should probably be discussed on freebsd-hackers, but cross-posting across 3 separate mailing lists is rude. If you want to drive this, cool, but please start a new thread about the matter (wanting the filesystem or device printed in panic() when things like filesystem panics happen) on freebsd-hackers. I'm not subscribed to that list, so please CC me if you go this route. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Jul 20 20:50:54 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4465B1065673 for ; Tue, 20 Jul 2010 20:50:54 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from mail15.syd.optusnet.com.au (mail15.syd.optusnet.com.au [211.29.132.196]) by mx1.freebsd.org (Postfix) with ESMTP id C38948FC26 for ; Tue, 20 Jul 2010 20:50:53 +0000 (UTC) Received: from server.vk2pj.dyndns.org (c211-30-160-13.belrs4.nsw.optusnet.com.au [211.30.160.13]) by mail15.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o6KKonmG028607 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 21 Jul 2010 06:50:50 +1000 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.4/8.14.4) with ESMTP id o6KKonqq033449; Wed, 21 Jul 2010 06:50:49 +1000 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.4/8.14.4/Submit) id o6KKonLW033448; Wed, 21 Jul 2010 06:50:49 +1000 (EST) (envelope-from peter) Date: Wed, 21 Jul 2010 06:50:48 +1000 From: Peter Jeremy To: Jeremy Chadwick Message-ID: <20100720205048.GA33373@server.vk2pj.dyndns.org> References: <20100720134931.GA41352@icarus.home.lan> <201007201544.o6KFipaV084986@chez.mckusick.com> <20100720175709.GA52321@icarus.home.lan> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="OXfL5xGRrasGEqWY" Content-Disposition: inline In-Reply-To: <20100720175709.GA52321@icarus.home.lan> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2010 20:50:57 -0000 --OXfL5xGRrasGEqWY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2010-Jul-20 10:57:09 -0700, Jeremy Chadwick w= rote: >On Tue, Jul 20, 2010 at 08:44:51AM -0700, Kirk McKusick wrote: >> So, it is almost always possible to make your way to fs_fsmnt. >> Since soft updates only runs on mounted filesystems, you will never >> have a case where the fs_fsmnt has not been filled in for you. > >Oh, cool. I was taking the paranoid approach ("does this really >point to something that's valid? What if..."). :-) In the case of panic messages, you probably should err on the side of paranoia. By definition, getting to the point where you need to call panic(9) means something unexpected has happened and it's probably not being excessively paranoid to explicitly check each pointer before de-referencing it. --=20 Peter Jeremy --OXfL5xGRrasGEqWY Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (FreeBSD) iEYEARECAAYFAkxGDCgACgkQ/opHv/APuIeb1wCfTrwsIlBuw8iNfpKDxvcywODK w7kAnRto1Ve3t1z8UyMgmEVZLNo17DMj =1llG -----END PGP SIGNATURE----- --OXfL5xGRrasGEqWY-- From owner-freebsd-fs@FreeBSD.ORG Tue Jul 20 21:47:54 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CAA35106566C for ; Tue, 20 Jul 2010 21:47:54 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta12.westchester.pa.mail.comcast.net (qmta12.westchester.pa.mail.comcast.net [76.96.59.227]) by mx1.freebsd.org (Postfix) with ESMTP id 789A28FC0A for ; Tue, 20 Jul 2010 21:47:53 +0000 (UTC) Received: from omta07.westchester.pa.mail.comcast.net ([76.96.62.59]) by qmta12.westchester.pa.mail.comcast.net with comcast id kEzg1e0081GhbT85CMnurV; Tue, 20 Jul 2010 21:47:54 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta07.westchester.pa.mail.comcast.net with comcast id kMnt1e0053LrwQ23TMntQk; Tue, 20 Jul 2010 21:47:54 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id F2B569B425; Tue, 20 Jul 2010 14:47:51 -0700 (PDT) Date: Tue, 20 Jul 2010 14:47:51 -0700 From: Jeremy Chadwick To: Kirk McKusick Message-ID: <20100720214751.GA58332@icarus.home.lan> References: <20100720134931.GA41352@icarus.home.lan> <201007201544.o6KFipaV084986@chez.mckusick.com> <20100720175709.GA52321@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100720175709.GA52321@icarus.home.lan> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: "Mikhail T." , fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2010 21:47:54 -0000 On Tue, Jul 20, 2010 at 10:57:09AM -0700, Jeremy Chadwick wrote: > On Tue, Jul 20, 2010 at 08:44:51AM -0700, Kirk McKusick wrote: > > You are on the right track with getting the filesystem information. > > Any place that one has an inode (say pointer ip), it is possible > > to get the filesystem information using ip->i_fs->fs_fsmnt. The > > mount point can also be found from the mount-point structure. > > [...] > The biggest problem (for me) is testing. I have no idea how to > trigger the error conditions in these functions. I assume it varies; > maybe through fsdb(8) or interactively dropping to DDB and forcing > the condition. > > I tend to do all of my work on this sort on a VM of FreeBSD (using > VMware Workstation), but if testing on bare metal is required I > have a testbed as well. I've written (what I believe to be) the first stage of getting this accomplished, and have been (slowly) testing each of the functions I modified in src/sys/ufs/ffs/ffs_softdep.c. The testing methodology I'm using is somewhat tedious: moving the panic() calls I modified up near the top of the function, building and installing kernel (using -DNOCLEAN to speed things up) + rebooting box and see if the panic() gets hit + shows the correct mountpoint in the panic string. Some of the functions only get triggered during certain things (ex. softdep_waitidle wouldn't fire for me until I unmounted a spare UFS2+SU filesystem), but I'll figure it out. The tediousness isn't as bad as it could be given that I'm using VMware Workstation (test, restore to previous VM snapshot, repeat). The diff so far, I think, is around 20KBytes. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Jul 20 23:14:54 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F0E9106566C for ; Tue, 20 Jul 2010 23:14:54 +0000 (UTC) (envelope-from marco@tolstoy.tols.org) Received: from tolstoy.tols.org (tolstoy.tols.org [IPv6:2a02:898:0:20::57:1]) by mx1.freebsd.org (Postfix) with ESMTP id BCB948FC15 for ; Tue, 20 Jul 2010 23:14:53 +0000 (UTC) Received: from tolstoy.tols.org (localhost [127.0.0.1]) by tolstoy.tols.org (8.14.4/8.14.4) with ESMTP id o6KNEo75067162 for ; Tue, 20 Jul 2010 23:14:50 GMT (envelope-from marco@tolstoy.tols.org) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.96.1 at tolstoy.tols.org Received: (from marco@localhost) by tolstoy.tols.org (8.14.4/8.14.4/Submit) id o6KNEow9067161 for freebsd-fs@freebsd.org; Tue, 20 Jul 2010 23:14:50 GMT (envelope-from marco) Date: Tue, 20 Jul 2010 23:14:50 +0000 From: Marco van Tol To: freebsd-fs@freebsd.org Message-ID: <20100720231450.GA63895@tolstoy.tols.org> Mail-Followup-To: freebsd-fs@freebsd.org References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <4C10B136.3030404@kkip.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C10B136.3030404@kkip.pl> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on tolstoy.tols.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2010 23:14:54 -0000 On Thu, Jun 10, 2010 at 11:32:38AM +0200, Bartosz Stec wrote: [... periodic zfs scrubs and snapshots ...] > Ross-at-neces-dot-com already did what you're searching for. I'm using > his periodic scripts for some months now, check here: > http://www.neces.com/blog/technology/integrating-freebsd-zfs-and-periodic-snapshots-and-scrubs. > They're doing all necessary stuff, like checking for scrub in progress too. > Hope you'll find them helpful. I was just looking into these and noticed you probably have to enable "listsnapshots" on your zpool which is off by default. Couldn't find a reference to it on the page mentioned above, but I may have overlooked it. While listsnapshots is off, snapshots don't appear in the used "zfs list -H -o name" commands. There haven't been any succesful efforts to include these or similar in the distribution right? Marco -- Doe wat je niet laten kunt, laat wat je niet doen kunt. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 21 00:24:48 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E10C0106567C for ; Wed, 21 Jul 2010 00:24:48 +0000 (UTC) (envelope-from delphij@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id A71868FC13 for ; Wed, 21 Jul 2010 00:24:48 +0000 (UTC) Received: by iwn35 with SMTP id 35so7865296iwn.13 for ; Tue, 20 Jul 2010 17:24:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=Ajez7zI9KNXrYFakpl6cmYwFVZPuBFN8ul2TN/jbyMk=; b=hZuPmkGFwKbXe0Zn1WSm/htI0k+W/Vb0iO6fFTc6LjtzGTzfXJymM4NYbW8IquIUBV POhkLWSEN97VkvHrBhMoOOXxTVCGSNq1xDqgO/iVMStzReRY/vqIiobaubl10nabUvUQ HJIBt+ndUZ/CJfleR6p695J2b+j9FS9I+aNeA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=b4jxsHi/VrSLE98WtjDmREPl44LxJ4W43Y8CIHvcVdRCDj0qFQ3rs62Je5vmXdH5vF fVmzA14xlYaQ7kt6Sxc5ahBhTuC0fpnuaEfM8BOxUG1U5YD0ee+QWPggtGBZ54HayWVN OPT6J3DKkI5Vf1K7RKIplya95BW/WE5zJad5o= MIME-Version: 1.0 Received: by 10.231.155.131 with SMTP id s3mr8649378ibw.2.1279671885861; Tue, 20 Jul 2010 17:24:45 -0700 (PDT) Received: by 10.231.169.206 with HTTP; Tue, 20 Jul 2010 17:24:45 -0700 (PDT) In-Reply-To: <20100720231450.GA63895@tolstoy.tols.org> References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <4C10B136.3030404@kkip.pl> <20100720231450.GA63895@tolstoy.tols.org> Date: Tue, 20 Jul 2010 17:24:45 -0700 Message-ID: From: Xin LI To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2010 00:24:49 -0000 On Tue, Jul 20, 2010 at 4:14 PM, Marco van Tol wrote: > I was just looking into these and noticed you probably have to enable > "listsnapshots" on your zpool which is off by default. =C2=A0Couldn't fin= d a > reference to it on the page mentioned above, but I may have overlooked > it. =C2=A0While listsnapshots is off, snapshots don't appear in the used > "zfs list -H -o name" commands. I think the zfs list should be spelled as 'zfs list -t snapshot' for this purpose. Setting listsnapshots=3Don might be useful but could break something else and I'd personally prefer to avoid. Cheers, --=20 Xin LI http://www.delphij.net From owner-freebsd-fs@FreeBSD.ORG Wed Jul 21 01:04:22 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 292641065672; Wed, 21 Jul 2010 01:04:22 +0000 (UTC) (envelope-from freebsd@jdc.parodius.com) Received: from sj-iport-5.cisco.com (sj-iport-5.cisco.com [171.68.10.87]) by mx1.freebsd.org (Postfix) with ESMTP id BFAB48FC0A; Wed, 21 Jul 2010 01:04:21 +0000 (UTC) Authentication-Results: sj-iport-5.cisco.com; dkim=neutral (message not signed) header.i=none Received: from sj-core-4.cisco.com ([171.68.223.138]) by sj-iport-5.cisco.com with ESMTP; 21 Jul 2010 01:00:58 +0000 Received: from xbh-sjc-211.amer.cisco.com (xbh-sjc-211.cisco.com [171.70.151.144]) by sj-core-4.cisco.com (8.13.8/8.14.3) with ESMTP id o6L10h8d011443; Wed, 21 Jul 2010 01:00:58 GMT Received: from mail pickup service by xbh-sjc-211.amer.cisco.com with Microsoft SMTPSVC; Tue, 20 Jul 2010 13:51:39 -0700 Received: from sj-iport-5.cisco.com ([171.68.10.87]) by xbh-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 20 Jul 2010 06:50:18 -0700 Received: from sj-core-1.cisco.com ([171.71.177.237]) by sj-iport-5.cisco.com with ESMTP; 20 Jul 2010 13:50:18 +0000 Received: from rtp-iport-1.cisco.com (rtp-iport-1.cisco.com [64.102.122.148]) by sj-core-1.cisco.com (8.13.8/8.14.3) with ESMTP id o6KDoFXP006094 for ; Tue, 20 Jul 2010 13:50:17 GMT Received: from unknown (HELO smtp2-outbound.ironport.com) ([173.37.8.45]) by rtp-iport-1.cisco.com with ESMTP; 20 Jul 2010 13:50:16 +0000 Received: from smtp.soma.ironport.com (HELO c601.soma.ironport.com) ([192.168.60.201]) by smtp2-colo-in.ironport.com with ESMTP/TLS/RC4-SHA; 20 Jul 2010 06:50:09 -0700 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjsBAGdGRUxFk1M1kWdsb2JhbACDH5xQFQEBAQEJCwoHEQUdrxyRdYEmgUSBVXMEhACDfVw X-IronPort-AV: E=McAfee;i="5400,1158,6048"; a="292464491" X-IronPort-AV: E=Sophos;i="4.55,232,1278313200"; d="scan'208";a="292464491" Received: from mx2.freebsd.org ([69.147.83.53]) by soma-c601.ironport.com with ESMTP; 20 Jul 2010 06:50:07 -0700 Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 3A592179B78; Tue, 20 Jul 2010 13:49:44 +0000 (UTC) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 7D64C1065728; Tue, 20 Jul 2010 13:49:42 +0000 (UTC) (envelope-from owner-freebsd-stable@freebsd.org) Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9549D1065675 for ; Tue, 20 Jul 2010 13:49:33 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta14.emeryville.ca.mail.comcast.net (qmta14.emeryville.ca.mail.comcast.net [76.96.27.212]) by mx1.freebsd.org (Postfix) with ESMTP id 7B1038FC1D for ; Tue, 20 Jul 2010 13:49:33 +0000 (UTC) Received: from omta03.emeryville.ca.mail.comcast.net ([76.96.30.27]) by qmta14.emeryville.ca.mail.comcast.net with comcast id kDWt1e0030b6N64AEDpYLe; Tue, 20 Jul 2010 13:49:32 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta03.emeryville.ca.mail.comcast.net with comcast id kDpX1e00A3LrwQ28PDpYX8; Tue, 20 Jul 2010 13:49:32 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 86DDF9B425; Tue, 20 Jul 2010 06:49:31 -0700 (PDT) Date: Tue, 20 Jul 2010 06:49:31 -0700 From: Jeremy Chadwick To: "Mikhail T." Message-ID: <20100720134931.GA41352@icarus.home.lan> References: <4C43F35D.5020007@aldan.algebra.com> <20100719113147.GA4786@icarus.home.lan> <4C44758F.7080209@aldan.algebra.com> <20100719204124.GA21573@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100719204124.GA21573@icarus.home.lan> User-Agent: Mutt/1.5.20 (2009-06-14) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-stable@freebsd.org Errors-To: owner-freebsd-stable@freebsd.org X-OriginalArrivalTime: 20 Jul 2010 13:50:18.0289 (UTC) FILETIME=[7D0FA210:01CB2812] Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2010 01:04:22 -0000 On Mon, Jul 19, 2010 at 01:41:24PM -0700, Jeremy Chadwick wrote: > On Mon, Jul 19, 2010 at 11:55:59AM -0400, Mikhail T. wrote: > > 19.07.2010 07:31, Jeremy Chadwick написав(ла): > > >If you boot the machine in single-user, and run fsck manually, are there > > >any errors? > > Thanks, Jeremy... I wish, there was a way to learn, /which/ > > file-system is giving trouble... However, after sending the question > > out last night, I tried to pkg_delete a package on the machine, and > > was very lucky to see a file-system error (inode something or other) > > before the panic struck. That, at least, told me, which file-system > > was in trouble (/var). > > [...] > > And, IMO, at the very least, *any panic related to a file-system > > must clearly identify the file-system in question*... What do you > > think? > > [...] > Assuming work tonight isn't that busy for me, I'll see if I can dedicate > some cycles to printing this information in the error string you saw. I spent some time on this tonight. It's not as simple as it sounds, for me anyway. Relevant source bits: src/sys/ufs/ffs/ffs_softdep.c src/sys/ufs/ffs/fs.h src/sys/ufs/ffs/softdep.h ffs_softdep.c, which is almost 6500 lines, contains a large number of inode-related functions which can call panic(). Functions which have easy access to the related inodedep struct are the ones which would be able to print this information easily. Sort of. struct inodedep (see softdep.h) contains a member called id_fs, which is struct fs (see fs.h). struct fs contains a member called fs_fsmnt (a char buffer), which is the name of the mounted filesystem. fs_fsmnt[0] should be NULL ('\0') if the filesystem isn't mounted. So in the case of your panic within handle_written_inodeblock(), it would be as simple as something like: u_char *mntpt = NULL; if (inodedep->id_fs->fs_fsmnt[0] != '\0') mntpt = &inodedep->id_fs->fs_fsmnt; else /* XXX do what here? */ Then, the panic() statements later have to do something like this (taken from real code): if (dp1->di_db[adp->ad_lbn]!=adp->ad_oldblkno) panic("%s: %s: %s #%jd mismatch %d != %jd", "handle_written_inodeblock", (mntpt ? mntpt) : "", "direct pointer", (intmax_t)adp->ad_lbn, dp1->di_db[adp->ad_lbn], (intmax_t)adp->ad_oldblkno); The panic message would look like one of the following: panic: handle_written_inodeblock: /mnt: direct pointer #nnn mismatch nnn != nnn panic: handle_written_inodeblock: : direct pointer #nnn mismatch nnn != nnn The "" string there is a Bad Idea(tm); see below. Secondly, this brings up the question: what happens if someone is doing something like "fsck /var", where /var uses soft updates? /var isn't mounted when this happens. Can these inode-related functions get called during that time? If so, fs_fsmnt would (in theory -- I haven't tested in practise) be null. So in that case, what should get printed as the filesystem? Well, this is where the "" string comes into play. My first answer was: "the name of the device/slice/etc. which the inode is associated with". The problem is that I couldn't find a way to get this information, as it's not stored in struct fs anywhere. One would have to change the kernel ABI to pass this down the stack, which changes the ABI and is not something I'm willing to do (plus there's performance implications as you're passing something else on the stack per every call). Of course there may be a way to get this easily, but I don't see it or know of it. Thirdly, and this is equally as important: given the repetitive nature of this code (it would have to be repeated in numerous functions), making a common function that populates a (global) variable with the fsname its working on would be ideal. But I don't know the implication of this, nor do I see many (I think two?) global variables used within softdep_ffs.c. Extending one of the structs to get access to the necessary information is not as simple as "just do it" -- there are implications when it comes to memory usage and so on. This is not a piece of code to bang on lightly. This should probably be discussed on freebsd-hackers, but cross-posting across 3 separate mailing lists is rude. If you want to drive this, cool, but please start a new thread about the matter (wanting the filesystem or device printed in panic() when things like filesystem panics happen) on freebsd-hackers. I'm not subscribed to that list, so please CC me if you go this route. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Jul 21 01:06:53 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4D4E9106566C; Wed, 21 Jul 2010 01:06:53 +0000 (UTC) (envelope-from freebsd@jdc.parodius.com) Received: from sj-iport-4.cisco.com (sj-iport-4.cisco.com [171.68.10.86]) by mx1.freebsd.org (Postfix) with ESMTP id BE08A8FC1C; Wed, 21 Jul 2010 01:06:52 +0000 (UTC) Authentication-Results: sj-iport-4.cisco.com; dkim=neutral (message not signed) header.i=none Received: from sj-core-4.cisco.com ([171.68.223.138]) by sj-iport-4.cisco.com with ESMTP; 21 Jul 2010 01:05:37 +0000 Received: from xbh-sjc-211.amer.cisco.com (xbh-sjc-211.cisco.com [171.70.151.144]) by sj-core-4.cisco.com (8.13.8/8.14.3) with ESMTP id o6L15WaU014012; Wed, 21 Jul 2010 01:05:37 GMT Received: from mail pickup service by xbh-sjc-211.amer.cisco.com with Microsoft SMTPSVC; Tue, 20 Jul 2010 13:23:12 -0700 Received: from xbh-rcd-102.cisco.com ([72.163.62.139]) by xbh-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 20 Jul 2010 07:01:19 -0700 Received: from xmb-rcd-313.cisco.com ([72.163.63.28]) by xbh-rcd-102.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 20 Jul 2010 09:01:18 -0500 Received: from xbh-rcd-301.cisco.com ([72.163.63.8]) by xmb-rcd-313.cisco.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 20 Jul 2010 09:01:18 -0500 Received: from xmb-rcd-312.cisco.com ([72.163.63.27]) by xbh-rcd-301.cisco.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 20 Jul 2010 09:01:18 -0500 Received: from xbh-rcd-202.cisco.com ([72.163.62.201]) by xmb-rcd-312.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 20 Jul 2010 08:50:28 -0500 Received: from xmb-rcd-312.cisco.com ([72.163.63.27]) by xbh-rcd-202.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 20 Jul 2010 08:50:27 -0500 Received: from xbh-rcd-101.cisco.com ([72.163.62.138]) by xmb-rcd-312.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 20 Jul 2010 08:50:27 -0500 Received: from xmb-rcd-312.cisco.com ([72.163.63.27]) by xbh-rcd-101.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 20 Jul 2010 08:50:24 -0500 Received: from xbh-rcd-102.cisco.com ([72.163.62.139]) by xmb-rcd-312.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 20 Jul 2010 08:50:24 -0500 Received: from rtp-iport-2.cisco.com ([64.102.122.149]) by xbh-rcd-102.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 20 Jul 2010 08:50:23 -0500 Received: from rcdn-core-5.cisco.com ([173.37.93.156]) by rtp-iport-2.cisco.com with ESMTP; 20 Jul 2010 13:50:20 +0000 Received: from sj-inbound-e.cisco.com (sj-inbound-e.cisco.com [128.107.243.14]) by rcdn-core-5.cisco.com (8.14.3/8.14.3) with ESMTP id o6KDoJHq013369 for ; Tue, 20 Jul 2010 13:50:20 GMT X-from-outside-Cisco: 69.147.83.53 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjsBACtGRUxFk1M1kWdsb2JhbACDH5xQFQEBAQEJCwoHEQUdpWeJNJF1gSaBRIFVcwSEAIN9XA X-IronPort-AV: E=Sophos;i="4.55,232,1278288000"; d="scan'208";a="165701141" Received: from mx2.freebsd.org ([69.147.83.53]) by sj-inbound-e.cisco.com with ESMTP; 20 Jul 2010 13:50:19 +0000 Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 0AE86179E9B; Tue, 20 Jul 2010 13:49:44 +0000 (UTC) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 45AE1106573C; Tue, 20 Jul 2010 13:49:43 +0000 (UTC) (envelope-from owner-freebsd-stable@freebsd.org) Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9549D1065675 for ; Tue, 20 Jul 2010 13:49:33 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta14.emeryville.ca.mail.comcast.net (qmta14.emeryville.ca.mail.comcast.net [76.96.27.212]) by mx1.freebsd.org (Postfix) with ESMTP id 7B1038FC1D for ; Tue, 20 Jul 2010 13:49:33 +0000 (UTC) Received: from omta03.emeryville.ca.mail.comcast.net ([76.96.30.27]) by qmta14.emeryville.ca.mail.comcast.net with comcast id kDWt1e0030b6N64AEDpYLe; Tue, 20 Jul 2010 13:49:32 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta03.emeryville.ca.mail.comcast.net with comcast id kDpX1e00A3LrwQ28PDpYX8; Tue, 20 Jul 2010 13:49:32 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 86DDF9B425; Tue, 20 Jul 2010 06:49:31 -0700 (PDT) Date: Tue, 20 Jul 2010 06:49:31 -0700 From: Jeremy Chadwick To: "Mikhail T." Message-ID: <20100720134931.GA41352@icarus.home.lan> References: <4C43F35D.5020007@aldan.algebra.com> <20100719113147.GA4786@icarus.home.lan> <4C44758F.7080209@aldan.algebra.com> <20100719204124.GA21573@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100719204124.GA21573@icarus.home.lan> User-Agent: Mutt/1.5.20 (2009-06-14) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-stable@freebsd.org Errors-To: owner-freebsd-stable@freebsd.org X-OriginalArrivalTime: 20 Jul 2010 13:50:23.0924 (UTC) FILETIME=[806B7740:01CB2812] Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2010 01:06:53 -0000 On Mon, Jul 19, 2010 at 01:41:24PM -0700, Jeremy Chadwick wrote: > On Mon, Jul 19, 2010 at 11:55:59AM -0400, Mikhail T. wrote: > > 19.07.2010 07:31, Jeremy Chadwick написав(ла): > > >If you boot the machine in single-user, and run fsck manually, are there > > >any errors? > > Thanks, Jeremy... I wish, there was a way to learn, /which/ > > file-system is giving trouble... However, after sending the question > > out last night, I tried to pkg_delete a package on the machine, and > > was very lucky to see a file-system error (inode something or other) > > before the panic struck. That, at least, told me, which file-system > > was in trouble (/var). > > [...] > > And, IMO, at the very least, *any panic related to a file-system > > must clearly identify the file-system in question*... What do you > > think? > > [...] > Assuming work tonight isn't that busy for me, I'll see if I can dedicate > some cycles to printing this information in the error string you saw. I spent some time on this tonight. It's not as simple as it sounds, for me anyway. Relevant source bits: src/sys/ufs/ffs/ffs_softdep.c src/sys/ufs/ffs/fs.h src/sys/ufs/ffs/softdep.h ffs_softdep.c, which is almost 6500 lines, contains a large number of inode-related functions which can call panic(). Functions which have easy access to the related inodedep struct are the ones which would be able to print this information easily. Sort of. struct inodedep (see softdep.h) contains a member called id_fs, which is struct fs (see fs.h). struct fs contains a member called fs_fsmnt (a char buffer), which is the name of the mounted filesystem. fs_fsmnt[0] should be NULL ('\0') if the filesystem isn't mounted. So in the case of your panic within handle_written_inodeblock(), it would be as simple as something like: u_char *mntpt = NULL; if (inodedep->id_fs->fs_fsmnt[0] != '\0') mntpt = &inodedep->id_fs->fs_fsmnt; else /* XXX do what here? */ Then, the panic() statements later have to do something like this (taken from real code): if (dp1->di_db[adp->ad_lbn]!=adp->ad_oldblkno) panic("%s: %s: %s #%jd mismatch %d != %jd", "handle_written_inodeblock", (mntpt ? mntpt) : "", "direct pointer", (intmax_t)adp->ad_lbn, dp1->di_db[adp->ad_lbn], (intmax_t)adp->ad_oldblkno); The panic message would look like one of the following: panic: handle_written_inodeblock: /mnt: direct pointer #nnn mismatch nnn != nnn panic: handle_written_inodeblock: : direct pointer #nnn mismatch nnn != nnn The "" string there is a Bad Idea(tm); see below. Secondly, this brings up the question: what happens if someone is doing something like "fsck /var", where /var uses soft updates? /var isn't mounted when this happens. Can these inode-related functions get called during that time? If so, fs_fsmnt would (in theory -- I haven't tested in practise) be null. So in that case, what should get printed as the filesystem? Well, this is where the "" string comes into play. My first answer was: "the name of the device/slice/etc. which the inode is associated with". The problem is that I couldn't find a way to get this information, as it's not stored in struct fs anywhere. One would have to change the kernel ABI to pass this down the stack, which changes the ABI and is not something I'm willing to do (plus there's performance implications as you're passing something else on the stack per every call). Of course there may be a way to get this easily, but I don't see it or know of it. Thirdly, and this is equally as important: given the repetitive nature of this code (it would have to be repeated in numerous functions), making a common function that populates a (global) variable with the fsname its working on would be ideal. But I don't know the implication of this, nor do I see many (I think two?) global variables used within softdep_ffs.c. Extending one of the structs to get access to the necessary information is not as simple as "just do it" -- there are implications when it comes to memory usage and so on. This is not a piece of code to bang on lightly. This should probably be discussed on freebsd-hackers, but cross-posting across 3 separate mailing lists is rude. If you want to drive this, cool, but please start a new thread about the matter (wanting the filesystem or device printed in panic() when things like filesystem panics happen) on freebsd-hackers. I'm not subscribed to that list, so please CC me if you go this route. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Jul 21 08:29:21 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 312E91065673 for ; Wed, 21 Jul 2010 08:29:21 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id CDCC38FC15 for ; Wed, 21 Jul 2010 08:29:20 +0000 (UTC) Received: from outgoing.leidinger.net (pD9E2C6A9.dip.t-dialin.net [217.226.198.169]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id D4AE184405D; Wed, 21 Jul 2010 10:29:12 +0200 (CEST) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 780571833; Wed, 21 Jul 2010 10:29:09 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1279700949; bh=Syg5nBLh5qhmoLtKWKt7CxC0LTSWMNVsnhr5acxKVHA=; h=Message-ID:Date:From:To:Cc:Subject:References:In-Reply-To: MIME-Version:Content-Type:Content-Transfer-Encoding; b=XpZihj9nmg2sIGTWjvV3RgBxWCMLq5yuKYQIFmuFEy6Qux2OAFCbi38Tu9aiVt9v7 aArdjlnxy/Sc4nbf3h0GYN9JsvkCRfQ8FIyyxmiB08SYnJznk8HyjWhscfkf+hn4WI hNH+aZcF3/5rASvpJpEsTOz7DYxGZlgl6kx2OcOiUoRK+uoOK1Jf0LG0YogX1pkwaT MkfmbOEOFqKWetZswP8ET3l0CACHqF2FuT2LxoW4GQGtMuQtQMIOJuuXV2aqb5fHpr 4J4x5tAc4AvMKRbwYEfC5TAoAiYXXwBI9i2IV/W69v0ipWQRIzAnvNVoE6DPKDnGVY yg/OzAU9CNrdw== Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o6L8T9B0033209; Wed, 21 Jul 2010 10:29:09 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Wed, 21 Jul 2010 10:29:09 +0200 Message-ID: <20100721102909.908226zl2nxympck@webmail.leidinger.net> Date: Wed, 21 Jul 2010 10:29:09 +0200 From: Alexander Leidinger To: Marco van Tol References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <4C10B136.3030404@kkip.pl> <20100720231450.GA63895@tolstoy.tols.org> In-Reply-To: <20100720231450.GA63895@tolstoy.tols.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: D4AE184405D.A6DB4 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1.023, required 6, autolearn=disabled, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1280305758.02574@mz9+Ihwzo0hFJWmR6H+4cA X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2010 08:29:21 -0000 Quoting Marco van Tol (from Tue, 20 Jul 2010 23:14:50 +0000): > On Thu, Jun 10, 2010 at 11:32:38AM +0200, Bartosz Stec wrote: > > [... periodic zfs scrubs and snapshots ...] > >> Ross-at-neces-dot-com already did what you're searching for. I'm using >> his periodic scripts for some months now, check here: >> > http://www.neces.com/blog/technology/integrating-freebsd-zfs-and-periodic-snapshots-and-scrubs. >> They're doing all necessary stuff, like checking for scrub in progress too. >> Hope you'll find them helpful. > > I was just looking into these and noticed you probably have to enable > "listsnapshots" on your zpool which is off by default. Couldn't find a > reference to it on the page mentioned above, but I may have overlooked > it. While listsnapshots is off, snapshots don't appear in the used > "zfs list -H -o name" commands. The script in -current does not use any code from the link above. The script in -current does not do snapshots, only scrubs. The script in -current does only do "zpool list", not "zfs list". So all in all, the script in -current does not need any change (I would have preferred the "-t snapshot" way of listing snapshots, if it would have been necessary). Bye, Alexander. -- Democracy is also a form of worship. It is the worship of Jackals by Jackasses. -- H. L. Mencken http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 21 09:00:22 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8569F1065676 for ; Wed, 21 Jul 2010 09:00:22 +0000 (UTC) (envelope-from gljennjohn@googlemail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 0AE678FC15 for ; Wed, 21 Jul 2010 09:00:21 +0000 (UTC) Received: by wwe15 with SMTP id 15so1597338wwe.31 for ; Wed, 21 Jul 2010 02:00:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:subject :message-id:in-reply-to:references:reply-to:x-mailer:mime-version :content-type:content-transfer-encoding; bh=lcP2GFDOghLzUeEBedsrx/TVWBn8fbYBH8Elj+8xckc=; b=vPewa9uajcMLTjfU+e7oC4V7uh9C60ei05kTasBA59W545wex4lnOvkxnQ8kdYq13t N6OuFuE1/CffmqV4aSG0yvqJdagiBcTcZ8ner53w32d6iZoMBeOH8RMTc4tfxT7uMcB5 ftwXeiEIwpI49TV0nNY61Eqwe3RwYuO08No0s= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=date:from:to:subject:message-id:in-reply-to:references:reply-to :x-mailer:mime-version:content-type:content-transfer-encoding; b=iG/bjgX6y6eAdJGtwVC75eYL/29Lls1Yt9KIHsVAhT8dJTZX4TodMtqMvkvPwNh5dt VfXlQhuuKaxqKmd41KCIedAlPfVc2mUdTdsYjjRneqQCUvu0X0FY72VErubZRl8WNrKV XyiVqyJ+jzEo1yMVaE9kvaaAauHGt3Fnut09E= Received: by 10.227.157.68 with SMTP id a4mr465116wbx.159.1279701029909; Wed, 21 Jul 2010 01:30:29 -0700 (PDT) Received: from ernst.jennejohn.org (p578E21D0.dip.t-dialin.net [87.142.33.208]) by mx.google.com with ESMTPS id a1sm51185916wbb.2.2010.07.21.01.30.28 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 21 Jul 2010 01:30:29 -0700 (PDT) Date: Wed, 21 Jul 2010 10:30:27 +0200 From: Gary Jennejohn To: fs@freebsd.org Message-ID: <20100721103027.3345a5e6@ernst.jennejohn.org> In-Reply-To: <4C45D37A.5020304@aldan.algebra.com> References: <201007201544.o6KFipaV084986@chez.mckusick.com> <4C45D37A.5020304@aldan.algebra.com> X-Mailer: Claws Mail 3.7.5 (GTK+ 2.18.7; amd64-portbld-freebsd9.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Subject: Re: background fsck considered harmful? (Re: panic: handle_written_inodeblock: bad size) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: gljennjohn@googlemail.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2010 09:00:22 -0000 On Tue, 20 Jul 2010 12:48:58 -0400 "Mikhail T." wrote: > 20.07.2010 11:44, Kirk McKusick wrote: > > Adding it to all the panic's will be a lot of work, > > but I agree would be useful. I will look into doing so when I > > get a chance. > > > > Kirk McKusick > > > How about disabling background fsck in a default install? It seems to be > the consensus here, that my troubles were due to fsck not fixing the > file-system properly reboot after reboot... > [trimmed to fs@] Since we're discussing bg fsck... For those running -current I highly recommend SUJ. It recovers the file systems in fractions of a second after a crash and obviates the need for fsck. I've only had good results using it. -- Gary Jennejohn From owner-freebsd-fs@FreeBSD.ORG Wed Jul 21 09:11:00 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A66BE106566B for ; Wed, 21 Jul 2010 09:11:00 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net [76.96.30.96]) by mx1.freebsd.org (Postfix) with ESMTP id 8C65A8FC18 for ; Wed, 21 Jul 2010 09:11:00 +0000 (UTC) Received: from omta23.emeryville.ca.mail.comcast.net ([76.96.30.90]) by qmta09.emeryville.ca.mail.comcast.net with comcast id kZAz1e0021wfjNsA9ZAzdp; Wed, 21 Jul 2010 09:10:59 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta23.emeryville.ca.mail.comcast.net with comcast id kZAy1e0033LrwQ28jZAzAj; Wed, 21 Jul 2010 09:10:59 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 9536E9B425; Wed, 21 Jul 2010 02:10:58 -0700 (PDT) Date: Wed, 21 Jul 2010 02:10:58 -0700 From: Jeremy Chadwick To: Kirk McKusick Message-ID: <20100721091058.GA74971@icarus.home.lan> References: <20100720134931.GA41352@icarus.home.lan> <201007201544.o6KFipaV084986@chez.mckusick.com> <20100720175709.GA52321@icarus.home.lan> <20100720214751.GA58332@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100720214751.GA58332@icarus.home.lan> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: "Mikhail T." , fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2010 09:11:00 -0000 On Tue, Jul 20, 2010 at 02:47:51PM -0700, Jeremy Chadwick wrote: > On Tue, Jul 20, 2010 at 10:57:09AM -0700, Jeremy Chadwick wrote: > > On Tue, Jul 20, 2010 at 08:44:51AM -0700, Kirk McKusick wrote: > > > You are on the right track with getting the filesystem information. > > > Any place that one has an inode (say pointer ip), it is possible > > > to get the filesystem information using ip->i_fs->fs_fsmnt. The > > > mount point can also be found from the mount-point structure. > > > > [...] > > The biggest problem (for me) is testing. I have no idea how to > > trigger the error conditions in these functions. I assume it varies; > > maybe through fsdb(8) or interactively dropping to DDB and forcing > > the condition. > > > > I tend to do all of my work on this sort on a VM of FreeBSD (using > > VMware Workstation), but if testing on bare metal is required I > > have a testbed as well. > > I've written (what I believe to be) the first stage of getting this > accomplished, and have been (slowly) testing each of the functions I > modified in src/sys/ufs/ffs/ffs_softdep.c. > [...] > The diff so far, I think, is around 20KBytes. I finished preliminary testing tonight. There were two functions which I couldn't verify work because I couldn't get the kernel to call them no matter what I tried: softdep_setup_allocext() request_cleanup() All the other functions I modified were tested by moving the panic() call near the top of the function and doing whatever was needed. Sometimes mounting a filesystem was all that was required to trigger it, other times I had to make a new filesystem + sync, or umount. In one case I had to make a UFS1+SU filesystem. It's not a completely 100% accurate test, hence "preliminary". :-) I also fixed a couple cosmetical items with the code (things not lining up right, some strings having ":" at the end of them when it should have been within the initial formatting string itself, and one function which used %s for "softdep" instead of just using the actual string itself). These were few and far between. Below is the patch/diff for RELENG_8 I've come up with. I only tested this on amd64: http://jdc.parodius.com/freebsd/ffs_softdep.c.patch Kirk, if you could review this I'd appreciate it. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Jul 21 20:15:59 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 20207106566C for ; Wed, 21 Jul 2010 20:15:59 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id F398F8FC08 for ; Wed, 21 Jul 2010 20:15:58 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id o6LKFp9Y066176; Wed, 21 Jul 2010 13:15:51 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201007212015.o6LKFp9Y066176@chez.mckusick.com> To: "Mikhail T." In-reply-to: <4C45D37A.5020304@aldan.algebra.com> Date: Wed, 21 Jul 2010 13:15:51 -0700 From: Kirk McKusick Cc: fs@freebsd.org Subject: Re: background fsck considered harmful? (Re: panic: handle_written_inodeblock: bad size) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2010 20:15:59 -0000 > Date: Tue, 20 Jul 2010 12:48:58 -0400 > From: "Mikhail T." > To: Kirk McKusick > CC: Jeremy Chadwick , fs@freebsd.org > Subject: background fsck considered harmful? (Re: panic: handle_written_inodeblock: > bad size) > X-ASK-Info: Message Queued (2010/07/20 09:49:10) > X-ASK-Info: Confirmed by User (2010/07/20 10:28:39) > > 20.07.2010 11:44, Kirk McKusick (): > > Adding it to all the panic's will be a lot of work, > > but I agree would be useful. I will look into doing so when I > > get a chance. > > > > Kirk McKusick > > > How about disabling background fsck in a default install? It seems to be > the consensus here, that my troubles were due to fsck not fixing the > file-system properly reboot after reboot... > > Yours, > > -mi Certainly disabling background fsck will eliminate that from your possible set of issues and may prevent a recurrance. It does mean that after a crash you will have to wait while your filesystems are checked before your system will come up. If your filesystems are below 0.5Tb that should be tolerable. The longer term solution is to use journaled soft updates when they become available in 9.0. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Wed Jul 21 20:25:42 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EF263106566B for ; Wed, 21 Jul 2010 20:25:41 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id C7FB08FC29 for ; Wed, 21 Jul 2010 20:25:41 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id o6LKPdpD068355; Wed, 21 Jul 2010 13:25:39 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201007212025.o6LKPdpD068355@chez.mckusick.com> To: Jeremy Chadwick In-reply-to: <20100721091058.GA74971@icarus.home.lan> Date: Wed, 21 Jul 2010 13:25:39 -0700 From: Kirk McKusick Cc: "Mikhail T." , fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2010 20:25:42 -0000 > Date: Wed, 21 Jul 2010 02:10:58 -0700 > From: Jeremy Chadwick > To: Kirk McKusick > Cc: "Mikhail T." , fs@freebsd.org > Subject: Re: panic: handle_written_inodeblock: bad size > > On Tue, Jul 20, 2010 at 02:47:51PM -0700, Jeremy Chadwick wrote: > > On Tue, Jul 20, 2010 at 10:57:09AM -0700, Jeremy Chadwick wrote: > > > On Tue, Jul 20, 2010 at 08:44:51AM -0700, Kirk McKusick wrote: > > > > You are on the right track with getting the filesystem information. > > > > Any place that one has an inode (say pointer ip), it is possible > > > > to get the filesystem information using ip->i_fs->fs_fsmnt. The > > > > mount point can also be found from the mount-point structure. > > > > > > [...] > > > The biggest problem (for me) is testing. I have no idea how to > > > trigger the error conditions in these functions. I assume it varies; > > > maybe through fsdb(8) or interactively dropping to DDB and forcing > > > the condition. > > > > > > I tend to do all of my work on this sort on a VM of FreeBSD (using > > > VMware Workstation), but if testing on bare metal is required I > > > have a testbed as well. > > > > I've written (what I believe to be) the first stage of getting this > > accomplished, and have been (slowly) testing each of the functions I > > modified in src/sys/ufs/ffs/ffs_softdep.c. > > [...] > > The diff so far, I think, is around 20KBytes. > > I finished preliminary testing tonight. There were two functions which > I couldn't verify work because I couldn't get the kernel to call them no > matter what I tried: > > softdep_setup_allocext() > request_cleanup() The softdep_setup_allocext() function is only called when you set external attributes such as ACL parameters (try setfacl(1)). The request_cleanup() function is only called when your kernel is memory stressed from too many filesystem operations. You might get it to trigger by using sysctl to cut debug.max_softdeps to say 5000 then try removing a tree with at least 10000 files in it. > All the other functions I modified were tested by moving the panic() > call near the top of the function and doing whatever was needed. > Sometimes mounting a filesystem was all that was required to trigger it, > other times I had to make a new filesystem + sync, or umount. In one > case I had to make a UFS1+SU filesystem. It's not a completely 100% > accurate test, hence "preliminary". :-) Sounds good. > I also fixed a couple cosmetical items with the code (things not lining > up right, some strings having ":" at the end of them when it should have > been within the initial formatting string itself, and one function which > used %s for "softdep" instead of just using the actual string itself). > These were few and far between. > > Below is the patch/diff for RELENG_8 I've come up with. I only tested > this on amd64: > > http://jdc.parodius.com/freebsd/ffs_softdep.c.patch > > Kirk, if you could review this I'd appreciate it. I have grabbed your patch and will look it over in the next few days. > -- > | Jeremy Chadwick jdc@parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | Thanks for taking the time to go though all those panics! Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Wed Jul 21 21:15:31 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 19EF1106564A for ; Wed, 21 Jul 2010 21:15:31 +0000 (UTC) (envelope-from mi+thun@aldan.algebra.com) Received: from mail2.timeinc.net (mail2.timeinc.net [64.236.74.30]) by mx1.freebsd.org (Postfix) with ESMTP id BA7B18FC0C for ; Wed, 21 Jul 2010 21:15:30 +0000 (UTC) Received: from mail.timeinc.net (mail.timeinc.net [64.12.55.166]) by mail2.timeinc.net (8.13.8/8.13.8) with ESMTP id o6LLFTIm032188 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 21 Jul 2010 17:15:29 -0400 Received: from ws-mteterin.dev.pathfinder.com (ws-mteterin.dev.pathfinder.com [209.251.223.173]) by mail.timeinc.net (8.13.8/8.13.8) with SMTP id o6LLFSbU024824; Wed, 21 Jul 2010 17:15:29 -0400 Message-ID: <4C476370.6030907@aldan.algebra.com> Date: Wed, 21 Jul 2010 17:15:28 -0400 From: "Mikhail T." Organization: Virtual Estates, Inc. User-Agent: Mozilla/5.0 (X11; U; Linux i686; uk; rv:1.9.1.11) Gecko/20100711 Lightning/1.0b1 Thunderbird/3.0.6 MIME-Version: 1.0 To: Kirk McKusick References: <201007212015.o6LKFp9Y066176@chez.mckusick.com> In-Reply-To: <201007212015.o6LKFp9Y066176@chez.mckusick.com> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 8bit X-Mailman-Approved-At: Wed, 21 Jul 2010 22:25:11 +0000 Cc: fs@freebsd.org Subject: Re: background fsck considered harmful? (Re: panic: handle_written_inodeblock: bad size) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2010 21:15:31 -0000 21.07.2010 16:15, Kirk McKusick (): > Certainly disabling background fsck will eliminate that from your > possible set of issues and may prevent a recurrance. It does mean > that after a crash you will have to wait while your filesystems > are checked before your system will come up. If your filesystems > are below 0.5Tb that should be tolerable. > > The longer term solution is to use journaled soft updates when they > become available in 9.0. > We are about to ship 8.1 -- with background fsck enabled by default possibly causing problems requiring far more admin time (and involving real data-loss). If the existing fsck can not be improved to properly fix the fs, when running in background mode, just as well as when it is running pre-mount, then, IMHO, it should not be enabled by default. Crashes are quite rare and waiting once in a while for fsck to rumble through would be better, than to have some people enter into a vicious circle of mysterious panics (even if Jeremy's ongoing work makes them slightly less mysterious). Respectfully yours, -mi From owner-freebsd-fs@FreeBSD.ORG Thu Jul 22 03:35:11 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 67891106564A for ; Thu, 22 Jul 2010 03:35:11 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id 2E7148FC12 for ; Thu, 22 Jul 2010 03:35:10 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id o6M3Z1ZT062733; Wed, 21 Jul 2010 20:35:02 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201007220335.o6M3Z1ZT062733@chez.mckusick.com> To: "Mikhail T." In-reply-to: <4C476370.6030907@aldan.algebra.com> Date: Wed, 21 Jul 2010 20:35:01 -0700 From: Kirk McKusick Cc: fs@freebsd.org Subject: Re: background fsck considered harmful? (Re: panic: handle_written_inodeblock: bad size) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2010 03:35:11 -0000 > Date: Wed, 21 Jul 2010 17:15:28 -0400 > From: "Mikhail T." > Organization: Virtual Estates, Inc. > To: Kirk McKusick > Cc: fs@freebsd.org > Subject: Re: background fsck considered harmful? > > 21.07.2010 16:15, Kirk McKusick: > > Certainly disabling background fsck will eliminate that from your > > possible set of issues and may prevent a recurrance. It does mean > > that after a crash you will have to wait while your filesystems > > are checked before your system will come up. If your filesystems > > are below 0.5Tb that should be tolerable. > > > > The longer term solution is to use journaled soft updates when they > > become available in 9.0. > > We are about to ship 8.1 -- with background fsck enabled by default > possibly causing problems requiring far more admin time (and involving > real data-loss). > > If the existing fsck can not be improved to properly fix the fs, when > running in background mode, just as well as when it is running > pre-mount, then, IMHO, it should not be enabled by default. > > Crashes are quite rare and waiting once in a while for fsck to rumble > through would be better, than to have some people enter into a vicious > circle of mysterious panics (even if Jeremy's ongoing work makes them > slightly less mysterious). > > Respectfully yours, > > -mi I believe that you are being excessively harsh on background fsck. Generally the problems are caused by hard-disk errors. Because background fsck only checks a small subset of the disk it does not find them and so when they eventually accumulate enough they cause difficult problems. Foreground fsck checks all the disk metadata every time, so hard disk errors are captured immediately before they have had a chance to accumulate. But background fsck users blame it because it has not found them. If you have small disk systems, running foreground fsck is an acceptable solution (and indeed I would recommend it). But when you are running systems with 20Tb of disks, you are not willing to have your system down for 10 hours after every crash. A reasonable intermediate solution is to use background fsck by default, but schedule down time to run a full fsck once a month or so to check for accumulated hard disk errors. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Thu Jul 22 04:10:03 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 91BC41065672 for ; Thu, 22 Jul 2010 04:10:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 682068FC18 for ; Thu, 22 Jul 2010 04:10:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o6M4A3eX092611 for ; Thu, 22 Jul 2010 04:10:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o6M4A3hv092610; Thu, 22 Jul 2010 04:10:03 GMT (envelope-from gnats) Date: Thu, 22 Jul 2010 04:10:03 GMT Message-Id: <201007220410.o6M4A3hv092610@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Shigeya Suzuki Cc: Subject: Re: kern/148709: [zfs] [panic] running du with zfs filesystem with sharesmb=on cause panic zfs_fuid_map_id X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Shigeya Suzuki List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2010 04:10:03 -0000 The following reply was made to PR kern/148709; it has been noted by GNATS. From: Shigeya Suzuki To: bug-followup@FreeBSD.org, shigeya@wide.ad.jp Cc: Subject: Re: kern/148709: [zfs] [panic] running du with zfs filesystem with sharesmb=on cause panic zfs_fuid_map_id Date: Thu, 22 Jul 2010 13:08:45 +0900 I forgot to mention that, the current 'zfs' command does not allow to modify sharesmb attribute. Thus, if user want to turn off sharesmb attribute, he have to boot from OpenSolaris to do so. While it may be possible, not easy to fix this, especially if there are lot of filesystems with sharesmb=on. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 22 09:00:14 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76DDD1065674 for ; Thu, 22 Jul 2010 09:00:14 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 47A3E8FC0C for ; Thu, 22 Jul 2010 09:00:14 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o6M90EMi010579 for ; Thu, 22 Jul 2010 09:00:14 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o6M90E2H010574; Thu, 22 Jul 2010 09:00:14 GMT (envelope-from gnats) Date: Thu, 22 Jul 2010 09:00:14 GMT Message-Id: <201007220900.o6M90E2H010574@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Martin Matuska Cc: Subject: Re: kern/148709: [zfs] [panic] running du with zfs filesystem with sharesmb=on cause panic zfs_fuid_map_id X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Martin Matuska List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2010 09:00:14 -0000 The following reply was made to PR kern/148709; it has been noted by GNATS. From: Martin Matuska To: bug-followup@FreeBSD.org, shigeya@wide.ad.jp Cc: Subject: Re: kern/148709: [zfs] [panic] running du with zfs filesystem with sharesmb=on cause panic zfs_fuid_map_id Date: Thu, 22 Jul 2010 10:54:49 +0200 Can you reproduce this with the latest ZFS v15 code? You can try to boot using the ISO file found at: http://mfsbsd.vx.sk/ From owner-freebsd-fs@FreeBSD.ORG Thu Jul 22 11:50:03 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D537D1065673 for ; Thu, 22 Jul 2010 11:50:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id AAE658FC18 for ; Thu, 22 Jul 2010 11:50:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o6MBo3Ai079214 for ; Thu, 22 Jul 2010 11:50:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o6MBo3NS079212; Thu, 22 Jul 2010 11:50:03 GMT (envelope-from gnats) Date: Thu, 22 Jul 2010 11:50:03 GMT Message-Id: <201007221150.o6MBo3NS079212@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Shigeya Suzuki Cc: Subject: Re: kern/148709: [zfs] [panic] running du with zfs filesystem with sharesmb=on cause panic zfs_fuid_map_id X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Shigeya Suzuki List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2010 11:50:03 -0000 The following reply was made to PR kern/148709; it has been noted by GNATS. From: Shigeya Suzuki To: Martin Matuska Cc: bug-followup@FreeBSD.org Subject: Re: kern/148709: [zfs] [panic] running du with zfs filesystem with sharesmb=on cause panic zfs_fuid_map_id Date: Thu, 22 Jul 2010 20:48:09 +0900 I still have the box I used for test as is. I will try tonight, if possible. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 22 13:02:24 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 910621065672 for ; Thu, 22 Jul 2010 13:02:24 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 512708FC1D for ; Thu, 22 Jul 2010 13:02:24 +0000 (UTC) Received: by iwn35 with SMTP id 35so9982215iwn.13 for ; Thu, 22 Jul 2010 06:02:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:date:from:to:cc :subject:in-reply-to:message-id:references:user-agent :x-openpgp-key-id:x-openpgp-key-fingerprint:mime-version :content-type; bh=umKXl8c41mYoIPeRgRhSZyDLMTlcNGGXyts6k/XR468=; b=E7dBXIh+PrursQSdVNnl+MCLF5eEQJd4+GnbMAuwaRtToUWZ+ChCGx7QOUQOBaMwGg GQXc1NDkwXbzSGv9PP9tZjbm9xMF1x7kw2bpw2b+/ueuLDGPYxKZK9cOqjaiQmNbebYi I78hcc2XD856EN3Kh/kV1W61oCzqZ2HXW1eV8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:x-openpgp-key-id:x-openpgp-key-fingerprint:mime-version :content-type; b=JLrEDnF2xW3b7FSdBOr7a6L1MKFKglBRlpO4qvzwVkwVlW0DnTxWKKgQE8xl0KLYep DrV8qPZ1zVucqzalPzR31XBNAHRbX3PIhLgm2ZYZNwqKOeucWwtrMO+UtZkEOPH8bM9Q +5Q3SjvxZYguKa10oPg2Q1VGirCcX7J9axdoU= Received: by 10.231.159.203 with SMTP id k11mr1837907ibx.115.1279803743427; Thu, 22 Jul 2010 06:02:23 -0700 (PDT) Received: from centel.dataix.local (adsl-99-181-133-55.dsl.klmzmi.sbcglobal.net [99.181.133.55]) by mx.google.com with ESMTPS id e8sm29962634ibb.20.2010.07.22.06.02.21 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 22 Jul 2010 06:02:21 -0700 (PDT) Sender: "J. Hellenthal" Date: Thu, 22 Jul 2010 09:02:14 -0400 From: jhell To: Kirk McKusick In-Reply-To: <201007212015.o6LKFp9Y066176@chez.mckusick.com> Message-ID: References: <201007212015.o6LKFp9Y066176@chez.mckusick.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-OpenPGP-Key-Id: 0x89D8547E X-OpenPGP-Key-Fingerprint: 85EF E26B 07BB 3777 76BE B12A 9057 8789 89D8 547E MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: "Mikhail T." , fs@freebsd.org Subject: Re: background fsck considered harmful? (Re: panic: handle_written_inodeblock: bad size) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2010 13:02:24 -0000 On Wed, 21 Jul 2010 16:15, Kirk McKusick wrote: In Message-Id: <201007212015.o6LKFp9Y066176@chez.mckusick.com> >> Date: Tue, 20 Jul 2010 12:48:58 -0400 >> From: "Mikhail T." >> To: Kirk McKusick >> CC: Jeremy Chadwick , fs@freebsd.org >> Subject: background fsck considered harmful? (Re: panic: handle_written_inodeblock: >> bad size) >> X-ASK-Info: Message Queued (2010/07/20 09:49:10) >> X-ASK-Info: Confirmed by User (2010/07/20 10:28:39) >> >> 20.07.2010 11:44, Kirk McKusick ???????(??): >>> Adding it to all the panic's will be a lot of work, >>> but I agree would be useful. I will look into doing so when I >>> get a chance. >>> >>> Kirk McKusick >>> >> How about disabling background fsck in a default install? It seems to >> be the consensus here, that my troubles were due to fsck not fixing the >> file-system properly reboot after reboot... >> >> Yours, >> >> -mi > > Certainly disabling background fsck will eliminate that from your > possible set of issues and may prevent a recurrance. It does mean that > after a crash you will have to wait while your filesystems are checked > before your system will come up. If your filesystems are below 0.5Tb > that should be tolerable. > This had me thinking of a possibility to have fsck write & read some meta-data to/from the disk its checking, say an enumerated value somewhere between 1 & 10, whatever would be acceptable. When it would hit this highest predetermined (hard coded) value fsck could return an error code that could be parsed by an rc script to change its behavior to a full check. But along those lines of thinking maybe fsck already returns something like this ? and if so does it do this early enough for a script to catch it ? This ultimately would remove the need to have a background_fsck_enable variable. And would allow for some file-systems to be checked in the background without user intervention while others would be checked in the foreground. And then thinking again maybe this could be handled by the initiating script that sets a variable that's held until the system is writable to be stored somewhere on-disk after the system is up so it could be read the next time around. Personally I prefer the previous as it seems to be a stronger solution. > The longer term solution is to use journaled soft updates when they > become available in 9.0. > > Kirk McKusick > Regards, -- jhell From owner-freebsd-fs@FreeBSD.ORG Thu Jul 22 14:40:03 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C8F871065672 for ; Thu, 22 Jul 2010 14:40:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id A039D8FC29 for ; Thu, 22 Jul 2010 14:40:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o6MEe3mH046222 for ; Thu, 22 Jul 2010 14:40:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o6MEe31a046221; Thu, 22 Jul 2010 14:40:03 GMT (envelope-from gnats) Date: Thu, 22 Jul 2010 14:40:03 GMT Message-Id: <201007221440.o6MEe31a046221@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Shigeya Suzuki Cc: Subject: Re: kern/148709: [zfs] [panic] running du with zfs filesystem with sharesmb=on cause panic zfs_fuid_map_id X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Shigeya Suzuki List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2010 14:40:03 -0000 The following reply was made to PR kern/148709; it has been noted by GNATS. From: Shigeya Suzuki To: Martin Matuska Cc: bug-followup@FreeBSD.org Subject: Re: kern/148709: [zfs] [panic] running du with zfs filesystem with sharesmb=on cause panic zfs_fuid_map_id Date: Thu, 22 Jul 2010 23:38:26 +0900 Tested. Gotten same panic: "zfs_fuid_map_id" From owner-freebsd-fs@FreeBSD.ORG Thu Jul 22 15:21:47 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC76D1065672 for ; Thu, 22 Jul 2010 15:21:47 +0000 (UTC) (envelope-from mi+thun@aldan.algebra.com) Received: from mail2.timeinc.net (mail2.timeinc.net [64.236.74.30]) by mx1.freebsd.org (Postfix) with ESMTP id 46CC18FC0A for ; Thu, 22 Jul 2010 15:21:46 +0000 (UTC) Received: from mail.timeinc.net (mail.timeinc.net [64.12.55.166]) by mail2.timeinc.net (8.13.8/8.13.8) with ESMTP id o6MFLja6015620 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 22 Jul 2010 11:21:45 -0400 Received: from ws-mteterin.dev.pathfinder.com (ws-mteterin.dev.pathfinder.com [209.251.223.173]) by mail.timeinc.net (8.13.8/8.13.8) with SMTP id o6MFLjcW014010; Thu, 22 Jul 2010 11:21:45 -0400 Message-ID: <4C486209.7050402@aldan.algebra.com> Date: Thu, 22 Jul 2010 11:21:45 -0400 From: "Mikhail T." Organization: Virtual Estates, Inc. User-Agent: Mozilla/5.0 (X11; U; Linux i686; uk; rv:1.9.1.11) Gecko/20100711 Lightning/1.0b1 Thunderbird/3.0.6 MIME-Version: 1.0 To: Kirk McKusick References: <201007220335.o6M3Z1ZT062733@chez.mckusick.com> In-Reply-To: <201007220335.o6M3Z1ZT062733@chez.mckusick.com> X-Mailman-Approved-At: Thu, 22 Jul 2010 15:44:25 +0000 Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: fs@freebsd.org Subject: Re: background fsck considered harmful? (Re: panic: handle_written_inodeblock: bad size) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2010 15:21:47 -0000 21.07.2010 23:35, Kirk McKusick (): > Foreground fsck checks all the disk > metadata every time, so hard disk errors are captured immediately > before they have had a chance to accumulate. But background fsck > users blame it because it has not found them. > I don't blame the program itself -- if it was deliberately /designed/ to only do partial checking. However, I was under the impression, that the background fsck was meant to do the same job as the "real" one, and that, whenever it did not, it was simply a bug in the /implementation/. I suspect, this misconception is shared by plenty of other users... Indeed, even if a inquisitive admin wanted to find out, fsck(8) gives absolutely no warning to that effect -- it simply states, that background fsck will be attempted, whenever possible. > If you have small disk systems, running foreground fsck is an > acceptable solution (and indeed I would recommend it). But when > you are running systems with 20Tb of disks, you are not willing > to have your system down for 10 hours after every crash. > > A reasonable intermediate solution is to use background fsck by > default, but schedule down time to run a full fsck once a month > or so to check for accumulated hard disk errors. > Maybe, filesystems less than, say, 100Gb (default threshold, subject to admin's adjustment) in size should always be foreground fsck-ed? This should, at least, cover the system file-systems (such as / and /var) on typical installations... And a stern warning issued, when a background fsck is attempted -- for whatever reason. Something like: background fsck, although faster, may be unable to detect certain rare forms of filesystem corruption. You are advised to perform a full fsck on %s on a regular basis. See fsck(8). should go into the right place under fsck_ffs/ -- not sure, where exactly... Below is a simple patch for the top-level fsck(8). Somebody more knowledgeable of the details should augment fsck_ffs(8) -- it currently gives the lists of inconsistencies checked for without mentioning the difference in coverage between full and background modes... diff -U 2 -r1.38.2.1 fsck.8 --- fsck.8 3 Aug 2009 08:13:06 -0000 1.38.2.1 +++ fsck.8 22 Jul 2010 15:19:25 -0000 @@ -170,4 +170,12 @@ When running in background mode, only one file system at a time will be checked. + +.Sy Warning: +because background fsck is performed while the filesystem +is in use, it is limited to checking for only the most commonly +occuring filesystem abnormalities. Under certain circumstances, +some errors can escape background fsck. It is recommended, that you +perform full fsck on your systems once in a while -- or whenever +you encounter filesystem-related panics. .It Fl t Ar fstype Invoke Yours, -mi From owner-freebsd-fs@FreeBSD.ORG Thu Jul 22 16:50:41 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F21D61065676 for ; Thu, 22 Jul 2010 16:50:41 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id BAAD48FC12 for ; Thu, 22 Jul 2010 16:50:41 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id o6MGoY9V039222; Thu, 22 Jul 2010 09:50:34 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201007221650.o6MGoY9V039222@chez.mckusick.com> To: "Mikhail T." In-reply-to: <4C486209.7050402@aldan.algebra.com> Date: Thu, 22 Jul 2010 09:50:34 -0700 From: Kirk McKusick Cc: fs@freebsd.org Subject: Re: background fsck considered harmful? (Re: panic: handle_written_inodeblock: bad size) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2010 16:50:42 -0000 > Date: Thu, 22 Jul 2010 11:21:45 -0400 > From: "Mikhail T." > Organization: Virtual Estates, Inc. > To: Kirk McKusick > CC: fs@freebsd.org > Subject: Re: background fsck considered harmful? > > 21.07.2010 23:35, Kirk McKusick > > Foreground fsck checks all the disk > > metadata every time, so hard disk errors are captured immediately > > before they have had a chance to accumulate. But background fsck > > users blame it because it has not found them. > > I don't blame the program itself -- if it was deliberately /designed/ to > only do partial checking. However, I was under the impression, that the > background fsck was meant to do the same job as the "real" one, and > that, whenever it did not, it was simply a bug in the /implementation/. > > I suspect, this misconception is shared by plenty of other users... > Indeed, even if a inquisitive admin wanted to find out, fsck(8) gives > absolutely no warning to that effect -- it simply states, that > background fsck will be attempted, whenever possible. > > > If you have small disk systems, running foreground fsck is an > > acceptable solution (and indeed I would recommend it). But when > > you are running systems with 20Tb of disks, you are not willing > > to have your system down for 10 hours after every crash. > > > > A reasonable intermediate solution is to use background fsck by > > default, but schedule down time to run a full fsck once a month > > or so to check for accumulated hard disk errors. > > Maybe, filesystems less than, say, 100Gb (default threshold, subject to > admin's adjustment) in size should always be foreground fsck-ed? This > should, at least, cover the system file-systems (such as / and /var) on > typical installations... If we did not have a better solution in the pipeline (journaled soft updates), I would agree with you that always doing a full check on small filesystems would be a useful enhancement. However, since we do have a solution that will work well for all sizes of filesystems in -current and expected out of the box with 9.0, I do not think that it would be useful to add this extra complexity at this time. > And a stern warning issued, when a background fsck is attempted -- for > whatever reason. Something like: > > background fsck, although faster, may be unable to detect certain > rare forms of filesystem corruption. You are advised to perform a > full fsck on %s on a regular basis. See fsck(8). > > should go into the right place under fsck_ffs/ -- not sure, where exactly... Since most folks do not look at the output from background fsck and with the changes noted above, I do not feel that adding this message would be all that helpful at this time. > Below is a simple patch for the top-level fsck(8). Somebody more > knowledgeable of the details should augment fsck_ffs(8) -- it currently > gives the lists of inconsistencies checked for without mentioning the > difference in coverage between full and background modes... > > diff -U 2 -r1.38.2.1 fsck.8 > --- fsck.8 3 Aug 2009 08:13:06 -0000 1.38.2.1 > +++ fsck.8 22 Jul 2010 15:19:25 -0000 > @@ -170,4 +170,12 @@ > When running in background mode, > only one file system at a time will be checked. > +.Sy Warning: > +because background fsck is performed while the filesystem > +is in use, it is limited to checking for only the most commonly > +occuring filesystem abnormalities. Under certain circumstances, > +some errors can escape background fsck. It is recommended, that you > +perform full fsck on your systems once in a while -- or whenever > +you encounter filesystem-related panics. > .It Fl t Ar fstype > Invoke > > Yours, > > -mi I concur that adding a note to fsck(8) would be a good idea as best practice is to run a full fsck after a disk-related panic. I would be happy with your checking in: diff -U 2 -r1.38.2.1 fsck.8 --- fsck.8 3 Aug 2009 08:13:06 -0000 1.38.2.1 +++ fsck.8 22 Jul 2010 15:19:25 -0000 @@ -170,4 +170,12 @@ When running in background mode, only one file system at a time will be checked. +.Sy Warning: +background fsck is limited to checking for only the most commonly +occuring filesystem abnormalities. Under certain circumstances, +some errors can escape background fsck. It is recommended, that you +perform full fsck on your systems once in a while -- or whenever +you encounter filesystem-related panics. .It Fl t Ar fstype Invoke Does this work for you? Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Thu Jul 22 18:37:19 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4EFC51065678 for ; Thu, 22 Jul 2010 18:37:19 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id 085308FC16 for ; Thu, 22 Jul 2010 18:37:18 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id o6MIbFqv062887; Thu, 22 Jul 2010 11:37:15 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201007221837.o6MIbFqv062887@chez.mckusick.com> To: "Mikhail T." In-reply-to: <4C487C73.9070709@aldan.algebra.com> Date: Thu, 22 Jul 2010 11:37:15 -0700 From: Kirk McKusick Cc: fs@freebsd.org Subject: Re: background fsck considered harmful? (Re: panic: handle_written_inodeblock: bad size) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2010 18:37:19 -0000 > Date: Thu, 22 Jul 2010 13:14:27 -0400 > From: "Mikhail T." > To: Kirk McKusick > CC: fs@freebsd.org > Subject: Re: background fsck considered harmful? > > When there is a problem with frequent FS-related panics, more attention > is paid to the start-up messages, I think... People are more likely to > see that error message, for example, than they are to study the man-page > (unless something directs them there). > > Being "only" a ports-committer, I can not update fsck.8 -- someone else > would have to do that. > > Also, what about updating fsck_ffs.8 -- to specify, which of the > inconsistencies are and aren't checked by background fsck? > > Yours, > > -mi I have updated fsck(8) and will MFC it to 8 in a week. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Thu Jul 22 17:14:29 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8360A1065673 for ; Thu, 22 Jul 2010 17:14:29 +0000 (UTC) (envelope-from mi+thun@aldan.algebra.com) Received: from mail2.timeinc.net (mail2.timeinc.net [64.236.74.30]) by mx1.freebsd.org (Postfix) with ESMTP id 458DB8FC14 for ; Thu, 22 Jul 2010 17:14:28 +0000 (UTC) Received: from mail.timeinc.net (mail.timeinc.net [64.12.55.166]) by mail2.timeinc.net (8.13.8/8.13.8) with ESMTP id o6MHES5K007244 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 22 Jul 2010 13:14:28 -0400 Received: from ws-mteterin.dev.pathfinder.com (ws-mteterin.dev.pathfinder.com [209.251.223.173]) by mail.timeinc.net (8.13.8/8.13.8) with SMTP id o6MHERFc020819; Thu, 22 Jul 2010 13:14:28 -0400 Message-ID: <4C487C73.9070709@aldan.algebra.com> Date: Thu, 22 Jul 2010 13:14:27 -0400 From: "Mikhail T." Organization: Virtual Estates, Inc. User-Agent: Mozilla/5.0 (X11; U; Linux i686; uk; rv:1.9.1.11) Gecko/20100711 Lightning/1.0b1 Thunderbird/3.0.6 MIME-Version: 1.0 To: Kirk McKusick References: <201007221650.o6MGoY9V039222@chez.mckusick.com> In-Reply-To: <201007221650.o6MGoY9V039222@chez.mckusick.com> X-Mailman-Approved-At: Thu, 22 Jul 2010 21:45:10 +0000 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: fs@freebsd.org Subject: Re: background fsck considered harmful? (Re: panic: handle_written_inodeblock: bad size) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2010 17:14:29 -0000 22.07.2010 12:50, Kirk McKusick ???????(??): > If we did not have a better solution in the pipeline (journaled > soft updates), I would agree with you that always doing a full > check on small filesystems would be a useful enhancement. However, > since we do have a solution that will work well for all sizes of > filesystems in -current and expected out of the box with 9.0, I do > not think that it would be useful to add this extra complexity > at this time. > The production-ready 9.x is at least a year away... Even when it ships, the journaled soft updates will not get into wide use immediately -- even if newfs enables that by default, people upgrading existing installations will, likely, leave the filesystem unchanged for a while. And the 7.x and 8.x installs currently in use will be around for many more years to come -- they should get this enhancement, in my opinion. >> > And a stern warning issued, when a background fsck is attempted -- for >> > whatever reason. Something like: >> > >> > background fsck, although faster, may be unable to detect certain >> > rare forms of filesystem corruption. You are advised to perform a >> > full fsck on %s on a regular basis. See fsck(8). >> > >> > should go into the right place under fsck_ffs/ -- not sure, where exactly... >> > Since most folks do not look at the output from background fsck and with > the changes noted above, I do not feel that adding this message would > be all that helpful at this time. > When there is a problem with frequent FS-related panics, more attention is paid to the start-up messages, I think... People are more likely to see that error message, for example, than they are to study the man-page (unless something directs them there). Being "only" a ports-committer, I can not update fsck.8 -- someone else would have to do that. Also, what about updating fsck_ffs.8 -- to specify, which of the inconsistencies are and aren't checked by background fsck? Yours, -mi From owner-freebsd-fs@FreeBSD.ORG Thu Jul 22 23:40:05 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B575F106566B for ; Thu, 22 Jul 2010 23:40:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8BEA98FC19 for ; Thu, 22 Jul 2010 23:40:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o6MNe5Rd073107 for ; Thu, 22 Jul 2010 23:40:05 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o6MNe5r1073105; Thu, 22 Jul 2010 23:40:05 GMT (envelope-from gnats) Date: Thu, 22 Jul 2010 23:40:05 GMT Message-Id: <201007222340.o6MNe5r1073105@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: dfilter@FreeBSD.ORG (dfilter service) Cc: Subject: Re: kern/145778: commit references a PR X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dfilter service List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2010 23:40:05 -0000 The following reply was made to PR kern/145778; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/145778: commit references a PR Date: Thu, 22 Jul 2010 23:30:35 +0000 (UTC) Author: mm Date: Thu Jul 22 23:30:24 2010 New Revision: 210398 URL: http://svn.freebsd.org/changeset/base/210398 Log: Enable fake resolving of SMB RIDs by using nulldomain and UID_NOBODY - fixes panics when Solaris/OpenSolaris pools that contain files uploaded with the SMB protocol are accessed Enable seting/unsetting the sharesmb property (dummy action) - allows users who import pools from Solaris/Opensolaris to unset the sharesmb property and get rid of annoying messages PR: kern/145778, kern/148709 Approved by: pjd, delphij (mentor) MFC after: 7 weeks Modified: head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c Modified: head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c ============================================================================== --- head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c Thu Jul 22 23:23:39 2010 (r210397) +++ head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c Thu Jul 22 23:30:24 2010 (r210398) @@ -1265,7 +1265,6 @@ zfs_prop_set(zfs_handle_t *zhp, const ch case ZFS_PROP_XATTR: case ZFS_PROP_VSCAN: case ZFS_PROP_NBMAND: - case ZFS_PROP_SHARESMB: (void) snprintf(errbuf, sizeof (errbuf), "property '%s' not supported on FreeBSD", propname); ret = zfs_error(hdl, EZFS_PERM, errbuf); Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c ============================================================================== --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c Thu Jul 22 23:23:39 2010 (r210397) +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c Thu Jul 22 23:30:24 2010 (r210398) @@ -410,7 +410,7 @@ zfs_fuid_map_id(zfsvfs_t *zfsvfs, uint64 domain = zfs_fuid_find_by_idx(zfsvfs, index); ASSERT(domain != NULL); -#ifdef TODO +#ifdef sun if (type == ZFS_OWNER || type == ZFS_ACE_USER) { (void) kidmap_getuidbysid(crgetzone(cr), domain, FUID_RID(fuid), &id); @@ -418,9 +418,9 @@ zfs_fuid_map_id(zfsvfs_t *zfsvfs, uint64 (void) kidmap_getgidbysid(crgetzone(cr), domain, FUID_RID(fuid), &id); } -#else - panic(__func__); -#endif +#else /* sun */ + id = UID_NOBODY; +#endif /* sun */ return (id); } @@ -514,21 +514,21 @@ zfs_fuid_create_cred(zfsvfs_t *zfsvfs, z if (!zfsvfs->z_use_fuids || !IS_EPHEMERAL(id)) return ((uint64_t)id); -#ifdef TODO +#ifdef sun ksid = crgetsid(cr, (type == ZFS_OWNER) ? KSID_OWNER : KSID_GROUP); VERIFY(ksid != NULL); rid = ksid_getrid(ksid); domain = ksid_getdomain(ksid); - +#else /* sun */ + rid = UID_NOBODY; + domain = nulldomain; +#endif /* sun */ idx = zfs_fuid_find_by_domain(zfsvfs, domain, &kdomain, B_TRUE); zfs_fuid_node_add(fuidp, kdomain, rid, idx, id, type); return (FUID_ENCODE(idx, rid)); -#else - panic(__func__); -#endif } /* @@ -597,7 +597,7 @@ zfs_fuid_create(zfsvfs_t *zfsvfs, uint64 }; domain = fuidp->z_domain_table[idx -1]; } else { -#ifdef TODO +#ifdef sun if (type == ZFS_OWNER || type == ZFS_ACE_USER) status = kidmap_getsidbyuid(crgetzone(cr), id, &domain, &rid); @@ -606,6 +606,7 @@ zfs_fuid_create(zfsvfs_t *zfsvfs, uint64 &domain, &rid); if (status != 0) { +#endif /* sun */ /* * When returning nobody we will need to * make a dummy fuid table entry for logging @@ -613,10 +614,9 @@ zfs_fuid_create(zfsvfs_t *zfsvfs, uint64 */ rid = UID_NOBODY; domain = nulldomain; +#ifdef sun } -#else - panic(__func__); -#endif +#endif /* sun */ } idx = zfs_fuid_find_by_domain(zfsvfs, domain, &kdomain, B_TRUE); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Jul 22 23:40:07 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 387D21065674 for ; Thu, 22 Jul 2010 23:40:07 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 287568FC0C for ; Thu, 22 Jul 2010 23:40:07 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o6MNe7Ip073115 for ; Thu, 22 Jul 2010 23:40:07 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o6MNe7Y2073114; Thu, 22 Jul 2010 23:40:07 GMT (envelope-from gnats) Date: Thu, 22 Jul 2010 23:40:07 GMT Message-Id: <201007222340.o6MNe7Y2073114@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: dfilter@FreeBSD.ORG (dfilter service) Cc: Subject: Re: kern/148709: commit references a PR X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dfilter service List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2010 23:40:07 -0000 The following reply was made to PR kern/148709; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/148709: commit references a PR Date: Thu, 22 Jul 2010 23:30:36 +0000 (UTC) Author: mm Date: Thu Jul 22 23:30:24 2010 New Revision: 210398 URL: http://svn.freebsd.org/changeset/base/210398 Log: Enable fake resolving of SMB RIDs by using nulldomain and UID_NOBODY - fixes panics when Solaris/OpenSolaris pools that contain files uploaded with the SMB protocol are accessed Enable seting/unsetting the sharesmb property (dummy action) - allows users who import pools from Solaris/Opensolaris to unset the sharesmb property and get rid of annoying messages PR: kern/145778, kern/148709 Approved by: pjd, delphij (mentor) MFC after: 7 weeks Modified: head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c Modified: head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c ============================================================================== --- head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c Thu Jul 22 23:23:39 2010 (r210397) +++ head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c Thu Jul 22 23:30:24 2010 (r210398) @@ -1265,7 +1265,6 @@ zfs_prop_set(zfs_handle_t *zhp, const ch case ZFS_PROP_XATTR: case ZFS_PROP_VSCAN: case ZFS_PROP_NBMAND: - case ZFS_PROP_SHARESMB: (void) snprintf(errbuf, sizeof (errbuf), "property '%s' not supported on FreeBSD", propname); ret = zfs_error(hdl, EZFS_PERM, errbuf); Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c ============================================================================== --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c Thu Jul 22 23:23:39 2010 (r210397) +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c Thu Jul 22 23:30:24 2010 (r210398) @@ -410,7 +410,7 @@ zfs_fuid_map_id(zfsvfs_t *zfsvfs, uint64 domain = zfs_fuid_find_by_idx(zfsvfs, index); ASSERT(domain != NULL); -#ifdef TODO +#ifdef sun if (type == ZFS_OWNER || type == ZFS_ACE_USER) { (void) kidmap_getuidbysid(crgetzone(cr), domain, FUID_RID(fuid), &id); @@ -418,9 +418,9 @@ zfs_fuid_map_id(zfsvfs_t *zfsvfs, uint64 (void) kidmap_getgidbysid(crgetzone(cr), domain, FUID_RID(fuid), &id); } -#else - panic(__func__); -#endif +#else /* sun */ + id = UID_NOBODY; +#endif /* sun */ return (id); } @@ -514,21 +514,21 @@ zfs_fuid_create_cred(zfsvfs_t *zfsvfs, z if (!zfsvfs->z_use_fuids || !IS_EPHEMERAL(id)) return ((uint64_t)id); -#ifdef TODO +#ifdef sun ksid = crgetsid(cr, (type == ZFS_OWNER) ? KSID_OWNER : KSID_GROUP); VERIFY(ksid != NULL); rid = ksid_getrid(ksid); domain = ksid_getdomain(ksid); - +#else /* sun */ + rid = UID_NOBODY; + domain = nulldomain; +#endif /* sun */ idx = zfs_fuid_find_by_domain(zfsvfs, domain, &kdomain, B_TRUE); zfs_fuid_node_add(fuidp, kdomain, rid, idx, id, type); return (FUID_ENCODE(idx, rid)); -#else - panic(__func__); -#endif } /* @@ -597,7 +597,7 @@ zfs_fuid_create(zfsvfs_t *zfsvfs, uint64 }; domain = fuidp->z_domain_table[idx -1]; } else { -#ifdef TODO +#ifdef sun if (type == ZFS_OWNER || type == ZFS_ACE_USER) status = kidmap_getsidbyuid(crgetzone(cr), id, &domain, &rid); @@ -606,6 +606,7 @@ zfs_fuid_create(zfsvfs_t *zfsvfs, uint64 &domain, &rid); if (status != 0) { +#endif /* sun */ /* * When returning nobody we will need to * make a dummy fuid table entry for logging @@ -613,10 +614,9 @@ zfs_fuid_create(zfsvfs_t *zfsvfs, uint64 */ rid = UID_NOBODY; domain = nulldomain; +#ifdef sun } -#else - panic(__func__); -#endif +#endif /* sun */ } idx = zfs_fuid_find_by_domain(zfsvfs, domain, &kdomain, B_TRUE); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 10:43:57 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EAD771065679 for ; Fri, 23 Jul 2010 10:43:57 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 42F9B8FC08 for ; Fri, 23 Jul 2010 10:43:56 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id 200D6399B21; Fri, 23 Jul 2010 12:28:10 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 8.8555] X-CRM114-CacheID: sfid-20100723_12280_11C1914B X-CRM114-Status: Good ( pR: 8.8555 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Fri Jul 23 12:28:09 2010 X-DSPAM-Confidence: 0.7614 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4c496eb9825253771820593 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00125, else, 0.00905, files+with, 0.01000, bits+are, 0.01000, iso, 0.01000, iso, 0.01000, reproduce, 0.01000, dd, 0.01000, dd, 0.01000, for+testing, 0.01000, ZFS, 0.01000, and+when, 0.01000, in+0, 0.01000, in+0, 0.01000, issue+with, 0.01000, zero, 0.01000, mount, 0.01000, mount, 0.01000, ONLINE, 0.99000, ONLINE, 0.99000, cache, 0.01000, blocks, 0.01000, (in, 0.01000, compatibility, 0.01000, /dev/null, 0.01000, /dev/null, 0.01000, X-Spambayes-Classification: ham; 0.00 Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99]) by people.fsn.hu (Postfix) with ESMTPSA id A1F01399B14 for ; Fri, 23 Jul 2010 12:28:01 +0200 (CEST) Message-ID: <4C496EB0.7050004@fsn.hu> Date: Fri, 23 Jul 2010 12:28:00 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.10) Gecko/20100629 Thunderbird/3.0.5 MIME-Version: 1.0 To: fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 10:43:58 -0000 Hi, I've came across a strange issue. On a file server (ftp/http/rsync) there is a dual SSD based L2ARC configured for a pool of 24 disks: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 raidz2 ONLINE 0 0 0 label/disk20-01 ONLINE 0 0 0 label/disk20-02 ONLINE 0 0 0 label/disk20-03 ONLINE 0 0 0 label/disk20-04 ONLINE 0 0 0 label/disk20-05 ONLINE 0 0 0 label/disk20-06 ONLINE 0 0 0 label/disk20-07 ONLINE 0 0 0 label/disk20-08 ONLINE 0 0 0 label/disk20-09 ONLINE 0 0 0 label/disk20-10 ONLINE 0 0 0 label/disk20-11 ONLINE 0 0 0 label/disk20-12 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 label/disk21-01 ONLINE 0 0 0 label/disk21-02 ONLINE 0 0 0 label/disk21-03 ONLINE 0 0 0 label/disk21-04 ONLINE 0 0 0 label/disk21-05 ONLINE 0 0 0 label/disk21-06 ONLINE 0 0 0 label/disk21-07 ONLINE 0 0 0 label/disk21-08 ONLINE 0 0 0 label/disk21-09 ONLINE 0 0 0 label/disk21-10 ONLINE 0 0 0 label/disk21-11 ONLINE 0 0 0 label/disk21-12 ONLINE 0 0 0 cache ad4s2 ONLINE 0 0 0 ad6s2 ONLINE 0 0 0 and there is about 6GB of ARC (in memory). The strange thing is that when I fetch a file with HTTP, it comes from disks: fetch -o /dev/null -4 http://ftp.fsn.hu/pub/CDROM-Images/opensolaris/osol-0906-106a-ai-sparc.iso /dev/null 100% of 493 MB 11 MBps 00m00s a second fetch right after the above comes from in-memory ARC: /dev/null 100% of 493 MB 28 MBps 00m00s If I wait some minutes, the blocks are evicted from memory to SSD, so the next fetch comes from the L2ARC on the two SSDs: /dev/null 100% of 493 MB 43 MBps 00m00s and if I re-fetch it right after the above (blocks will be served from memory again), I get: /dev/null 100% of 493 MB 27 MBps 00m00s And this is very consistent. I watch L2ARC activity with gstat/zpool iostat and when the bits are served from that, the throughput is constantly higher with at least 10MBps, and the next download will always be slower, no matter that it comes from memory. The effect can't be seen on localhost, with dd: # dd if=osol-0906-106a-ai-sparc.iso of=/dev/null bs=1M 493+1 records in 493+1 records out 516968448 bytes transferred in 3.310415 secs (156164240 bytes/sec) (comes from L2ARC) # dd if=osol-0906-106a-ai-sparc.iso of=/dev/null bs=1M 493+1 records in 493+1 records out 516968448 bytes transferred in 0.861610 secs (600002886 bytes/sec) (comes from memory) The daemons run in jails and they see the data through a read only nullfs mount. However dd-ing from that nullfs mount also gives fast throughput: # dd if=osol-0906-106a-ai-sparc.iso of=/dev/null bs=1M 493+1 records in 493+1 records out 516968448 bytes transferred in 0.718028 secs (719983859 bytes/sec) I'm lost. Anyone else can reproduce this? BTW, I see another strange issue with ZFS. On the site there are sparse files for testing clients' compatibility with big files. Fetching these files are slower than the ones, which contains real data and therefore moves the disks! What should cause this? It seems very unnatural to fetch a zero byte file with 10 MBps, which doesn't involve even a bit of disk IO, while fetching real files with real disk IO faster... Thanks, From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 10:57:09 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4CA941065673 for ; Fri, 23 Jul 2010 10:57:09 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.emeryville.ca.mail.comcast.net (qmta04.emeryville.ca.mail.comcast.net [76.96.30.40]) by mx1.freebsd.org (Postfix) with ESMTP id 375228FC08 for ; Fri, 23 Jul 2010 10:57:08 +0000 (UTC) Received: from omta21.emeryville.ca.mail.comcast.net ([76.96.30.88]) by qmta04.emeryville.ca.mail.comcast.net with comcast id lNrp1e0051u4NiLA4Nx8MH; Fri, 23 Jul 2010 10:57:08 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta21.emeryville.ca.mail.comcast.net with comcast id lNx71e0053LrwQ28hNx78S; Fri, 23 Jul 2010 10:57:08 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 3FA8D9B425; Fri, 23 Jul 2010 03:57:07 -0700 (PDT) Date: Fri, 23 Jul 2010 03:57:07 -0700 From: Jeremy Chadwick To: Attila Nagy Message-ID: <20100723105707.GA47221@icarus.home.lan> References: <4C496EB0.7050004@fsn.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C496EB0.7050004@fsn.hu> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: fs@freebsd.org Subject: Re: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 10:57:09 -0000 On Fri, Jul 23, 2010 at 12:28:00PM +0200, Attila Nagy wrote: > I've came across a strange issue. > [...] Can you please provide uname -a output? Thanks. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 11:05:48 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ADDAC106568B for ; Fri, 23 Jul 2010 11:05:48 +0000 (UTC) (envelope-from andrew@modulus.org) Received: from email.octopus.com.au (email.octopus.com.au [122.100.2.232]) by mx1.freebsd.org (Postfix) with ESMTP id 72C088FC0A for ; Fri, 23 Jul 2010 11:05:47 +0000 (UTC) Received: by email.octopus.com.au (Postfix, from userid 1002) id D433E5CB966; Fri, 23 Jul 2010 20:57:29 +1000 (EST) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on spamkiller X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=unavailable version=3.3.1 Received: from [10.20.30.103] (60.218.233.220.static.exetel.com.au [220.233.218.60]) (Authenticated sender: admin@email.octopus.com.au) by email.octopus.com.au (Postfix) with ESMTP id 875655CB8E8; Fri, 23 Jul 2010 20:57:28 +1000 (EST) Message-ID: <4C497782.8020704@modulus.org> Date: Fri, 23 Jul 2010 21:05:38 +1000 From: Andrew Snow User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100528 Thunderbird/3.0.5 MIME-Version: 1.0 To: Attila Nagy , freebsd-fs@freebsd.org References: <4C496EB0.7050004@fsn.hu> In-Reply-To: <4C496EB0.7050004@fsn.hu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 11:05:48 -0000 > 100% of 493 MB 11 MBps 00m00s Hi Attila, fetch is a poor cache benchmark. Your "tests" are taking a split second and fetch's output is gibberish - 493MB @ 11Mbps should take more than 0 seconds. Thus, the results are not useful. - Andrew From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 11:17:42 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A0ECD106567A for ; Fri, 23 Jul 2010 11:17:42 +0000 (UTC) (envelope-from andrew@modulus.org) Received: from email.octopus.com.au (email.octopus.com.au [122.100.2.232]) by mx1.freebsd.org (Postfix) with ESMTP id 63CC68FC18 for ; Fri, 23 Jul 2010 11:17:41 +0000 (UTC) Received: by email.octopus.com.au (Postfix, from userid 1002) id 3A9FA5CB95F; Fri, 23 Jul 2010 21:09:25 +1000 (EST) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on spamkiller X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=unavailable version=3.3.1 Received: from [10.20.30.103] (60.218.233.220.static.exetel.com.au [220.233.218.60]) (Authenticated sender: admin@email.octopus.com.au) by email.octopus.com.au (Postfix) with ESMTP id AFB9E5CB8CD; Fri, 23 Jul 2010 21:09:24 +1000 (EST) Message-ID: <4C497A4E.8050307@modulus.org> Date: Fri, 23 Jul 2010 21:17:34 +1000 From: Andrew Snow User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100528 Thunderbird/3.0.5 MIME-Version: 1.0 To: blttll , Attila Nagy , freebsd-fs@freebsd.org References: <4C496EB0.7050004@fsn.hu> <4C497782.8020704@modulus.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 11:17:42 -0000 > it is countdown timer, so it is 00m00s when download finished. Whoops, thanks. Which webserver software is in use? Is it using sendfile? - Andrew From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 11:27:00 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7AD2C106566C for ; Fri, 23 Jul 2010 11:27:00 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 2BE888FC1F for ; Fri, 23 Jul 2010 11:26:59 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id 60E5939A3D8; Fri, 23 Jul 2010 13:26:58 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000003, version=1.2.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 9.3336] X-CRM114-CacheID: sfid-20100723_13265_FCAEC340 X-CRM114-Status: Good ( pR: 9.3336 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Fri Jul 23 13:26:58 2010 X-DSPAM-Confidence: 0.9900 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4c497c81432705673680053 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00123, FreeBSD, 0.00262, FreeBSD, 0.00262, wrote, 0.00274, wrote, 0.00274, wrote+>, 0.00304, wrote+>, 0.00304, >+On, 0.00943, CEST, 0.01000, Jeremy, 0.01000, #13, 0.01000, patches, 0.01000, at+12, 0.01000, 2010+root, 0.01000, 23+2010, 0.01000, Nagy, 0.01000, Thanks+>, 0.01000, FreeBSD+8, 0.01000, I've, 0.01076, 2010+at, 0.01326, >+>>, 0.01408, root, 0.01605, Url*freebsd, 0.01684, >>+>, 0.02034, Received*ESMTPSA, 0.02776, Received*ESMTPSA+id, 0.02776, X-Spambayes-Classification: ham; 0.02 Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99]) by people.fsn.hu (Postfix) with ESMTPSA id 505B839A3CC; Fri, 23 Jul 2010 13:26:52 +0200 (CEST) Message-ID: <4C497C7C.4080601@fsn.hu> Date: Fri, 23 Jul 2010 13:26:52 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.10) Gecko/20100629 Thunderbird/3.0.5 MIME-Version: 1.0 To: Jeremy Chadwick References: <4C496EB0.7050004@fsn.hu> <20100723105707.GA47221@icarus.home.lan> In-Reply-To: <20100723105707.GA47221@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Cc: fs@freebsd.org Subject: Re: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 11:27:00 -0000 On 07/23/10 12:57, Jeremy Chadwick wrote: > On Fri, Jul 23, 2010 at 12:28:00PM +0200, Attila Nagy wrote: > >> I've came across a strange issue. >> [...] >> > Can you please provide uname -a output? Thanks. > Sure: FreeBSD ftpfe1.fsn.hu 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #13: Fri Jul 23 12:16:12 CEST 2010 root@ftpfe1.fsn.hu:/usr/obj/usr/src/sys/FTP amd64 BTW, it contains the following patches from mm@: http://people.freebsd.org/~mm/patches/zfs/v15/stable-8-v15.patch http://mfsbsd.vx.sk/zfs/head-9701.patch http://mfsbsd.vx.sk/zfs/head-9909.patch http://mfsbsd.vx.sk/zfs/head-9981-10143-10232-10250-10269.patch You are right that I should try without them. :) From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 12:15:30 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 66DAD106567D for ; Fri, 23 Jul 2010 12:15:30 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 228898FC19 for ; Fri, 23 Jul 2010 12:15:29 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1OcHA1-0002cB-4R for freebsd-fs@freebsd.org; Fri, 23 Jul 2010 14:15:29 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 23 Jul 2010 14:15:29 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 23 Jul 2010 14:15:29 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Fri, 23 Jul 2010 14:15:44 +0200 Lines: 16 Message-ID: References: <4C496EB0.7050004@fsn.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.9) Gecko/20100518 Thunderbird/3.0.4 In-Reply-To: <4C496EB0.7050004@fsn.hu> X-Enigmail-Version: 1.0.1 Subject: Re: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 12:15:30 -0000 On 07/23/10 12:28, Attila Nagy wrote: > Hi, > > I've came across a strange issue. On a file server (ftp/http/rsync) > there is a dual SSD based L2ARC configured for a pool of 24 disks: > fetch -o /dev/null -4 > http://ftp.fsn.hu/pub/CDROM-Images/opensolaris/osol-0906-106a-ai-sparc.iso > /dev/null 100% of 493 MB 11 MBps If I understand your setup and your benchmark correctly, you are saying you have achieved 11 megabytes / s performance out of a volume of 24 RAIDZ2 drives split into two parts (so it's like RAID 60). Doesn't this number seem extremely low to you, considering that (if recent models) each of your drives can probably pull at least 70 MB/s? From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 12:50:57 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 97286106567B; Fri, 23 Jul 2010 12:50:57 +0000 (UTC) (envelope-from ticso@cicely7.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id 1D6738FC14; Fri, 23 Jul 2010 12:50:56 +0000 (UTC) Received: from mail.cicely.de ([10.1.1.37]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id o6NCosFH007198 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 23 Jul 2010 14:50:54 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9]) by mail.cicely.de (8.14.3/8.14.3) with ESMTP id o6NCopdi048740 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 23 Jul 2010 14:50:51 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (localhost [127.0.0.1]) by cicely7.cicely.de (8.14.2/8.14.2) with ESMTP id o6NCoplG058158; Fri, 23 Jul 2010 14:50:51 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: (from ticso@localhost) by cicely7.cicely.de (8.14.2/8.14.2/Submit) id o6NCop8p058157; Fri, 23 Jul 2010 14:50:51 +0200 (CEST) (envelope-from ticso) Date: Fri, 23 Jul 2010 14:50:51 +0200 From: Bernd Walter To: Ivan Voras Message-ID: <20100723125051.GM53114@cicely7.cicely.de> References: <4C496EB0.7050004@fsn.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: FreeBSD cicely7.cicely.de 7.0-STABLE i386 User-Agent: Mutt/1.5.11 X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED=-1, BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01 autolearn=ham version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on spamd.cicely.de Cc: freebsd-fs@freebsd.org Subject: Re: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 12:50:57 -0000 On Fri, Jul 23, 2010 at 02:15:44PM +0200, Ivan Voras wrote: > On 07/23/10 12:28, Attila Nagy wrote: > > Hi, > > > > I've came across a strange issue. On a file server (ftp/http/rsync) > > there is a dual SSD based L2ARC configured for a pool of 24 disks: > > > fetch -o /dev/null -4 > > http://ftp.fsn.hu/pub/CDROM-Images/opensolaris/osol-0906-106a-ai-sparc.iso > > /dev/null 100% of 493 MB 11 MBps > > If I understand your setup and your benchmark correctly, you are saying > you have achieved 11 megabytes / s performance out of a volume of 24 > RAIDZ2 drives split into two parts (so it's like RAID 60). Doesn't this > number seem extremely low to you, considering that (if recent models) > each of your drives can probably pull at least 70 MB/s? It is also quite strange that a linear read file gets stored in L2ARC, which usually holds random accessed data. Maybe it is very fragmented on disks. L2ARC with MLC drives usually is much slower than modern disks when it comes to linear reads. Are there any facts backup your assumption that data is really read from memory, SSD, disk in the named cases? E.g. by ARC/L2ARC and IO statistics. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 13:02:03 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 627911065670 for ; Fri, 23 Jul 2010 13:02:03 +0000 (UTC) (envelope-from rafaelhfaria@cenadigital.com.br) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 02E278FC17 for ; Fri, 23 Jul 2010 13:02:02 +0000 (UTC) Received: by wyj26 with SMTP id 26so194538wyj.13 for ; Fri, 23 Jul 2010 06:02:02 -0700 (PDT) Received: by 10.227.146.142 with SMTP id h14mr3469767wbv.25.1279888561410; Fri, 23 Jul 2010 05:36:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.30.72 with HTTP; Fri, 23 Jul 2010 05:35:31 -0700 (PDT) In-Reply-To: References: <4C496EB0.7050004@fsn.hu> From: Rafael Henrique Faria Date: Fri, 23 Jul 2010 09:35:31 -0300 Message-ID: To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 13:02:03 -0000 On Fri, Jul 23, 2010 at 09:15, Ivan Voras wrote: > On 07/23/10 12:28, Attila Nagy wrote: > > Hi, > > > > I've came across a strange issue. On a file server (ftp/http/rsync) > > there is a dual SSD based L2ARC configured for a pool of 24 disks: > > > fetch -o /dev/null -4 > > > http://ftp.fsn.hu/pub/CDROM-Images/opensolaris/osol-0906-106a-ai-sparc.iso > > /dev/null 100% of 493 MB 11 MBps > > If I understand your setup and your benchmark correctly, you are saying > you have achieved 11 megabytes / s performance out of a volume of 24 > RAIDZ2 drives split into two parts (so it's like RAID 60). Doesn't this > number seem extremely low to you, considering that (if recent models) > each of your drives can probably pull at least 70 MB/s? > > > Hi, I'm not so sure, but some time ago, reading the ZFS documentation from Sun, I noticed that with a lot of disks, is better to use a lot of raidz too... with your 24 disks, I think that you could get better security (don't know about performance) with 4 raidz2 (6 disks on each one) And, again, not sure here, but I think that more then one raidz pool, will be like a JBOD, and not a stripe (60). -- Rafael Henrique da Silva Faria From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 13:16:57 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6026A106566C; Fri, 23 Jul 2010 13:16:57 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id A90E98FC16; Fri, 23 Jul 2010 13:16:56 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id B305839B102; Fri, 23 Jul 2010 15:16:54 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 8.4695] X-CRM114-CacheID: sfid-20100723_15155_1BBBEC24 X-CRM114-Status: Good ( pR: 8.4695 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Fri Jul 23 15:16:53 2010 X-DSPAM-Confidence: 0.9910 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4c499644562341620720745 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00119, wrote, 0.00274, wrote, 0.00274, wrote+>, 0.00304, wrote+>, 0.00304, >+On, 0.00943, this+>, 0.01000, CPUs, 0.01000, I+understand, 0.01000, hardware, 0.01000, benchmark, 0.01000, saying, 0.01000, understand+your, 0.01000, Nagy, 0.01000, checksum, 0.01000, on+it, 0.01000, the+machine, 0.01000, it's+not, 0.01000, /dev/null, 0.01000, /dev/null, 0.01000, 246, 0.01000, I've, 0.01076, >>+>>, 0.01253, >>+>>, 0.01253, >+>>, 0.01408, >+>>, 0.01408, X-Spambayes-Classification: ham; 0.00 Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99]) by people.fsn.hu (Postfix) with ESMTPSA id 7CF2A39B0EB; Fri, 23 Jul 2010 15:15:35 +0200 (CEST) Message-ID: <4C4995F7.2080107@fsn.hu> Date: Fri, 23 Jul 2010 15:15:35 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.10) Gecko/20100629 Thunderbird/3.0.5 MIME-Version: 1.0 To: Ivan Voras References: <4C496EB0.7050004@fsn.hu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 13:16:57 -0000 On 07/23/10 14:15, Ivan Voras wrote: > On 07/23/10 12:28, Attila Nagy wrote: > >> Hi, >> >> I've came across a strange issue. On a file server (ftp/http/rsync) >> there is a dual SSD based L2ARC configured for a pool of 24 disks: >> > >> fetch -o /dev/null -4 >> http://ftp.fsn.hu/pub/CDROM-Images/opensolaris/osol-0906-106a-ai-sparc.iso >> /dev/null 100% of 493 MB 11 MBps >> > If I understand your setup and your benchmark correctly, you are saying > you have achieved 11 megabytes / s performance out of a volume of 24 > RAIDZ2 drives split into two parts (so it's like RAID 60). Doesn't this > number seem extremely low to you, considering that (if recent models) > each of your drives can probably pull at least 70 MB/s? > First of all, it's not an isolated system, there are active users on it. But you are right, 11 MiBps is far from being the max out of this hardware, even considering that the CPUs are somewhat old (2xAMD Opteron 246, 2GHz). When pulling this amount of data out of the machine, the disks aren't saturated, they are at around 10-20% of utilization according to gstat. BTW, remember that two RAIDZ2 in stripe isn't RAID60. In RAIDZ2 every read involves a full stripe (er, block) read for checksum validation, which means at a 128 kiB blocksize and with 12 disks in a RAIDZ2 pool, all disks provide their part of that 128k read. That's why a RAIDZ2 pool's IO performance equals of one disk's. The disks in a normal 20-30 MiBps network load do about 30-40 read IOPS, you are right that they are capable of more (around 100-120). From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 13:21:30 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76A091065672 for ; Fri, 23 Jul 2010 13:21:30 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 3140A8FC12 for ; Fri, 23 Jul 2010 13:21:29 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1OcIBs-00009u-AO for freebsd-fs@freebsd.org; Fri, 23 Jul 2010 15:21:28 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 23 Jul 2010 15:21:28 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 23 Jul 2010 15:21:28 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Fri, 23 Jul 2010 15:21:42 +0200 Lines: 20 Message-ID: References: <4C496EB0.7050004@fsn.hu> <4C4995F7.2080107@fsn.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.9) Gecko/20100518 Thunderbird/3.0.4 In-Reply-To: <4C4995F7.2080107@fsn.hu> X-Enigmail-Version: 1.0.1 Subject: Re: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 13:21:30 -0000 On 07/23/10 15:15, Attila Nagy wrote: > When pulling this amount of data out of the machine, the disks aren't > saturated, they are at around 10-20% of utilization according to gstat. > BTW, remember that two RAIDZ2 in stripe isn't RAID60. In RAIDZ2 every > read involves a full stripe (er, block) read for checksum validation, > which means at a 128 kiB blocksize and with 12 disks in a RAIDZ2 pool, > all disks provide their part of that 128k read. > That's why a RAIDZ2 pool's IO performance equals of one disk's. Yes, in case of random IOPS you are correct - and in your case it would mean that the files are horribly fragmented (torrent downloads? :)). For sequential IO, even RAIDZ/1/2 will give N-1/2/3 times the performance of a single drive because prefetching will kick in. > The disks in a normal 20-30 MiBps network load do about 30-40 read IOPS, > you are right that they are capable of more (around 100-120). Except for the possible fragmentation issue, I think you should get much better throughput even with 30-40 IOPS per drive. From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 13:23:47 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2CF041065689; Fri, 23 Jul 2010 13:23:47 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 677B58FC1D; Fri, 23 Jul 2010 13:23:45 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id 629AA39B1A8; Fri, 23 Jul 2010 15:23:44 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 13.9824] X-CRM114-CacheID: sfid-20100723_15230_DDD0F203 X-CRM114-Status: Good ( pR: 13.9824 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Fri Jul 23 15:23:44 2010 X-DSPAM-Confidence: 0.9916 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4c4997df898121049039089 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00116, wrote, 0.00274, wrote, 0.00274, wrote+>, 0.00304, wrote+>, 0.00304, 8+0, 0.00631, 0+8, 0.00871, >+On, 0.00943, 9+0, 0.01000, >+It, 0.01000, data+is, 0.01000, >+There, 0.01000, stuff, 0.01000, 0+4, 0.01000, I+understand, 0.01000, reading, 0.01000, wrote+>>, 0.01000, benchmark, 0.01000, stream, 0.01000, saying, 0.01000, understand+your, 0.01000, but+I'm, 0.01000, 23+2010, 0.01000, >+it, 0.01000, I'm+not, 0.01000, 4+1, 0.01000, X-Spambayes-Classification: ham; 0.00 Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99]) by people.fsn.hu (Postfix) with ESMTPSA id 7FAF139B16E; Fri, 23 Jul 2010 15:20:52 +0200 (CEST) Message-ID: <4C499733.5000104@fsn.hu> Date: Fri, 23 Jul 2010 15:20:51 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.10) Gecko/20100629 Thunderbird/3.0.5 MIME-Version: 1.0 To: ticso@cicely.de References: <4C496EB0.7050004@fsn.hu> <20100723125051.GM53114@cicely7.cicely.de> In-Reply-To: <20100723125051.GM53114@cicely7.cicely.de> Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Bernd Walter , Ivan Voras Subject: Re: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 13:23:47 -0000 On 07/23/10 14:50, Bernd Walter wrote: > On Fri, Jul 23, 2010 at 02:15:44PM +0200, Ivan Voras wrote: > >> On 07/23/10 12:28, Attila Nagy wrote: >> >>> Hi, >>> >>> I've came across a strange issue. On a file server (ftp/http/rsync) >>> there is a dual SSD based L2ARC configured for a pool of 24 disks: >>> >> >>> fetch -o /dev/null -4 >>> http://ftp.fsn.hu/pub/CDROM-Images/opensolaris/osol-0906-106a-ai-sparc.iso >>> /dev/null 100% of 493 MB 11 MBps >>> >> If I understand your setup and your benchmark correctly, you are saying >> you have achieved 11 megabytes / s performance out of a volume of 24 >> RAIDZ2 drives split into two parts (so it's like RAID 60). Doesn't this >> number seem extremely low to you, considering that (if recent models) >> each of your drives can probably pull at least 70 MB/s? >> > It is also quite strange that a linear read file gets stored in L2ARC, > which usually holds random accessed data. > Maybe it is very fragmented on disks. > L2ARC with MLC drives usually is much slower than modern disks when > it comes to linear reads. > There is no linear reads here from the PoV of the disks. Exactly one stream of linear read is linear read, but two streams are not. :) Maybe I should have written this first, but I'm not the only one reading from the machine. For random reads even the cheapest MLC outperforms a 7k2 SATA disk (only reads), and this is an Intel stuff, which can do 3000 RIOPS easily. > Are there any facts backup your assumption that data is really > read from memory, SSD, disk in the named cases? > E.g. by ARC/L2ARC and IO statistics. > Yes. When downloading from L2ARC: L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 174 174 21505 0.8 0 0 0.0 13.3| ad4 0 169 169 21479 0.9 0 0 0.0 15.0| ad6 when downloading from ARC: L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 26 19 1129 0.6 7 78 0.4 1.3| ad4 0 19 12 1436 1.1 7 78 0.3 1.4| ad6 From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 13:29:41 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 565ED1065672 for ; Fri, 23 Jul 2010 13:29:41 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 100618FC08 for ; Fri, 23 Jul 2010 13:29:40 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1OcIJk-0004AN-H4 for freebsd-fs@freebsd.org; Fri, 23 Jul 2010 15:29:36 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 23 Jul 2010 15:29:36 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 23 Jul 2010 15:29:36 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Fri, 23 Jul 2010 15:29:51 +0200 Lines: 25 Message-ID: References: <4C496EB0.7050004@fsn.hu> <20100723125051.GM53114@cicely7.cicely.de> <4C499733.5000104@fsn.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.9) Gecko/20100518 Thunderbird/3.0.4 In-Reply-To: <4C499733.5000104@fsn.hu> X-Enigmail-Version: 1.0.1 Subject: Re: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 13:29:41 -0000 On 07/23/10 15:20, Attila Nagy wrote: > Maybe I should have written this first, but I'm not the only one reading > from the machine. You probably realize this makes all your performance data of suspicious validity :) > For random reads even the cheapest MLC outperforms a 7k2 SATA disk (only > reads), and this is an Intel stuff, which can do 3000 RIOPS easily. >> Are there any facts backup your assumption that data is really >> read from memory, SSD, disk in the named cases? >> E.g. by ARC/L2ARC and IO statistics. >> > Yes. When downloading from L2ARC: > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 174 174 21505 0.8 0 0 0.0 13.3| ad4 > 0 169 169 21479 0.9 0 0 0.0 15.0| ad6 > when downloading from ARC: > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 26 19 1129 0.6 7 78 0.4 1.3| ad4 > 0 19 12 1436 1.1 7 78 0.3 1.4| ad6 So it looks like you encountered a problem where the memory-based ARC cache read performance is incredibly bad? From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 13:34:42 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D36A0106566C; Fri, 23 Jul 2010 13:34:42 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 1D9A98FC1A; Fri, 23 Jul 2010 13:34:41 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id 6E1F839B2DE; Fri, 23 Jul 2010 15:34:40 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.001883, version=1.2.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 13.7865] X-CRM114-CacheID: sfid-20100723_15343_31086577 X-CRM114-Status: Good ( pR: 13.7865 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Fri Jul 23 15:34:40 2010 X-DSPAM-Confidence: 0.9920 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4c499a70132352015961825 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00115, >+>, 0.00271, wrote, 0.00274, wrote, 0.00274, wrote+>, 0.00304, wrote+>, 0.00304, 8+0, 0.00631, >+I, 0.00709, 0+8, 0.00871, >+On, 0.00943, encountered, 0.01000, 9+0, 0.01000, data+is, 0.01000, it+), 0.01000, a+computer, 0.01000, stuff, 0.01000, 0+4, 0.01000, reproduce, 0.01000, reproduce, 0.01000, reading, 0.01000, reading, 0.01000, >+You, 0.01000, wouldn't, 0.01000, worse, 0.01000, cache, 0.01000, cache, 0.01000, X-Spambayes-Classification: ham; 0.00 Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99]) by people.fsn.hu (Postfix) with ESMTPSA id D407E39B2D2; Fri, 23 Jul 2010 15:34:32 +0200 (CEST) Message-ID: <4C499A67.9080707@fsn.hu> Date: Fri, 23 Jul 2010 15:34:31 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.10) Gecko/20100629 Thunderbird/3.0.5 MIME-Version: 1.0 To: Ivan Voras References: <4C496EB0.7050004@fsn.hu> <20100723125051.GM53114@cicely7.cicely.de> <4C499733.5000104@fsn.hu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 13:34:42 -0000 On 07/23/10 15:29, Ivan Voras wrote: > On 07/23/10 15:20, Attila Nagy wrote: > > >> Maybe I should have written this first, but I'm not the only one reading >> from the machine. >> > You probably realize this makes all your performance data of suspicious > validity :) > Yes, this is the same I would write to this e-mail, but I can reproduce it. :) Fetching the same file not in the cache three times make the first the slowest, the second (after waiting a little to fall out of RAM) the fastest and the third the second fastest. I can consistently reproduce this behaviour, but only via network (ftpd/httpd) not from localhost. >> For random reads even the cheapest MLC outperforms a 7k2 SATA disk (only >> reads), and this is an Intel stuff, which can do 3000 RIOPS easily. >> >>> Are there any facts backup your assumption that data is really >>> read from memory, SSD, disk in the named cases? >>> E.g. by ARC/L2ARC and IO statistics. >>> >>> >> Yes. When downloading from L2ARC: >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >> 0 174 174 21505 0.8 0 0 0.0 13.3| ad4 >> 0 169 169 21479 0.9 0 0 0.0 15.0| ad6 >> when downloading from ARC: >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >> 0 26 19 1129 0.6 7 78 0.4 1.3| ad4 >> 0 19 12 1436 1.1 7 78 0.3 1.4| ad6 >> > So it looks like you encountered a problem where the memory-based ARC > cache read performance is incredibly bad? > I wouldn't call it incredibly bad, but it's worse than reading from L2ARC (2xSSD), which is pretty strange and not sane, at least to what I know about how things work in a computer. :) From owner-freebsd-fs@FreeBSD.ORG Fri Jul 23 13:43:35 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 440AB106566B; Fri, 23 Jul 2010 13:43:35 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 8C2AE8FC08; Fri, 23 Jul 2010 13:43:33 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id E6CC539B3A4; Fri, 23 Jul 2010 15:43:32 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 10.9541] X-CRM114-CacheID: sfid-20100723_15432_4291F35F X-CRM114-Status: Good ( pR: 10.9541 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Fri Jul 23 15:43:32 2010 X-DSPAM-Confidence: 0.9915 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4c499c84976601156211266 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00114, >+>, 0.00271, wrote, 0.00274, wrote, 0.00274, wrote+>, 0.00304, wrote+>, 0.00304, >+On, 0.00943, it+doesn't, 0.01000, previously, 0.01000, mean+that, 0.01000, >+a, 0.01000, in+>, 0.01000, of+course, 0.01000, 15+15, 0.01000, case+of, 0.01000, worse, 0.01000, of+>, 0.01000, >+This, 0.01000, Nagy, 0.01000, )+>, 0.01000, checksum, 0.01000, the+machine, 0.01000, much+>, 0.01000, I've, 0.01076, >+>>, 0.01408, >+>>, 0.01408, X-Spambayes-Classification: ham; 0.00 Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99]) by people.fsn.hu (Postfix) with ESMTPSA id A5FB239B396; Fri, 23 Jul 2010 15:43:21 +0200 (CEST) Message-ID: <4C499C79.8040001@fsn.hu> Date: Fri, 23 Jul 2010 15:43:21 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.10) Gecko/20100629 Thunderbird/3.0.5 MIME-Version: 1.0 To: Ivan Voras References: <4C496EB0.7050004@fsn.hu> <4C4995F7.2080107@fsn.hu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: ZFS makes SSDs faster than memory! X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jul 2010 13:43:35 -0000 On 07/23/10 15:21, Ivan Voras wrote: > On 07/23/10 15:15, Attila Nagy wrote: > > >> When pulling this amount of data out of the machine, the disks aren't >> saturated, they are at around 10-20% of utilization according to gstat. >> BTW, remember that two RAIDZ2 in stripe isn't RAID60. In RAIDZ2 every >> read involves a full stripe (er, block) read for checksum validation, >> which means at a 128 kiB blocksize and with 12 disks in a RAIDZ2 pool, >> all disks provide their part of that 128k read. >> That's why a RAIDZ2 pool's IO performance equals of one disk's. >> > Yes, in case of random IOPS you are correct - and in your case it would > mean that the files are horribly fragmented (torrent downloads? :)). For > sequential IO, even RAIDZ/1/2 will give N-1/2/3 times the performance of > a single drive because prefetching will kick in. > Especially when prefetching is disabled. :) I've had problems with it previously and it was left in this state. Turning it off makes that 11 MiBps 30 MiBps, and the disk utilization increase, but of course it doesn't affect the fact that SSDs are faster than RAM with FreeBSD/zfs. In fact, it made things worse. SSDs AND HDDs are now faster than RAM. :-O Damn. ;) > >> The disks in a normal 20-30 MiBps network load do about 30-40 read IOPS, >> you are right that they are capable of more (around 100-120). >> > Except for the possible fragmentation issue, I think you should get much > better throughput even with 30-40 IOPS per drive. > This is an FTP/HTTP server, like ftp.freebsd.org (well, exactly like that). If there is 20-30 MiBps network load that may be because nobody wants to fetch more data. :) From owner-freebsd-fs@FreeBSD.ORG Sat Jul 24 19:59:04 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 27B451065673; Sat, 24 Jul 2010 19:59:04 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 267AA8FC0A; Sat, 24 Jul 2010 19:59:04 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o6OJx3aH007544; Sat, 24 Jul 2010 19:59:03 GMT (envelope-from mm@freefall.freebsd.org) Received: (from mm@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o6OJx32H007540; Sat, 24 Jul 2010 19:59:03 GMT (envelope-from mm) Date: Sat, 24 Jul 2010 19:59:03 GMT Message-Id: <201007241959.o6OJx32H007540@freefall.freebsd.org> To: mm@FreeBSD.org, mm@FreeBSD.org, freebsd-fs@FreeBSD.org From: mm@FreeBSD.org Cc: Subject: Re: kern/145424: [zfs] [patch] move source closer to v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Jul 2010 19:59:04 -0000 Synopsis: [zfs] [patch] move source closer to v15 State-Changed-From-To: suspended->closed State-Changed-By: mm State-Changed-When: Sat Jul 24 19:59:03 UTC 2010 State-Changed-Why: ZFS version 15 commited to head in revision 209962 http://www.freebsd.org/cgi/query-pr.cgi?pr=145424 From owner-freebsd-fs@FreeBSD.ORG Sat Jul 24 23:08:31 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 13ED2106564A for ; Sat, 24 Jul 2010 23:08:31 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id C4EBB8FC0A for ; Sat, 24 Jul 2010 23:08:30 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AnUHALcOS0yDaFvH/2dsb2JhbACTFQEBjElxvm+FNgQ X-IronPort-AV: E=Sophos;i="4.55,255,1278302400"; d="scan'208";a="88008076" Received: from danube.cs.uoguelph.ca ([131.104.91.199]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 24 Jul 2010 19:08:27 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by danube.cs.uoguelph.ca (Postfix) with ESMTP id E67BA1078164 for ; Sat, 24 Jul 2010 19:08:29 -0400 (EDT) X-Virus-Scanned: amavisd-new at danube.cs.uoguelph.ca Received: from danube.cs.uoguelph.ca ([127.0.0.1]) by localhost (danube.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sPLkhRschW3L for ; Sat, 24 Jul 2010 19:08:29 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by danube.cs.uoguelph.ca (Postfix) with ESMTP id 0C099107815C for ; Sat, 24 Jul 2010 19:08:29 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o6ONQQB26925 for ; Sat, 24 Jul 2010 19:26:26 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Sat, 24 Jul 2010 19:26:26 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: freebsd-fs@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: Packrats: NFSv4 client side disk caching X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Jul 2010 23:08:31 -0000 I have just put some patches up on http://people.freebsd.org/~rmacklem/packrat-patches that implement Packrat kernel threads that cache a copy of small files on a client's local disk when an NFSv4 delegation is issued to the client. These only work for NFSv4 mounts where delegations are enabled and the kernel patch expects a post-FreeBSD8.1 FreeBSD-current kernel source tree. (I will try and track the FreeBSD-current kernel with the patch. Please let me know if the patch won't apply.) There is no recovery from client or server crashes implemented yet and, as such, the patches are definitely meant to be experimented with only. They do significantly reduce the # of Read/Write RPCs for initial experiments I've done with them and will hopefully improve performance, particularily when working across WAN networks (larger latency). If you choose to play with it, please let me know how it goes, rick