From owner-freebsd-fs@FreeBSD.ORG Wed Jun 12 10:41:51 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 55D64A50; Wed, 12 Jun 2013 10:41:51 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id D380010D6; Wed, 12 Jun 2013 10:41:50 +0000 (UTC) Received: from mfilter1-d.gandi.net (mfilter1-d.gandi.net [217.70.178.130]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id CE33341C07E; Wed, 12 Jun 2013 12:41:39 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter1-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter1-d.gandi.net (mfilter1-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id IQBXtwfEe7Xz; Wed, 12 Jun 2013 12:41:38 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id BA6AD41C090; Wed, 12 Jun 2013 12:41:37 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id EA25573A1C; Wed, 12 Jun 2013 03:41:35 -0700 (PDT) Date: Wed, 12 Jun 2013 03:41:35 -0700 From: Jeremy Chadwick To: Mikolaj Golub Subject: Re: hast: can't restore after disk failure Message-ID: <20130612104135.GA11495@icarus.home.lan> References: <20130610201650.GA2823@gmail.com> <20130611060741.GA42231@gmail.com> <20130612084453.GA55502@gmail.com> <20130612093639.GA9219@icarus.home.lan> <20130612100332.GB55502@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130612100332.GB55502@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Dmitry Morozovsky X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jun 2013 10:41:51 -0000 On Wed, Jun 12, 2013 at 01:03:33PM +0300, Mikolaj Golub wrote: > On Wed, Jun 12, 2013 at 02:36:39AM -0700, Jeremy Chadwick wrote: > > > I honestly cannot see how nv->nv_error (which is what nv_error() > > returns) gets set to ENOENT within the function call stack: > > > > - metadata_read() is what prints the error (line 152 in nv.c) > > - Error printing done by pjdlog_errno(), which uses the global errno > > to print its errors > > - nv = nv_ntoh(eb) > > - nv_ntoh() sets nv->nv_error to 0 initially, but then calls > > nv_validate() later on which can modify nv->error > > - nv_validate() explicitly sets error (which later can get assigned > > to nv->nv_error) to EINVAL in many cases, but not ENOENT. > > > > Therefore, I am honestly not sure how ENOENT gets returned to the user > > in this case. It looks like it's a misleading errno and is probably > > meant to be something else. If it's correct, I would absolutely love > > for someone to show me how/where. > > nv_find() (which is used by nv_get_* functions) sets ENOENT when it > fails. How wonderful -- when I reviewed the code, I thought "Oh surely those can't be responsible...". I did see nv_find(), but I did not think nv_get_*() would call that. My fault/failure. > "No such file or directory" really looks confusing in this case. I am > not sure what a code from errno.h would be better here though. ENOATTR? Sorry to make this longer than it needs to be, but I'm brain dumping: What exactly is the error condition that is happening in the above case? All I read was that the partition size differed between nodes and that this caused the issue? IMO, that condition should be checked and handled elegantly, and that the error message should not use an errno at all but instead just tell the user about the device size mismatch between nodes (for that specific device) -- the device sizes must match between both nodes, correct? There must be some kind of communication protocol between the nodes that can indicate something along those lines. If an errno is really needed, ENOATTR isn't relevant (that's referring to extended filesystem attributes). See intro(2) for the official explanation of all of them. I would choose EIO, ENXIO, ENOSPC, EOPNOTSUPP, or EPROTO. I have not looked at what OpenBSD and NetBSD have for errno.h. That might be good to do first. Else, Linux has some errno.h entries in it which look like they might be more relevant, such as EBADFD, EREMOTEIO, or EMEDIUMTYPE (this one might be a bit misleading). http://www.virtsync.com/c-error-codes-include-errno Some of these are even part of our recent BSM audit(2) stuff; check out include/bsm/audit_errno.h (some are Solaris specific but look like they might help, and I see some duplicates between those and what Linux has too). Important: I do not know the implications of adding/enhancing errno. POSIX is involved, thus it would be wise to ask Bruce Evans. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB |