Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Jun 2013 14:55:41 +0300
From:      Mikolaj Golub <to.my.trociny@gmail.com>
To:        Jeremy Chadwick <jdc@koitsu.org>
Cc:        freebsd-fs@freebsd.org, Dmitry Morozovsky <marck@rinet.ru>
Subject:   Re: hast: can't restore after disk failure
Message-ID:  <20130612115540.GC55502@gmail.com>
In-Reply-To: <20130612104135.GA11495@icarus.home.lan>
References:  <alpine.BSF.2.00.1306101700300.69113@woozle.rinet.ru> <20130610201650.GA2823@gmail.com> <alpine.BSF.2.00.1306110038010.96502@woozle.rinet.ru> <20130611060741.GA42231@gmail.com> <alpine.BSF.2.00.1306120022580.96502@woozle.rinet.ru> <20130612084453.GA55502@gmail.com> <20130612093639.GA9219@icarus.home.lan> <20130612100332.GB55502@gmail.com> <20130612104135.GA11495@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 12, 2013 at 03:41:35AM -0700, Jeremy Chadwick wrote:
> On Wed, Jun 12, 2013 at 01:03:33PM +0300, Mikolaj Golub wrote:
> > On Wed, Jun 12, 2013 at 02:36:39AM -0700, Jeremy Chadwick wrote:
> > 
> > > I honestly cannot see how nv->nv_error (which is what nv_error()
> > > returns) gets set to ENOENT within the function call stack:
> > > 
> > > - metadata_read() is what prints the error (line 152 in nv.c)
> > > - Error printing done by pjdlog_errno(), which uses the global errno
> > >   to print its errors
> > > - nv = nv_ntoh(eb)
> > > - nv_ntoh() sets nv->nv_error to 0 initially, but then calls
> > >   nv_validate() later on which can modify nv->error
> > > - nv_validate() explicitly sets error (which later can get assigned
> > >   to nv->nv_error) to EINVAL in many cases, but not ENOENT.
> > > 
> > > Therefore, I am honestly not sure how ENOENT gets returned to the user
> > > in this case.  It looks like it's a misleading errno and is probably
> > > meant to be something else.  If it's correct, I would absolutely love
> > > for someone to show me how/where.
> > 
> > nv_find() (which is used by nv_get_* functions) sets ENOENT when it
> > fails.
> 
> How wonderful -- when I reviewed the code, I thought "Oh surely those
> can't be responsible...".  I did see nv_find(), but I did not think
> nv_get_*() would call that.  My fault/failure.
> 
> > "No such file or directory" really looks confusing in this case. I am
> > not sure what a code from errno.h would be better here though. ENOATTR?
> 
> Sorry to make this longer than it needs to be, but I'm brain dumping:
> 
> What exactly is the error condition that is happening in the above case?
> All I read was that the partition size differed between nodes and that
> this caused the issue?

As I wrote it before the error was that hastd failed to parse metadata
it had read from the local disk (failed to find some entry in metadata
structure). Usually this happens when metadata is not properly
initialized for a new disk or corrupted.

Different data sizes should trigger the error "Data size differs
between nodes ..." on primary.

Unfortunately I have not seen full logs from primary and secondary, so
it is difficult to me to guess what was going on there.

-- 
Mikolaj Golub



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130612115540.GC55502>