Date: Wed, 12 Jun 2013 02:36:39 -0700 From: Jeremy Chadwick <jdc@koitsu.org> To: Mikolaj Golub <to.my.trociny@gmail.com> Cc: freebsd-fs@freebsd.org, Dmitry Morozovsky <marck@rinet.ru> Subject: Re: hast: can't restore after disk failure Message-ID: <20130612093639.GA9219@icarus.home.lan> In-Reply-To: <20130612084453.GA55502@gmail.com> References: <alpine.BSF.2.00.1306101700300.69113@woozle.rinet.ru> <20130610201650.GA2823@gmail.com> <alpine.BSF.2.00.1306110038010.96502@woozle.rinet.ru> <20130611060741.GA42231@gmail.com> <alpine.BSF.2.00.1306120022580.96502@woozle.rinet.ru> <20130612084453.GA55502@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 12, 2013 at 11:44:54AM +0300, Mikolaj Golub wrote: > On Wed, Jun 12, 2013 at 12:23:52AM +0400, Dmitry Morozovsky wrote: > > On Tue, 11 Jun 2013, Mikolaj Golub wrote: > > > > > On Tue, Jun 11, 2013 at 12:40:08AM +0400, Dmitry Morozovsky wrote: > > > > On Mon, 10 Jun 2013, Mikolaj Golub wrote: > > > > > > > > [snipall] > > > > > > > > > > Jun 10 16:56:20 <console.info> cthulhu3 kernel: Jun 10 16:56:20 <daemon.err> > > > > > > cthulhu3 hastd[765]: [d1] (secondary) Worker process exited ungracefully > > > > > > (pid=14380, exitcode=66). > > > > > > > > > > > > Any hints? Thanks! > > > > > > > > > > Have you run hastctl create to initialize metadata? > > > > > > > > Yes, but did it naively: > > > > > > > > hastctl create d1 > > > > > > No errors? > > > > no visible, but hast instance ungracefully exits > > > > > > and status still reported 0 as provider size... > > > > > > I assume /dev/ada1p1 is present and readable/writable? > > > > > > Symptoms are like if it did not exist. > > > > nope, it does: > > > > root@cthulhu3:/# diskinfo /dev/ada1p1 > > /dev/ada1p1 512 999654686720 1952450560 0 1048576 1936954 16 63 > > root@cthulhu3:/# diskinfo /dev/ada0p1 > > /dev/ada0p1 512 999653638144 1952448512 0 1048576 1936952 16 63 > > > > Hm, looking in the source where this error is generated: > > cthulhu3 hastd[14379]: [d1] (secondary) Unable to read metadata from /dev/ada1p1: No such file or directory. > > it looks like hastd successfully read metadata from disk but failed to > parse it (did not found an entry). This usually happens when metadata > is not initialized by `hastctl create`. > > Does `hastctl dump d1' not work too? Note up front: I have zero familiarity with hast stuff. I'm just looking at source code, because your comment seems to indicate that ENOENT (errno 2; No such file or directory) is actually false/incorrect. I did spend almost 30 minutes digging through the hastd code. This is hard to follow -- very specifically, the error/errno situational code. It's a very deep rabbit hole. Variable names are common or re-used (legitimately due to local scope), and the actual error that gets printed comes directly from the global errno variable. I honestly cannot see how nv->nv_error (which is what nv_error() returns) gets set to ENOENT within the function call stack: - metadata_read() is what prints the error (line 152 in nv.c) - Error printing done by pjdlog_errno(), which uses the global errno to print its errors - nv = nv_ntoh(eb) - nv_ntoh() sets nv->nv_error to 0 initially, but then calls nv_validate() later on which can modify nv->error - nv_validate() explicitly sets error (which later can get assigned to nv->nv_error) to EINVAL in many cases, but not ENOENT. Therefore, I am honestly not sure how ENOENT gets returned to the user in this case. It looks like it's a misleading errno and is probably meant to be something else. If it's correct, I would absolutely love for someone to show me how/where. The code is here: http://svnweb.freebsd.org/base/stable/9/sbin/hastd/ -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130612093639.GA9219>