Date: Thu, 21 Nov 2013 22:37:12 +0200 From: Mikolaj Golub <trociny@FreeBSD.org> To: Pete French <petefrench@ingresso.co.uk> Cc: freebsd-stable@freebsd.org Subject: Re: Hast locking up under 9.2 Message-ID: <20131121203711.GA3736@gmail.com> In-Reply-To: <E1VjSsY-000PXy-GC@dilbert.ingresso.co.uk> References: <E1VjSsY-000PXy-GC@dilbert.ingresso.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Nov 21, 2013 at 11:57:02AM +0000, Pete French wrote: > I have had to (hopefully temprarily) disable hats on > our systems as under 9.2 I am finding that it locks up under > high disc load. This has only sarted being a problem after we moved > from 8-STABLE to 9-STABLE, there was no locking up before. I remember already asking you about replication mode you was using and don't remember you answered. One of the significant changes is memsync mode, which is default in 9.2 (it was fullsync in eralier versions). So if you are using default settings you can try switching to fullsync as a workaround. > I dont have any useful debugging unfortunately, and I do > realise thart "it locks up" is unhelpful! The only thing > I see in the syslog are a statements like this: > > Nov 14 13:51:59 <daemon.err> serpentine-active hastd[1258]: [serp1] (primary) Worker process killed (pid=1520, signal=6). > Nov 14 13:51:59 <daemon.err> serpentine-passive hastd[14307]: [serp1] (secondary) Worker process exited ungracefully (pid=14638, exitcode=75). signal=6 means that hastd crashed due to some assertion failed. Usually "Assertion failed ..." message precedes this line in the logs. Don't you see such a message? It might be very helpful. Do you always see this error when it gets stuck? Unfortunately the crash did not generated core (due to capsicum). When I want to get a coredump I rebuild hastd with CFLAGS+=-DHAVE_CAPSICUM removed in Makefile (and with debugging symbols). There might be an easier method but I don't know. If you don't find the assertion message and the crashes are reproducible, it would be helpful to rebuild hastd with symbols and capsicum disabled to make it coredump and provide the backtrace. Also, when you have hastd got stuck you can generate a core of the live process with gcore(1). > Thats about all the nfo I have - currently I have taken hast out of the stack > and am tryying to cobble something together manually using > iscsi, but I would prefer to go back to hast if possible. Has anyone seen > anythign similar, or have any suggestions ? What revision are you using? Recently there was a fix for crashes triggered by this failed assertion: Assertion failed: (amp->am_memtab[ext] > 0), function activemap_write_complete, file activemap.c, line 351. It was merged to STABLE/9 in r257470 (2013-10-31). -- Mikolaj Golub
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131121203711.GA3736>