Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Nov 2016 14:00:47 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Mark Johnston <markj@FreeBSD.org>
Cc:        David Cross <dcrosstech@gmail.com>, freebsd-hackers@freebsd.org
Subject:   Re: FreeBSD 11 i386 disk deadlock (I think) (now with reproduction steps!)
Message-ID:  <20161128120046.GP54029@kib.kiev.ua>
In-Reply-To: <20161128041847.GA65249@charmander>
References:  <CAM9edeMYMhnkWid7Lig5D-FjhahniFm0VbFRm8ysyb85h29wXg@mail.gmail.com> <20161128041847.GA65249@charmander>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Nov 27, 2016 at 08:18:47PM -0800, Mark Johnston wrote:
> On Sun, Nov 27, 2016 at 03:17:13PM -0500, David Cross wrote:
> > So, narrowing this down, I think it has something to do with geli swap
> > (since I can easily reproduce it with geli swap, but have yet to reproduce
> > it without).. and I have a bit of a convoluted way almost anyone can
> > reproduce it with bhyve.  (Note, I haven't been able to get a crashdump,
> > since apparently the VM system being locked up prevents that, but with
> > watchdogd, I have been able to get into DDB)
> > 
> > Anyway, my reproduction steps, I used the 11.0 Retail DVD, but I fully
> > suspect the 11.0-RELEASE image will be fine to install an i386 image into
> > bhyve; I install to vtbd disks (even though my 'real' case is to an ada
> > device, that this can be repro-ed across such wide "hardware" really
> > reduces the likelyhood of a device driver issue)
> > 
> > After its installed, I start my VM with the following (dropping memory to
> > the floor, well below my "real" machine, but the emulated machine is much
> > faster and I suspsect this is a race condition somewhere), note the options
> > to the virtio-blk device to pin it to "real" and not hit the host vmcache,
> > again speed seems to be key here, and slowing things down makes it more
> > likely to happen.
> > 
> > bhyveload -m 64M -d /usr/bhyve/11.0.1-i386.img fbsd11-i386
> > bhyve -u -A -c 1 -H -m 64M -C -s 0,hostbridge -s 1,lpc -s 2,virtio-net,tap0
> > -s 3,virtio-blk,/usr/bhyve/11.0.1-i386.img,nocache,direct -l
> > com1,/dev/nmdm0A fbsd11-i386
> > 
> > At this point:
> > Log into the VM
> > cd /usr/src
> > /usr/bin/make buildkernel
> > <wait>
> > 
> > For me this has hung 99% of the time at:
> > objcopy --strip-debug kernel
> > 
> > Once you've gotten here once, I have been able to just skip the rest of the
> > compile, cd /usr/obj/usr/src/sys/GENERIC run that command directly and
> > trigger the condition.
> > 
> > What I have at this point is the following DDB ps list:
> > 
> > db> ps
> >   pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
> > ...
> >    50     0     0     0  DL      vmwait   0xc1c4f6d8 [g_eli[0] vtbd0p3]
> > ...
> > 100043                   D       wswbuf0  0xc1bf30d4 [pagedaemon]
> > ...
> > 
> > I note that the swapper and that geli are both in vmwait, and a bunch of
> > other processes are in pfault, and the "crypto" drivers are in disk wait??
> 
> This is a low memory deadlock: the pagedaemon is attempting to reclaim
> memory by freeing pages from the inactive queue, and here is waiting for
> the swap pager to finish writing out a page. However, the GELI thread is
> blocked waiting for the pagedaemon to free up some pages.
> 
> Some recent work that's gone into HEAD ought to address this scenario.
> In particular, with r308474 swapping is performed by a separate thread,
> so even if that thread blocks waiting for the GELI thread, the
> pagedaemon is able to continue freeing clean pages or at least kill
> memory-hogging processes. Could you try your scenario in a VM running a
> HEAD kernel?

Neither geli nor zfs vols can be used as swap, exactly because they
allocate memory on the write path.  In fact, zfs has troubles with the
normal pageout of files as well, for this same reason.

It is very easy to trigger situation when everything is dirty, and even
worse, it is possible to have all dirty pages belong to one vnode. The
laundry work is great, but it cannot completely solve the situation
where free or clean page producer allocates memory.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161128120046.GP54029>