Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Dec 2016 12:27:43 +0000
From:      Ian Campbell <ian.campbell@docker.com>
To:        freebsd-virtualization@freebsd.org
Cc:        anil@recoil.org
Subject:   Query about bhyve's blockif_cancel and the signalling mechanisms
Message-ID:  <CAOc2ZU0Hqvctv767ESu6fKwXJ3W35ReqM3=ud6G3MzKLSEY=Bw@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hello,

Recently I've been investigating a blk io hang/deadlock[0] in Hyperkit
(a bhyve derived hypervisor for Mac's using the OSX Hypervisor
framework).

While I think I understand that issue (TL;DR: OSX kevent EVFILT_SIGNAL
does not receive signals directed to specific threads rather than
entire processes, which is different from FreeBSD where it does) there
are some aspects of the code which I'm unsure about the mechanisms by
which they work on FreeBSD.

To recap my understanding of the mechanisms at work (glossing over the
queue handling and condvars involved etc), the bhyve block_if
infrastructure registers a callback for SIGCONT with the mevent
subsystem, which is a kevent/kqueue thing which delivers events to the
main thread (mevent_dispatch is the last thing in main()) it also sets
SIGCONT to SIG_IGN. When a disk controller device model wants to
cancel a block request (e.g. in ahci_port_stop) it calls
blockif_cancel which sends a SIGCONT to the blkio thread which has
claimed the request, notionally to kick it out of whatever blocking
system call it is in and cause it to return an error to the device
model. The signal handler callback (in the mevent thread) then kicks
the thread which sent the signal to indicate completion of the
cancellation. I go in to some more detail of how I think things work
in [0] but that's the gist of it.

The main thing I do not follow is whether or not the blkio thread is
actually interrupted at all when the signal has been configured to be
delivered via the kevent/kqueue mechanisms to a 3rd unrelated thread.
I've dug around in the FreeBSD kevent and signal man pages but I
cannot find any part which describes anything of the semantics which
bhyve seems to be relying on (which seems to be that the system call
in the target thread will return EINTR at some point before the thread
which is "handling" the signal via kevent/kqueue sees that event).

Have I missed something here or is bhyve relying on some subtle
underlying semantics?

I have a secondary concern which is what happens if the IO thread is
on its way to making a blocking system call in blockif_proc but has
not actually done so when the signal is delivered. It seems like it
would simply carry on and make the blocking call with perhaps
unexpected consequences (i/o getting wedged, perhaps only until a
second reset attempt). I've not actually seen this happening though
and there's a chance I'm simply over thinking things after staring at
them for so long!

I should say that I've mostly been looking at the hyperkit code here
but AFAICT it has not diverged in any relevant ways from the bhyve
code (although it might now have to in the future given the differing
semantics of EVFILT_SIGNAL on OSX).

Thanks,
Ian.

[0] https://github.com/docker/hyperkit/issues/94



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOc2ZU0Hqvctv767ESu6fKwXJ3W35ReqM3=ud6G3MzKLSEY=Bw>