Date: Mon, 12 Dec 2016 12:27:43 +0000 From: Ian Campbell <ian.campbell@docker.com> To: freebsd-virtualization@freebsd.org Cc: anil@recoil.org Subject: Query about bhyve's blockif_cancel and the signalling mechanisms Message-ID: <CAOc2ZU0Hqvctv767ESu6fKwXJ3W35ReqM3=ud6G3MzKLSEY=Bw@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hello, Recently I've been investigating a blk io hang/deadlock[0] in Hyperkit (a bhyve derived hypervisor for Mac's using the OSX Hypervisor framework). While I think I understand that issue (TL;DR: OSX kevent EVFILT_SIGNAL does not receive signals directed to specific threads rather than entire processes, which is different from FreeBSD where it does) there are some aspects of the code which I'm unsure about the mechanisms by which they work on FreeBSD. To recap my understanding of the mechanisms at work (glossing over the queue handling and condvars involved etc), the bhyve block_if infrastructure registers a callback for SIGCONT with the mevent subsystem, which is a kevent/kqueue thing which delivers events to the main thread (mevent_dispatch is the last thing in main()) it also sets SIGCONT to SIG_IGN. When a disk controller device model wants to cancel a block request (e.g. in ahci_port_stop) it calls blockif_cancel which sends a SIGCONT to the blkio thread which has claimed the request, notionally to kick it out of whatever blocking system call it is in and cause it to return an error to the device model. The signal handler callback (in the mevent thread) then kicks the thread which sent the signal to indicate completion of the cancellation. I go in to some more detail of how I think things work in [0] but that's the gist of it. The main thing I do not follow is whether or not the blkio thread is actually interrupted at all when the signal has been configured to be delivered via the kevent/kqueue mechanisms to a 3rd unrelated thread. I've dug around in the FreeBSD kevent and signal man pages but I cannot find any part which describes anything of the semantics which bhyve seems to be relying on (which seems to be that the system call in the target thread will return EINTR at some point before the thread which is "handling" the signal via kevent/kqueue sees that event). Have I missed something here or is bhyve relying on some subtle underlying semantics? I have a secondary concern which is what happens if the IO thread is on its way to making a blocking system call in blockif_proc but has not actually done so when the signal is delivered. It seems like it would simply carry on and make the blocking call with perhaps unexpected consequences (i/o getting wedged, perhaps only until a second reset attempt). I've not actually seen this happening though and there's a chance I'm simply over thinking things after staring at them for so long! I should say that I've mostly been looking at the hyperkit code here but AFAICT it has not diverged in any relevant ways from the bhyve code (although it might now have to in the future given the differing semantics of EVFILT_SIGNAL on OSX). Thanks, Ian. [0] https://github.com/docker/hyperkit/issues/94
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOc2ZU0Hqvctv767ESu6fKwXJ3W35ReqM3=ud6G3MzKLSEY=Bw>