From owner-freebsd-arch Fri Mar 21 14:45:54 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 18EE337B401 for ; Fri, 21 Mar 2003 14:45:52 -0800 (PST) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 250BF43F93 for ; Fri, 21 Mar 2003 14:45:51 -0800 (PST) (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.8/8.12.8) with ESMTP id h2LMjl1f011640 for ; Fri, 21 Mar 2003 23:45:48 +0100 (CET) (envelope-from phk@phk.freebsd.dk) To: arch@freebsd.org Subject: GEOM ioctl issue. From: Poul-Henning Kamp Date: Fri, 21 Mar 2003 23:45:47 +0100 Message-ID: <11639.1048286747@critter.freebsd.dk> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I am trying to fix some issues relating to removable devices and GEOM, and have run into an either/or issue with GEOM's state-engine. A bit of background: ioctl will automagically do copyin(9)/copyout(9) of the primary argument which often is a struct. The way this works is that the ioctl identifier, which is an integer, has 13 bits dedicated to the length, so primary arguments can be up to 8191 bytes long. (The alert reader will notice that this is one byte short of being able to accomodate a UFS bootblock.) The ioctl syscall handler will extract this length and two flags which encode if the primary argument should be copied into the kernel before and out of the kernel after handling the ioctl, and do the appropriate copyin(9) and copyout(9). (See for how this encoding is done.) Think of this as a zeroth order marshalling layer. If the primary argument is a struct containing pointer(s) to something else, a string or maybe another structure, these secondary arguments are not handled in the generic ioctl syscall code, it becomes the responsibility of the ioctl implementor to explicity call copyin(9)/copyout(9) on these arguments. The way we have implemented copyin(9) and copyout(9) depends strongly on the fact that they are executed in the corresponding thread, it is not possible to access userland for one thread with copyin/out from another thread, userland or kernel. GEOM abandons the original thread as the very first thing, there are a multitude of reasons for this, the primary being prevention of kernel stack overrun and the second realible and simple serialization. If an ioctl is called on /dev/da0, what will happen is that the ioctl gets as far as to geom_dev.c (which is a normal device driver), where the ioctl arguments are packed into a special kind of struct bio, which then is shipped down-stream in the dedicated GEOM io-path threads. It will relatively fast hit geom_disk.c which is the GEOM class which interfaces to disk device drivers. geom_disk.c recognizing the potential need for copyin/copyout will return the request with a special error code and a function pointer. The special struct bio will eventually get delivered to geom_dev.c into the original userland thread, and special case code on seeing the magic error code will call the provided function pointer to really execute the ioctl. If the disk device driver at this point gets an interrupt and finds out that its hardware has departed the system, things explode in a nice display of fireworks, because there is no way to lock this scenario without causing deadlocks. So I am faced with an Either/Or decision: Either I support ioctls in the disk device drivers Or we get removable devices working robustly. Three drivers today use the ioctl interface, (notice that we are only talking GEOM disk device drivers here, CDROM drives for instance are not affected). "afd" uses ioctl to implement two primitivs for opening and closing the device tray. "mlx" uses it for management tasks and "da" uses it to find the corresponding "pass" device for a given "da" device. Close inspection of libcam has not shown that this ioctl is actually used however. Finally RaidFrame uses ioctl for configuration issues. For all I can tell, the code in "da" can be dropped and nobody will notice. The "mlx" and "raidframe" can probably more logically implement a "mlx.ctl" and "rf.ctl" control device which is used soleley for OAM issues, this would be in line with the md(4) and ccd(4) drivers. That leaves the "afd" drivers "open/close tray" ioctls, which currently are #ifdef'ed out pending further study of how we actually want to interface with removable devices in general (this subject is out of scope for this email). On the other hand, the SANE2002 conference handed out "USB-keys" to all in attendance and everybody and his dad has a digital camera these days, so getting removable devices working robustly is not even open for discussion. So the decision is really a foregone conclusion: We cannot possibly support the copyin/copyout semantics of ioctls in GEOM disk device drivers. One option would be to simply drop the support for doing copyin/copyout in the ioctl handing code, but this would mean that we suddenly have two kinds of ioctls in the system, and it would leave another issue unresolved: the serialization between regular I/O and ioctls. The replacement mechanism will therefore be to pass the BIO_GETATTR/BIO_SETATTR struct bio requests through to the device drivers and marshal ioctls into these at the geom_dev.c level is required. I have not completed the patches to carry this transformation out yet, but this is the next item on my TODO list, so expect a patch to be floated for test and review in a few days time. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message