From owner-freebsd-arch@FreeBSD.ORG Fri Jul 18 00:56:14 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5FC8C7CB for ; Fri, 18 Jul 2014 00:56:14 +0000 (UTC) Received: from mail-we0-x22b.google.com (mail-we0-x22b.google.com [IPv6:2a00:1450:400c:c03::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id ED1452424 for ; Fri, 18 Jul 2014 00:56:13 +0000 (UTC) Received: by mail-we0-f171.google.com with SMTP id p10so3864627wes.30 for ; Thu, 17 Jul 2014 17:56:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=2UHo1D+wgmjmu1YOK/F1+2Rrx/2YN6qgEqtOk0tTVrA=; b=JfbiY9XbjHfNQ88DlOiJCF8JfuoS77p1VDcQxy6nhYpR3IBf0ao7nN5EYhWPqbIcDm 0j0qPdMR5GKoE8foEl+A/W5U8kg/o+xrO/N/CNe77Em2WfEf44WUxmP8ZdV9k7bJhgls zOzjgTiIQXf7Lw34+Uig+7bvoD0jhX8EZuayBR6wIX7FC0nk3GyZsK1eMmZen8F83RO4 dDh5hrxpqPAqNq9H+TuMmhLY5vCKxLf4RI+1b1D1hn64zeCaTLPSI5+7ndvEhYJPvlm1 Ywdn802zCEghYejVmYB7qLisMMZ6ryyXu2cxx3jvpCP1tHBx/7QllPSBRA0kfADeYG8N FLAA== X-Received: by 10.194.10.167 with SMTP id j7mr1398572wjb.100.1405644972211; Thu, 17 Jul 2014 17:56:12 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id jb16sm493326wic.10.2014.07.17.17.56.10 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Thu, 17 Jul 2014 17:56:10 -0700 (PDT) Date: Fri, 18 Jul 2014 02:56:08 +0200 From: Mateusz Guzik To: freebsd-arch@freebsd.org Subject: Re: current fd allocation idiom Message-ID: <20140718005608.GB15714@dft-labs.eu> References: <20140717235538.GA15714@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20140717235538.GA15714@dft-labs.eu> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 00:56:14 -0000 On Fri, Jul 18, 2014 at 01:55:38AM +0200, Mateusz Guzik wrote: > The kernel has to deal with concurrent attempts to open, close, dup2 and > use the same file descriptor. > > I start with stating a rather esoteric bug affecting this area, then I > follow with a short overview of what is happening in general and a > proposal how to change to get rid of the bug and get some additional > enchancements. Interestingly enough turns out Linux is doing pretty much > the same thing. > > ============================ > THE BUG: > /* > * Initialize the file pointer with the specified properties. > * > * The ops are set with release semantics to be certain that the flags, type, > * and data are visible when ops is. This is to prevent ops methods from being > * called with bad data. > */ > void > finit(struct file *fp, u_int flag, short type, void *data, struct fileops *ops) > { > fp->f_data = data; > fp->f_flag = flag; > fp->f_type = type; > atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintptr_t)ops); > } > > This itself is fine, but it assumes all code obtaining fp from fdtable places > a read memory barrier after reading f_ops and before reading anything else. > As you could guess no code does that and I don't believe placing rmb's > in several new places is the way to go. > > ============================ > GENERAL OVERVIEW OF CURRENT STATE: > > fps are obtained and installed as follows: > > struct file *fp; > int fd; > > if (error = falloc(&fp, &fd)) > return (error); > if (error = something(....)) { > fdclose(fp, fd); > fdrop(fp); > return (error); > } > > finit(fp, ....); > fdrop(fp); > return (0); > > After falloc fp is installed in fdtable, it has refcount 2 and ops set to > 'badfileops'. > > if something() failed: > fdclose() checks if it has anything to do. if so, it cleares fd and fdrops fp > fdrop() clears the second reference, everything is cleared up > > if something() succeeded: > finit() finishes initialization of fp > fdrop() cleares the second reference. fp now has expected refcount of 1. > > Now a little complication: > parallel close() execution: > fd is recognizes as used. it is cleared and fdrop(fp) is called. > > if something() succeeded after close: > fdrop() kills fp > > if something() failed after close: > fdclose() concludes nothing to do > fdrop() kill fp > > Same story with dup2. > > What readers need to do: > - rmb() after reading fp_ops > - check fp_ops for badfileops > > ============================ > PROPOSAL: > > struct file *fp; > int fd; > > if (error = falloc(&fp, &fd)) > return (error); > if (error = something(....)) { > fdclose(fp, fd); > return (error); > } > > finit(fp, ....); > factivate(fd, fp); > return (0); > > After falloc fd is only marked as used, fp is NOT installed. > fp is returned with refcount of 1 and is invisible to anyone but > curthread. > > if something() failed: > fdclose() marks fd as unused and kills fp > > if something() succeeded: > finit() finishes initialization of fp > factivate() sets fp to non-null with a barrier > > Now a little complication: > parallel close() execution: > since fp is null, fd is recognized as unused. EBADF is returned. > > The only problem is with dup2 and I believe it is actually a step > forward. > > Let's assume fd was marked as used by falloc, but fp was not installed yet. > dup2(n, fd) will see that fd is used. With current code there is no > problem since there is fp to fdrop and it can proceed. With the proposal > however, there is nothing to fdrop. Linux returns EBADF in this case > which deals with the problem and does not seem to provide any drawbacks > for behaving processes. > > So, differences to current approach: > 1. fewer barriers and atomic operations > 2. no need to check for f_ops type > 3. new case when dup2 can return an error > One has to note that fdtable can be reallocated at any time and factivate would have to make sure it updated the current table. The simplest thing would be to take filedesc lock, which diminishes advantages to some extent. Maybe switching sx lock to other kind would remedy this a little. > Note that 3 should not be a problem since Linux is doing this already. > > Also note current approach is not implemented correctly at the moment as > it misses rmbs, although I'm unsure how much this matters in practice. > > Thoughts? > -- > Mateusz Guzik -- Mateusz Guzik