From owner-freebsd-arch@FreeBSD.ORG Tue Jan 10 20:30:12 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CBCDA106564A; Tue, 10 Jan 2012 20:30:12 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com [209.85.215.182]) by mx1.freebsd.org (Postfix) with ESMTP id 073A28FC12; Tue, 10 Jan 2012 20:30:11 +0000 (UTC) Received: by eaan12 with SMTP id n12so532117eaa.13 for ; Tue, 10 Jan 2012 12:30:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:sender:date:message-id :user-agent:mime-version:content-type; bh=LTTeT7QSNJRErDiegT4qZsQnY/QXW7yBhHQchBU0M0M=; b=N/9dh/1uOaGOV6SIg9w5dRS3Hn8b+p2Vbov/HABoTebGqFGjBkWm+H1kgNO33auBV7 dFFC3iQidgdonwVihKzl8a/gTS/CRKiAm5OWe8/0wesNShqbiq9DX9MxpMmNElukkJ/w dR6MD9VtyjK6Lsxm79eOu/G8zXmfk1thtIrmA= Received: by 10.204.10.65 with SMTP id o1mr9040366bko.19.1326227410874; Tue, 10 Jan 2012 12:30:10 -0800 (PST) Received: from localhost ([95.69.173.122]) by mx.google.com with ESMTPS id d23sm106076493bkw.15.2012.01.10.12.30.08 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 10 Jan 2012 12:30:09 -0800 (PST) From: Mikolaj Golub To: "Robert N. M. Watson" References: <86sjjobzmn.fsf@kopusha.home.net> X-Comment-To: Robert N. M. Watson Sender: Mikolaj Golub Date: Tue, 10 Jan 2012 22:30:06 +0200 Message-ID: <86fwfnti5t.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: arch@freebsd.org, Kostik Belousov Subject: Re: unix domain sockets on nullfs(5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 20:30:12 -0000 On Tue, 10 Jan 2012 14:02:34 +0000 Robert N. M. Watson wrote: RNMW> On 9 Jan 2012, at 16:37, Mikolaj Golub wrote: >> kib@ suggested that the issue could be fixed if one added new VOP_* >> operations for setting and accessing vnode's v_socket field. RNMW> I like the philosophy of the proposed approach signifiantly better than RNMW> the previous discussed approaches. Some thoughts: RNMW> (1) I don't think the new behaviour should be optional -- it was always RNMW> the intent that nullfs pass through all behaviours to the underlying RNMW> layer, it's just that certain edge cases didn't appear in the original RNMW> implementation. Memory mapping was fixed a few years ago using similar RNMW> techniques. This will significantly reduce the complexity of your RNMW> patch, and also avoid user confusion since it will now behave "as RNMW> expected". Certainly, mention in future release notes would be RNMW> appropriate, however. I don't mind having only the new behavior, as I can't imagine where I would need a nullfs with nosobypass option mounted and I also like when things are simple :-). On the other hand there might be people who relied on the old behavior and who would be surprised if it had changed. So, if other people agree I will remove the old behaviour to make the patch simpler. Another option would be to have sobypass by default with possibility to (re)mount fs with nosobypass. RNMW> (2) I'd like to think (as John also mentioned?) that we could use a RNMW> shared vnode lock when doing read-only access (i.e., connect). Thanks, trying this. RNMW> (3) With this patch, an rwlock is held over vnode operations -- RNMW> required due to the interlocked synchronisation of vnode and unix RNMW> domain sockets. We often try to avoid doing this for reasons of lock RNMW> order (and principle). It appears that it is likely fine in this case RNMW> but it makes me slightly nervous. Well, I have not noticed any issues but it might be because I don't have much experience here. RNMW> (4) I'm slightly puzzled by the bind(2) case and interactions with RNMW> layering -- possibly there is a bug here. If I issue bind(2) against RNMW> the top layer, it looks like vp->v_socket will be set in the bottom RNMW> layer, but unp->unp_vnode will be assigned to the top-layer vnode? My RNMW> assumption was that you would want unp_vnode to always point to the RNMW> bottom (real) vnode, which suggest to me that the VOPs shouldn't just RNMW> assign v_socket, but should also assign unp_vnode. This has RNMW> implications elsewhere in uipc_usrreq.c as well. Could you clarify RNMW> whether you think this could be an issue? I have made unp_vnode to always point to a vnode returned in bind() on create intentionally. On bind v_usecount is increased for this vnode and in uipc_detach() we have to call vrele() for this same vnode. I believe that in uipc_usrreq.c via unp->unp_vnode we should reference only the upper vnode, accessing the lower vnode only via VOP_* operations. I don't see issues with such approach so far, while doing in the suggested way (unp_vnode to always point to vnode that has socket reference) I don't see how to decrease usecount for upper vnode on detach (we want to have usecount for the upper vnode increased on bind, so e.g. umount would fail with EBUSY when trying to unmount nullfs if somebody bond to an upper path). RNMW>It may also be worth KASSERTing that the top-level vnode never points at RNMW>anything but NULL to catch bugs like this. This may mean the VOPs have RNMW>to have a bit of "test-and-set" to them to get atomicity properties RNMW>right when it comes to bind(2). RNMW> In general, I think this is the right thing to do, and I'm very pleased RNMW> you're doing it -- but the patch requires some further work. Thanks. -- Mikolaj Golub