From owner-freebsd-net  Fri Mar  8  5:30:59 2002
Delivered-To: freebsd-net@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP
	id C0CFA37B423; Fri,  8 Mar 2002 05:30:21 -0800 (PST)
Received: from fledge.watson.org (fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.11.6/8.11.5) with SMTP id g28DThD67687;
	Fri, 8 Mar 2002 08:29:43 -0500 (EST)
	(envelope-from robert@fledge.watson.org)
Date: Fri, 8 Mar 2002 08:29:42 -0500 (EST)
From: Robert Watson <rwatson@freebsd.org>
X-Sender: robert@fledge.watson.org
To: Bill Fumerola <billf@elvis.mu.org>
Cc: Julian Elischer <julian@elischer.org>,
	Terry Lambert <tlambert2@mindspring.com>, green@freebsd.org,
	net@freebsd.org, hackers@freebsd.org
Subject: Re: in_pcblookup_hash() called multiple times
In-Reply-To: <20020308051520.GB803@elvis.mu.org>
Message-ID: <Pine.NEB.3.96L.1020308081302.67015A-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-net@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-net.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-net>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-net>
X-Loop: FreeBSD.org


On Thu, 7 Mar 2002, Bill Fumerola wrote:

> On Thu, Mar 07, 2002 at 11:03:19PM -0500, Robert Watson wrote:
> > A couple of comments:
> > 
> > - You can always cache the pcb the first time it's used, and then have it
> >   available for future use.  I agree with your concerns about generating
> >   it every time -- that would be a disaster for routers where no packets
> >   are even delivered locally. :-) 
> 
> you can't cache it and make it available for future use without making
> the invasive changes that i mention: 

Ah, I misread your e-mail.  I was interested in caching it within ipfw,
but not exporting the cached entry to be reused in ip_input().  My
personal feeling is that the notion of uid/gid rules doesn't really fit
the model very well at all, but it falles into the category of
"interesting hack".

> with ipfw cacheing the pcb lookup + credential check and w/o terry's
> patch, the worst case would be a ruleset with any uid/gid rules: a pcb
> lookup being done twice (once ever in ipfw, once in the protocol
> handler). 
> 
> that's really not so bad compared with the current behavior with uid/gid
> rules where the lookup is done of a lot of times (as many uid/gid rules
> you walk through before you match) in ipfw and once in the protocol
> handler. 

Sounds like we agree.

> > - The uid/gid code is broken for a number of important applications,
> >   including SSH forwarding, because SSHd binds the socket using a root
> >   credential rather than the user credential.  Arguably, this is a bug
> >   with SSHd, and it also breaks a number of other things including the MAC
> >   support we're adding to 5.0-CURRENT.  Also, it had some *evil* bugs
> >   involving NFS that I recently fixed in 5.0-CURRENT, where sockets were
> >   rebound using the credential of the user making the VFS operation,
> >   resulting in ipfw uid/gid rules dropping/rate-limiting file system
> >   requests for all users.  For those running into the whole sshd tunnel
> >   and ident problem, it's the same cause. 
> 
> i would like to make my cache have the proper credential(s) rather then
> just cache the current socket credentials cr_uid, if that's wrong. 
> 
> please let me know privately just what exactly i should be comparing
> against (or functions i should be using, if an API exists now) in
> -current with the changes to credentials. 
> 
> when i mfc the cache, i'll just keep the current uid comparing behavior. 

I don't think there is a "proper" set of credentials in most cases.
Consider the following cases:

(1) Socket-to-socket communication (loop-back packet to another socket)
(2) Stack-to-socket communication (icmp version of EHOSTUNREACH)
(3) Socket-to-stack communication (resulting in stack response such as
    icmp version of EHOSTUNREACH)
(4) Socket to interface communication (outbound packet to remote host)
(5) Interface to socket communication (inbound packet from remote host)
(6) Stack to interface communication (connection refused icmp)
(7) Interface to stack communication (tcp SYN that will be refused)
(8) Interface to interface communication (routed packet)

The only situations in which you could argue a credential might be
involved are the cases including a socket.  Then the question arises: what
credential is the right credential?  Observe that a socket may be in use
by a number of processes, and even the kernel.  A credential is always
available when the socket is created, and that's generally cached as
so->so_cred.  However, after that point, the notion of an active
credential is ambiguous.  When a packet is created on the socket by a
process (kernel or otherwise), then potentially that is the "proper"
credential for out-going authorization.  However, when a packet is
received for a socket, you don't know yet which process will be receiving
the data, since that's done asynchronously by one of potentially many
different credentials listening on the socket.  In fact, it may *never* be
read.  And it gets even more confusing with stream sockets, where
different credentials might potentially read the same data.  Another
pointer: for loopback communication, potentially *two* sockets will match
the same packet.

Part of what's going on here is that a socket isn't really a subject
(credential).  It's an object.  Subjects put data (either as part of a
stream or as datagrams) into the object, and that generates new objects
that can't be directly referenced by a subject except through another
object (be it another socket, bpf device, whatever).  In the MAC
implementation, we recognize that a socket is an object, and provide it
with a label, which may be managed using socket options.  Datagrams
generated from the socket generally inherit the same label, although
individual policies might do different things.  When a datagram is being
considered for delivery to a socket, the labels of the mbuf and socket can
be used in two ways: (1) to affect the pcb match, and (2) to perform
access control on the delivery.  This allows policies maximum flexibility
in both defining the notion of a "match", and in blocking inappropriate
but matching traffic.  Likewise, we've hypothesized having ipfw rules that
use the labels.

That said, I think moving to having uids/gids on sockets may not be a good
idea, because most applications would simply not understand how to change
them (and the mode on the socket).  Part of the confusion would come also
from the fact that some sockets have filesystem representations: because a
socket is not the same as its filesystem representation, there would be
two seperate owner/group/mode sets, and that can be confusing for the
developer.  We accept that confusion for the MAC code, because it's
required, but for uid/gid rules, just using a cached credential may be a
lot easier.  So what I'm suggesting is that you stick with the so_cred
model, because it's easy, but that we fix applications like SSH, which Do
The Wrong Thing, and kernel facilities that fall into the same boat.

I've already fixed the NFS client so it caches the mount-time credential
and uses that to rebind the socket, which differs from the old behavior,
where the VOP cred was used to rebind the socket.  If you had ipfw uid/gid
rules that affected the uid or gid that happened to be on hand, from then
until the point when the socket got disconnected, you'd have your NFS
mount impacted by the rules.  We ran into this with the MAC code: if the
NFS mount was over a high integrity interface, and the connection broke
(TCP, or whatever), and the first user to read from the filesystem was low
integrity, NFS would use the low integrity credential to rebind the
socket, and then packets would no longer be allowed out the high integrity
interface.  So NFS would keel over with EPERM.  This also, until recently,
caused an mbuf leak because NFS took mbuf delivery failures poorly. :-)

I am concerned netsmb might have some of the same issues, and need to look
at it, although I haven't yet.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
robert@fledge.watson.org      NAI Labs, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message