From owner-freebsd-arch  Tue May  7  6:47:29 2002
Delivered-To: freebsd-arch@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP
	id 3033337B405; Tue,  7 May 2002 06:47:23 -0700 (PDT)
Received: from fledge.watson.org (fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.12.3/8.12.3) with SMTP id g47Dkvb5076683;
	Tue, 7 May 2002 09:46:57 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Tue, 7 May 2002 09:46:56 -0400 (EDT)
From: Robert Watson <rwatson@FreeBSD.ORG>
X-Sender: robert@fledge.watson.org
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: Poul-Henning Kamp <phk@FreeBSD.ORG>, arch@FreeBSD.ORG
Subject: Re: syscall changes to deal with 32->64 changes.
In-Reply-To: <200205070815.g478Fn180961@apollo.backplane.com>
Message-ID: <Pine.NEB.3.96L.1020507093445.76283B-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-arch.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-arch>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-arch>
X-Loop: FreeBSD.ORG


On Tue, 7 May 2002, Matthew Dillon wrote:

>     I tried to add new syscalls to the existing vector but so many
>     syscalls had to be changed to support 64 bit time_t's that it
>     became a huge mess... so much so that I would expect the other
>     BSD's to cry foul on us if we tried to do it with the existing
>     vector.  It will be far, far cleaner to simply implement an
>     entirely new syscall vector / ELF identifier.

Sounds like we actually have relatively firm concensus on this point [thus
far].  The only concern really has been level of effort and chances of
completeness in a reasonable time.

>     In regards to other things that may need to change size:  A complete
>     audit will have to be performed.  I would be happy to take a run through.
>     Getting it right the first time is extremely important.  A bunch of 
>     things starting leaking out of the woodwork as I was playing around 
>     with 64 bit time_t's.  At the very least I would pad the structures 
>     to handle things like 64 bit dev_t, ino_t, and file flags, and I would
>     even consider padding now 96 bit structures like timespec (on IA32 with
>     64 bit time_t's + nanoseconds long = 96 bits) to 128 bits across the 
>     board.  It might also be worthwhile to make uid_t, gid_t, and pid_t
>     64 bits to support probable future work in those areas.

It's certainly worth talking about, in that it would be a good breaking
point for any of these.  One thing to keep in mind is how much inode
growth is acceptable -- we get some as part of the natural process of
moving to 64-bit block pointers, but we shouldn't let it get out of
control or risk serious reduction in cached inode information for the same
memory footprint. 

My understanding from talking with Kirk and Poul-Henning is that ino_t is
definitely on the list already, as are a number of the others at the file
system image layer (such as block pointers, et al).  They should already
have much of this underway, and the temptation is to allow them to do the
grunt work if they're already doing it :-).

dev_t I don't really have an opinion about. 

For file flags, last I checked, the current leaning was to actually add a
new flags field to the inode for internal system use, and possibly
breaking out the current field into two fields.  All of this is at the fs
image level however.  This would allow moving the snapshot flag out of the
normal fields range and reducing the level of hard-to-read flag masking in
the UFS2 code.  The system flags would also allow for some EA information
caching, improving performance for ACLs and related things.

No opinion on the time type stuff, but I'm sure phk has an opinion :-). 

WRT uid_t and gid_t: I'm not sure if there's enough benefit here.  The
temptation is certainly there, but unlike things like ino_t, this stuff
tends to get mixed up in data files a lot.  Same with time type stuff. 
This is more likely to cause problems with persistent application state --
for example, password databases.

>     In regards to the 5.0R question that comes up later in the thread... 
>     I just don't know.  I will say that creating a new syscall vector
>     cannot be done piecemeal... you have to get it *all* in from the
>     get-go or you create huge issues with things like bootstrapping new
>     systems and general compatibility and useability, etc.. 

Well, I think it's possible to start doing back-end stuff in the kernel
and just "cast down", but I haven't given it too much think-through yet.
Where we'll really need a flag day is with the default compiled ABI for
user binaries.  It depends how we handle the change to some of the types,
I suppose -- whether we use "holding types" during a changeover, etc.  One
of the reasons I mentioned the "one week" figure in my earlier responses
it that when we do the cut-over, we need to do it decisively and without a
lot of lag in getting stuff working.  At the filesystem level, introducing
new types doesn't hurt us too much (ufs_ino_t, ufs2_ino_t, et al), but at
the system call layer it could be that would hurt too much.

I guess the one opinion I haven't heard yet, and am a little surprised not
to have heard is:

  No, we shouldn't do this on architectural grounds. 

We've heard "yes" in various flavors, including moderated "yes if we can
manage it by the release".  Not to invite a bikeshed, but if there's going
to be a strong argument against such a change, it would be nice to hear it
sooner.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
robert@fledge.watson.org      NAI Labs, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message