Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Jan 2012 00:04:56 +0100
From:      Giovanni Trematerra <gianni@freebsd.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        jilles@freebsd.org, Attilio Rao <attilio@freebsd.org>, flo@freebsd.org, Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org
Subject:   Re: pipe/fifo code merged.
Message-ID:  <CACfq091AT7OP6Bxd3g5me8=HR%2BykUrVA4zEaBw8qtBxb9Ue1fQ@mail.gmail.com>
In-Reply-To: <20120110153807.H943@besplex.bde.org>
References:  <CACfq093o9iVZKxCj58OR2hpCLDYTUTdxg_re_bEMYn2SrNrLCQ@mail.gmail.com> <20120110005155.S2378@besplex.bde.org> <CACfq09225iMYLe6p8jSiVhsDw_rqTyEHsvPdtZXLrQYT0-skzg@mail.gmail.com> <20120110153807.H943@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jan 10, 2012 at 10:41 AM, Bruce Evans <brde@optusnet.com.au> wrote:
> On Mon, 9 Jan 2012, Giovanni Trematerra wrote:
>
>> On Mon, Jan 9, 2012 at 3:34 PM, Bruce Evans <brde@optusnet.com.au> wrote=
:
>>>
>>>
>>> I would go the other way, and pessimize pipes to be like fifos. =A0Then
>>> optimize the socket layer under both. =A0Fifos are not important, but
>>> they are implemented on top of the socket layer which is important.
>>> Pipes are important. ...
>>> ...
>>>
>>> Linux-2.6.10 implements fifos as a small wrapper around pipes, while
>>> FreeBSD implements them as a large wrapper around sockets. =A0I hope th=
e
>>> former is what you do -- share most pipe code, without making it more
>>> complicated, and with making the fifo wrapper much simpler. =A0The Linu=
x
>>> code is much simpler and smaller, since for pipes it it doesn't
>>> implement direct mode, and for sockets it doesn't have to interact with
>>> the complicated socket layer.
>>
>>
>> If you read the patch, as I think you didn't, you'd see that there's no
>> wrapper
>> at all. fifo's code is just fifo_open, fifo_close and another couple of
>> helper
>> functions to deal with VFS, all the remaining code is shared with pipes
>> and
>> no complicated code was added.
>
>
> I think you don't want me to read the patch, since I would see too much
> detail starting with style bugs. =A0Anyway..

Thanks a lot for your review. I really appreciated it.

[skip]

I'll do my best to fix style bugs.

>
> In this file, I have most experience fixing this function (and open
> and close so that select and poll work). =A0The above looks simple, but
> has a complex interaction with layers above and below it. =A0Most of the
> details are in the socket layer. =A0You had to reimplement these in the
> pipe layer. =A0The most delicate point involving fs_wgen seems to be
> reimplemented correctly in fifo_iseof(). =A0Before I fixed this for
> fifos, poll and select on pipes (especially for EOF) was less broken
> than for fifos, partly because pipes are simpler -- they can't be
> reopened. =A0My tests in /usr/src/tools/regression/poll/ are hopefully
> enough to detect any regressions. =A0Some of the tests are intentionally
> left broken and/or expected to fail, to be bug for bug compatible with
> old kernel bugs.
>

ok. I'll try that regression test

>
> % @@ -295,42 +328,133 @@ pipe_zone_ctor(void *mem, int size, void
> % =A0static int
> % =A0pipe_zone_init(void *mem, int size, int flags)
> % =A0{
> % - =A0 =A0 struct pipepair *pp;
> % + =A0 =A0 struct umapipe *up;
> % + =A0 =A0 struct pipeinfo *pip;
> % + =A0 =A0 struct timespec ctime;
> % % - =A0 KASSERT(size =3D=3D sizeof(*pp), ("pipe_zone_init: wrong size")=
);
> % + =A0 =A0 KASSERT(size =3D=3D sizeof(*up), ("pipe_zone_init: wrong size=
"));
> % % - =A0 pp =3D (struct pipepair *)mem;
> % + =A0 =A0 up =3D (struct umapipe *)mem;
> % + =A0 =A0 vfs_timestamp(&ctime);
> % + =A0 =A0 pip =3D &up->pip[0];
> % + =A0 =A0 pip->pi_atime =3D pip->pi_mtime =3D pip->pi_ctime =3D ctime;
> % + =A0 =A0 pip->pi_ino =3D -1;
> % % - =A0 mtx_init(&pp->pp_mtx, "pipe mutex", NULL, MTX_DEF | MTX_RECURSE=
);
> % + =A0 =A0 pip =3D &up->pip[1];
> % + =A0 =A0 pip->pi_atime =3D pip->pi_mtime =3D pip->pi_ctime =3D ctime;
> % + =A0 =A0 pip->pi_ino =3D -1;
> % +
> % + =A0 =A0 mtx_init(&up->pp.pp_mtx, "pipe mutex", NULL, MTX_DEF | MTX_RE=
CURSE);
> % =A0 =A0 =A0 return (0);
>
> Timestamps seem to be broken. =A0jhb pointed out a problem in them, witho=
ut
> much detail (but I forget the exact detail). =A0Here it can't be right to
> have timestamps on both sides, since timestamps are a property of the
> file at its lowest level, not the file descriptor or even the file at the
> open file (fcntl) level. =A0For fifos, timestamps are even more a propert=
y
> of the file.
>



> % ...
> % +static int
> % +fifo_zone_init(void *mem, int size, int flags)
> % +{
> % + =A0 =A0 struct umafifo *up;
> % + =A0 =A0 struct pipeinfo *pip;
> % +
> % + =A0 =A0 KASSERT(size =3D=3D sizeof(*up), ("fifo_zone_init: wrong size=
"));
> % +
> % + =A0 =A0 up =3D (struct umafifo *)mem;
> % + =A0 =A0 pip =3D &up->pip[0];
> % + =A0 =A0 vfs_timestamp(&pip->pi_ctime);
> % + =A0 =A0 pip->pi_atime =3D pip->pi_mtime =3D pip->pi_ctime;
>
> For fifos, is wrong to have even 1 timestamp at this level (except
> unused ones won't hurt). =A0Fifos and their timstamps persist as disk
> files, and the timestamps for these disk files are managed by the
> underlying file system.

> Any timestamps at this level can only give
> possibilities for inconsistencies. =A0For example, this level uses
> vfs_timestamp(), but the file system level might use a different
> timestamp method, either because it is buggy or because it cannot
> represent timestamps with the granularity that vfs_timestamp() gives.
> (It is a bug that vfs_timestamp() is global.)
>
> % @@ -1219,7 +1403,7 @@ pipe_write(fp, uio, active_cred, flags, % =A0 =A0=
 =A0 }
> % % =A0 =A0 if (error =3D=3D 0)
> % - =A0 =A0 =A0 =A0 =A0 =A0 vfs_timestamp(&wpipe->pipe_mtime);
> % + =A0 =A0 =A0 =A0 =A0 =A0 vfs_timestamp(&pip->pi_mtime);
> % % =A0 =A0 /*
> % =A0 =A0 =A0 =A0* We have something to offer,
>
> It's doing timstamps in the same way for fifos as for pipes. =A0There mus=
t
> by a problem for stat() too. =A0The old fifo code uses fo_stat() to get
> back to the underlying file system which knows all the attributes (not
> just timestamps). =A0I can't see anything like that here. =A0The new fifo
> code seems to just use pipe_stat() which gives many fake attributes
> which are likely to differ from ones in the file system.
>
> % @@ -1492,12 +1726,12 @@ pipe_free_kmem(cpipe)
> % =A0 * shutdown the pipe
> % =A0 */
> % =A0static void
> % -pipeclose(cpipe)
> % +pipeclose(cpipe, isfifo)
> % =A0 =A0 =A0 struct pipe *cpipe;
> % + =A0 =A0 int isfifo;
> % =A0{
>
> I don't see any reason to have a different zone for fifos. =A0This
> complicates some interfaces ...

Just to not waste a sizeof(struct pipeinfo ) bytes.

>
> % @@ -1570,21 +1798,34 @@ pipeclose(cpipe)
> % =A0#ifdef MAC
> % =A0 =A0 =A0 =A0 =A0 =A0 =A0 mac_pipe_destroy(pp);
> % =A0#endif
> % - =A0 =A0 =A0 =A0 =A0 =A0 uma_zfree(pipe_zone, cpipe->pipe_pair);
> % + =A0 =A0 =A0 =A0 =A0 =A0 uma_zfree(isfifo ? fifo_zone : pipe_zone, cpi=
pe->pipe_pair);
>
> The new isfifo parameter is only used here.

That's the only way to understand which zone I need to use.

>
> % diff -r fee0771aad22 sys/sys/pipe.h
> % --- a/sys/sys/pipe.h =A0Sun Jan 01 19:00:13 2012 +0100
> % +++ b/sys/sys/pipe.h =A0Mon Jan 09 00:13:55 2012 +0100
> % @@ -28,6 +28,8 @@
> % =A0#error "no user-servicable parts inside"
> % =A0#endif
> % % +#include <sys/selinfo.h>
> % +
>
> This namespace pollution was intentionally left out.
>
> % =A0/*
> % =A0 * Pipe buffer size, keep moderate in value, pipes take kva space.
> % =A0 */
> % @@ -103,16 +105,12 @@ struct pipe {
> % =A0 =A0 =A0 struct =A0pipebuf pipe_buffer; =A0 =A0/* data storage */
> % =A0 =A0 =A0 struct =A0pipemapping pipe_map; =A0 /* pipe mapping for dir=
ect I/O */
> % =A0 =A0 =A0 struct =A0selinfo pipe_sel; =A0 =A0 =A0 /* for compat with =
select */
>
> <sys/selinfo.h> was a prerequisite for this file.

I'll fix it.

>
> % - =A0 =A0 struct =A0timespec pipe_atime; =A0 =A0/* time of last access =
*/
> % - =A0 =A0 struct =A0timespec pipe_mtime; =A0 =A0/* time of last modify =
*/
> % - =A0 =A0 struct =A0timespec pipe_ctime; =A0 =A0/* time of status chang=
e */
> % =A0 =A0 =A0 struct =A0sigio *pipe_sigio; =A0 =A0 =A0/* information for =
async I/O */
> % =A0 =A0 =A0 struct =A0pipe *pipe_peer; =A0 =A0 =A0 =A0/* link with othe=
r direction */
> % =A0 =A0 =A0 struct =A0pipepair *pipe_pair; =A0 =A0/* container structur=
e pointer */
> % =A0 =A0 =A0 u_int =A0 pipe_state; =A0 =A0 =A0 =A0 =A0 =A0 /* pipe statu=
s info */
> % =A0 =A0 =A0 int =A0 =A0 pipe_busy; =A0 =A0 =A0 =A0 =A0 =A0 =A0/* busy f=
lag, mostly to handle
> rundown sanely */
> % =A0 =A0 =A0 int =A0 =A0 pipe_present; =A0 =A0 =A0 =A0 =A0 /* still pres=
ent? */
> % - =A0 =A0 ino_t =A0 pipe_ino; =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* fake inode=
 for stat(2) */
>
> Both this and the timestamps should never have been here, since they
> are per-"disk"-file but they were per-pipe-endpoint (see below).
>
> % =A0};
> % % =A0/*
> % @@ -138,5 +136,24 @@ struct pipepair {
> % =A0#define PIPE_UNLOCK(pipe) =A0 =A0mtx_unlock(PIPE_MTX(pipe))
> % =A0#define PIPE_LOCK_ASSERT(pipe, type) =A0mtx_assert(PIPE_MTX(pipe), (=
type))
> % % +#define PIPE_CNT(pipe) =A0 =A0 ((pipe->pipe_state & PIPE_DIRECTW) ? =
\
> % + =A0 =A0 =A0 =A0 =A0 =A0 pipe->pipe_map.cnt : pipe->pipe_buffer.cnt)
> % +
> % +/*
> % + * =A0Per-file descriptor structure.
> % + */
>
> I was very confused by this comment. =A0It is sort of backwards. =A0This
> structure is for the lowest level, which is the pipepair level for
> pipes and the disk level for for fifos.
> =A0(But we have to expand the disk level, first from dinodes/directory
> =A0entries to inodes/directory blocks, then from inodes to vnodes, then
> =A0append pipepairs). =A0The open file level is a level or two above that=
,
> =A0and the file descriptor level is further above.
>
> =A0The levels are stacked even more confusingly for pipes. =A0Above the
> =A0pipepair level, there is the pipe level, but this is modes sideways
> =A0than fully above. =A0Open files are more at the level of pipes than
> =A0pipepairs. =A0"pipe" in normal usage can mean either "pipe" or
> =A0"pipepair" in this implementation.)
> The timestamps and inode were at a wrong level. =A0You made a step toward=
s
> fixing this by moving them down, but the comment says that they are now
> at the highest level.

The comment is wrong I'll change it.

>
> % +struct pipeinfo {
> % + =A0 =A0 struct =A0pipe =A0 =A0*pi_rpipe; =A0 =A0 =A0/* pipe we read f=
rom */
> % + =A0 =A0 struct =A0pipe =A0 =A0*pi_wpipe; =A0 =A0 =A0/* pipe we write =
to */
> % + =A0 =A0 struct =A0timespec pi_atime; =A0 =A0 =A0/* time of last acces=
s */
> % + =A0 =A0 struct =A0timespec pi_mtime; =A0 =A0 =A0/* time of last modif=
y */
> % + =A0 =A0 struct =A0timespec pi_ctime; =A0 =A0 =A0/* time of status cha=
nge */
> % + =A0 =A0 ino_t =A0 pi_ino; =A0 =A0 =A0 =A0 /* fake pipe inode for stat=
(2) */
>
> Indentation error.
>
> % +};
>
> I was confused by the layering for this struct too. =A0This struct seems
> to be needed only to swap rpipe with wpipe for the 2 ends of a pipe.
> This is confusing, but I can't see any better way at the moment.
> Putting the other fields in it just gives confusion which leads to
> bugs and minor resource wastage. =A0All the other fields must be per-file
> at the lowest level, so any duplication of them gives either bugs
> (if they are different) or just wastes resources time to write them
> and check that they are the same, and and space to hold copies).
> These fields belong in the pipepair struct. =A0This moves them down
> (more sideways) another level.
>


> POSIX is fuzzy about whether the attributes are unique for the 2 ends
> of a pipe. =A0It requires all st_ timestamps and some other attributes
> to be "meaningful" unless otherwise specified and doesn't specify
> anything otherwise for pipes, at least in an old draft. =A0But for pipe()
> it says:
>
> 27884 =A0 =A0 =A0 =A0 =A0 =A0Upon successful completion, pipe( ) shall ma=
rk for update
> the st_atime, st_ctime, and st_mtime
> 27885 =A0 =A0 =A0 =A0 =A0 =A0fields of the pipe.
>
> Assuming that this part is not fuzzy, "the ... st_atime... fields of
> the pipe" in it must refer to unique fields.
>
> Similarly for pi_ino.
>
> These fields shouldn't be used at all for fifos, as mentioned above.
> File systems still maintain separate non-copies which pipe_stat()
> hides. =A0This doesn't matter much for timestamps and st_ino. =A0It
> matters for modes and permissions, etc.

I'll fix it
Just to be clear. timestamps of last access or last modify aren't saved
back into file system for fifo.
it's hard to fix this now so I might look at it later.

>
> After moving timestamps and pi_ino into the pipepair struct, the new
> info struct is reduced to 2 pointers into the pipepair struct.
> Unfortunately, the pointer to the pipeinfo struct cannot be simply
> replaced by a pointer to the pipepair struct (and dereferencing the
> latter instead of the former) because of complications for the
> separate ends of a pipe:
>
> @ @@ -372,17 +554,17 @@ kern_pipe(struct thread *td, int fildes[
> @ =A0 =A0 =A0 =A0* to avoid races against processes which manage to dup()=
 the read
> @ =A0 =A0 =A0 =A0* side while we are blocked trying to allocate the write=
 side.
> @ =A0 =A0 =A0 =A0*/
> @ - =A0 =A0 finit(rf, FREAD | FWRITE, DTYPE_PIPE, rpipe, &pipeops);
> @ + =A0 =A0 finit(rf, FREAD | FWRITE, DTYPE_PIPE, &up->pip[0], &pipeops);
> @ =A0 =A0 =A0 error =3D falloc(td, &wf, &fd, 0);
> @ =A0 =A0 =A0 if (error) {
> @ =A0 =A0 =A0 =A0 =A0 =A0 =A0 fdclose(fdp, rf, fildes[0], td);
> @ =A0 =A0 =A0 =A0 =A0 =A0 =A0 fdrop(rf, td);
> @ =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* rpipe has been closed by fdrop(). */
> @ - =A0 =A0 =A0 =A0 =A0 =A0 pipeclose(wpipe);
> @ + =A0 =A0 =A0 =A0 =A0 =A0 pipeclose(wpipe, 0);
> @ =A0 =A0 =A0 =A0 =A0 =A0 =A0 return (error);
> @ =A0 =A0 =A0 }
> @ =A0 =A0 =A0 /* An extra reference on `wf' has been held for us by fallo=
c(). */
> @ - =A0 =A0 finit(wf, FREAD | FWRITE, DTYPE_PIPE, wpipe, &pipeops);
> @ + =A0 =A0 finit(wf, FREAD | FWRITE, DTYPE_PIPE, &up->pip[1], &pipeops);
>
> One end used to get rpipe and the other end wpipe. =A0This is used mainly
> by read/write to go in the correct direction. =A0Now, the 2 ends get
> pointers to different pipeinfo structs, with the only differences in
> the pipeinfo structs being:
> - swap rpipe and wpipe. =A0This is used mainly by read/write as before

rpipe and wpipe aren't swapped.

for pipes:
pip[0].pi_rpipe =3D=3D pip[0].pi_wpipe
pip[1].pi_rpipe =3D=3D pip[1].pi_wpipe

pipe[0].pi_rpipe !=3D pipe[1].pi_rpipe
pipe[0].pi_wpipe !=3D pipe[1].pi_wpipe

for fifos
pip[0].pi_rpipe !=3D pip[0].pi_wpipe

> - different timestamps (to implement bugs and resource wastage)
> - pi_ino should be the same in both (just waste space).

Well posix says:
"The st_ino and st_dev fields taken together uniquely identify the file
within the system."
So that's a matter of what a file is: pipepair or and end of if?

>
> For fifos, rpipe =3D=3D wpipe so the 2 pipeinfo structs should be the sam=
e
> and the complications are not needed.
>

Err, that's true for pipes.

>
> There are some more complications resource wastages related to having
> 2 pipe ends when only 1 is needed:
> - struct pipe is in struct pipepair twice. =A0This can't quite be fixed
> =A0by moving it to struct pipeinfo. =A0If both pipe structs for malloc()e=
d,
> =A0then the 2 pointers in struct pipeinfo would be enough for accessing
> =A0them, but I think the space wastage is too small to fix like this.
> - some code was symmetrical relative to rpipe and wpipe and it didn't
> =A0care in which order they were in. =A0Now rpipe can be equal to wpipe,
> =A0some of this code is no longer symmetrical because it does things
> =A0like free(rpipe), while the rest of it is still symmetrical because
> =A0it does things like initalizing rpipe->foo to the same value as
> =A0wpipe->foo. =A0You had to make some changes in this area. =A0Example
> =A0of a change in this area:
>
> % @@@ -349,18 +473,76 @@ kern_pipe(struct thread *td, int fildes[
> % @ =A0 =A0 /* Only the forward direction pipe is backed by default */
> % @ =A0 =A0 if ((error =3D pipe_create(rpipe, 1)) !=3D 0 ||
> % @ =A0 =A0 =A0 =A0 (error =3D pipe_create(wpipe, 0)) !=3D 0) {
> % @- =A0 =A0 =A0 =A0 =A0 =A0pipeclose(rpipe);
> % @- =A0 =A0 =A0 =A0 =A0 =A0pipeclose(wpipe);
> % @+ =A0 =A0 =A0 =A0 =A0 =A0pipeclose(rpipe, isfifo);
> % @+ =A0 =A0 =A0 =A0 =A0 =A0pipeclose(wpipe, isfifo);
>
> =A0Not really symmetrical. =A0There must be 2 closes if there are 2 open
> =A0fd's (1 in each direction). =A0But what if there is only 1 open fd for
> =A0a fifo with rpipe =3D wpipe? =A0pipe_dtor() is more obviously correct,
> =A0since it only does 1 pipeclose() for fifos. =A0The new isfifo flag is
> =A0only used in pipeclose() to select the zone.

That's needed to free UMA allocated memory.

>
> % @ =A0 =A0 =A0 =A0 =A0 =A0 return (error);
> % @ =A0 =A0 }
> % @ % @ =A0 =A0 =A0 =A0 rpipe->pipe_state |=3D PIPE_DIRECTOK;
> % @ =A0 =A0 wpipe->pipe_state |=3D PIPE_DIRECTOK;
> % @ % @+ =A0 =A0 =A0 =A0if (isfifo) {
> % @+ =A0 =A0 =A0 =A0 =A0 =A0up->pip[0].pi_rpipe =3D rpipe;
> % @+ =A0 =A0 =A0 =A0 =A0 =A0up->pip[0].pi_wpipe =3D wpipe;
> % @+ =A0 =A0} else {
> % @+ =A0 =A0 =A0 =A0 =A0 =A0up->pip[0].pi_rpipe =3D rpipe;
> % @+ =A0 =A0 =A0 =A0 =A0 =A0up->pip[0].pi_wpipe =3D rpipe;
> % @+ =A0 =A0 =A0 =A0 =A0 =A0up->pip[1].pi_rpipe =3D wpipe;
> % @+ =A0 =A0 =A0 =A0 =A0 =A0up->pip[1].pi_wpipe =3D wpipe;
> % @+ =A0 =A0}
>
> =A0This seems slightly broken. =A0Fifos should only use 1 pipeinfo struct=
,
> =A0but other fifo code initializes both up->pip[0] and up->pip[1].

your interpretation seems wrong.

> Having only 1 pipe end initialized would destroy any remaining symmetry
> but would make it clear which 1 is actually used.
>
> % +
> % +extern struct fileops pipeops;
> % +
> % +int pipe_ctor(struct pipeinfo **ppip, struct thread *td);
> % +void pipe_dtor(struct pipeinfo *pip);
>
> Indentation errors.
>
> % % =A0#endif /* !_SYS_PIPE_H_ */
>
> Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACfq091AT7OP6Bxd3g5me8=HR%2BykUrVA4zEaBw8qtBxb9Ue1fQ>