Date: Mon, 25 Aug 2014 20:04:17 +0200 From: Mateusz Guzik <mjguzik@gmail.com> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-hackers@freebsd.org Subject: Re: atomic_load_acq_int in sequential_heuristic Message-ID: <20140825180417.GB23088@dft-labs.eu> In-Reply-To: <20140825172755.GD2737@kib.kiev.ua> References: <20140824162331.GW2737@kib.kiev.ua> <20140824164236.GX2737@kib.kiev.ua> <20140825005659.GA14344@dft-labs.eu> <20140825073404.GZ2737@kib.kiev.ua> <20140825081526.GB14344@dft-labs.eu> <20140825083539.GB2737@kib.kiev.ua> <20140825091056.GC14344@dft-labs.eu> <20140825111000.GC2737@kib.kiev.ua> <20140825130433.GD14344@dft-labs.eu> <20140825172755.GD2737@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Aug 25, 2014 at 08:27:55PM +0300, Konstantin Belousov wrote: > On Mon, Aug 25, 2014 at 03:04:33PM +0200, Mateusz Guzik wrote: > > On Mon, Aug 25, 2014 at 02:10:01PM +0300, Konstantin Belousov wrote: > > > On Mon, Aug 25, 2014 at 11:10:56AM +0200, Mateusz Guzik wrote: > > > > On Mon, Aug 25, 2014 at 11:35:39AM +0300, Konstantin Belousov wrote: > > > > > > + atomic_set_int(&fp->f_flag, FHASLOCK); > > > > > You misspelled FRDAHEAD as FHASLOCK, below as well. > > > > > Was this tested ? > > > > > > > > > > > > > Oops, damn copy-pasto. Sorry. > > > > > > > > > > + VOP_UNLOCK(vp, 0); > > > > > > } else { > > > > > > - do { > > > > > > - new = old = fp->f_flag; > > > > > > - new &= ~FRDAHEAD; > > > > > > - } while (!atomic_cmpset_rel_int(&fp->f_flag, old, new)); > > > > > > + atomic_clear_int(&fp->f_flag, FHASLOCK); > > > > > So what about extending the vnode lock to cover the flag reset ? > > > > > > > > > > > > > Sure. > > > > > > > > So this time I tested it properly and found out it is impossible to > > > > disable the hint. The test is: > > > > > > > > -1 is truncated and then read from intptr_t which yields a big positive > > > > number instead. Other users in the function use int tmp to work around > > > > this issue. > > > Could you provide me with the test case which demonstrates the problem ? > > > > > > > Nothing special: > > https://people.freebsd.org/~mjg/patches/F_READAHEAD.c > And how did you verified that fcntl(F_READAHEAD, -1) did not worked ? > I ended up with adding printf() to kern_fcntl() to see arg value. > 3 uprintfs. one with the value, and then one in each if branch. > > > > > The fcntl(2) prototype in sys/fcntl.h is variadic, so int arg argument > > > is not promoted. On the other hand, syscalls.master declares arg as long. > > > Did you tried to pass -1L as third argument to disable ? > > > > > > > Yes, -1L deals with the problem. I would still argue that using 'tmp' > > like the rest of the function would not hurt as a cheap solution. > This would deliberately break the current ABI (which takes the argument > as long for F_READAHEAD), which is not acceptable. > Ok. > I do think that there is bug in the "-1" stuff, but it is in compat32 > shims. The compat/freebsd32/syscalls.master does not provide the compat > for fcntl(2), which means that 32bit fcntl(2) does not work when either > signed extension is needed (the F_READAHEAD case), or on the big-endian > machines. On i386, it did not practically matter before F_READAHEAD, > since x86 is little-endian and flags passed as arg did not touch the > sign bit. > > Note that fcntl(2) man page is wrong, it claims that optional argument > arg is int. It cannot be true since pointer on LP64 platform cannot > fit into int. The SUSv4 is explicit in describing which command > takes which type; our man page must be fixed, but this is for later. > > See the patch at the end of the reply for the fix. It needs sysent > regen for actual build. > I tested the patch and it fixes the problem. > > /* > > * Exclusive lock synchronizes against f_seqcount reads and writes in > > * sequential_heuristic(). > > */ > > > > > Another place to add the locking annotation is the struct file in > > > sys/file.h. Now f_seqcount is 'protected' by the vnode lock. > > > I am not sure how to express the locking model shortly. > > > > > > > /* > > * (a) f_vnode lock required (shared allows both reads and writes) > > */ > Ok. > diff --git a/sys/kern/kern_descrip.c b/sys/kern/kern_descrip.c index 7abdca0..52fc01a 100644 --- a/sys/kern/kern_descrip.c +++ b/sys/kern/kern_descrip.c @@ -476,7 +476,6 @@ kern_fcntl(struct thread *td, int fd, int cmd, intptr_t arg) struct vnode *vp; cap_rights_t rights; int error, flg, tmp; - u_int old, new; uint64_t bsize; off_t foffset; @@ -760,26 +759,24 @@ kern_fcntl(struct thread *td, int fd, int cmd, intptr_t arg) error = EBADF; break; } + vp = fp->f_vnode; + /* + * Exclusive lock synchronizes against f_seqcount reads and + * writes in sequential_heuristic(). + */ + error = vn_lock(vp, LK_EXCLUSIVE); + if (error != 0) { + fdrop(fp, td); + break; + } if (arg >= 0) { - vp = fp->f_vnode; - error = vn_lock(vp, LK_SHARED); - if (error != 0) { - fdrop(fp, td); - break; - } bsize = fp->f_vnode->v_mount->mnt_stat.f_iosize; - VOP_UNLOCK(vp, 0); fp->f_seqcount = (arg + bsize - 1) / bsize; - do { - new = old = fp->f_flag; - new |= FRDAHEAD; - } while (!atomic_cmpset_rel_int(&fp->f_flag, old, new)); + atomic_set_int(&fp->f_flag, FRDAHEAD); } else { - do { - new = old = fp->f_flag; - new &= ~FRDAHEAD; - } while (!atomic_cmpset_rel_int(&fp->f_flag, old, new)); + atomic_clear_int(&fp->f_flag, FRDAHEAD); } + VOP_UNLOCK(vp, 0); fdrop(fp, td); break; diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c index f1d19ac..98823f3 100644 --- a/sys/kern/vfs_vnops.c +++ b/sys/kern/vfs_vnops.c @@ -438,7 +438,8 @@ static int sequential_heuristic(struct uio *uio, struct file *fp) { - if (atomic_load_acq_int(&(fp->f_flag)) & FRDAHEAD) + ASSERT_VOP_LOCKED(fp->f_vnode, __func__); + if (fp->f_flag & FRDAHEAD) return (fp->f_seqcount << IO_SEQSHIFT); /* diff --git a/sys/sys/file.h b/sys/sys/file.h index b7d358b..856f799 100644 --- a/sys/sys/file.h +++ b/sys/sys/file.h @@ -143,6 +143,7 @@ struct fileops { * * Below is the list of locks that protects members in struct file. * + * (a) f_vnode lock required (shared allows both reads and writes) * (f) protected with mtx_lock(mtx_pool_find(fp)) * (d) cdevpriv_mtx * none not locked @@ -168,7 +169,7 @@ struct file { /* * DTYPE_VNODE specific fields. */ - int f_seqcount; /* Count of sequential accesses. */ + int f_seqcount; /* (a) Count of sequential accesses. */ off_t f_nextoff; /* next expected read/write offset. */ union { struct cdev_privdata *fvn_cdevpriv; -- Mateusz Guzik <mjguzik gmail.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140825180417.GB23088>