Date: Tue, 24 May 2016 22:20:40 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Kevin Lo <kevlo@freebsd.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r300423 - in head/sys: fs/ext2fs ufs/ffs Message-ID: <20160524195127.C931@besplex.bde.org> In-Reply-To: <201605221431.u4MEVKXC007524@repo.freebsd.org> References: <201605221431.u4MEVKXC007524@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 22 May 2016, Kevin Lo wrote: > Log: > arc4random() returns 0 to (2**32)=E2=88=921, use an alternative to initi= alize > i_gen if it's zero rather than a divide by 2. > > With inputs from delphij, mckusick, rmacklem > > Reviewed by:=09mckusick > ... > Modified: head/sys/fs/ext2fs/ext2_alloc.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > --- head/sys/fs/ext2fs/ext2_alloc.c=09Sun May 22 14:13:20 2016=09(r300422= ) > +++ head/sys/fs/ext2fs/ext2_alloc.c=09Sun May 22 14:31:20 2016=09(r300423= ) > @@ -408,7 +408,8 @@ ext2_valloc(struct vnode *pvp, int mode, > =09/* > =09 * Set up a new generation number for this inode. > =09 */ > -=09ip->i_gen =3D arc4random(); > +=09while (ip->i_gen =3D=3D 0 || ++ip->i_gen =3D=3D 0) > +=09=09ip->i_gen =3D arc4random(); This is a correct implementation of ffs's intended method, but ffs's intended method is wrong (see below for its wrongness). Correctness depends on i_gen having type uint32_t in ext2fs. This makes the code +ip_i_gen undead, so i_gen is re-randomized occasionally (averaged over all inodes, once for every ~2 billionth reuse of an inode, which is practically never. Bugs in ffs prevent it being done even that often there. So the re-randomization is almost useless. I think it is slighty worse than useless, since it may give the same i_gen immediately, while always incrementing (but skipping 0) always gives a new i_gen. ext2fs might not need this at all. For ffs, then special case of i_gen =3D=3D 0 must be handled because we still pretend to support file systems created by newfs versions almost 25 years old. newfs didn't initialize di_gen back then, so all inodes started with di_gen =3D=3D 0. ext2fs is less that 25 years old, so it has less history to support. > > =09vfs_timestamp(&ts); > =09ip->i_birthtime =3D ts.tv_sec; > > Modified: head/sys/fs/ext2fs/ext2_vfsops.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > --- head/sys/fs/ext2fs/ext2_vfsops.c=09Sun May 22 14:13:20 2016=09(r30042= 2) > +++ head/sys/fs/ext2fs/ext2_vfsops.c=09Sun May 22 14:31:20 2016=09(r30042= 3) > @@ -998,7 +998,8 @@ ext2_vget(struct mount *mp, ino_t ino, i > =09 * already have one. This should only happen on old filesystems. > =09 */ > =09if (ip->i_gen =3D=3D 0) { > -=09=09ip->i_gen =3D random() + 1; > +=09=09while (ip->i_gen =3D=3D 0) > +=09=09=09ip->i_gen =3D arc4random(); This is correct, but might be unnecessary (see above). "on old filesystems= " was copied from ffs where it meant "on file systems created by versions of newfs that didn't initialize di_gen". Such file systems are restricted to ffs1. But i_gen stil occurs due to bugs in newfs -- it is missing this loop, so it sometimes sets i_gen to the random value of 1, and we can't/don= 't tell the difference between this and unitialized. I think ext2fs is more like ffs2 in a relevant way here. Both have the feature of speeding up newfs by only writing a few inodes. Thus most inode allocations occur in the kernel, and it hardly matters if newfs initialized i_gen for the few inodes that it initialized. extfs and ext2fs also have an inode non-clearing feature on unlink that might be relevant. I forget if ffs2 has this. So the code should be simplified by never expecting newfs to initialize i_gen to nonzero. It already almost does this, except in the comment. For this, it is necessary to skip i_gen =3D=3D 0 (mod 2**32). > =09=09if ((vp->v_mount->mnt_flag & MNT_RDONLY) =3D=3D 0) > =09=09=09ip->i_flag |=3D IN_MODIFIED; > =09} > > Modified: head/sys/ufs/ffs/ffs_alloc.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > --- head/sys/ufs/ffs/ffs_alloc.c=09Sun May 22 14:13:20 2016=09(r300422) > +++ head/sys/ufs/ffs/ffs_alloc.c=09Sun May 22 14:31:20 2016=09(r300423) > @@ -1102,8 +1102,8 @@ dup_alloc: > =09/* > =09 * Set up a new generation number for this inode. > =09 */ > -=09if (ip->i_gen =3D=3D 0 || ++ip->i_gen =3D=3D 0) > -=09=09ip->i_gen =3D arc4random() / 2 + 1; > +=09while (ip->i_gen =3D=3D 0 || ++ip->i_gen =3D=3D 0) > +=09=09ip->i_gen =3D arc4random(); This is broken due to an old type error. In ffs, i_gen has type uint64_t, so it is physically impossible for ++ip->i_gen to wrap to 0. (i_gen is initialized from di_gen, and that has type uint32_t, so i_gen is initially <=3D UINT32_MAX and it would take 584 years to wrap with the modest inode recycling period of 1 nanosecond.) So the above is an obfuscated way of writing: =09if (ip->i_gen =3D=3D 0) { =09=09/* =09=09 * This value means uninitialized (or a bug). Init now. =09=09 * The loop is to not have the usual bug here. =09=09 */ =09=09do =09=09=09ip->i_gen =3D arc4random(); =09=09while (ip->i_gen =3D=3D 0); =09} else =09=09ip->i_gen++; Now it is clear that i_gen can grow far above UINT32_MAX. But usually it doesn't. Growth above UINT32_MAX gets truncated when the vnode is recycled. Overflow occurs with i_gen is stored to di_gen, and growth resumes at a small truncated value. The type error gives truncation on most uses of i_gen: di_gen, ufid_gen and ueh_i_gen are all 32 bits. va_gen is 32-bits on 32-bit arches but is 64 bits on 64-bit arches. Various bugs result. The bugs are mostly features. It is not very useful to re-randomize on reaching the 32-bit boundary. The bugfeature normally avoids this. If i_gen were not truncated to 32 bits when the vnode is recycled (or on unmount), and if it were consistently truncated (that means, truncate it to 32 bits in va_gen on 64-bit arches), then the bugfeature would work perfectly. The top 32 bits in i_gen would then be unused except to record history for a few trillion years for most inodes. > =09DIP_SET(ip, i_gen, ip->i_gen); > =09if (fs->fs_magic =3D=3D FS_UFS2_MAGIC) { > =09=09vfs_timestamp(&ts); > @@ -2080,7 +2080,8 @@ gotit: > =09=09bzero(ibp->b_data, (int)fs->fs_bsize); > =09=09dp2 =3D (struct ufs2_dinode *)(ibp->b_data); > =09=09for (i =3D 0; i < INOPB(fs); i++) { > -=09=09=09dp2->di_gen =3D arc4random() / 2 + 1; > +=09=09=09while (dp2->di_gen =3D=3D 0) > +=09=09=09=09dp2->di_gen =3D arc4random(); This seems to be correct. It is only for the ffs2 case, and di_gen was initialized to 0 by the bzero(), but the while loop is easier to read that the more optimal do-while loop that I wrote above. > =09=09=09dp2++; > =09=09} > =09=09/* > > Modified: head/sys/ufs/ffs/ffs_vfsops.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > --- head/sys/ufs/ffs/ffs_vfsops.c=09Sun May 22 14:13:20 2016=09(r300422) > +++ head/sys/ufs/ffs/ffs_vfsops.c=09Sun May 22 14:31:20 2016=09(r300423) > @@ -1768,7 +1768,8 @@ ffs_vgetf(mp, ino, flags, vpp, ffs_flags > =09 * already have one. This should only happen on old filesystems. > =09 */ > =09if (ip->i_gen =3D=3D 0) { > -=09=09ip->i_gen =3D arc4random() / 2 + 1; > +=09=09while (ip->i_gen =3D=3D 0) > +=09=09=09ip->i_gen =3D arc4random(); This also seems to be correct. Now the compiler can easily optimize the while loop to a do-while loop, since a previous check for i_gen =3D=3D 0 is visible. "should" in the comment is correct, except this should never happen now since file systems that were old when it was written now shouldn't exist. However, this does happen now, mostly for new file systems, due to a bug in newfs: newfs is missing this while loop, so it sometimes initializes di_gen to 0. Then we can't/don't tell if di_gen was initialized to a random value, so we-re-randomize it. > =09=09if ((vp->v_mount->mnt_flag & MNT_RDONLY) =3D=3D 0) { > =09=09=09ip->i_flag |=3D IN_MODIFIED; > =09=09=09DIP_SET(ip, i_gen, ip->i_gen); All versions of newfs seem to have had buggy initialization: - they didn't initialize di_gen (except to 0) until 1997 - the first 1997 version used /dev/urandom to initialize "long ret;". Assignment of this to "int32_t di_gen;" overflowed on 64-bit arches (alpha?). This gave a negative value which gave further overflows or suprising sign extensions mainly when assigned to "u_long va_gen'"; most other types were consistently signed. - the next 1997 version used random(). This fixed the generation of negative values. 0 was still generated - the 2003 version used arc4random(). This gave negative values again. 0 was still generated. This is essentially the current version. The current version uses a wrapper function like the first 1997 version. - newfs doesn't seem to have ever had the version that did *random() / 2 + = 1 like the kernel did. It thus escaped having the overflow/sign extension bugs that the kernel had. Dividing by 2 was apparently supposed to avoi= d these bugs. It worked with random() since random() returns at most INT32_MAX. But arc4random() returns at mist UINT32_MAX. Dividing this by 2 gives (u_int)INT32_MAX and adding 1 gives a value that overflows when assigned to int32_t. This was later fixed by changing lots of int32_t to uint32_t. I'm not sure about the security aspects of randomizing i_gen. The re- randomization is so accidental and infrequent that security must be unimportant. But I think a unique generation (over all inodes on all file systems) would be better than any randomness. FreeBSD-1 was closer to having that -- i_gen was initialized to the global nextgennumber++ if it was zero; it was not incremented for inode use. The globabl would have to be 64 bits now. Ensuring uniqueness is not easy since it means that you have to check inode numbers on not-very-trusted newly mounted file systems against all inode numbers already in use. Perhaps it works to always use a new set for every new mount (ignore the ones on disk). Bruce From owner-svn-src-all@freebsd.org Tue May 24 12:40:04 2016 Return-Path: <owner-svn-src-all@freebsd.org> Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D694DB47672; Tue, 24 May 2016 12:40:04 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9B15810F9; Tue, 24 May 2016 12:40:04 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id u4OCe3Bc076066; Tue, 24 May 2016 12:40:03 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id u4OCe3Gc076065; Tue, 24 May 2016 12:40:03 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201605241240.u4OCe3Gc076065@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin <mav@FreeBSD.org> Date: Tue, 24 May 2016 12:40:03 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r300610 - head/sys/dev/ntb/if_ntb X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" <svn-src-all.freebsd.org> List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-all>, <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/> List-Post: <mailto:svn-src-all@freebsd.org> List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help> List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-all>, <mailto:svn-src-all-request@freebsd.org?subject=subscribe> X-List-Received-Date: Tue, 24 May 2016 12:40:04 -0000 Author: mav Date: Tue May 24 12:40:03 2016 New Revision: 300610 URL: https://svnweb.freebsd.org/changeset/base/300610 Log: Re-enable write combining, disabled by default at r295486. if_ntb(4) strongly benefits from WC, improving throughput from 350Mbit/s to 8-10Gbit/s on my tests. MFC after: 1 week Modified: head/sys/dev/ntb/if_ntb/if_ntb.c Modified: head/sys/dev/ntb/if_ntb/if_ntb.c ============================================================================== --- head/sys/dev/ntb/if_ntb/if_ntb.c Tue May 24 12:20:23 2016 (r300609) +++ head/sys/dev/ntb/if_ntb/if_ntb.c Tue May 24 12:40:03 2016 (r300610) @@ -616,6 +616,10 @@ ntb_transport_probe(struct ntb_softc *nt mw->xlat_size = 0; mw->virt_addr = NULL; mw->dma_addr = 0; + + rc = ntb_mw_set_wc(nt->ntb, i, VM_MEMATTR_WRITE_COMBINING); + if (rc) + ntb_printf(0, "Unable to set mw%d caching\n", i); } qp_bitmap = ntb_db_valid_mask(ntb);
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160524195127.C931>