Date: Sat, 30 Mar 2013 20:57:35 +0000 (UTC) From: Kirk McKusick <mckusick@FreeBSD.org> To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-8@freebsd.org Subject: svn commit: r248936 - in stable/8/sys: . amd64/amd64 dev/sound geom i386/i386 kern sys ufs/ffs Message-ID: <201303302057.r2UKvZWW065301@svn.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: mckusick Date: Sat Mar 30 20:57:35 2013 New Revision: 248936 URL: http://svnweb.freebsd.org/changeset/base/248936 Log: MFC of 246876, 246877, and 247387: MFC reviewed by: kib MFC 246876: Add barrier write capability to the VFS buffer interface. A barrier write is a disk write request that tells the disk that the buffer being written must be committed to the media along with any writes that preceeded it before any future blocks may be written to the drive. Barrier writes are provided by adding the functions bbarrierwrite (bwrite with barrier) and babarrierwrite (bawrite with barrier). Following a bbarrierwrite the client knows that the requested buffer is on the media. It does not ensure that buffers written before that buffer are on the media. It only ensure that buffers written before that buffer will get to the media before any buffers written after that buffer. A flush command must be sent to the disk to ensure that all earlier written buffers are on the media. Reviewed by: kib Tested by: Peter Holm MFC 246877: The UFS2 filesystem allocates new blocks of inodes as they are needed. When a cylinder group runs short of inodes, a new block for inodes is allocated, zero'ed, and written to the disk. The zero'ed inodes must be on the disk before the cylinder group can be updated to claim them. If the cylinder group claiming the new inodes were written before the zero'ed block of inodes, the system could crash with the filesystem in an unrecoverable state. Rather than adding a soft updates dependency to ensure that the new inode block is written before it is claimed by the cylinder group map, we just do a barrier write of the zero'ed inode block to ensure that it will get written before the updated cylinder group map can be written. This change should only slow down bulk loading of newly created filesystems since that is the primary time that new inode blocks need to be created. Reported by: Robert Watson Reviewed by: kib Tested by: Peter Holm MFC 247387: An inode block must not be blockingly read while cg block is owned. The order is inode buffer lock -> snaplk -> cg buffer lock, reversing the order causes deadlocks. Inode block must not be written while cg block buffer is owned. The FFS copy on write needs to allocate a block to copy the content of the inode block, and the cylinder group selected for the allocation might be the same as the owned cg block. The reserved block detection code in the ffs_copyonwrite() and ffs_bp_snapblk() is unable to detect the situation, because the locked cg buffer is not exposed to it. In order to maintain the dependency between initialized inode block and the cg_initediblk pointer, look up the inode buffer in non-blocking mode. If succeeded, brelse cg block, initialize the inode block and write it. After the write is finished, reread cg block and update the cg_initediblk. If inode block is already locked by another thread, let the another thread initialize it. If another thread raced with us after we started writing inode block, the situation is detected by an update of cg_initediblk. Note that double-initialization of the inode block is harmless, the block cannot be used until cg_initediblk is incremented. Sponsored by: The FreeBSD Foundation In collaboration with: pho Reviewed by: mckusick X-MFC-note: after r246877 Modified: stable/8/sys/geom/geom_vfs.c stable/8/sys/kern/vfs_bio.c stable/8/sys/kern/vfs_cluster.c stable/8/sys/sys/buf.h stable/8/sys/ufs/ffs/ffs_alloc.c Directory Properties: stable/8/sys/ (props changed) stable/8/sys/Makefile (props changed) stable/8/sys/amd64/ (props changed) stable/8/sys/amd64/amd64/intr_machdep.c (props changed) stable/8/sys/amd64/include/xen/ (props changed) stable/8/sys/arm/ (props changed) stable/8/sys/boot/ (props changed) stable/8/sys/bsm/ (props changed) stable/8/sys/cam/ (props changed) stable/8/sys/cddl/ (props changed) stable/8/sys/cddl/contrib/opensolaris/ (props changed) stable/8/sys/compat/ (props changed) stable/8/sys/conf/ (props changed) stable/8/sys/contrib/ (props changed) stable/8/sys/contrib/dev/acpica/ (props changed) stable/8/sys/contrib/pf/ (props changed) stable/8/sys/crypto/ (props changed) stable/8/sys/ddb/ (props changed) stable/8/sys/dev/ (props changed) stable/8/sys/dev/aac/ (props changed) stable/8/sys/dev/acpi_support/ (props changed) stable/8/sys/dev/acpica/ (props changed) stable/8/sys/dev/adb/ (props changed) stable/8/sys/dev/adlink/ (props changed) stable/8/sys/dev/advansys/ (props changed) stable/8/sys/dev/ae/ (props changed) stable/8/sys/dev/age/ (props changed) stable/8/sys/dev/agp/ (props changed) stable/8/sys/dev/aha/ (props changed) stable/8/sys/dev/ahb/ (props changed) stable/8/sys/dev/ahci/ (props changed) stable/8/sys/dev/aic/ (props changed) stable/8/sys/dev/aic7xxx/ (props changed) stable/8/sys/dev/alc/ (props changed) stable/8/sys/dev/ale/ (props changed) stable/8/sys/dev/amd/ (props changed) stable/8/sys/dev/amdsbwd/ (props changed) stable/8/sys/dev/amdtemp/ (props changed) stable/8/sys/dev/amr/ (props changed) stable/8/sys/dev/an/ (props changed) stable/8/sys/dev/arcmsr/ (props changed) stable/8/sys/dev/asmc/ (props changed) stable/8/sys/dev/asr/ (props changed) stable/8/sys/dev/ata/ (props changed) stable/8/sys/dev/ath/ (props changed) stable/8/sys/dev/atkbdc/ (props changed) stable/8/sys/dev/auxio/ (props changed) stable/8/sys/dev/bce/ (props changed) stable/8/sys/dev/bfe/ (props changed) stable/8/sys/dev/bge/ (props changed) stable/8/sys/dev/bktr/ (props changed) stable/8/sys/dev/bm/ (props changed) stable/8/sys/dev/buslogic/ (props changed) stable/8/sys/dev/bwi/ (props changed) stable/8/sys/dev/bwn/ (props changed) stable/8/sys/dev/cardbus/ (props changed) stable/8/sys/dev/cas/ (props changed) stable/8/sys/dev/ce/ (props changed) stable/8/sys/dev/cfe/ (props changed) stable/8/sys/dev/cfi/ (props changed) stable/8/sys/dev/ciss/ (props changed) stable/8/sys/dev/cm/ (props changed) stable/8/sys/dev/cmx/ (props changed) stable/8/sys/dev/coretemp/ (props changed) stable/8/sys/dev/cp/ (props changed) stable/8/sys/dev/cpuctl/ (props changed) stable/8/sys/dev/cpufreq/ (props changed) stable/8/sys/dev/cs/ (props changed) stable/8/sys/dev/ct/ (props changed) stable/8/sys/dev/ctau/ (props changed) stable/8/sys/dev/cx/ (props changed) stable/8/sys/dev/cxgb/ (props changed) stable/8/sys/dev/cxgbe/ (props changed) stable/8/sys/dev/cy/ (props changed) stable/8/sys/dev/dc/ (props changed) stable/8/sys/dev/dcons/ (props changed) stable/8/sys/dev/de/ (props changed) stable/8/sys/dev/digi/ (props changed) stable/8/sys/dev/dpms/ (props changed) stable/8/sys/dev/dpt/ (props changed) stable/8/sys/dev/drm/ (props changed) stable/8/sys/dev/e1000/ (props changed) stable/8/sys/dev/ed/ (props changed) stable/8/sys/dev/eisa/ (props changed) stable/8/sys/dev/en/ (props changed) stable/8/sys/dev/ep/ (props changed) stable/8/sys/dev/esp/ (props changed) stable/8/sys/dev/et/ (props changed) stable/8/sys/dev/ex/ (props changed) stable/8/sys/dev/exca/ (props changed) stable/8/sys/dev/fatm/ (props changed) stable/8/sys/dev/fb/ (props changed) stable/8/sys/dev/fdc/ (props changed) stable/8/sys/dev/fe/ (props changed) stable/8/sys/dev/firewire/ (props changed) stable/8/sys/dev/flash/ (props changed) stable/8/sys/dev/fxp/ (props changed) stable/8/sys/dev/gem/ (props changed) stable/8/sys/dev/glxsb/ (props changed) stable/8/sys/dev/hatm/ (props changed) stable/8/sys/dev/hifn/ (props changed) stable/8/sys/dev/hme/ (props changed) stable/8/sys/dev/hpt27xx/ (props changed) stable/8/sys/dev/hptiop/ (props changed) stable/8/sys/dev/hptmv/ (props changed) stable/8/sys/dev/hptrr/ (props changed) stable/8/sys/dev/hwpmc/ (props changed) stable/8/sys/dev/ic/ (props changed) stable/8/sys/dev/ichsmb/ (props changed) stable/8/sys/dev/ichwd/ (props changed) stable/8/sys/dev/ida/ (props changed) stable/8/sys/dev/ie/ (props changed) stable/8/sys/dev/ieee488/ (props changed) stable/8/sys/dev/if_ndis/ (props changed) stable/8/sys/dev/iicbus/ (props changed) stable/8/sys/dev/iir/ (props changed) stable/8/sys/dev/io/ (props changed) stable/8/sys/dev/ipmi/ (props changed) stable/8/sys/dev/ips/ (props changed) stable/8/sys/dev/ipw/ (props changed) stable/8/sys/dev/isci/ (props changed) stable/8/sys/dev/iscsi/ (props changed) stable/8/sys/dev/isp/ (props changed) stable/8/sys/dev/ispfw/ (props changed) stable/8/sys/dev/iwi/ (props changed) stable/8/sys/dev/iwn/ (props changed) stable/8/sys/dev/ixgb/ (props changed) stable/8/sys/dev/ixgbe/ (props changed) stable/8/sys/dev/jme/ (props changed) stable/8/sys/dev/joy/ (props changed) stable/8/sys/dev/kbd/ (props changed) stable/8/sys/dev/kbdmux/ (props changed) stable/8/sys/dev/ksyms/ (props changed) stable/8/sys/dev/le/ (props changed) stable/8/sys/dev/led/ (props changed) stable/8/sys/dev/lge/ (props changed) stable/8/sys/dev/lindev/ (props changed) stable/8/sys/dev/lmc/ (props changed) stable/8/sys/dev/malo/ (props changed) stable/8/sys/dev/mc146818/ (props changed) stable/8/sys/dev/mca/ (props changed) stable/8/sys/dev/mcd/ (props changed) stable/8/sys/dev/md/ (props changed) stable/8/sys/dev/mem/ (props changed) stable/8/sys/dev/mfi/ (props changed) stable/8/sys/dev/mge/ (props changed) stable/8/sys/dev/mii/ (props changed) stable/8/sys/dev/mk48txx/ (props changed) stable/8/sys/dev/mlx/ (props changed) stable/8/sys/dev/mly/ (props changed) stable/8/sys/dev/mmc/ (props changed) stable/8/sys/dev/mn/ (props changed) stable/8/sys/dev/mps/ (props changed) stable/8/sys/dev/mpt/ (props changed) stable/8/sys/dev/mse/ (props changed) stable/8/sys/dev/msk/ (props changed) stable/8/sys/dev/mvs/ (props changed) stable/8/sys/dev/mwl/ (props changed) stable/8/sys/dev/mxge/ (props changed) stable/8/sys/dev/my/ (props changed) stable/8/sys/dev/ncv/ (props changed) stable/8/sys/dev/netmap/ (props changed) stable/8/sys/dev/nfe/ (props changed) stable/8/sys/dev/nge/ (props changed) stable/8/sys/dev/nmdm/ (props changed) stable/8/sys/dev/nsp/ (props changed) stable/8/sys/dev/null/ (props changed) stable/8/sys/dev/nve/ (props changed) stable/8/sys/dev/nvram/ (props changed) stable/8/sys/dev/nxge/ (props changed) stable/8/sys/dev/oce/ (props changed) stable/8/sys/dev/ofw/ (props changed) stable/8/sys/dev/patm/ (props changed) stable/8/sys/dev/pbio/ (props changed) stable/8/sys/dev/pccard/ (props changed) stable/8/sys/dev/pccbb/ (props changed) stable/8/sys/dev/pcf/ (props changed) stable/8/sys/dev/pci/ (props changed) stable/8/sys/dev/pcn/ (props changed) stable/8/sys/dev/pdq/ (props changed) stable/8/sys/dev/powermac_nvram/ (props changed) stable/8/sys/dev/ppbus/ (props changed) stable/8/sys/dev/ppc/ (props changed) stable/8/sys/dev/pst/ (props changed) stable/8/sys/dev/puc/ (props changed) stable/8/sys/dev/quicc/ (props changed) stable/8/sys/dev/ral/ (props changed) stable/8/sys/dev/random/ (props changed) stable/8/sys/dev/rc/ (props changed) stable/8/sys/dev/re/ (props changed) stable/8/sys/dev/rndtest/ (props changed) stable/8/sys/dev/rp/ (props changed) stable/8/sys/dev/safe/ (props changed) stable/8/sys/dev/sbni/ (props changed) stable/8/sys/dev/scc/ (props changed) stable/8/sys/dev/scd/ (props changed) stable/8/sys/dev/sdhci/ (props changed) stable/8/sys/dev/sec/ (props changed) stable/8/sys/dev/sf/ (props changed) stable/8/sys/dev/sge/ (props changed) stable/8/sys/dev/si/ (props changed) stable/8/sys/dev/siba/ (props changed) stable/8/sys/dev/siis/ (props changed) stable/8/sys/dev/sio/ (props changed) stable/8/sys/dev/sis/ (props changed) stable/8/sys/dev/sk/ (props changed) stable/8/sys/dev/smbus/ (props changed) stable/8/sys/dev/smc/ (props changed) stable/8/sys/dev/sn/ (props changed) stable/8/sys/dev/snc/ (props changed) stable/8/sys/dev/snp/ (props changed) stable/8/sys/dev/sound/ (props changed) stable/8/sys/dev/sound/chip.h (props changed) stable/8/sys/dev/sound/clone.c (props changed) stable/8/sys/dev/sound/clone.h (props changed) stable/8/sys/dev/sound/driver.c (props changed) stable/8/sys/dev/sound/isa/ (props changed) stable/8/sys/dev/sound/macio/ (props changed) stable/8/sys/dev/sound/midi/ (props changed) stable/8/sys/dev/sound/pci/ (props changed) stable/8/sys/dev/sound/pcm/ (props changed) stable/8/sys/dev/sound/sbus/ (props changed) stable/8/sys/dev/sound/unit.c (props changed) stable/8/sys/dev/sound/unit.h (props changed) stable/8/sys/dev/sound/usb/ (props changed) stable/8/sys/dev/sound/version.h (props changed) stable/8/sys/dev/speaker/ (props changed) stable/8/sys/dev/spibus/ (props changed) stable/8/sys/dev/ste/ (props changed) stable/8/sys/dev/stg/ (props changed) stable/8/sys/dev/stge/ (props changed) stable/8/sys/dev/streams/ (props changed) stable/8/sys/dev/sym/ (props changed) stable/8/sys/dev/syscons/ (props changed) stable/8/sys/dev/tdfx/ (props changed) stable/8/sys/dev/ti/ (props changed) stable/8/sys/dev/tl/ (props changed) stable/8/sys/dev/tpm/ (props changed) stable/8/sys/dev/trm/ (props changed) stable/8/sys/dev/tsec/ (props changed) stable/8/sys/dev/twa/ (props changed) stable/8/sys/dev/twe/ (props changed) stable/8/sys/dev/tws/ (props changed) stable/8/sys/dev/tx/ (props changed) stable/8/sys/dev/txp/ (props changed) stable/8/sys/dev/uart/ (props changed) stable/8/sys/dev/ubsec/ (props changed) stable/8/sys/dev/usb/ (props changed) stable/8/sys/dev/utopia/ (props changed) stable/8/sys/dev/vge/ (props changed) stable/8/sys/dev/viawd/ (props changed) stable/8/sys/dev/virtio/ (props changed) stable/8/sys/dev/vkbd/ (props changed) stable/8/sys/dev/vr/ (props changed) stable/8/sys/dev/vte/ (props changed) stable/8/sys/dev/vx/ (props changed) stable/8/sys/dev/watchdog/ (props changed) stable/8/sys/dev/wb/ (props changed) stable/8/sys/dev/wbwd/ (props changed) stable/8/sys/dev/wds/ (props changed) stable/8/sys/dev/wi/ (props changed) stable/8/sys/dev/wl/ (props changed) stable/8/sys/dev/wpi/ (props changed) stable/8/sys/dev/xe/ (props changed) stable/8/sys/dev/xen/ (props changed) stable/8/sys/dev/xl/ (props changed) stable/8/sys/fs/ (props changed) stable/8/sys/gdb/ (props changed) stable/8/sys/geom/ (props changed) stable/8/sys/gnu/ (props changed) stable/8/sys/i386/ (props changed) stable/8/sys/i386/i386/intr_machdep.c (props changed) stable/8/sys/ia64/ (props changed) stable/8/sys/isa/ (props changed) stable/8/sys/kern/ (props changed) stable/8/sys/kgssapi/ (props changed) stable/8/sys/libkern/ (props changed) stable/8/sys/mips/ (props changed) stable/8/sys/modules/ (props changed) stable/8/sys/net/ (props changed) stable/8/sys/net80211/ (props changed) stable/8/sys/netatalk/ (props changed) stable/8/sys/netgraph/ (props changed) stable/8/sys/netinet/ (props changed) stable/8/sys/netinet6/ (props changed) stable/8/sys/netipsec/ (props changed) stable/8/sys/netipx/ (props changed) stable/8/sys/netnatm/ (props changed) stable/8/sys/netncp/ (props changed) stable/8/sys/netsmb/ (props changed) stable/8/sys/nfs/ (props changed) stable/8/sys/nfsclient/ (props changed) stable/8/sys/nfsserver/ (props changed) stable/8/sys/nlm/ (props changed) stable/8/sys/opencrypto/ (props changed) stable/8/sys/pc98/ (props changed) stable/8/sys/pci/ (props changed) stable/8/sys/powerpc/ (props changed) stable/8/sys/rpc/ (props changed) stable/8/sys/security/ (props changed) stable/8/sys/sparc64/ (props changed) stable/8/sys/sun4v/ (props changed) stable/8/sys/sys/ (props changed) stable/8/sys/tools/ (props changed) stable/8/sys/ufs/ (props changed) stable/8/sys/vm/ (props changed) stable/8/sys/x86/ (props changed) stable/8/sys/xdr/ (props changed) stable/8/sys/xen/ (props changed) Modified: stable/8/sys/geom/geom_vfs.c ============================================================================== --- stable/8/sys/geom/geom_vfs.c Sat Mar 30 17:46:03 2013 (r248935) +++ stable/8/sys/geom/geom_vfs.c Sat Mar 30 20:57:35 2013 (r248936) @@ -127,6 +127,10 @@ g_vfs_strategy(struct bufobj *bo, struct bip->bio_done = g_vfs_done; bip->bio_caller2 = bp; bip->bio_length = bp->b_bcount; + if (bp->b_flags & B_BARRIER) { + bip->bio_flags |= BIO_ORDERED; + bp->b_flags &= ~B_BARRIER; + } g_io_request(bip, cp); } Modified: stable/8/sys/kern/vfs_bio.c ============================================================================== --- stable/8/sys/kern/vfs_bio.c Sat Mar 30 17:46:03 2013 (r248935) +++ stable/8/sys/kern/vfs_bio.c Sat Mar 30 20:57:35 2013 (r248936) @@ -206,6 +206,9 @@ SYSCTL_INT(_vfs, OID_AUTO, flushbufqtarg static long notbufdflashes; SYSCTL_LONG(_vfs, OID_AUTO, notbufdflashes, CTLFLAG_RD, ¬bufdflashes, 0, "Number of dirty buffer flushes done by the bufdaemon helpers"); +static long barrierwrites; +SYSCTL_LONG(_vfs, OID_AUTO, barrierwrites, CTLFLAG_RW, &barrierwrites, 0, + "Number of barrier writes"); /* * Wakeup point for bufdaemon, as well as indicator of whether it is already @@ -870,6 +873,9 @@ bufwrite(struct buf *bp) return (0); } + if (bp->b_flags & B_BARRIER) + barrierwrites++; + oldflags = bp->b_flags; BUF_ASSERT_HELD(bp); @@ -989,6 +995,8 @@ bdwrite(struct buf *bp) CTR3(KTR_BUF, "bdwrite(%p) vp %p flags %X", bp, bp->b_vp, bp->b_flags); KASSERT(bp->b_bufobj != NULL, ("No b_bufobj %p", bp)); + KASSERT((bp->b_flags & B_BARRIER) == 0, + ("Barrier request in delayed write %p", bp)); BUF_ASSERT_HELD(bp); if (bp->b_flags & B_INVAL) { @@ -1149,6 +1157,40 @@ bawrite(struct buf *bp) } /* + * babarrierwrite: + * + * Asynchronous barrier write. Start output on a buffer, but do not + * wait for it to complete. Place a write barrier after this write so + * that this buffer and all buffers written before it are committed to + * the disk before any buffers written after this write are committed + * to the disk. The buffer is released when the output completes. + */ +void +babarrierwrite(struct buf *bp) +{ + + bp->b_flags |= B_ASYNC | B_BARRIER; + (void) bwrite(bp); +} + +/* + * bbarrierwrite: + * + * Synchronous barrier write. Start output on a buffer and wait for + * it to complete. Place a write barrier after this write so that + * this buffer and all buffers written before it are committed to + * the disk before any buffers written after this write are committed + * to the disk. The buffer is released when the output completes. + */ +int +bbarrierwrite(struct buf *bp) +{ + + bp->b_flags |= B_BARRIER; + return (bwrite(bp)); +} + +/* * bwillwrite: * * Called prior to the locking of any vnodes when we are expecting to Modified: stable/8/sys/kern/vfs_cluster.c ============================================================================== --- stable/8/sys/kern/vfs_cluster.c Sat Mar 30 17:46:03 2013 (r248935) +++ stable/8/sys/kern/vfs_cluster.c Sat Mar 30 20:57:35 2013 (r248936) @@ -944,11 +944,17 @@ cluster_wbuild(vp, size, start_lbn, len) } bp->b_bcount += size; bp->b_bufsize += size; - bundirty(tbp); - tbp->b_flags &= ~B_DONE; - tbp->b_ioflags &= ~BIO_ERROR; + /* + * If any of the clustered buffers have their + * B_BARRIER flag set, transfer that request to + * the cluster. + */ + bp->b_flags |= (tbp->b_flags & B_BARRIER); + tbp->b_flags &= ~(B_DONE | B_BARRIER); tbp->b_flags |= B_ASYNC; + tbp->b_ioflags &= ~BIO_ERROR; tbp->b_iocmd = BIO_WRITE; + bundirty(tbp); reassignbuf(tbp); /* put on clean list */ bufobj_wref(tbp->b_bufobj); BUF_KERNPROC(tbp); Modified: stable/8/sys/sys/buf.h ============================================================================== --- stable/8/sys/sys/buf.h Sat Mar 30 17:46:03 2013 (r248935) +++ stable/8/sys/sys/buf.h Sat Mar 30 20:57:35 2013 (r248936) @@ -205,7 +205,7 @@ struct buf { #define B_00000800 0x00000800 /* Available flag. */ #define B_00001000 0x00001000 /* Available flag. */ #define B_INVAL 0x00002000 /* Does not contain valid info. */ -#define B_00004000 0x00004000 /* Available flag. */ +#define B_BARRIER 0x00004000 /* Write this and all preceeding first. */ #define B_NOCACHE 0x00008000 /* Do not cache block after use. */ #define B_MALLOC 0x00010000 /* malloced b_data */ #define B_CLUSTEROK 0x00020000 /* Pagein op, so swap() can count it. */ @@ -485,6 +485,8 @@ int breadn(struct vnode *, daddr_t, int, struct ucred *, struct buf **); void bdwrite(struct buf *); void bawrite(struct buf *); +void babarrierwrite(struct buf *); +int bbarrierwrite(struct buf *); void bdirty(struct buf *); void bundirty(struct buf *); void bufstrategy(struct bufobj *, struct buf *); Modified: stable/8/sys/ufs/ffs/ffs_alloc.c ============================================================================== --- stable/8/sys/ufs/ffs/ffs_alloc.c Sat Mar 30 17:46:03 2013 (r248935) +++ stable/8/sys/ufs/ffs/ffs_alloc.c Sat Mar 30 20:57:35 2013 (r248936) @@ -1706,6 +1706,17 @@ fail: return (0); } +static inline struct buf * +getinobuf(struct inode *ip, u_int cg, u_int32_t cginoblk, int gbflags) +{ + struct fs *fs; + + fs = ip->i_fs; + return (getblk(ip->i_devvp, fsbtodb(fs, ino_to_fsba(fs, + cg * fs->fs_ipg + cginoblk)), (int)fs->fs_bsize, 0, 0, + gbflags)); +} + /* * Determine whether an inode can be allocated. * @@ -1729,9 +1740,11 @@ ffs_nodealloccg(ip, cg, ipref, mode) u_int8_t *inosused; struct ufs2_dinode *dp2; int error, start, len, loc, map, i; + u_int32_t old_initediblk; fs = ip->i_fs; ump = ip->i_ump; +check_nifree: if (fs->fs_cs(fs, cg).cs_nifree == 0) return (0); UFS_UNLOCK(ump); @@ -1743,13 +1756,13 @@ ffs_nodealloccg(ip, cg, ipref, mode) return (0); } cgp = (struct cg *)bp->b_data; +restart: if (!cg_chkmagic(cgp) || cgp->cg_cs.cs_nifree == 0) { brelse(bp); UFS_LOCK(ump); return (0); } bp->b_xflags |= BX_BKGRDWRITE; - cgp->cg_old_time = cgp->cg_time = time_second; inosused = cg_inosused(cgp); if (ipref) { ipref %= fs->fs_ipg; @@ -1777,26 +1790,83 @@ ffs_nodealloccg(ip, cg, ipref, mode) panic("ffs_nodealloccg: block not in map"); } ipref = i * NBBY + ffs(map) - 1; - cgp->cg_irotor = ipref; gotit: /* * Check to see if we need to initialize more inodes. */ - ibp = NULL; if (fs->fs_magic == FS_UFS2_MAGIC && ipref + INOPB(fs) > cgp->cg_initediblk && cgp->cg_initediblk < cgp->cg_niblk) { - ibp = getblk(ip->i_devvp, fsbtodb(fs, - ino_to_fsba(fs, cg * fs->fs_ipg + cgp->cg_initediblk)), - (int)fs->fs_bsize, 0, 0, 0); + old_initediblk = cgp->cg_initediblk; + + /* + * Free the cylinder group lock before writing the + * initialized inode block. Entering the + * babarrierwrite() with the cylinder group lock + * causes lock order violation between the lock and + * snaplk. + * + * Another thread can decide to initialize the same + * inode block, but whichever thread first gets the + * cylinder group lock after writing the newly + * allocated inode block will update it and the other + * will realize that it has lost and leave the + * cylinder group unchanged. + */ + ibp = getinobuf(ip, cg, old_initediblk, GB_LOCK_NOWAIT); + brelse(bp); + if (ibp == NULL) { + /* + * The inode block buffer is already owned by + * another thread, which must initialize it. + * Wait on the buffer to allow another thread + * to finish the updates, with dropped cg + * buffer lock, then retry. + */ + ibp = getinobuf(ip, cg, old_initediblk, 0); + brelse(ibp); + UFS_LOCK(ump); + goto check_nifree; + } bzero(ibp->b_data, (int)fs->fs_bsize); dp2 = (struct ufs2_dinode *)(ibp->b_data); for (i = 0; i < INOPB(fs); i++) { dp2->di_gen = arc4random() / 2 + 1; dp2++; } - cgp->cg_initediblk += INOPB(fs); + /* + * Rather than adding a soft updates dependency to ensure + * that the new inode block is written before it is claimed + * by the cylinder group map, we just do a barrier write + * here. The barrier write will ensure that the inode block + * gets written before the updated cylinder group map can be + * written. The barrier write should only slow down bulk + * loading of newly created filesystems. + */ + babarrierwrite(ibp); + + /* + * After the inode block is written, try to update the + * cg initediblk pointer. If another thread beat us + * to it, then leave it unchanged as the other thread + * has already set it correctly. + */ + error = bread(ip->i_devvp, fsbtodb(fs, cgtod(fs, cg)), + (int)fs->fs_cgsize, NOCRED, &bp); + UFS_LOCK(ump); + ACTIVECLEAR(fs, cg); + UFS_UNLOCK(ump); + if (error != 0) { + brelse(bp); + return (error); + } + cgp = (struct cg *)bp->b_data; + if (cgp->cg_initediblk == old_initediblk) + cgp->cg_initediblk += INOPB(fs); + goto restart; } + cgp->cg_old_time = cgp->cg_time = time_second; + cgp->cg_irotor = ipref; UFS_LOCK(ump); ACTIVECLEAR(fs, cg); setbit(inosused, ipref); @@ -1813,8 +1883,6 @@ gotit: if (DOINGSOFTDEP(ITOV(ip))) softdep_setup_inomapdep(bp, ip, cg * fs->fs_ipg + ipref); bdwrite(bp); - if (ibp != NULL) - bawrite(ibp); return ((ino_t)(cg * fs->fs_ipg + ipref)); }
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201303302057.r2UKvZWW065301>