From owner-svn-src-all@FreeBSD.ORG Tue Jan 7 01:32:27 2014 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 145E7200; Tue, 7 Jan 2014 01:32:27 +0000 (UTC) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id EFF98115B; Tue, 7 Jan 2014 01:32:26 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.7/8.14.7) with ESMTP id s071WQPB048315; Tue, 7 Jan 2014 01:32:26 GMT (envelope-from scottl@svn.freebsd.org) Received: (from scottl@localhost) by svn.freebsd.org (8.14.7/8.14.7/Submit) id s071WO8N048296; Tue, 7 Jan 2014 01:32:24 GMT (envelope-from scottl@svn.freebsd.org) Message-Id: <201401070132.s071WO8N048296@svn.freebsd.org> From: Scott Long Date: Tue, 7 Jan 2014 01:32:24 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org Subject: svn commit: r260385 - in stable/10/sys: cam/ata cam/scsi cddl/contrib/opensolaris/uts/common/fs/zfs dev/md geom geom/concat geom/gate geom/mirror geom/multipath geom/nop geom/part geom/raid geom/st... X-SVN-Group: stable-10 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jan 2014 01:32:27 -0000 Author: scottl Date: Tue Jan 7 01:32:23 2014 New Revision: 260385 URL: http://svnweb.freebsd.org/changeset/base/260385 Log: MFC Alexander Motin's GEOM direct dispatch work: r256603: Introduce new function devstat_end_transaction_bio_bt(), adding new argument to specify present time. Use this function to move binuptime() out of lock, substantially reducing lock congestion when slow timecounter is used. r256606: Move g_io_deliver() out of the lock, as required for direct dispatch. Move g_destroy_bio() out too to reduce lock scope even more. r256607: Fix passing uninitialized bio_resid argument to g_trace(). r256610: Add unmapped I/O support to GEOM RAID. r256830: Restore BIO_UNMAPPED and BIO_TRANSIENT_MAPPING in biodonne() when unmapping temporary mapped buffer. That fixes double unmap if biodone() called twice for the same BIO (but with different done methods). r256880: Merge GEOM direct dispatch changes from the projects/camlock branch. When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O. r259247: Fix bug introduced at r256607. We have to recalculate bp_resid here since sizes of original and completed requests may differ due to end of media. Testing of the stable/10 merge was done by Netflix, but all of the credit goes to Alexander and iX Systems. Submitted by: mav Sponsored by: iX Systems Modified: stable/10/sys/cam/ata/ata_da.c stable/10/sys/cam/scsi/scsi_da.c stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c stable/10/sys/dev/md/md.c stable/10/sys/geom/concat/g_concat.c stable/10/sys/geom/concat/g_concat.h stable/10/sys/geom/gate/g_gate.c stable/10/sys/geom/geom.h stable/10/sys/geom/geom_dev.c stable/10/sys/geom/geom_disk.c stable/10/sys/geom/geom_disk.h stable/10/sys/geom/geom_int.h stable/10/sys/geom/geom_io.c stable/10/sys/geom/geom_kern.c stable/10/sys/geom/geom_slice.c stable/10/sys/geom/geom_vfs.c stable/10/sys/geom/mirror/g_mirror.c stable/10/sys/geom/mirror/g_mirror.h stable/10/sys/geom/multipath/g_multipath.c stable/10/sys/geom/nop/g_nop.c stable/10/sys/geom/nop/g_nop.h stable/10/sys/geom/part/g_part.c stable/10/sys/geom/raid/g_raid.c stable/10/sys/geom/raid/g_raid.h stable/10/sys/geom/raid/md_ddf.c stable/10/sys/geom/raid/md_intel.c stable/10/sys/geom/raid/md_jmicron.c stable/10/sys/geom/raid/md_nvidia.c stable/10/sys/geom/raid/md_promise.c stable/10/sys/geom/raid/md_sii.c stable/10/sys/geom/raid/tr_concat.c stable/10/sys/geom/raid/tr_raid0.c stable/10/sys/geom/raid/tr_raid1.c stable/10/sys/geom/raid/tr_raid1e.c stable/10/sys/geom/raid/tr_raid5.c stable/10/sys/geom/stripe/g_stripe.c stable/10/sys/geom/stripe/g_stripe.h stable/10/sys/geom/zero/g_zero.c stable/10/sys/kern/subr_devstat.c stable/10/sys/kern/vfs_bio.c stable/10/sys/sys/devicestat.h stable/10/sys/sys/proc.h Directory Properties: stable/10/ (props changed) Modified: stable/10/sys/cam/ata/ata_da.c ============================================================================== --- stable/10/sys/cam/ata/ata_da.c Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/cam/ata/ata_da.c Tue Jan 7 01:32:23 2014 (r260385) @@ -1254,7 +1254,7 @@ adaregister(struct cam_periph *periph, v maxio = min(maxio, 256 * softc->params.secsize); softc->disk->d_maxsize = maxio; softc->disk->d_unit = periph->unit_number; - softc->disk->d_flags = 0; + softc->disk->d_flags = DISKFLAG_DIRECT_COMPLETION; if (softc->flags & ADA_FLAG_CAN_FLUSHCACHE) softc->disk->d_flags |= DISKFLAG_CANFLUSHCACHE; if (softc->flags & ADA_FLAG_CAN_TRIM) { Modified: stable/10/sys/cam/scsi/scsi_da.c ============================================================================== --- stable/10/sys/cam/scsi/scsi_da.c Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/cam/scsi/scsi_da.c Tue Jan 7 01:32:23 2014 (r260385) @@ -2133,7 +2133,7 @@ daregister(struct cam_periph *periph, vo else softc->disk->d_maxsize = cpi.maxio; softc->disk->d_unit = periph->unit_number; - softc->disk->d_flags = 0; + softc->disk->d_flags = DISKFLAG_DIRECT_COMPLETION; if ((softc->quirks & DA_Q_NO_SYNC_CACHE) == 0) softc->disk->d_flags |= DISKFLAG_CANFLUSHCACHE; if ((cpi.hba_misc & PIM_UNMAPPED) != 0) Modified: stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c ============================================================================== --- stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Tue Jan 7 01:32:23 2014 (r260385) @@ -147,6 +147,7 @@ vdev_geom_attach(struct g_provider *pp) ZFS_LOG(1, "Used existing consumer for %s.", pp->name); } } + cp->flags |= G_CF_DIRECT_SEND | G_CF_DIRECT_RECEIVE; return (cp); } Modified: stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c ============================================================================== --- stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c Tue Jan 7 01:32:23 2014 (r260385) @@ -2153,6 +2153,7 @@ zvol_geom_create(const char *name) gp->start = zvol_geom_start; gp->access = zvol_geom_access; pp = g_new_providerf(gp, "%s/%s", ZVOL_DRIVER, name); + pp->flags |= G_PF_DIRECT_RECEIVE | G_PF_DIRECT_SEND; pp->sectorsize = DEV_BSIZE; zv = kmem_zalloc(sizeof(*zv), KM_SLEEP); @@ -2256,18 +2257,20 @@ zvol_geom_start(struct bio *bp) zvol_state_t *zv; boolean_t first; + zv = bp->bio_to->private; + ASSERT(zv != NULL); switch (bp->bio_cmd) { + case BIO_FLUSH: + if (!THREAD_CAN_SLEEP()) + goto enqueue; + zil_commit(zv->zv_zilog, ZVOL_OBJ); + g_io_deliver(bp, 0); + break; case BIO_READ: case BIO_WRITE: - case BIO_FLUSH: - zv = bp->bio_to->private; - ASSERT(zv != NULL); - mtx_lock(&zv->zv_queue_mtx); - first = (bioq_first(&zv->zv_queue) == NULL); - bioq_insert_tail(&zv->zv_queue, bp); - mtx_unlock(&zv->zv_queue_mtx); - if (first) - wakeup_one(&zv->zv_queue); + if (!THREAD_CAN_SLEEP()) + goto enqueue; + zvol_strategy(bp); break; case BIO_GETATTR: case BIO_DELETE: @@ -2275,6 +2278,15 @@ zvol_geom_start(struct bio *bp) g_io_deliver(bp, EOPNOTSUPP); break; } + return; + +enqueue: + mtx_lock(&zv->zv_queue_mtx); + first = (bioq_first(&zv->zv_queue) == NULL); + bioq_insert_tail(&zv->zv_queue, bp); + mtx_unlock(&zv->zv_queue_mtx); + if (first) + wakeup_one(&zv->zv_queue); } static void @@ -2449,6 +2461,7 @@ zvol_rename_minor(struct g_geom *gp, con g_wither_provider(pp, ENXIO); pp = g_new_providerf(gp, "%s/%s", ZVOL_DRIVER, newname); + pp->flags |= G_PF_DIRECT_RECEIVE | G_PF_DIRECT_SEND; pp->sectorsize = DEV_BSIZE; pp->mediasize = zv->zv_volsize; pp->private = zv; Modified: stable/10/sys/dev/md/md.c ============================================================================== --- stable/10/sys/dev/md/md.c Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/dev/md/md.c Tue Jan 7 01:32:23 2014 (r260385) @@ -189,6 +189,7 @@ struct md_s { LIST_ENTRY(md_s) list; struct bio_queue_head bio_queue; struct mtx queue_mtx; + struct mtx stat_mtx; struct cdev *dev; enum md_types type; off_t mediasize; @@ -415,8 +416,11 @@ g_md_start(struct bio *bp) struct md_s *sc; sc = bp->bio_to->geom->softc; - if ((bp->bio_cmd == BIO_READ) || (bp->bio_cmd == BIO_WRITE)) + if ((bp->bio_cmd == BIO_READ) || (bp->bio_cmd == BIO_WRITE)) { + mtx_lock(&sc->stat_mtx); devstat_start_transaction_bio(sc->devstat, bp); + mtx_unlock(&sc->stat_mtx); + } mtx_lock(&sc->queue_mtx); bioq_disksort(&sc->bio_queue, bp); mtx_unlock(&sc->queue_mtx); @@ -987,6 +991,7 @@ mdnew(int unit, int *errp, enum md_types sc->type = type; bioq_init(&sc->bio_queue); mtx_init(&sc->queue_mtx, "md bio queue", NULL, MTX_DEF); + mtx_init(&sc->stat_mtx, "md stat", NULL, MTX_DEF); sc->unit = unit; sprintf(sc->name, "md%d", unit); LIST_INSERT_HEAD(&md_softc_list, sc, list); @@ -994,6 +999,7 @@ mdnew(int unit, int *errp, enum md_types if (error == 0) return (sc); LIST_REMOVE(sc, list); + mtx_destroy(&sc->stat_mtx); mtx_destroy(&sc->queue_mtx); free_unr(md_uh, sc->unit); free(sc, M_MD); @@ -1011,6 +1017,7 @@ mdinit(struct md_s *sc) gp = g_new_geomf(&g_md_class, "md%d", sc->unit); gp->softc = sc; pp = g_new_providerf(gp, "md%d", sc->unit); + pp->flags |= G_PF_DIRECT_SEND | G_PF_DIRECT_RECEIVE; pp->mediasize = sc->mediasize; pp->sectorsize = sc->sectorsize; switch (sc->type) { @@ -1206,6 +1213,7 @@ mddestroy(struct md_s *sc, struct thread while (!(sc->flags & MD_EXITING)) msleep(sc->procp, &sc->queue_mtx, PRIBIO, "mddestroy", hz / 10); mtx_unlock(&sc->queue_mtx); + mtx_destroy(&sc->stat_mtx); mtx_destroy(&sc->queue_mtx); if (sc->vnode != NULL) { vn_lock(sc->vnode, LK_EXCLUSIVE | LK_RETRY); Modified: stable/10/sys/geom/concat/g_concat.c ============================================================================== --- stable/10/sys/geom/concat/g_concat.c Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/geom/concat/g_concat.c Tue Jan 7 01:32:23 2014 (r260385) @@ -239,6 +239,27 @@ g_concat_kernel_dump(struct bio *bp) } static void +g_concat_done(struct bio *bp) +{ + struct g_concat_softc *sc; + struct bio *pbp; + + pbp = bp->bio_parent; + sc = pbp->bio_to->geom->softc; + mtx_lock(&sc->sc_lock); + if (pbp->bio_error == 0) + pbp->bio_error = bp->bio_error; + pbp->bio_completed += bp->bio_completed; + pbp->bio_inbed++; + if (pbp->bio_children == pbp->bio_inbed) { + mtx_unlock(&sc->sc_lock); + g_io_deliver(pbp, pbp->bio_error); + } else + mtx_unlock(&sc->sc_lock); + g_destroy_bio(bp); +} + +static void g_concat_flush(struct g_concat_softc *sc, struct bio *bp) { struct bio_queue_head queue; @@ -250,23 +271,19 @@ g_concat_flush(struct g_concat_softc *sc for (no = 0; no < sc->sc_ndisks; no++) { cbp = g_clone_bio(bp); if (cbp == NULL) { - for (cbp = bioq_first(&queue); cbp != NULL; - cbp = bioq_first(&queue)) { - bioq_remove(&queue, cbp); + while ((cbp = bioq_takefirst(&queue)) != NULL) g_destroy_bio(cbp); - } if (bp->bio_error == 0) bp->bio_error = ENOMEM; g_io_deliver(bp, bp->bio_error); return; } bioq_insert_tail(&queue, cbp); - cbp->bio_done = g_std_done; + cbp->bio_done = g_concat_done; cbp->bio_caller1 = sc->sc_disks[no].d_consumer; cbp->bio_to = sc->sc_disks[no].d_consumer->provider; } - for (cbp = bioq_first(&queue); cbp != NULL; cbp = bioq_first(&queue)) { - bioq_remove(&queue, cbp); + while ((cbp = bioq_takefirst(&queue)) != NULL) { G_CONCAT_LOGREQ(cbp, "Sending request."); cp = cbp->bio_caller1; cbp->bio_caller1 = NULL; @@ -320,7 +337,10 @@ g_concat_start(struct bio *bp) offset = bp->bio_offset; length = bp->bio_length; - addr = bp->bio_data; + if ((bp->bio_flags & BIO_UNMAPPED) != 0) + addr = NULL; + else + addr = bp->bio_data; end = offset + length; bioq_init(&queue); @@ -338,11 +358,8 @@ g_concat_start(struct bio *bp) cbp = g_clone_bio(bp); if (cbp == NULL) { - for (cbp = bioq_first(&queue); cbp != NULL; - cbp = bioq_first(&queue)) { - bioq_remove(&queue, cbp); + while ((cbp = bioq_takefirst(&queue)) != NULL) g_destroy_bio(cbp); - } if (bp->bio_error == 0) bp->bio_error = ENOMEM; g_io_deliver(bp, bp->bio_error); @@ -352,11 +369,21 @@ g_concat_start(struct bio *bp) /* * Fill in the component buf structure. */ - cbp->bio_done = g_std_done; + if (len == bp->bio_length) + cbp->bio_done = g_std_done; + else + cbp->bio_done = g_concat_done; cbp->bio_offset = off; - cbp->bio_data = addr; - addr += len; cbp->bio_length = len; + if ((bp->bio_flags & BIO_UNMAPPED) != 0) { + cbp->bio_ma_offset += (uintptr_t)addr; + cbp->bio_ma += cbp->bio_ma_offset / PAGE_SIZE; + cbp->bio_ma_offset %= PAGE_SIZE; + cbp->bio_ma_n = round_page(cbp->bio_ma_offset + + cbp->bio_length) / PAGE_SIZE; + } else + cbp->bio_data = addr; + addr += len; cbp->bio_to = disk->d_consumer->provider; cbp->bio_caller1 = disk; @@ -366,8 +393,7 @@ g_concat_start(struct bio *bp) KASSERT(length == 0, ("Length is still greater than 0 (class=%s, name=%s).", bp->bio_to->geom->class->name, bp->bio_to->geom->name)); - for (cbp = bioq_first(&queue); cbp != NULL; cbp = bioq_first(&queue)) { - bioq_remove(&queue, cbp); + while ((cbp = bioq_takefirst(&queue)) != NULL) { G_CONCAT_LOGREQ(cbp, "Sending request."); disk = cbp->bio_caller1; cbp->bio_caller1 = NULL; @@ -379,7 +405,7 @@ static void g_concat_check_and_run(struct g_concat_softc *sc) { struct g_concat_disk *disk; - struct g_provider *pp; + struct g_provider *dp, *pp; u_int no, sectorsize = 0; off_t start; @@ -388,20 +414,27 @@ g_concat_check_and_run(struct g_concat_s return; pp = g_new_providerf(sc->sc_geom, "concat/%s", sc->sc_name); + pp->flags |= G_PF_DIRECT_SEND | G_PF_DIRECT_RECEIVE | + G_PF_ACCEPT_UNMAPPED; start = 0; for (no = 0; no < sc->sc_ndisks; no++) { disk = &sc->sc_disks[no]; + dp = disk->d_consumer->provider; disk->d_start = start; - disk->d_end = disk->d_start + - disk->d_consumer->provider->mediasize; + disk->d_end = disk->d_start + dp->mediasize; if (sc->sc_type == G_CONCAT_TYPE_AUTOMATIC) - disk->d_end -= disk->d_consumer->provider->sectorsize; + disk->d_end -= dp->sectorsize; start = disk->d_end; if (no == 0) - sectorsize = disk->d_consumer->provider->sectorsize; - else { - sectorsize = lcm(sectorsize, - disk->d_consumer->provider->sectorsize); + sectorsize = dp->sectorsize; + else + sectorsize = lcm(sectorsize, dp->sectorsize); + + /* A provider underneath us doesn't support unmapped */ + if ((dp->flags & G_PF_ACCEPT_UNMAPPED) == 0) { + G_CONCAT_DEBUG(1, "Cancelling unmapped " + "because of %s.", dp->name); + pp->flags &= ~G_PF_ACCEPT_UNMAPPED; } } pp->sectorsize = sectorsize; @@ -468,6 +501,7 @@ g_concat_add_disk(struct g_concat_softc fcp = LIST_FIRST(&gp->consumer); cp = g_new_consumer(gp); + cp->flags |= G_CF_DIRECT_SEND | G_CF_DIRECT_RECEIVE; error = g_attach(cp, pp); if (error != 0) { g_destroy_consumer(cp); @@ -557,6 +591,7 @@ g_concat_create(struct g_class *mp, cons for (no = 0; no < sc->sc_ndisks; no++) sc->sc_disks[no].d_consumer = NULL; sc->sc_type = type; + mtx_init(&sc->sc_lock, "gconcat lock", NULL, MTX_DEF); gp->softc = sc; sc->sc_geom = gp; @@ -605,6 +640,7 @@ g_concat_destroy(struct g_concat_softc * KASSERT(sc->sc_provider == NULL, ("Provider still exists? (device=%s)", gp->name)); free(sc->sc_disks, M_CONCAT); + mtx_destroy(&sc->sc_lock); free(sc, M_CONCAT); G_CONCAT_DEBUG(0, "Device %s destroyed.", gp->name); Modified: stable/10/sys/geom/concat/g_concat.h ============================================================================== --- stable/10/sys/geom/concat/g_concat.h Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/geom/concat/g_concat.h Tue Jan 7 01:32:23 2014 (r260385) @@ -83,6 +83,7 @@ struct g_concat_softc { struct g_concat_disk *sc_disks; uint16_t sc_ndisks; + struct mtx sc_lock; }; #define sc_name sc_geom->name #endif /* _KERNEL */ Modified: stable/10/sys/geom/gate/g_gate.c ============================================================================== --- stable/10/sys/geom/gate/g_gate.c Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/geom/gate/g_gate.c Tue Jan 7 01:32:23 2014 (r260385) @@ -91,6 +91,7 @@ static struct mtx g_gate_units_lock; static int g_gate_destroy(struct g_gate_softc *sc, boolean_t force) { + struct bio_queue_head queue; struct g_provider *pp; struct g_consumer *cp; struct g_geom *gp; @@ -113,21 +114,22 @@ g_gate_destroy(struct g_gate_softc *sc, pp->flags |= G_PF_WITHER; g_orphan_provider(pp, ENXIO); callout_drain(&sc->sc_callout); + bioq_init(&queue); mtx_lock(&sc->sc_queue_mtx); - while ((bp = bioq_first(&sc->sc_inqueue)) != NULL) { - bioq_remove(&sc->sc_inqueue, bp); + while ((bp = bioq_takefirst(&sc->sc_inqueue)) != NULL) { sc->sc_queue_count--; - G_GATE_LOGREQ(1, bp, "Request canceled."); - g_io_deliver(bp, ENXIO); + bioq_insert_tail(&queue, bp); } - while ((bp = bioq_first(&sc->sc_outqueue)) != NULL) { - bioq_remove(&sc->sc_outqueue, bp); + while ((bp = bioq_takefirst(&sc->sc_outqueue)) != NULL) { sc->sc_queue_count--; - G_GATE_LOGREQ(1, bp, "Request canceled."); - g_io_deliver(bp, ENXIO); + bioq_insert_tail(&queue, bp); } mtx_unlock(&sc->sc_queue_mtx); g_topology_unlock(); + while ((bp = bioq_takefirst(&queue)) != NULL) { + G_GATE_LOGREQ(1, bp, "Request canceled."); + g_io_deliver(bp, ENXIO); + } mtx_lock(&g_gate_units_lock); /* One reference is ours. */ sc->sc_ref--; @@ -334,6 +336,7 @@ g_gate_getunit(int unit, int *errorp) static void g_gate_guard(void *arg) { + struct bio_queue_head queue; struct g_gate_softc *sc; struct bintime curtime; struct bio *bp, *bp2; @@ -341,24 +344,27 @@ g_gate_guard(void *arg) sc = arg; binuptime(&curtime); g_gate_hold(sc->sc_unit, NULL); + bioq_init(&queue); mtx_lock(&sc->sc_queue_mtx); TAILQ_FOREACH_SAFE(bp, &sc->sc_inqueue.queue, bio_queue, bp2) { if (curtime.sec - bp->bio_t0.sec < 5) continue; bioq_remove(&sc->sc_inqueue, bp); sc->sc_queue_count--; - G_GATE_LOGREQ(1, bp, "Request timeout."); - g_io_deliver(bp, EIO); + bioq_insert_tail(&queue, bp); } TAILQ_FOREACH_SAFE(bp, &sc->sc_outqueue.queue, bio_queue, bp2) { if (curtime.sec - bp->bio_t0.sec < 5) continue; bioq_remove(&sc->sc_outqueue, bp); sc->sc_queue_count--; + bioq_insert_tail(&queue, bp); + } + mtx_unlock(&sc->sc_queue_mtx); + while ((bp = bioq_takefirst(&queue)) != NULL) { G_GATE_LOGREQ(1, bp, "Request timeout."); g_io_deliver(bp, EIO); } - mtx_unlock(&sc->sc_queue_mtx); if ((sc->sc_flags & G_GATE_FLAG_DESTROY) == 0) { callout_reset(&sc->sc_callout, sc->sc_timeout * hz, g_gate_guard, sc); @@ -542,6 +548,7 @@ g_gate_create(struct g_gate_ctl_create * if (ropp != NULL) { cp = g_new_consumer(gp); + cp->flags |= G_CF_DIRECT_SEND | G_CF_DIRECT_RECEIVE; error = g_attach(cp, ropp); if (error != 0) { G_GATE_DEBUG(1, "Unable to attach to %s.", ropp->name); @@ -560,6 +567,7 @@ g_gate_create(struct g_gate_ctl_create * ggio->gctl_unit = sc->sc_unit; pp = g_new_providerf(gp, "%s", name); + pp->flags |= G_PF_DIRECT_SEND | G_PF_DIRECT_RECEIVE; pp->mediasize = ggio->gctl_mediasize; pp->sectorsize = ggio->gctl_sectorsize; sc->sc_provider = pp; @@ -636,6 +644,7 @@ g_gate_modify(struct g_gate_softc *sc, s return (EINVAL); } cp = g_new_consumer(sc->sc_provider->geom); + cp->flags |= G_CF_DIRECT_SEND | G_CF_DIRECT_RECEIVE; error = g_attach(cp, pp); if (error != 0) { G_GATE_DEBUG(1, "Unable to attach to %s.", Modified: stable/10/sys/geom/geom.h ============================================================================== --- stable/10/sys/geom/geom.h Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/geom/geom.h Tue Jan 7 01:32:23 2014 (r260385) @@ -177,6 +177,8 @@ struct g_consumer { int flags; #define G_CF_SPOILED 0x1 #define G_CF_ORPHAN 0x4 +#define G_CF_DIRECT_SEND 0x10 +#define G_CF_DIRECT_RECEIVE 0x20 struct devstat *stat; u_int nstart, nend; @@ -206,6 +208,8 @@ struct g_provider { #define G_PF_WITHER 0x2 #define G_PF_ORPHAN 0x4 #define G_PF_ACCEPT_UNMAPPED 0x8 +#define G_PF_DIRECT_SEND 0x10 +#define G_PF_DIRECT_RECEIVE 0x20 /* Two fields for the implementing class to use */ void *private; @@ -393,6 +397,8 @@ g_free(void *ptr) }; \ DECLARE_MODULE(name, name##_mod, SI_SUB_DRIVERS, SI_ORDER_FIRST); +int g_is_geom_thread(struct thread *td); + #endif /* _KERNEL */ /* geom_ctl.c */ Modified: stable/10/sys/geom/geom_dev.c ============================================================================== --- stable/10/sys/geom/geom_dev.c Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/geom/geom_dev.c Tue Jan 7 01:32:23 2014 (r260385) @@ -222,6 +222,7 @@ g_dev_taste(struct g_class *mp, struct g mtx_init(&sc->sc_mtx, "g_dev", NULL, MTX_DEF); cp = g_new_consumer(gp); cp->private = sc; + cp->flags |= G_CF_DIRECT_SEND | G_CF_DIRECT_RECEIVE; error = g_attach(cp, pp); KASSERT(error == 0, ("g_dev_taste(%s) failed to g_attach, err=%d", pp->name, error)); @@ -485,16 +486,16 @@ g_dev_done(struct bio *bp2) sc = cp->private; bp = bp2->bio_parent; bp->bio_error = bp2->bio_error; - if (bp->bio_error != 0) { + bp->bio_completed = bp2->bio_completed; + bp->bio_resid = bp->bio_length - bp2->bio_completed; + if (bp2->bio_error != 0) { g_trace(G_T_BIO, "g_dev_done(%p) had error %d", - bp2, bp->bio_error); + bp2, bp2->bio_error); bp->bio_flags |= BIO_ERROR; } else { g_trace(G_T_BIO, "g_dev_done(%p/%p) resid %ld completed %jd", - bp2, bp, bp->bio_resid, (intmax_t)bp2->bio_completed); + bp2, bp, bp2->bio_resid, (intmax_t)bp2->bio_completed); } - bp->bio_resid = bp->bio_length - bp2->bio_completed; - bp->bio_completed = bp2->bio_completed; g_destroy_bio(bp2); destroy = 0; mtx_lock(&sc->sc_mtx); Modified: stable/10/sys/geom/geom_disk.c ============================================================================== --- stable/10/sys/geom/geom_disk.c Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/geom/geom_disk.c Tue Jan 7 01:32:23 2014 (r260385) @@ -66,6 +66,7 @@ struct g_disk_softc { struct sysctl_oid *sysctl_tree; char led[64]; uint32_t state; + struct mtx start_mtx; }; static g_access_t g_disk_access; @@ -229,6 +230,7 @@ g_disk_setstate(struct bio *bp, struct g static void g_disk_done(struct bio *bp) { + struct bintime now; struct bio *bp2; struct g_disk_softc *sc; @@ -237,19 +239,40 @@ g_disk_done(struct bio *bp) bp2 = bp->bio_parent; sc = bp2->bio_to->private; bp->bio_completed = bp->bio_length - bp->bio_resid; + binuptime(&now); mtx_lock(&sc->done_mtx); if (bp2->bio_error == 0) bp2->bio_error = bp->bio_error; bp2->bio_completed += bp->bio_completed; if ((bp->bio_cmd & (BIO_READ|BIO_WRITE|BIO_DELETE)) != 0) - devstat_end_transaction_bio(sc->dp->d_devstat, bp); - g_destroy_bio(bp); + devstat_end_transaction_bio_bt(sc->dp->d_devstat, bp, &now); bp2->bio_inbed++; if (bp2->bio_children == bp2->bio_inbed) { + mtx_unlock(&sc->done_mtx); bp2->bio_resid = bp2->bio_bcount - bp2->bio_completed; g_io_deliver(bp2, bp2->bio_error); + } else + mtx_unlock(&sc->done_mtx); + g_destroy_bio(bp); +} + +static void +g_disk_done_single(struct bio *bp) +{ + struct bintime now; + struct g_disk_softc *sc; + + bp->bio_completed = bp->bio_length - bp->bio_resid; + bp->bio_done = (void *)bp->bio_to; + bp->bio_to = LIST_FIRST(&bp->bio_disk->d_geom->provider); + if ((bp->bio_cmd & (BIO_READ|BIO_WRITE|BIO_DELETE)) != 0) { + binuptime(&now); + sc = bp->bio_to->private; + mtx_lock(&sc->done_mtx); + devstat_end_transaction_bio_bt(sc->dp->d_devstat, bp, &now); + mtx_unlock(&sc->done_mtx); } - mtx_unlock(&sc->done_mtx); + g_io_deliver(bp, bp->bio_error); } static int @@ -277,7 +300,7 @@ g_disk_start(struct bio *bp) struct disk *dp; struct g_disk_softc *sc; int error; - off_t off; + off_t d_maxsize, off; sc = bp->bio_to->private; if (sc == NULL || (dp = sc->dp) == NULL || dp->d_destroyed) { @@ -294,6 +317,22 @@ g_disk_start(struct bio *bp) /* fall-through */ case BIO_READ: case BIO_WRITE: + d_maxsize = (bp->bio_cmd == BIO_DELETE) ? + dp->d_delmaxsize : dp->d_maxsize; + if (bp->bio_length <= d_maxsize) { + bp->bio_disk = dp; + bp->bio_to = (void *)bp->bio_done; + bp->bio_done = g_disk_done_single; + bp->bio_pblkno = bp->bio_offset / dp->d_sectorsize; + bp->bio_bcount = bp->bio_length; + mtx_lock(&sc->start_mtx); + devstat_start_transaction_bio(dp->d_devstat, bp); + mtx_unlock(&sc->start_mtx); + g_disk_lock_giant(dp); + dp->d_strategy(bp); + g_disk_unlock_giant(dp); + break; + } off = 0; bp3 = NULL; bp2 = g_clone_bio(bp); @@ -302,10 +341,6 @@ g_disk_start(struct bio *bp) break; } do { - off_t d_maxsize; - - d_maxsize = (bp->bio_cmd == BIO_DELETE) ? - dp->d_delmaxsize : dp->d_maxsize; bp2->bio_offset += off; bp2->bio_length -= off; if ((bp->bio_flags & BIO_UNMAPPED) == 0) { @@ -346,7 +381,9 @@ g_disk_start(struct bio *bp) bp2->bio_pblkno = bp2->bio_offset / dp->d_sectorsize; bp2->bio_bcount = bp2->bio_length; bp2->bio_disk = dp; + mtx_lock(&sc->start_mtx); devstat_start_transaction_bio(dp->d_devstat, bp2); + mtx_unlock(&sc->start_mtx); g_disk_lock_giant(dp); dp->d_strategy(bp2); g_disk_unlock_giant(dp); @@ -402,15 +439,11 @@ g_disk_start(struct bio *bp) error = EOPNOTSUPP; break; } - bp2 = g_clone_bio(bp); - if (bp2 == NULL) { - g_io_deliver(bp, ENOMEM); - return; - } - bp2->bio_done = g_disk_done; - bp2->bio_disk = dp; + bp->bio_disk = dp; + bp->bio_to = (void *)bp->bio_done; + bp->bio_done = g_disk_done_single; g_disk_lock_giant(dp); - dp->d_strategy(bp2); + dp->d_strategy(bp); g_disk_unlock_giant(dp); break; default: @@ -515,17 +548,24 @@ g_disk_create(void *arg, int flag) g_topology_assert(); dp = arg; sc = g_malloc(sizeof(*sc), M_WAITOK | M_ZERO); + mtx_init(&sc->start_mtx, "g_disk_start", NULL, MTX_DEF); mtx_init(&sc->done_mtx, "g_disk_done", NULL, MTX_DEF); sc->dp = dp; gp = g_new_geomf(&g_disk_class, "%s%d", dp->d_name, dp->d_unit); gp->softc = sc; pp = g_new_providerf(gp, "%s", gp->name); + devstat_remove_entry(pp->stat); + pp->stat = NULL; + dp->d_devstat->id = pp; pp->mediasize = dp->d_mediasize; pp->sectorsize = dp->d_sectorsize; pp->stripeoffset = dp->d_stripeoffset; pp->stripesize = dp->d_stripesize; if ((dp->d_flags & DISKFLAG_UNMAPPED_BIO) != 0) pp->flags |= G_PF_ACCEPT_UNMAPPED; + if ((dp->d_flags & DISKFLAG_DIRECT_COMPLETION) != 0) + pp->flags |= G_PF_DIRECT_SEND; + pp->flags |= G_PF_DIRECT_RECEIVE; if (bootverbose) printf("GEOM: new disk %s\n", gp->name); sysctl_ctx_init(&sc->sysctl_ctx); @@ -574,6 +614,7 @@ g_disk_providergone(struct g_provider *p pp->private = NULL; pp->geom->softc = NULL; mtx_destroy(&sc->done_mtx); + mtx_destroy(&sc->start_mtx); g_free(sc); } Modified: stable/10/sys/geom/geom_disk.h ============================================================================== --- stable/10/sys/geom/geom_disk.h Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/geom/geom_disk.h Tue Jan 7 01:32:23 2014 (r260385) @@ -107,6 +107,7 @@ struct disk { #define DISKFLAG_CANDELETE 0x4 #define DISKFLAG_CANFLUSHCACHE 0x8 #define DISKFLAG_UNMAPPED_BIO 0x10 +#define DISKFLAG_DIRECT_COMPLETION 0x20 struct disk *disk_alloc(void); void disk_create(struct disk *disk, int version); Modified: stable/10/sys/geom/geom_int.h ============================================================================== --- stable/10/sys/geom/geom_int.h Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/geom/geom_int.h Tue Jan 7 01:32:23 2014 (r260385) @@ -39,6 +39,9 @@ LIST_HEAD(class_list_head, g_class); TAILQ_HEAD(g_tailq_head, g_geom); extern int g_collectstats; +#define G_STATS_PROVIDERS 1 /* Collect I/O stats for providers */ +#define G_STATS_CONSUMERS 2 /* Collect I/O stats for consumers */ + extern int g_debugflags; /* * 1 G_T_TOPOLOGY Modified: stable/10/sys/geom/geom_io.c ============================================================================== --- stable/10/sys/geom/geom_io.c Tue Jan 7 01:17:27 2014 (r260384) +++ stable/10/sys/geom/geom_io.c Tue Jan 7 01:32:23 2014 (r260385) @@ -65,6 +65,8 @@ __FBSDID("$FreeBSD$"); #include #include +static int g_io_transient_map_bio(struct bio *bp); + static struct g_bioq g_bio_run_down; static struct g_bioq g_bio_run_up; static struct g_bioq g_bio_run_task; @@ -310,6 +312,8 @@ g_io_check(struct bio *bp) { struct g_consumer *cp; struct g_provider *pp; + off_t excess; + int error; cp = bp->bio_from; pp = bp->bio_to; @@ -354,11 +358,44 @@ g_io_check(struct bio *bp) return (EIO); if (bp->bio_offset > pp->mediasize) return (EIO); + + /* Truncate requests to the end of providers media. */ + excess = bp->bio_offset + bp->bio_length; + if (excess > bp->bio_to->mediasize) { + KASSERT((bp->bio_flags & BIO_UNMAPPED) == 0 || + round_page(bp->bio_ma_offset + + bp->bio_length) / PAGE_SIZE == bp->bio_ma_n, + ("excess bio %p too short", bp)); + excess -= bp->bio_to->mediasize; + bp->bio_length -= excess; + if ((bp->bio_flags & BIO_UNMAPPED) != 0) { + bp->bio_ma_n = round_page(bp->bio_ma_offset + + bp->bio_length) / PAGE_SIZE; + } + if (excess > 0) + CTR3(KTR_GEOM, "g_down truncated bio " + "%p provider %s by %d", bp, + bp->bio_to->name, excess); + } + + /* Deliver zero length transfers right here. */ + if (bp->bio_length == 0) { + CTR2(KTR_GEOM, "g_down terminated 0-length " + "bp %p provider %s", bp, bp->bio_to->name); + return (0); + } + + if ((bp->bio_flags & BIO_UNMAPPED) != 0 && + (bp->bio_to->flags & G_PF_ACCEPT_UNMAPPED) == 0 && + (bp->bio_cmd == BIO_READ || bp->bio_cmd == BIO_WRITE)) { + if ((error = g_io_transient_map_bio(bp)) >= 0) + return (error); + } break; default: break; } - return (0); + return (EJUSTRETURN); } /* @@ -422,7 +459,8 @@ void g_io_request(struct bio *bp, struct g_consumer *cp) { struct g_provider *pp; - int first; + struct mtx *mtxp; + int direct, error, first; KASSERT(cp != NULL, ("NULL cp in g_io_request")); KASSERT(bp != NULL, ("NULL bp in g_io_request")); @@ -472,48 +510,81 @@ g_io_request(struct bio *bp, struct g_co KASSERT(!(bp->bio_flags & BIO_ONQUEUE), ("Bio already on queue bp=%p", bp)); - bp->bio_flags |= BIO_ONQUEUE; - - if (g_collectstats) + if ((g_collectstats & G_STATS_CONSUMERS) != 0 || + ((g_collectstats & G_STATS_PROVIDERS) != 0 && pp->stat != NULL)) binuptime(&bp->bio_t0); else getbinuptime(&bp->bio_t0); +#ifdef GET_STACK_USAGE + direct = (cp->flags & G_CF_DIRECT_SEND) && + (pp->flags & G_PF_DIRECT_RECEIVE) && + !g_is_geom_thread(curthread) && + (((pp->flags & G_PF_ACCEPT_UNMAPPED) == 0 && + (bp->bio_flags & BIO_UNMAPPED) != 0) || THREAD_CAN_SLEEP()); + if (direct) { + /* Block direct execution if less then half of stack left. */ + size_t st, su; + GET_STACK_USAGE(st, su); + if (su * 2 > st) + direct = 0; + } +#else + direct = 0; +#endif + + if (!TAILQ_EMPTY(&g_classifier_tailq) && !bp->bio_classifier1) { + g_bioq_lock(&g_bio_run_down); + g_run_classifiers(bp); + g_bioq_unlock(&g_bio_run_down); + } + /* * The statistics collection is lockless, as such, but we * can not update one instance of the statistics from more * than one thread at a time, so grab the lock first. - * - * We also use the lock to protect the list of classifiers. */ - g_bioq_lock(&g_bio_run_down); - - if (!TAILQ_EMPTY(&g_classifier_tailq) && !bp->bio_classifier1) - g_run_classifiers(bp); - - if (g_collectstats & 1) + mtxp = mtx_pool_find(mtxpool_sleep, pp); + mtx_lock(mtxp); + if (g_collectstats & G_STATS_PROVIDERS) devstat_start_transaction(pp->stat, &bp->bio_t0); - if (g_collectstats & 2) + if (g_collectstats & G_STATS_CONSUMERS) devstat_start_transaction(cp->stat, &bp->bio_t0); - pp->nstart++; cp->nstart++; - first = TAILQ_EMPTY(&g_bio_run_down.bio_queue); - TAILQ_INSERT_TAIL(&g_bio_run_down.bio_queue, bp, bio_queue); - g_bio_run_down.bio_queue_length++; - g_bioq_unlock(&g_bio_run_down); + mtx_unlock(mtxp); - /* Pass it on down. */ - if (first) - wakeup(&g_wait_down); + if (direct) { + error = g_io_check(bp); + if (error >= 0) { + CTR3(KTR_GEOM, "g_io_request g_io_check on bp %p " + "provider %s returned %d", bp, bp->bio_to->name, + error); + g_io_deliver(bp, error); + return; + } + bp->bio_to->geom->start(bp); + } else { + g_bioq_lock(&g_bio_run_down); + first = TAILQ_EMPTY(&g_bio_run_down.bio_queue); + TAILQ_INSERT_TAIL(&g_bio_run_down.bio_queue, bp, bio_queue); + bp->bio_flags |= BIO_ONQUEUE; + g_bio_run_down.bio_queue_length++; + g_bioq_unlock(&g_bio_run_down); + /* Pass it on down. */ + if (first) + wakeup(&g_wait_down); + } } void g_io_deliver(struct bio *bp, int error) { + struct bintime now; struct g_consumer *cp; struct g_provider *pp; - int first; + struct mtx *mtxp; + int direct, first; KASSERT(bp != NULL, ("NULL bp in g_io_deliver")); pp = bp->bio_to; @@ -559,31 +630,55 @@ g_io_deliver(struct bio *bp, int error) bp->bio_bcount = bp->bio_length; bp->bio_resid = bp->bio_bcount - bp->bio_completed; +#ifdef GET_STACK_USAGE + direct = (pp->flags & G_PF_DIRECT_SEND) && + (cp->flags & G_CF_DIRECT_RECEIVE) && + !g_is_geom_thread(curthread); + if (direct) { + /* Block direct execution if less then half of stack left. */ + size_t st, su; + GET_STACK_USAGE(st, su); + if (su * 2 > st) + direct = 0; + } +#else + direct = 0; +#endif + /* * The statistics collection is lockless, as such, but we * can not update one instance of the statistics from more * than one thread at a time, so grab the lock first. */ - g_bioq_lock(&g_bio_run_up); - if (g_collectstats & 1) - devstat_end_transaction_bio(pp->stat, bp); - if (g_collectstats & 2) - devstat_end_transaction_bio(cp->stat, bp); - + if ((g_collectstats & G_STATS_CONSUMERS) != 0 || + ((g_collectstats & G_STATS_PROVIDERS) != 0 && pp->stat != NULL)) + binuptime(&now); + mtxp = mtx_pool_find(mtxpool_sleep, cp); + mtx_lock(mtxp); + if (g_collectstats & G_STATS_PROVIDERS) + devstat_end_transaction_bio_bt(pp->stat, bp, &now); + if (g_collectstats & G_STATS_CONSUMERS) + devstat_end_transaction_bio_bt(cp->stat, bp, &now); cp->nend++; pp->nend++; + mtx_unlock(mtxp); + if (error != ENOMEM) { bp->bio_error = error; - first = TAILQ_EMPTY(&g_bio_run_up.bio_queue); - TAILQ_INSERT_TAIL(&g_bio_run_up.bio_queue, bp, bio_queue); - bp->bio_flags |= BIO_ONQUEUE; - g_bio_run_up.bio_queue_length++; - g_bioq_unlock(&g_bio_run_up); - if (first) - wakeup(&g_wait_up); + if (direct) { + biodone(bp); + } else { + g_bioq_lock(&g_bio_run_up); + first = TAILQ_EMPTY(&g_bio_run_up.bio_queue); + TAILQ_INSERT_TAIL(&g_bio_run_up.bio_queue, bp, bio_queue); + bp->bio_flags |= BIO_ONQUEUE; + g_bio_run_up.bio_queue_length++; + g_bioq_unlock(&g_bio_run_up); + if (first) + wakeup(&g_wait_up); + } return; } - g_bioq_unlock(&g_bio_run_up); if (bootverbose) printf("ENOMEM %p on %p(%s)\n", bp, pp, pp->name); @@ -639,11 +734,10 @@ retry: if (vmem_alloc(transient_arena, size, M_BESTFIT | M_NOWAIT, &addr)) { if (transient_map_retries != 0 && retried >= transient_map_retries) { - g_io_deliver(bp, EDEADLK/* XXXKIB */); CTR2(KTR_GEOM, "g_down cannot map bp %p provider %s", bp, bp->bio_to->name); atomic_add_int(&transient_map_hard_failures, 1); - return (1); + return (EDEADLK/* XXXKIB */); } else { /* * Naive attempt to quisce the I/O to get more @@ -663,14 +757,13 @@ retry: bp->bio_data = (caddr_t)addr + bp->bio_ma_offset; bp->bio_flags |= BIO_TRANSIENT_MAPPING; bp->bio_flags &= ~BIO_UNMAPPED; - return (0); + return (EJUSTRETURN); } void g_io_schedule_down(struct thread *tp __unused) { struct bio *bp; - off_t excess; int error; for(;;) { @@ -689,59 +782,15 @@ g_io_schedule_down(struct thread *tp __u pause("g_down", hz/10); pace--; } + CTR2(KTR_GEOM, "g_down processing bp %p provider %s", bp, + bp->bio_to->name); error = g_io_check(bp); - if (error) { + if (error >= 0) { CTR3(KTR_GEOM, "g_down g_io_check on bp %p provider " "%s returned %d", bp, bp->bio_to->name, error); g_io_deliver(bp, error); continue; } - CTR2(KTR_GEOM, "g_down processing bp %p provider %s", bp, - bp->bio_to->name); - switch (bp->bio_cmd) { - case BIO_READ: - case BIO_WRITE: - case BIO_DELETE: - /* Truncate requests to the end of providers media. */ *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***