From owner-svn-src-all@FreeBSD.ORG Tue Oct 22 08:22:23 2013 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 8C315ECC; Tue, 22 Oct 2013 08:22:23 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 76D75279D; Tue, 22 Oct 2013 08:22:23 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.7/8.14.7) with ESMTP id r9M8MNjC032129; Tue, 22 Oct 2013 08:22:23 GMT (envelope-from mav@svn.freebsd.org) Received: (from mav@localhost) by svn.freebsd.org (8.14.7/8.14.5/Submit) id r9M8MKXR032099; Tue, 22 Oct 2013 08:22:20 GMT (envelope-from mav@svn.freebsd.org) Message-Id: <201310220822.r9M8MKXR032099@svn.freebsd.org> From: Alexander Motin Date: Tue, 22 Oct 2013 08:22:20 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r256880 - in head/sys: cam/ata cam/scsi cddl/contrib/opensolaris/uts/common/fs/zfs dev/md geom geom/concat geom/gate geom/mirror geom/multipath geom/nop geom/part geom/raid geom/stripe ... X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Oct 2013 08:22:23 -0000 Author: mav Date: Tue Oct 22 08:22:19 2013 New Revision: 256880 URL: http://svnweb.freebsd.org/changeset/base/256880 Log: Merge GEOM direct dispatch changes from the projects/camlock branch. When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O. The defined now safety requirements are: - caller should not hold any locks and should be reenterable; - callee should not depend on GEOM dual-threaded concurency semantics; - on the way down, if request is unmapped while callee doesn't support it, the context should be sleepable; - kernel thread stack usage should be below 50%. To keep compatibility with GEOM classes not meeting above requirements new provider and consumer flags added: - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request); - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done); - G_PF_DIRECT_SEND -- provider code meets caller requirements (done); - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request). Capable GEOM class can set them, allowing direct dispatch in cases where it is safe. If any of requirements are not met, request is queued to g_up or g_down thread same as before. Such GEOM classes were reviewed and updated to support direct dispatch: CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE, VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL, MAP, FLASHMAP, etc). To declare direct completion capability disk(9) KPI got new flag equivalent to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk drivers got it set now thanks to earlier CAM locking work. This change more then twice increases peak block storage performance on systems with manu CPUs, together with earlier CAM locking changes reaching more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to 256 user-level threads). Sponsored by: iXsystems, Inc. MFC after: 2 months Modified: head/sys/cam/ata/ata_da.c head/sys/cam/scsi/scsi_da.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c head/sys/dev/md/md.c head/sys/geom/concat/g_concat.c head/sys/geom/concat/g_concat.h head/sys/geom/gate/g_gate.c head/sys/geom/geom.h head/sys/geom/geom_dev.c head/sys/geom/geom_disk.c head/sys/geom/geom_disk.h head/sys/geom/geom_int.h head/sys/geom/geom_io.c head/sys/geom/geom_kern.c head/sys/geom/geom_slice.c head/sys/geom/geom_vfs.c head/sys/geom/mirror/g_mirror.c head/sys/geom/mirror/g_mirror.h head/sys/geom/multipath/g_multipath.c head/sys/geom/nop/g_nop.c head/sys/geom/nop/g_nop.h head/sys/geom/part/g_part.c head/sys/geom/raid/g_raid.c head/sys/geom/raid/md_ddf.c head/sys/geom/raid/md_intel.c head/sys/geom/raid/md_jmicron.c head/sys/geom/raid/md_nvidia.c head/sys/geom/raid/md_promise.c head/sys/geom/raid/md_sii.c head/sys/geom/stripe/g_stripe.c head/sys/geom/stripe/g_stripe.h head/sys/geom/zero/g_zero.c head/sys/kern/subr_devstat.c head/sys/sys/proc.h Modified: head/sys/cam/ata/ata_da.c ============================================================================== --- head/sys/cam/ata/ata_da.c Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/cam/ata/ata_da.c Tue Oct 22 08:22:19 2013 (r256880) @@ -1253,7 +1253,7 @@ adaregister(struct cam_periph *periph, v maxio = min(maxio, 256 * softc->params.secsize); softc->disk->d_maxsize = maxio; softc->disk->d_unit = periph->unit_number; - softc->disk->d_flags = 0; + softc->disk->d_flags = DISKFLAG_DIRECT_COMPLETION; if (softc->flags & ADA_FLAG_CAN_FLUSHCACHE) softc->disk->d_flags |= DISKFLAG_CANFLUSHCACHE; if (softc->flags & ADA_FLAG_CAN_TRIM) { Modified: head/sys/cam/scsi/scsi_da.c ============================================================================== --- head/sys/cam/scsi/scsi_da.c Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/cam/scsi/scsi_da.c Tue Oct 22 08:22:19 2013 (r256880) @@ -2125,7 +2125,7 @@ daregister(struct cam_periph *periph, vo else softc->disk->d_maxsize = cpi.maxio; softc->disk->d_unit = periph->unit_number; - softc->disk->d_flags = 0; + softc->disk->d_flags = DISKFLAG_DIRECT_COMPLETION; if ((softc->quirks & DA_Q_NO_SYNC_CACHE) == 0) softc->disk->d_flags |= DISKFLAG_CANFLUSHCACHE; if ((cpi.hba_misc & PIM_UNMAPPED) != 0) Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c ============================================================================== --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Tue Oct 22 08:22:19 2013 (r256880) @@ -147,6 +147,7 @@ vdev_geom_attach(struct g_provider *pp) ZFS_LOG(1, "Used existing consumer for %s.", pp->name); } } + cp->flags |= G_CF_DIRECT_SEND | G_CF_DIRECT_RECEIVE; return (cp); } Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c ============================================================================== --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c Tue Oct 22 08:22:19 2013 (r256880) @@ -2153,6 +2153,7 @@ zvol_geom_create(const char *name) gp->start = zvol_geom_start; gp->access = zvol_geom_access; pp = g_new_providerf(gp, "%s/%s", ZVOL_DRIVER, name); + pp->flags |= G_PF_DIRECT_RECEIVE | G_PF_DIRECT_SEND; pp->sectorsize = DEV_BSIZE; zv = kmem_zalloc(sizeof(*zv), KM_SLEEP); @@ -2256,18 +2257,20 @@ zvol_geom_start(struct bio *bp) zvol_state_t *zv; boolean_t first; + zv = bp->bio_to->private; + ASSERT(zv != NULL); switch (bp->bio_cmd) { + case BIO_FLUSH: + if (!THREAD_CAN_SLEEP()) + goto enqueue; + zil_commit(zv->zv_zilog, ZVOL_OBJ); + g_io_deliver(bp, 0); + break; case BIO_READ: case BIO_WRITE: - case BIO_FLUSH: - zv = bp->bio_to->private; - ASSERT(zv != NULL); - mtx_lock(&zv->zv_queue_mtx); - first = (bioq_first(&zv->zv_queue) == NULL); - bioq_insert_tail(&zv->zv_queue, bp); - mtx_unlock(&zv->zv_queue_mtx); - if (first) - wakeup_one(&zv->zv_queue); + if (!THREAD_CAN_SLEEP()) + goto enqueue; + zvol_strategy(bp); break; case BIO_GETATTR: case BIO_DELETE: @@ -2275,6 +2278,15 @@ zvol_geom_start(struct bio *bp) g_io_deliver(bp, EOPNOTSUPP); break; } + return; + +enqueue: + mtx_lock(&zv->zv_queue_mtx); + first = (bioq_first(&zv->zv_queue) == NULL); + bioq_insert_tail(&zv->zv_queue, bp); + mtx_unlock(&zv->zv_queue_mtx); + if (first) + wakeup_one(&zv->zv_queue); } static void @@ -2449,6 +2461,7 @@ zvol_rename_minor(struct g_geom *gp, con g_wither_provider(pp, ENXIO); pp = g_new_providerf(gp, "%s/%s", ZVOL_DRIVER, newname); + pp->flags |= G_PF_DIRECT_RECEIVE | G_PF_DIRECT_SEND; pp->sectorsize = DEV_BSIZE; pp->mediasize = zv->zv_volsize; pp->private = zv; Modified: head/sys/dev/md/md.c ============================================================================== --- head/sys/dev/md/md.c Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/dev/md/md.c Tue Oct 22 08:22:19 2013 (r256880) @@ -189,6 +189,7 @@ struct md_s { LIST_ENTRY(md_s) list; struct bio_queue_head bio_queue; struct mtx queue_mtx; + struct mtx stat_mtx; struct cdev *dev; enum md_types type; off_t mediasize; @@ -415,8 +416,11 @@ g_md_start(struct bio *bp) struct md_s *sc; sc = bp->bio_to->geom->softc; - if ((bp->bio_cmd == BIO_READ) || (bp->bio_cmd == BIO_WRITE)) + if ((bp->bio_cmd == BIO_READ) || (bp->bio_cmd == BIO_WRITE)) { + mtx_lock(&sc->stat_mtx); devstat_start_transaction_bio(sc->devstat, bp); + mtx_unlock(&sc->stat_mtx); + } mtx_lock(&sc->queue_mtx); bioq_disksort(&sc->bio_queue, bp); mtx_unlock(&sc->queue_mtx); @@ -987,6 +991,7 @@ mdnew(int unit, int *errp, enum md_types sc->type = type; bioq_init(&sc->bio_queue); mtx_init(&sc->queue_mtx, "md bio queue", NULL, MTX_DEF); + mtx_init(&sc->stat_mtx, "md stat", NULL, MTX_DEF); sc->unit = unit; sprintf(sc->name, "md%d", unit); LIST_INSERT_HEAD(&md_softc_list, sc, list); @@ -994,6 +999,7 @@ mdnew(int unit, int *errp, enum md_types if (error == 0) return (sc); LIST_REMOVE(sc, list); + mtx_destroy(&sc->stat_mtx); mtx_destroy(&sc->queue_mtx); free_unr(md_uh, sc->unit); free(sc, M_MD); @@ -1011,6 +1017,7 @@ mdinit(struct md_s *sc) gp = g_new_geomf(&g_md_class, "md%d", sc->unit); gp->softc = sc; pp = g_new_providerf(gp, "md%d", sc->unit); + pp->flags |= G_PF_DIRECT_SEND | G_PF_DIRECT_RECEIVE; pp->mediasize = sc->mediasize; pp->sectorsize = sc->sectorsize; switch (sc->type) { @@ -1206,6 +1213,7 @@ mddestroy(struct md_s *sc, struct thread while (!(sc->flags & MD_EXITING)) msleep(sc->procp, &sc->queue_mtx, PRIBIO, "mddestroy", hz / 10); mtx_unlock(&sc->queue_mtx); + mtx_destroy(&sc->stat_mtx); mtx_destroy(&sc->queue_mtx); if (sc->vnode != NULL) { vn_lock(sc->vnode, LK_EXCLUSIVE | LK_RETRY); Modified: head/sys/geom/concat/g_concat.c ============================================================================== --- head/sys/geom/concat/g_concat.c Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/geom/concat/g_concat.c Tue Oct 22 08:22:19 2013 (r256880) @@ -239,6 +239,27 @@ g_concat_kernel_dump(struct bio *bp) } static void +g_concat_done(struct bio *bp) +{ + struct g_concat_softc *sc; + struct bio *pbp; + + pbp = bp->bio_parent; + sc = pbp->bio_to->geom->softc; + mtx_lock(&sc->sc_lock); + if (pbp->bio_error == 0) + pbp->bio_error = bp->bio_error; + pbp->bio_completed += bp->bio_completed; + pbp->bio_inbed++; + if (pbp->bio_children == pbp->bio_inbed) { + mtx_unlock(&sc->sc_lock); + g_io_deliver(pbp, pbp->bio_error); + } else + mtx_unlock(&sc->sc_lock); + g_destroy_bio(bp); +} + +static void g_concat_flush(struct g_concat_softc *sc, struct bio *bp) { struct bio_queue_head queue; @@ -250,23 +271,19 @@ g_concat_flush(struct g_concat_softc *sc for (no = 0; no < sc->sc_ndisks; no++) { cbp = g_clone_bio(bp); if (cbp == NULL) { - for (cbp = bioq_first(&queue); cbp != NULL; - cbp = bioq_first(&queue)) { - bioq_remove(&queue, cbp); + while ((cbp = bioq_takefirst(&queue)) != NULL) g_destroy_bio(cbp); - } if (bp->bio_error == 0) bp->bio_error = ENOMEM; g_io_deliver(bp, bp->bio_error); return; } bioq_insert_tail(&queue, cbp); - cbp->bio_done = g_std_done; + cbp->bio_done = g_concat_done; cbp->bio_caller1 = sc->sc_disks[no].d_consumer; cbp->bio_to = sc->sc_disks[no].d_consumer->provider; } - for (cbp = bioq_first(&queue); cbp != NULL; cbp = bioq_first(&queue)) { - bioq_remove(&queue, cbp); + while ((cbp = bioq_takefirst(&queue)) != NULL) { G_CONCAT_LOGREQ(cbp, "Sending request."); cp = cbp->bio_caller1; cbp->bio_caller1 = NULL; @@ -320,7 +337,10 @@ g_concat_start(struct bio *bp) offset = bp->bio_offset; length = bp->bio_length; - addr = bp->bio_data; + if ((bp->bio_flags & BIO_UNMAPPED) != 0) + addr = NULL; + else + addr = bp->bio_data; end = offset + length; bioq_init(&queue); @@ -338,11 +358,8 @@ g_concat_start(struct bio *bp) cbp = g_clone_bio(bp); if (cbp == NULL) { - for (cbp = bioq_first(&queue); cbp != NULL; - cbp = bioq_first(&queue)) { - bioq_remove(&queue, cbp); + while ((cbp = bioq_takefirst(&queue)) != NULL) g_destroy_bio(cbp); - } if (bp->bio_error == 0) bp->bio_error = ENOMEM; g_io_deliver(bp, bp->bio_error); @@ -352,11 +369,21 @@ g_concat_start(struct bio *bp) /* * Fill in the component buf structure. */ - cbp->bio_done = g_std_done; + if (len == bp->bio_length) + cbp->bio_done = g_std_done; + else + cbp->bio_done = g_concat_done; cbp->bio_offset = off; - cbp->bio_data = addr; - addr += len; cbp->bio_length = len; + if ((bp->bio_flags & BIO_UNMAPPED) != 0) { + cbp->bio_ma_offset += (uintptr_t)addr; + cbp->bio_ma += cbp->bio_ma_offset / PAGE_SIZE; + cbp->bio_ma_offset %= PAGE_SIZE; + cbp->bio_ma_n = round_page(cbp->bio_ma_offset + + cbp->bio_length) / PAGE_SIZE; + } else + cbp->bio_data = addr; + addr += len; cbp->bio_to = disk->d_consumer->provider; cbp->bio_caller1 = disk; @@ -366,8 +393,7 @@ g_concat_start(struct bio *bp) KASSERT(length == 0, ("Length is still greater than 0 (class=%s, name=%s).", bp->bio_to->geom->class->name, bp->bio_to->geom->name)); - for (cbp = bioq_first(&queue); cbp != NULL; cbp = bioq_first(&queue)) { - bioq_remove(&queue, cbp); + while ((cbp = bioq_takefirst(&queue)) != NULL) { G_CONCAT_LOGREQ(cbp, "Sending request."); disk = cbp->bio_caller1; cbp->bio_caller1 = NULL; @@ -379,7 +405,7 @@ static void g_concat_check_and_run(struct g_concat_softc *sc) { struct g_concat_disk *disk; - struct g_provider *pp; + struct g_provider *dp, *pp; u_int no, sectorsize = 0; off_t start; @@ -388,20 +414,27 @@ g_concat_check_and_run(struct g_concat_s return; pp = g_new_providerf(sc->sc_geom, "concat/%s", sc->sc_name); + pp->flags |= G_PF_DIRECT_SEND | G_PF_DIRECT_RECEIVE | + G_PF_ACCEPT_UNMAPPED; start = 0; for (no = 0; no < sc->sc_ndisks; no++) { disk = &sc->sc_disks[no]; + dp = disk->d_consumer->provider; disk->d_start = start; - disk->d_end = disk->d_start + - disk->d_consumer->provider->mediasize; + disk->d_end = disk->d_start + dp->mediasize; if (sc->sc_type == G_CONCAT_TYPE_AUTOMATIC) - disk->d_end -= disk->d_consumer->provider->sectorsize; + disk->d_end -= dp->sectorsize; start = disk->d_end; if (no == 0) - sectorsize = disk->d_consumer->provider->sectorsize; - else { - sectorsize = lcm(sectorsize, - disk->d_consumer->provider->sectorsize); + sectorsize = dp->sectorsize; + else + sectorsize = lcm(sectorsize, dp->sectorsize); + + /* A provider underneath us doesn't support unmapped */ + if ((dp->flags & G_PF_ACCEPT_UNMAPPED) == 0) { + G_CONCAT_DEBUG(1, "Cancelling unmapped " + "because of %s.", dp->name); + pp->flags &= ~G_PF_ACCEPT_UNMAPPED; } } pp->sectorsize = sectorsize; @@ -468,6 +501,7 @@ g_concat_add_disk(struct g_concat_softc fcp = LIST_FIRST(&gp->consumer); cp = g_new_consumer(gp); + cp->flags |= G_CF_DIRECT_SEND | G_CF_DIRECT_RECEIVE; error = g_attach(cp, pp); if (error != 0) { g_destroy_consumer(cp); @@ -557,6 +591,7 @@ g_concat_create(struct g_class *mp, cons for (no = 0; no < sc->sc_ndisks; no++) sc->sc_disks[no].d_consumer = NULL; sc->sc_type = type; + mtx_init(&sc->sc_lock, "gconcat lock", NULL, MTX_DEF); gp->softc = sc; sc->sc_geom = gp; @@ -605,6 +640,7 @@ g_concat_destroy(struct g_concat_softc * KASSERT(sc->sc_provider == NULL, ("Provider still exists? (device=%s)", gp->name)); free(sc->sc_disks, M_CONCAT); + mtx_destroy(&sc->sc_lock); free(sc, M_CONCAT); G_CONCAT_DEBUG(0, "Device %s destroyed.", gp->name); Modified: head/sys/geom/concat/g_concat.h ============================================================================== --- head/sys/geom/concat/g_concat.h Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/geom/concat/g_concat.h Tue Oct 22 08:22:19 2013 (r256880) @@ -83,6 +83,7 @@ struct g_concat_softc { struct g_concat_disk *sc_disks; uint16_t sc_ndisks; + struct mtx sc_lock; }; #define sc_name sc_geom->name #endif /* _KERNEL */ Modified: head/sys/geom/gate/g_gate.c ============================================================================== --- head/sys/geom/gate/g_gate.c Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/geom/gate/g_gate.c Tue Oct 22 08:22:19 2013 (r256880) @@ -91,6 +91,7 @@ static struct mtx g_gate_units_lock; static int g_gate_destroy(struct g_gate_softc *sc, boolean_t force) { + struct bio_queue_head queue; struct g_provider *pp; struct g_consumer *cp; struct g_geom *gp; @@ -113,21 +114,22 @@ g_gate_destroy(struct g_gate_softc *sc, pp->flags |= G_PF_WITHER; g_orphan_provider(pp, ENXIO); callout_drain(&sc->sc_callout); + bioq_init(&queue); mtx_lock(&sc->sc_queue_mtx); - while ((bp = bioq_first(&sc->sc_inqueue)) != NULL) { - bioq_remove(&sc->sc_inqueue, bp); + while ((bp = bioq_takefirst(&sc->sc_inqueue)) != NULL) { sc->sc_queue_count--; - G_GATE_LOGREQ(1, bp, "Request canceled."); - g_io_deliver(bp, ENXIO); + bioq_insert_tail(&queue, bp); } - while ((bp = bioq_first(&sc->sc_outqueue)) != NULL) { - bioq_remove(&sc->sc_outqueue, bp); + while ((bp = bioq_takefirst(&sc->sc_outqueue)) != NULL) { sc->sc_queue_count--; - G_GATE_LOGREQ(1, bp, "Request canceled."); - g_io_deliver(bp, ENXIO); + bioq_insert_tail(&queue, bp); } mtx_unlock(&sc->sc_queue_mtx); g_topology_unlock(); + while ((bp = bioq_takefirst(&queue)) != NULL) { + G_GATE_LOGREQ(1, bp, "Request canceled."); + g_io_deliver(bp, ENXIO); + } mtx_lock(&g_gate_units_lock); /* One reference is ours. */ sc->sc_ref--; @@ -334,6 +336,7 @@ g_gate_getunit(int unit, int *errorp) static void g_gate_guard(void *arg) { + struct bio_queue_head queue; struct g_gate_softc *sc; struct bintime curtime; struct bio *bp, *bp2; @@ -341,24 +344,27 @@ g_gate_guard(void *arg) sc = arg; binuptime(&curtime); g_gate_hold(sc->sc_unit, NULL); + bioq_init(&queue); mtx_lock(&sc->sc_queue_mtx); TAILQ_FOREACH_SAFE(bp, &sc->sc_inqueue.queue, bio_queue, bp2) { if (curtime.sec - bp->bio_t0.sec < 5) continue; bioq_remove(&sc->sc_inqueue, bp); sc->sc_queue_count--; - G_GATE_LOGREQ(1, bp, "Request timeout."); - g_io_deliver(bp, EIO); + bioq_insert_tail(&queue, bp); } TAILQ_FOREACH_SAFE(bp, &sc->sc_outqueue.queue, bio_queue, bp2) { if (curtime.sec - bp->bio_t0.sec < 5) continue; bioq_remove(&sc->sc_outqueue, bp); sc->sc_queue_count--; + bioq_insert_tail(&queue, bp); + } + mtx_unlock(&sc->sc_queue_mtx); + while ((bp = bioq_takefirst(&queue)) != NULL) { G_GATE_LOGREQ(1, bp, "Request timeout."); g_io_deliver(bp, EIO); } - mtx_unlock(&sc->sc_queue_mtx); if ((sc->sc_flags & G_GATE_FLAG_DESTROY) == 0) { callout_reset(&sc->sc_callout, sc->sc_timeout * hz, g_gate_guard, sc); @@ -542,6 +548,7 @@ g_gate_create(struct g_gate_ctl_create * if (ropp != NULL) { cp = g_new_consumer(gp); + cp->flags |= G_CF_DIRECT_SEND | G_CF_DIRECT_RECEIVE; error = g_attach(cp, ropp); if (error != 0) { G_GATE_DEBUG(1, "Unable to attach to %s.", ropp->name); @@ -560,6 +567,7 @@ g_gate_create(struct g_gate_ctl_create * ggio->gctl_unit = sc->sc_unit; pp = g_new_providerf(gp, "%s", name); + pp->flags |= G_PF_DIRECT_SEND | G_PF_DIRECT_RECEIVE; pp->mediasize = ggio->gctl_mediasize; pp->sectorsize = ggio->gctl_sectorsize; sc->sc_provider = pp; @@ -636,6 +644,7 @@ g_gate_modify(struct g_gate_softc *sc, s return (EINVAL); } cp = g_new_consumer(sc->sc_provider->geom); + cp->flags |= G_CF_DIRECT_SEND | G_CF_DIRECT_RECEIVE; error = g_attach(cp, pp); if (error != 0) { G_GATE_DEBUG(1, "Unable to attach to %s.", Modified: head/sys/geom/geom.h ============================================================================== --- head/sys/geom/geom.h Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/geom/geom.h Tue Oct 22 08:22:19 2013 (r256880) @@ -177,6 +177,8 @@ struct g_consumer { int flags; #define G_CF_SPOILED 0x1 #define G_CF_ORPHAN 0x4 +#define G_CF_DIRECT_SEND 0x10 +#define G_CF_DIRECT_RECEIVE 0x20 struct devstat *stat; u_int nstart, nend; @@ -206,6 +208,8 @@ struct g_provider { #define G_PF_WITHER 0x2 #define G_PF_ORPHAN 0x4 #define G_PF_ACCEPT_UNMAPPED 0x8 +#define G_PF_DIRECT_SEND 0x10 +#define G_PF_DIRECT_RECEIVE 0x20 /* Two fields for the implementing class to use */ void *private; @@ -393,6 +397,8 @@ g_free(void *ptr) }; \ DECLARE_MODULE(name, name##_mod, SI_SUB_DRIVERS, SI_ORDER_FIRST); +int g_is_geom_thread(struct thread *td); + #endif /* _KERNEL */ /* geom_ctl.c */ Modified: head/sys/geom/geom_dev.c ============================================================================== --- head/sys/geom/geom_dev.c Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/geom/geom_dev.c Tue Oct 22 08:22:19 2013 (r256880) @@ -222,6 +222,7 @@ g_dev_taste(struct g_class *mp, struct g mtx_init(&sc->sc_mtx, "g_dev", NULL, MTX_DEF); cp = g_new_consumer(gp); cp->private = sc; + cp->flags |= G_CF_DIRECT_SEND | G_CF_DIRECT_RECEIVE; error = g_attach(cp, pp); KASSERT(error == 0, ("g_dev_taste(%s) failed to g_attach, err=%d", pp->name, error)); Modified: head/sys/geom/geom_disk.c ============================================================================== --- head/sys/geom/geom_disk.c Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/geom/geom_disk.c Tue Oct 22 08:22:19 2013 (r256880) @@ -66,6 +66,7 @@ struct g_disk_softc { struct sysctl_oid *sysctl_tree; char led[64]; uint32_t state; + struct mtx start_mtx; }; static g_access_t g_disk_access; @@ -255,6 +256,25 @@ g_disk_done(struct bio *bp) g_destroy_bio(bp); } +static void +g_disk_done_single(struct bio *bp) +{ + struct bintime now; + struct g_disk_softc *sc; + + bp->bio_completed = bp->bio_length - bp->bio_resid; + bp->bio_done = (void *)bp->bio_to; + bp->bio_to = LIST_FIRST(&bp->bio_disk->d_geom->provider); + if ((bp->bio_cmd & (BIO_READ|BIO_WRITE|BIO_DELETE)) != 0) { + binuptime(&now); + sc = bp->bio_to->private; + mtx_lock(&sc->done_mtx); + devstat_end_transaction_bio_bt(sc->dp->d_devstat, bp, &now); + mtx_unlock(&sc->done_mtx); + } + g_io_deliver(bp, bp->bio_error); +} + static int g_disk_ioctl(struct g_provider *pp, u_long cmd, void * data, int fflag, struct thread *td) { @@ -280,7 +300,7 @@ g_disk_start(struct bio *bp) struct disk *dp; struct g_disk_softc *sc; int error; - off_t off; + off_t d_maxsize, off; sc = bp->bio_to->private; if (sc == NULL || (dp = sc->dp) == NULL || dp->d_destroyed) { @@ -297,6 +317,22 @@ g_disk_start(struct bio *bp) /* fall-through */ case BIO_READ: case BIO_WRITE: + d_maxsize = (bp->bio_cmd == BIO_DELETE) ? + dp->d_delmaxsize : dp->d_maxsize; + if (bp->bio_length <= d_maxsize) { + bp->bio_disk = dp; + bp->bio_to = (void *)bp->bio_done; + bp->bio_done = g_disk_done_single; + bp->bio_pblkno = bp->bio_offset / dp->d_sectorsize; + bp->bio_bcount = bp->bio_length; + mtx_lock(&sc->start_mtx); + devstat_start_transaction_bio(dp->d_devstat, bp); + mtx_unlock(&sc->start_mtx); + g_disk_lock_giant(dp); + dp->d_strategy(bp); + g_disk_unlock_giant(dp); + break; + } off = 0; bp3 = NULL; bp2 = g_clone_bio(bp); @@ -305,10 +341,6 @@ g_disk_start(struct bio *bp) break; } do { - off_t d_maxsize; - - d_maxsize = (bp->bio_cmd == BIO_DELETE) ? - dp->d_delmaxsize : dp->d_maxsize; bp2->bio_offset += off; bp2->bio_length -= off; if ((bp->bio_flags & BIO_UNMAPPED) == 0) { @@ -349,7 +381,9 @@ g_disk_start(struct bio *bp) bp2->bio_pblkno = bp2->bio_offset / dp->d_sectorsize; bp2->bio_bcount = bp2->bio_length; bp2->bio_disk = dp; + mtx_lock(&sc->start_mtx); devstat_start_transaction_bio(dp->d_devstat, bp2); + mtx_unlock(&sc->start_mtx); g_disk_lock_giant(dp); dp->d_strategy(bp2); g_disk_unlock_giant(dp); @@ -405,15 +439,11 @@ g_disk_start(struct bio *bp) error = EOPNOTSUPP; break; } - bp2 = g_clone_bio(bp); - if (bp2 == NULL) { - g_io_deliver(bp, ENOMEM); - return; - } - bp2->bio_done = g_disk_done; - bp2->bio_disk = dp; + bp->bio_disk = dp; + bp->bio_to = (void *)bp->bio_done; + bp->bio_done = g_disk_done_single; g_disk_lock_giant(dp); - dp->d_strategy(bp2); + dp->d_strategy(bp); g_disk_unlock_giant(dp); break; default: @@ -518,17 +548,24 @@ g_disk_create(void *arg, int flag) g_topology_assert(); dp = arg; sc = g_malloc(sizeof(*sc), M_WAITOK | M_ZERO); + mtx_init(&sc->start_mtx, "g_disk_start", NULL, MTX_DEF); mtx_init(&sc->done_mtx, "g_disk_done", NULL, MTX_DEF); sc->dp = dp; gp = g_new_geomf(&g_disk_class, "%s%d", dp->d_name, dp->d_unit); gp->softc = sc; pp = g_new_providerf(gp, "%s", gp->name); + devstat_remove_entry(pp->stat); + pp->stat = NULL; + dp->d_devstat->id = pp; pp->mediasize = dp->d_mediasize; pp->sectorsize = dp->d_sectorsize; pp->stripeoffset = dp->d_stripeoffset; pp->stripesize = dp->d_stripesize; if ((dp->d_flags & DISKFLAG_UNMAPPED_BIO) != 0) pp->flags |= G_PF_ACCEPT_UNMAPPED; + if ((dp->d_flags & DISKFLAG_DIRECT_COMPLETION) != 0) + pp->flags |= G_PF_DIRECT_SEND; + pp->flags |= G_PF_DIRECT_RECEIVE; if (bootverbose) printf("GEOM: new disk %s\n", gp->name); sysctl_ctx_init(&sc->sysctl_ctx); @@ -577,6 +614,7 @@ g_disk_providergone(struct g_provider *p pp->private = NULL; pp->geom->softc = NULL; mtx_destroy(&sc->done_mtx); + mtx_destroy(&sc->start_mtx); g_free(sc); } Modified: head/sys/geom/geom_disk.h ============================================================================== --- head/sys/geom/geom_disk.h Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/geom/geom_disk.h Tue Oct 22 08:22:19 2013 (r256880) @@ -107,6 +107,7 @@ struct disk { #define DISKFLAG_CANDELETE 0x4 #define DISKFLAG_CANFLUSHCACHE 0x8 #define DISKFLAG_UNMAPPED_BIO 0x10 +#define DISKFLAG_DIRECT_COMPLETION 0x20 struct disk *disk_alloc(void); void disk_create(struct disk *disk, int version); Modified: head/sys/geom/geom_int.h ============================================================================== --- head/sys/geom/geom_int.h Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/geom/geom_int.h Tue Oct 22 08:22:19 2013 (r256880) @@ -39,6 +39,9 @@ LIST_HEAD(class_list_head, g_class); TAILQ_HEAD(g_tailq_head, g_geom); extern int g_collectstats; +#define G_STATS_PROVIDERS 1 /* Collect I/O stats for providers */ +#define G_STATS_CONSUMERS 2 /* Collect I/O stats for consumers */ + extern int g_debugflags; /* * 1 G_T_TOPOLOGY Modified: head/sys/geom/geom_io.c ============================================================================== --- head/sys/geom/geom_io.c Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/geom/geom_io.c Tue Oct 22 08:22:19 2013 (r256880) @@ -65,6 +65,8 @@ __FBSDID("$FreeBSD$"); #include #include +static int g_io_transient_map_bio(struct bio *bp); + static struct g_bioq g_bio_run_down; static struct g_bioq g_bio_run_up; static struct g_bioq g_bio_run_task; @@ -310,6 +312,8 @@ g_io_check(struct bio *bp) { struct g_consumer *cp; struct g_provider *pp; + off_t excess; + int error; cp = bp->bio_from; pp = bp->bio_to; @@ -354,11 +358,44 @@ g_io_check(struct bio *bp) return (EIO); if (bp->bio_offset > pp->mediasize) return (EIO); + + /* Truncate requests to the end of providers media. */ + excess = bp->bio_offset + bp->bio_length; + if (excess > bp->bio_to->mediasize) { + KASSERT((bp->bio_flags & BIO_UNMAPPED) == 0 || + round_page(bp->bio_ma_offset + + bp->bio_length) / PAGE_SIZE == bp->bio_ma_n, + ("excess bio %p too short", bp)); + excess -= bp->bio_to->mediasize; + bp->bio_length -= excess; + if ((bp->bio_flags & BIO_UNMAPPED) != 0) { + bp->bio_ma_n = round_page(bp->bio_ma_offset + + bp->bio_length) / PAGE_SIZE; + } + if (excess > 0) + CTR3(KTR_GEOM, "g_down truncated bio " + "%p provider %s by %d", bp, + bp->bio_to->name, excess); + } + + /* Deliver zero length transfers right here. */ + if (bp->bio_length == 0) { + CTR2(KTR_GEOM, "g_down terminated 0-length " + "bp %p provider %s", bp, bp->bio_to->name); + return (0); + } + + if ((bp->bio_flags & BIO_UNMAPPED) != 0 && + (bp->bio_to->flags & G_PF_ACCEPT_UNMAPPED) == 0 && + (bp->bio_cmd == BIO_READ || bp->bio_cmd == BIO_WRITE)) { + if ((error = g_io_transient_map_bio(bp)) >= 0) + return (error); + } break; default: break; } - return (0); + return (EJUSTRETURN); } /* @@ -422,7 +459,8 @@ void g_io_request(struct bio *bp, struct g_consumer *cp) { struct g_provider *pp; - int first; + struct mtx *mtxp; + int direct, error, first; KASSERT(cp != NULL, ("NULL cp in g_io_request")); KASSERT(bp != NULL, ("NULL bp in g_io_request")); @@ -472,40 +510,71 @@ g_io_request(struct bio *bp, struct g_co KASSERT(!(bp->bio_flags & BIO_ONQUEUE), ("Bio already on queue bp=%p", bp)); - bp->bio_flags |= BIO_ONQUEUE; - - if (g_collectstats) + if ((g_collectstats & G_STATS_CONSUMERS) != 0 || + ((g_collectstats & G_STATS_PROVIDERS) != 0 && pp->stat != NULL)) binuptime(&bp->bio_t0); else getbinuptime(&bp->bio_t0); +#ifdef GET_STACK_USAGE + direct = (cp->flags & G_CF_DIRECT_SEND) && + (pp->flags & G_PF_DIRECT_RECEIVE) && + !g_is_geom_thread(curthread) && + (((pp->flags & G_PF_ACCEPT_UNMAPPED) == 0 && + (bp->bio_flags & BIO_UNMAPPED) != 0) || THREAD_CAN_SLEEP()); + if (direct) { + /* Block direct execution if less then half of stack left. */ + size_t st, su; + GET_STACK_USAGE(st, su); + if (su * 2 > st) + direct = 0; + } +#else + direct = 0; +#endif + + if (!TAILQ_EMPTY(&g_classifier_tailq) && !bp->bio_classifier1) { + g_bioq_lock(&g_bio_run_down); + g_run_classifiers(bp); + g_bioq_unlock(&g_bio_run_down); + } + /* * The statistics collection is lockless, as such, but we * can not update one instance of the statistics from more * than one thread at a time, so grab the lock first. - * - * We also use the lock to protect the list of classifiers. */ - g_bioq_lock(&g_bio_run_down); - - if (!TAILQ_EMPTY(&g_classifier_tailq) && !bp->bio_classifier1) - g_run_classifiers(bp); - - if (g_collectstats & 1) + mtxp = mtx_pool_find(mtxpool_sleep, pp); + mtx_lock(mtxp); + if (g_collectstats & G_STATS_PROVIDERS) devstat_start_transaction(pp->stat, &bp->bio_t0); - if (g_collectstats & 2) + if (g_collectstats & G_STATS_CONSUMERS) devstat_start_transaction(cp->stat, &bp->bio_t0); - pp->nstart++; cp->nstart++; - first = TAILQ_EMPTY(&g_bio_run_down.bio_queue); - TAILQ_INSERT_TAIL(&g_bio_run_down.bio_queue, bp, bio_queue); - g_bio_run_down.bio_queue_length++; - g_bioq_unlock(&g_bio_run_down); + mtx_unlock(mtxp); - /* Pass it on down. */ - if (first) - wakeup(&g_wait_down); + if (direct) { + error = g_io_check(bp); + if (error >= 0) { + CTR3(KTR_GEOM, "g_io_request g_io_check on bp %p " + "provider %s returned %d", bp, bp->bio_to->name, + error); + g_io_deliver(bp, error); + return; + } + bp->bio_to->geom->start(bp); + } else { + g_bioq_lock(&g_bio_run_down); + first = TAILQ_EMPTY(&g_bio_run_down.bio_queue); + TAILQ_INSERT_TAIL(&g_bio_run_down.bio_queue, bp, bio_queue); + bp->bio_flags |= BIO_ONQUEUE; + g_bio_run_down.bio_queue_length++; + g_bioq_unlock(&g_bio_run_down); + /* Pass it on down. */ + if (first) + wakeup(&g_wait_down); + } } void @@ -514,7 +583,8 @@ g_io_deliver(struct bio *bp, int error) struct bintime now; struct g_consumer *cp; struct g_provider *pp; - int first; + struct mtx *mtxp; + int direct, first; KASSERT(bp != NULL, ("NULL bp in g_io_deliver")); pp = bp->bio_to; @@ -560,33 +630,55 @@ g_io_deliver(struct bio *bp, int error) bp->bio_bcount = bp->bio_length; bp->bio_resid = bp->bio_bcount - bp->bio_completed; +#ifdef GET_STACK_USAGE + direct = (pp->flags & G_PF_DIRECT_SEND) && + (cp->flags & G_CF_DIRECT_RECEIVE) && + !g_is_geom_thread(curthread); + if (direct) { + /* Block direct execution if less then half of stack left. */ + size_t st, su; + GET_STACK_USAGE(st, su); + if (su * 2 > st) + direct = 0; + } +#else + direct = 0; +#endif + /* * The statistics collection is lockless, as such, but we * can not update one instance of the statistics from more * than one thread at a time, so grab the lock first. */ - if (g_collectstats) + if ((g_collectstats & G_STATS_CONSUMERS) != 0 || + ((g_collectstats & G_STATS_PROVIDERS) != 0 && pp->stat != NULL)) binuptime(&now); - g_bioq_lock(&g_bio_run_up); - if (g_collectstats & 1) + mtxp = mtx_pool_find(mtxpool_sleep, cp); + mtx_lock(mtxp); + if (g_collectstats & G_STATS_PROVIDERS) devstat_end_transaction_bio_bt(pp->stat, bp, &now); - if (g_collectstats & 2) + if (g_collectstats & G_STATS_CONSUMERS) devstat_end_transaction_bio_bt(cp->stat, bp, &now); - cp->nend++; pp->nend++; + mtx_unlock(mtxp); + if (error != ENOMEM) { bp->bio_error = error; - first = TAILQ_EMPTY(&g_bio_run_up.bio_queue); - TAILQ_INSERT_TAIL(&g_bio_run_up.bio_queue, bp, bio_queue); - bp->bio_flags |= BIO_ONQUEUE; - g_bio_run_up.bio_queue_length++; - g_bioq_unlock(&g_bio_run_up); - if (first) - wakeup(&g_wait_up); + if (direct) { + biodone(bp); + } else { + g_bioq_lock(&g_bio_run_up); + first = TAILQ_EMPTY(&g_bio_run_up.bio_queue); + TAILQ_INSERT_TAIL(&g_bio_run_up.bio_queue, bp, bio_queue); + bp->bio_flags |= BIO_ONQUEUE; + g_bio_run_up.bio_queue_length++; + g_bioq_unlock(&g_bio_run_up); + if (first) + wakeup(&g_wait_up); + } return; } - g_bioq_unlock(&g_bio_run_up); if (bootverbose) printf("ENOMEM %p on %p(%s)\n", bp, pp, pp->name); @@ -642,11 +734,10 @@ retry: if (vmem_alloc(transient_arena, size, M_BESTFIT | M_NOWAIT, &addr)) { if (transient_map_retries != 0 && retried >= transient_map_retries) { - g_io_deliver(bp, EDEADLK/* XXXKIB */); CTR2(KTR_GEOM, "g_down cannot map bp %p provider %s", bp, bp->bio_to->name); atomic_add_int(&transient_map_hard_failures, 1); - return (1); + return (EDEADLK/* XXXKIB */); } else { /* * Naive attempt to quisce the I/O to get more @@ -666,14 +757,13 @@ retry: bp->bio_data = (caddr_t)addr + bp->bio_ma_offset; bp->bio_flags |= BIO_TRANSIENT_MAPPING; bp->bio_flags &= ~BIO_UNMAPPED; - return (0); + return (EJUSTRETURN); } void g_io_schedule_down(struct thread *tp __unused) { struct bio *bp; - off_t excess; int error; for(;;) { @@ -692,59 +782,15 @@ g_io_schedule_down(struct thread *tp __u pause("g_down", hz/10); pace--; } + CTR2(KTR_GEOM, "g_down processing bp %p provider %s", bp, + bp->bio_to->name); error = g_io_check(bp); - if (error) { + if (error >= 0) { CTR3(KTR_GEOM, "g_down g_io_check on bp %p provider " "%s returned %d", bp, bp->bio_to->name, error); g_io_deliver(bp, error); continue; } - CTR2(KTR_GEOM, "g_down processing bp %p provider %s", bp, - bp->bio_to->name); - switch (bp->bio_cmd) { - case BIO_READ: - case BIO_WRITE: - case BIO_DELETE: - /* Truncate requests to the end of providers media. */ - /* - * XXX: What if we truncate because of offset being - * bad, not length? - */ - excess = bp->bio_offset + bp->bio_length; - if (excess > bp->bio_to->mediasize) { - KASSERT((bp->bio_flags & BIO_UNMAPPED) == 0 || - round_page(bp->bio_ma_offset + - bp->bio_length) / PAGE_SIZE == bp->bio_ma_n, - ("excess bio %p too short", bp)); - excess -= bp->bio_to->mediasize; - bp->bio_length -= excess; - if ((bp->bio_flags & BIO_UNMAPPED) != 0) { - bp->bio_ma_n = round_page( - bp->bio_ma_offset + - bp->bio_length) / PAGE_SIZE; - } - if (excess > 0) - CTR3(KTR_GEOM, "g_down truncated bio " - "%p provider %s by %d", bp, - bp->bio_to->name, excess); - } - /* Deliver zero length transfers right here. */ - if (bp->bio_length == 0) { - g_io_deliver(bp, 0); - CTR2(KTR_GEOM, "g_down terminated 0-length " - "bp %p provider %s", bp, bp->bio_to->name); - continue; - } - break; - default: - break; - } - if ((bp->bio_flags & BIO_UNMAPPED) != 0 && - (bp->bio_to->flags & G_PF_ACCEPT_UNMAPPED) == 0 && - (bp->bio_cmd == BIO_READ || bp->bio_cmd == BIO_WRITE)) { - if (g_io_transient_map_bio(bp)) - continue; - } THREAD_NO_SLEEPING(); CTR4(KTR_GEOM, "g_down starting bp %p provider %s off %ld " "len %ld", bp, bp->bio_to->name, bp->bio_offset, Modified: head/sys/geom/geom_kern.c ============================================================================== --- head/sys/geom/geom_kern.c Tue Oct 22 07:59:14 2013 (r256879) +++ head/sys/geom/geom_kern.c Tue Oct 22 08:22:19 2013 (r256880) @@ -124,6 +124,13 @@ g_event_procbody(void *arg) /* NOTREACHED */ } +int +g_is_geom_thread(struct thread *td) *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***