From owner-svn-src-stable-11@freebsd.org Tue Nov 28 00:19:06 2017 Return-Path: Delivered-To: svn-src-stable-11@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 382BFDEE193; Tue, 28 Nov 2017 00:19:06 +0000 (UTC) (envelope-from asomers@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0ECE16BAB9; Tue, 28 Nov 2017 00:19:05 +0000 (UTC) (envelope-from asomers@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id vAS0J5BW051838; Tue, 28 Nov 2017 00:19:05 GMT (envelope-from asomers@FreeBSD.org) Received: (from asomers@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id vAS0J4CC051834; Tue, 28 Nov 2017 00:19:04 GMT (envelope-from asomers@FreeBSD.org) Message-Id: <201711280019.vAS0J4CC051834@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: asomers set sender to asomers@FreeBSD.org using -f From: Alan Somers Date: Tue, 28 Nov 2017 00:19:04 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-11@freebsd.org Subject: svn commit: r326298 - in stable/11/cddl: contrib/opensolaris/lib/libzfs/common usr.sbin/zfsd X-SVN-Group: stable-11 X-SVN-Commit-Author: asomers X-SVN-Commit-Paths: in stable/11/cddl: contrib/opensolaris/lib/libzfs/common usr.sbin/zfsd X-SVN-Commit-Revision: 326298 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-stable-11@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for only the 11-stable src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Nov 2017 00:19:06 -0000 Author: asomers Date: Tue Nov 28 00:19:04 2017 New Revision: 326298 URL: https://svnweb.freebsd.org/changeset/base/326298 Log: MFC r322854, r323995, r324568, r324991 r322854: zfsd(8): Close a race condition when onlining a disk paritition When inserting a partitioned disk, devfs and geom will announce the whole disk before they announce the partition. If the partition containing ZFS extends to one of the disk's extents, then zfsd will see a ZFS label on the whole disk and attempt to online it. ZFS is smart enough to activate the partition instead of the whole disk, but only if GEOM has already created the partition's provider. cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c Add a zpool_read_all_labels method. It's similar to zpool_read_label, but it will return the number of labels found. cddl/usr.sbin/zfsd/zfsd_event.cc When processing a DevFS CREATE event, only online a VDEV if we can read all four ZFS labels. Reviewed by: mav Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D11920 r323995: Close a memory leak when using zpool_read_all_labels X-MFC-With: 322854 Sponsored by: Spectra Logic Corp r324568: Optimize zpool_read_all_labels with AIO Read all labels in parallel instead of sequentially X-MFC-With: 322854 Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D12495 r324991: Fix zpool_read_all_labels when vfs.aio.enable_unsafe=0 Previously, zpool_read_all_labels was trying to do 256KB reads, which are greater than the default MAXPHYS and therefore must go through the slow, unsafe AIO path. Shrink these reads to 112KB so they can use the safe, fast AIO path instead. X-MFC-With: 324568 Sponsored by: Spectra Logic Corp Modified: stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c stable/11/cddl/usr.sbin/zfsd/zfsd_event.cc Directory Properties: stable/11/ (props changed) Modified: stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h ============================================================================== --- stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h Tue Nov 28 00:12:14 2017 (r326297) +++ stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h Tue Nov 28 00:19:04 2017 (r326298) @@ -774,6 +774,7 @@ extern int zpool_in_use(libzfs_handle_t *, int, pool_s * Label manipulation. */ extern int zpool_read_label(int, nvlist_t **); +extern int zpool_read_all_labels(int, nvlist_t **); extern int zpool_clear_label(int); /* is this zvol valid for use as a dump device? */ Modified: stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c ============================================================================== --- stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c Tue Nov 28 00:12:14 2017 (r326297) +++ stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c Tue Nov 28 00:19:04 2017 (r326298) @@ -42,6 +42,7 @@ * using our derived config, and record the results. */ +#include #include #include #include @@ -913,6 +914,90 @@ zpool_read_label(int fd, nvlist_t **config) free(label); *config = NULL; return (-1); +} + +/* + * Given a file descriptor, read the label information and return an nvlist + * describing the configuration, if there is one. + * returns the number of valid labels found + * If a label is found, returns it via config. The caller is responsible for + * freeing it. + */ +int +zpool_read_all_labels(int fd, nvlist_t **config) +{ + struct stat64 statbuf; + struct aiocb aiocbs[VDEV_LABELS]; + struct aiocb *aiocbps[VDEV_LABELS]; + int l; + vdev_phys_t *labels; + uint64_t state, txg, size; + int nlabels = 0; + + *config = NULL; + + if (fstat64(fd, &statbuf) == -1) + return (0); + size = P2ALIGN_TYPED(statbuf.st_size, sizeof (vdev_label_t), uint64_t); + + if ((labels = calloc(VDEV_LABELS, sizeof (vdev_phys_t))) == NULL) + return (0); + + memset(aiocbs, 0, sizeof(aiocbs)); + for (l = 0; l < VDEV_LABELS; l++) { + aiocbs[l].aio_fildes = fd; + aiocbs[l].aio_offset = label_offset(size, l) + VDEV_SKIP_SIZE; + aiocbs[l].aio_buf = &labels[l]; + aiocbs[l].aio_nbytes = sizeof(vdev_phys_t); + aiocbs[l].aio_lio_opcode = LIO_READ; + aiocbps[l] = &aiocbs[l]; + } + + if (lio_listio(LIO_WAIT, aiocbps, VDEV_LABELS, NULL) != 0) { + if (errno == EAGAIN || errno == EINTR || errno == EIO) { + for (l = 0; l < VDEV_LABELS; l++) { + errno = 0; + int r = aio_error(&aiocbs[l]); + if (r != EINVAL) + (void)aio_return(&aiocbs[l]); + } + } + free(labels); + return (0); + } + + for (l = 0; l < VDEV_LABELS; l++) { + nvlist_t *temp = NULL; + + if (aio_return(&aiocbs[l]) != sizeof(vdev_phys_t)) + continue; + + if (nvlist_unpack(labels[l].vp_nvlist, + sizeof (labels[l].vp_nvlist), &temp, 0) != 0) + continue; + + if (nvlist_lookup_uint64(temp, ZPOOL_CONFIG_POOL_STATE, + &state) != 0 || state > POOL_STATE_L2CACHE) { + nvlist_free(temp); + temp = NULL; + continue; + } + + if (state != POOL_STATE_SPARE && state != POOL_STATE_L2CACHE && + (nvlist_lookup_uint64(temp, ZPOOL_CONFIG_POOL_TXG, + &txg) != 0 || txg == 0)) { + nvlist_free(temp); + temp = NULL; + continue; + } + if (temp) + *config = temp; + + nlabels++; + } + + free(labels); + return (nlabels); } typedef struct rdsk_node { Modified: stable/11/cddl/usr.sbin/zfsd/zfsd_event.cc ============================================================================== --- stable/11/cddl/usr.sbin/zfsd/zfsd_event.cc Tue Nov 28 00:12:14 2017 (r326297) +++ stable/11/cddl/usr.sbin/zfsd/zfsd_event.cc Tue Nov 28 00:19:04 2017 (r326298) @@ -36,6 +36,7 @@ #include #include #include +#include #include @@ -93,21 +94,32 @@ DevfsEvent::ReadLabel(int devFd, bool &inUse, bool &de pool_state_t poolState; char *poolName; boolean_t b_inuse; + int nlabels; inUse = false; degraded = false; poolName = NULL; if (zpool_in_use(g_zfsHandle, devFd, &poolState, &poolName, &b_inuse) == 0) { - nvlist_t *devLabel; + nvlist_t *devLabel = NULL; inUse = b_inuse == B_TRUE; if (poolName != NULL) free(poolName); - if (zpool_read_label(devFd, &devLabel) != 0 - || devLabel == NULL) + nlabels = zpool_read_all_labels(devFd, &devLabel); + /* + * If we find a disk with fewer than the maximum number of + * labels, it might be the whole disk of a partitioned disk + * where ZFS resides on a partition. In that case, we should do + * nothing and wait for the partition to appear. Or, the disk + * might be damaged. In that case, zfsd should do nothing and + * wait for the sysadmin to decide. + */ + if (nlabels != VDEV_LABELS || devLabel == NULL) { + nvlist_free(devLabel); return (NULL); + } try { Vdev vdev(devLabel); @@ -121,6 +133,7 @@ DevfsEvent::ReadLabel(int devFd, bool &inUse, bool &de exp.GetString().insert(0, context); exp.Log(); + nvlist_free(devLabel); } } return (NULL);