From owner-freebsd-stable@freebsd.org Mon Apr 10 04:42:47 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 97C83D36EE8 for ; Mon, 10 Apr 2017 04:42:47 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 833571738 for ; Mon, 10 Apr 2017 04:42:47 +0000 (UTC) (envelope-from julian@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 82485D36EE7; Mon, 10 Apr 2017 04:42:47 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8037BD36EE5 for ; Mon, 10 Apr 2017 04:42:47 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 5FC7E1737 for ; Mon, 10 Apr 2017 04:42:46 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from Julian-MBP3.local (106-68-206-144.dyn.iinet.net.au [106.68.206.144]) (authenticated bits=0) by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id v3A4gWpW047940 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Sun, 9 Apr 2017 21:42:37 -0700 (PDT) (envelope-from julian@freebsd.org) Subject: Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0 To: Pete French , stable@freebsd.org References: <20170408110100.GB14604@brick> From: Julian Elischer Message-ID: <9f9bbb0e-2824-700f-1eac-8b904f91618b@freebsd.org> Date: Mon, 10 Apr 2017 12:42:26 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170408110100.GB14604@brick> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 04:42:47 -0000 On 8/4/17 7:01 pm, Edward Tomasz NapieraƂa wrote: > On 0313T1206, Pete French wrote: >> I have a number of machines in Azure, all booting from ZFS and, until >> the weekend, running 10.3 perfectly happily. >> >> I started upgrading these to 11. The first went fine, the second would >> not boot. Looking at the boot diagnistics it is having problems finding the >> root pool to mount. I see this is the diagnostic output: >> >> storvsc0: on vmbus0 >> Solaris: NOTICE: Cannot find the pool label for 'rpool' >> Mounting from zfs:rpool/ROOT/default failed with error 5. >> Root mount waiting for: storvsc >> (probe0:blkvsc0:0:storvsc1: 0:0): on vmbus0 >> storvsc scsi_status = 2 >> (da0:blkvsc0:0:0:0): UNMAPPED >> (probe1:blkvsc1:0:1:0): storvsc scsi_status = 2 >> hvheartbeat0: on vmbus0 >> da0 at blkvsc0 bus 0 scbus2 target 0 lun 0 >> >> As you can see, the drive da0 only appears after it has tried, and failed, >> to mount the root pool. > Does the same problem still happen with recent 11-STABLE? There is a fix for this floating around, we applied at work. Our systems are 10.3, but I think it wouldn't be a bad thing to add generally as it could (if we let it) solve the problem we sometimes see with nfs as well as with azure. p4 diff2 -du //depot/bugatti/FreeBSD-PZ/10.3/sys/kern/vfs_mountroot.c#1 //depot/bugatti/FreeBSD-PZ/10.3/sys/kern/vfs_mountroot.c#3 ==== //depot/bugatti/FreeBSD-PZ/10.3/sys/kern/vfs_mountroot.c#1 (text) - //depot/bugatti/FreeBSD-PZ/10.3/sys/kern/vfs_mountroot.c#3 (text) ==== content @@ -126,8 +126,8 @@ static int root_mount_mddev; static int root_mount_complete; -/* By default wait up to 3 seconds for devices to appear. */ -static int root_mount_timeout = 3; +/* By default wait up to 30 seconds for devices to appear. */ +static int root_mount_timeout = 30; TUNABLE_INT("vfs.mountroot.timeout", &root_mount_timeout); struct root_hold_token * @@ -690,7 +690,7 @@ char *errmsg; struct mntarg *ma; char *dev, *fs, *opts, *tok; - int delay, error, timeout; + int delay, error, timeout, err_stride; error = parse_token(conf, &tok); if (error) @@ -727,11 +727,20 @@ goto out; } + /* + * For ZFS we can't simply wait for a specific device + * as we only know the pool name. To work around this, + * parse_mount() will retry the mount later on. + * + * While retrying for NFS could be implemented similarly + * it is currently not supported. + */ + delay = hz / 10; + timeout = root_mount_timeout * hz; + if (strcmp(fs, "zfs") != 0 && strstr(fs, "nfs") == NULL && dev[0] != '\0' && !parse_mount_dev_present(dev)) { printf("mountroot: waiting for device %s ...\n", dev); - delay = hz / 10; - timeout = root_mount_timeout * hz; do { pause("rmdev", delay); timeout -= delay; @@ -741,16 +750,34 @@ goto out; } } + /* Timeout keeps counting down */ - ma = NULL; - ma = mount_arg(ma, "fstype", fs, -1); - ma = mount_arg(ma, "fspath", "/", -1); - ma = mount_arg(ma, "from", dev, -1); - ma = mount_arg(ma, "errmsg", errmsg, ERRMSGL); - ma = mount_arg(ma, "ro", NULL, 0); - ma = parse_mountroot_options(ma, opts); - error = kernel_mount(ma, MNT_ROOTFS); + err_stride=0; + do { + ma = NULL; + ma = mount_arg(ma, "fstype", fs, -1); + ma = mount_arg(ma, "fspath", "/", -1); + ma = mount_arg(ma, "from", dev, -1); + ma = mount_arg(ma, "errmsg", errmsg, ERRMSGL); + ma = mount_arg(ma, "ro", NULL, 0); + ma = parse_mountroot_options(ma, opts); + error = kernel_mount(ma, MNT_ROOTFS); + /* UFS only does it once */ + if (strcmp(fs, "zfs") != 0) + break; + timeout -= delay; + if (timeout > 0 && error) { + if (err_stride <= 0 ) { + printf("Mounting from %s:%s failed with error %d. " + "%d seconds left. Retrying.\n", fs, dev, error, + timeout / hz); + } + err_stride += 1; + err_stride %= 50; + pause("rmzfs", delay); + } + } while (timeout > 0 && error); out: if (error) { printf("Mounting from %s:%s failed with error %d", > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >