From owner-freebsd-current@freebsd.org Wed Mar 21 09:29:01 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 50C77F57685 for ; Wed, 21 Mar 2018 09:29:01 +0000 (UTC) (envelope-from fbsd-lists@dudes.ch) Received: from mail.dudes.ch (mail.dudes.ch [193.73.211.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.dudes.ch", Issuer "StartCom Class 3 OV Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D52B58578D for ; Wed, 21 Mar 2018 09:29:00 +0000 (UTC) (envelope-from fbsd-lists@dudes.ch) Received: from mwoffice.virtualtec.office (pippin.virtualtec.ch [93.189.66.120]) (authenticated bits=0) by mail.dudes.ch (8.15.2/8.15.2) with ESMTPSA id w2LASjvY047816 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Wed, 21 Mar 2018 11:28:45 +0100 (CET) (envelope-from fbsd-lists@dudes.ch) X-Authentication-Warning: mail.dudes.ch: Host pippin.virtualtec.ch [93.189.66.120] claimed to be mwoffice.virtualtec.office Date: Wed, 21 Mar 2018 10:28:48 +0100 From: Markus Wild To: freebsd-current@freebsd.org Subject: Re: ZFS i/o error in recent 12.0 Message-ID: <20180321102848.20a9f48a@mwoffice.virtualtec.office> In-Reply-To: References: <201803192300.w2JN04fx007127@kx.openedu.org> <20180320085028.0b15ff40@mwoffice.virtualtec.office> X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.31; amd64-portbld-freebsd11.1) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.78 on 193.73.211.25 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Mar 2018 09:29:01 -0000 Hello Thomas, > > I had faced the exact same issue on a HP Microserver G8 with 8TB disks and a 16TB zpool on FreeBSD 11 about a year > > ago. > I will ask you the same question as I asked the OP: > > Has this pool had new vdevs addded to it since the server was installed? No. This is a microserver with only 4 (not even hotplug) trays. It was set up using the freebsd installer originally. I had to apply the (then patch, don't know whether it's included standard now) btx loader fix to retry a failed read to get around BIOS bugs with that server, but after that, the server booted fine. It's only after a bit of use and a kernel update that things went south. I tried many different things at that time, but the only approach that worked for me was to steal 2 of the 4 swap partitions which I placed on every disk initially, and build a mirrored boot zpool from those. The loader had no problem loading the kernel from that, and when the kernel took over, it had no problem using the original root pool (that the boot loader wasn't able to find/load). Whence my conclusion that the 2nd stage boot loader has a problem (probably due to yet another bios bug on that server) loading blocks beyond a certain limit, which could be 2TB or 4TB. > What does a "zpool status" look like when the pool is imported? $ zpool status pool: zboot state: ONLINE scan: scrub repaired 0 in 0h0m with 0 errors on Wed Mar 21 03:58:36 2018 config: NAME STATE READ WRITE CKSUM zboot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/zfs-boot0 ONLINE 0 0 0 gpt/zfs-boot1 ONLINE 0 0 0 errors: No known data errors pool: zroot state: ONLINE scan: scrub repaired 0 in 6h49m with 0 errors on Sat Mar 10 10:17:49 2018 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/zfs0 ONLINE 0 0 0 gpt/zfs1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 gpt/zfs2 ONLINE 0 0 0 gpt/zfs3 ONLINE 0 0 0 errors: No known data errors Please note: this server is in use at a customer now, it's workin fine with this workaround. I just brought it up to give a possible explanation to the observed problem of the original poster, and that it _might_ have nothing to do with a newer version of the current kernel, but rather be due to the updated kernel being written to a new location on disk, which can't be read properly by the boot loader. Cheers, Markus