From owner-svn-src-stable@freebsd.org Thu Sep 1 19:30:53 2016 Return-Path: Delivered-To: svn-src-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 86DA3BCCBD2; Thu, 1 Sep 2016 19:30:53 +0000 (UTC) (envelope-from gjb@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 450AAE95; Thu, 1 Sep 2016 19:30:53 +0000 (UTC) (envelope-from gjb@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id u81JUqoF022323; Thu, 1 Sep 2016 19:30:52 GMT (envelope-from gjb@FreeBSD.org) Received: (from gjb@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id u81JUqls022322; Thu, 1 Sep 2016 19:30:52 GMT (envelope-from gjb@FreeBSD.org) Message-Id: <201609011930.u81JUqls022322@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: gjb set sender to gjb@FreeBSD.org using -f From: Glen Barber Date: Thu, 1 Sep 2016 19:30:52 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-11@freebsd.org Subject: svn commit: r305232 - stable/11/sys/boot/i386/libi386 X-SVN-Group: stable-11 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-stable@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: SVN commit messages for all the -stable branches of the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Sep 2016 19:30:53 -0000 Author: gjb Date: Thu Sep 1 19:30:52 2016 New Revision: 305232 URL: https://svnweb.freebsd.org/changeset/base/305232 Log: MFC r304966 (peter): The read-ahead code from r298230 made it likely the boot code would read beyond the end of disk. r298900 added code to prevent this. Some BIOSes cause significant delays if asked to read past end-of-disk. We never trusted the BIOS to accurately report the sectorsize of disks before and this set of changes. Unfortuately they interact badly with the infamous >2TB wraparound bugs. We have a number of relatively-recent machines in the FreeBSD.org cluster where the BIOS reports 3TB disks as 1TB. With pre-r298900 they work just fine. After r298900 they stop working if the boot environment attempts to access anything outside the first 1TB on the disk. 'ZFS: I/O error, all block copies unavailable' etc. It affects both UFS and ZFS if they try to boot from large volumes. This change replaces the blind trust of the BIOS end-of-disk reporting with a read-ahead clip to prevent reads crossing the of end-of-disk boundary. Since 2^32 (2TB) size reporting truncation is not uncommon, the clipping is done on 2TB aliases of the reported end-of-disk. ie: a 3TB disk reported as 1TB has readahead clipped at 1TB, 3TB, 5TB, ... as one of them is likely to be the real end-of-disk. This should make the loader on these broken machines behave the same as traditional pre-r298900 loader behavior, without disabling read-ahead. PR: 212139 Sponsored by: The FreeBSD Foundation Modified: stable/11/sys/boot/i386/libi386/biosdisk.c Directory Properties: stable/11/ (props changed) Modified: stable/11/sys/boot/i386/libi386/biosdisk.c ============================================================================== --- stable/11/sys/boot/i386/libi386/biosdisk.c Thu Sep 1 19:18:26 2016 (r305231) +++ stable/11/sys/boot/i386/libi386/biosdisk.c Thu Sep 1 19:30:52 2016 (r305232) @@ -495,7 +495,7 @@ bd_realstrategy(void *devdata, int rw, d char *buf, size_t *rsize) { struct disk_devdesc *dev = (struct disk_devdesc *)devdata; - int blks; + int blks, remaining; #ifdef BD_SUPPORT_FRAGS /* XXX: sector size */ char fragbuf[BIOSDISK_SECSIZE]; size_t fragsize; @@ -511,14 +511,15 @@ bd_realstrategy(void *devdata, int rw, d if (rsize) *rsize = 0; - if (dblk >= BD(dev).bd_sectors) { - DEBUG("IO past disk end %llu", (unsigned long long)dblk); - return (EIO); - } - - if (dblk + blks > BD(dev).bd_sectors) { - /* perform partial read */ - blks = BD(dev).bd_sectors - dblk; + /* + * Perform partial read to prevent read-ahead crossing + * the end of disk - or any 32 bit aliases of the end. + * Signed arithmetic is used to handle wrap-around cases + * like we do for TCP sequence numbers. + */ + remaining = (int)(BD(dev).bd_sectors - dblk); /* truncate */ + if (remaining > 0 && remaining < blks) { + blks = remaining; size = blks * BD(dev).bd_sectorsize; DEBUG("short read %d", blks); }