From owner-freebsd-arch@FreeBSD.ORG Fri Jun 10 08:11:00 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D43C516A41C; Fri, 10 Jun 2005 08:11:00 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from pasmtp.tele.dk (pasmtp.tele.dk [193.162.159.95]) by mx1.FreeBSD.org (Postfix) with ESMTP id 76A5343D55; Fri, 10 Jun 2005 08:11:00 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (0x535c0e2a.sgnxx1.adsl-dhcp.tele.dk [83.92.14.42]) by pasmtp.tele.dk (Postfix) with ESMTP id 10F3E1EC361; Fri, 10 Jun 2005 10:10:59 +0200 (CEST) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.13.4/8.13.3) with ESMTP id j5A8Al16017438; Fri, 10 Jun 2005 10:10:47 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: Jeff Roberson From: "Poul-Henning Kamp" In-Reply-To: Your message of "Fri, 10 Jun 2005 04:05:52 EDT." <20050610040302.S16943@mail.chesapeake.net> Date: Fri, 10 Jun 2005 10:10:47 +0200 Message-ID: <17437.1118391047@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@FreeBSD.org, Pawel Jakub Dawidek Subject: Re: simplify disksort, please review. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jun 2005 08:11:01 -0000 In message <20050610040302.S16943@mail.chesapeake.net>, Jeff Roberson writes: >On Thu, 9 Jun 2005, Poul-Henning Kamp wrote: > >> In message <20050609193008.GB837@darkness.comp.waw.pl>, Pawel Jakub Dawidek writes: >> >> >The one example of how the order can be broken (write(offset, size)): >> > >> > write(1024, 512) >> > write(0, 2048) >> >> If you issue these two requests just like that, you get no guarantee >> which order they get written in. >> >> It's not just disksort which might surprise you, tagged queuing and >> write caches may mess up your day as well. > >Anyway, it's really not important to solve at the disk queue layer, >because it's only an issue with raw devices, and it's very irregular to >have multiple processes opening the same raw device to issue a lot of io >of different sizes. It will not be solved at any level, because it would require the GEOM layer to either make all writes strictly synchronous, a total no-go, or keep very expensive housekeeping of which requests are outstanding so we can block conflicting writes etc. This is something that belongs 100% in the code issuing the I/O requests and if some of that code does something stupid, it looses and that is how it should be. Jeff is correct that the buffer-cache mostly will prevent people from shooting their feet off, but nothing prevents a GEOM class from making a fool of itself. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.