From owner-cvs-all Wed Jun 19 21:38: 2 2002 Delivered-To: cvs-all@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 8C9A437B407; Wed, 19 Jun 2002 21:37:56 -0700 (PDT) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.3/8.12.3) with ESMTP id g5K4buCV036455; Wed, 19 Jun 2002 21:37:56 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.3/8.12.3/Submit) id g5K4buMu036454; Wed, 19 Jun 2002 21:37:56 -0700 (PDT) (envelope-from dillon) Date: Wed, 19 Jun 2002 21:37:56 -0700 (PDT) From: Matthew Dillon Message-Id: <200206200437.g5K4buMu036454@apollo.backplane.com> To: Bruce Evans Cc: cvs-committers@FreeBSD.ORG, Subject: Re: cvs commit: src/sys/ufs/ufs ufs_readwrite.c Sender: owner-cvs-all@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :> Log: :> In rev 1.72 a situation related to write/mmap was fixed which could result :> in a user process gaining visibility into the 'old' contents of a filesystem :> block. There were two cases: (1) when uiomove() fails (user process issues :> illegal write), and (2) when uiomove() overlaps a mmap() of the same file at :> the same offset (fault -> recursive buffer I/O reads contents of old block). : :I fixed (1) in FreeBSD-1 by always backing out the write in the EFAULT case: Yah, #1 is fairly easy to deal with. #2 is a real mess. Even with the fix I originally had in there (and just moved around a little in this commit), Tor was able to write a little two line program to demonstrate that there are still issues with fragment extension. My little fix doesn't actually change the outstanding issues at all, it just hacks around the read-before-write that the original fix introduced. It's necessary because the read-before-write kills rewrite performance by 75% (e.g. 20 MBytes/sec -> 5 MBytes/sec), and on hardware RAID systems it can be 80 - 90% write performance *LOSS*, depending on the configuration. We still have serious issues with fragment extension (which wasn't covered by the original fix or this commit)... Tor has a two-line program which demonstrates data visibility during fragment extension. Kirk, Tor and I (mainly Tor and I) are exploring options for a real fix. Constructive comments are welcome but this particular area of the codebase is extremely complex and I doubt more then a handful of people even understand how it works. The case in question here is write()ing an overlapped mmap()'d buffer to the same descriptor. The code path is: write() -> ufs_readwrite() -> uiomove() -> (fault) -> ffs_getpages() ... I/O, (fault return) resume-uiomove() -> bdwrite(). Approximately. Very, very nasty. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message