From owner-freebsd-fs Mon Jan 20 10:30:44 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1E84E37B401 for ; Mon, 20 Jan 2003 10:30:41 -0800 (PST) Received: from mcomail01.maxtor.com (mcomail01.maxtor.com [134.6.76.15]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5D69343EB2 for ; Mon, 20 Jan 2003 10:30:40 -0800 (PST) (envelope-from stephen_byan@maxtor.com) Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1]) by mcomail01.maxtor.com (8.11.6/8.11.6) with ESMTP id h0KIKa019756 for ; Mon, 20 Jan 2003 11:20:37 -0700 Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id DH1PYH51; Mon, 20 Jan 2003 11:30:51 -0700 Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM) id NAA0000018700; Mon, 20 Jan 2003 13:30:19 -0500 (EST) Date: Mon, 20 Jan 2003 13:30:14 -0500 Subject: Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support) Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v551) From: Steve Byan To: freebsd-fs@FreeBSD.ORG Content-Transfer-Encoding: 7bit In-Reply-To: <3E27DA7F.D5DBEFB@mindspring.com> Message-Id: <37CA8FF0-2CA5-11D7-962B-00306548867E@maxtor.com> X-Mailer: Apple Mail (2.551) Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Friday, January 17, 2003, at 05:27 AM, Terry Lambert wrote: > No, the worst case following a power failure is a screwed disk > track. I'm skeptical of this claim, unless you mean it in a way that strikes me a rather unusual. > Modern disk drives read and write a track at a time; this is to > avoid rotational latency that woul happen if you waited for a > hard "sector start" marker to come around, and it avoids the need > for "low level formatting". I'm familiar with drives which will re-order their queue of writes for a track (i.e. SCSI disks with write cache enabled, SCSI disks with command-queued writes without a "ordered task" tag, or ATA disks with caching enabled). But you seem to be implying by your mention of "avoiding rotationaly latency ... waiting for a ... sector start marker" and mention of "low level formatting" that there exists a modern SCSI or ATA disk which writes by simply blasting a whole new track whenever it writes, starting at the current rotational position. This would certainly open the possibility of making the remainder of the track unreadable. Perhaps one of Maxtor's competitors has such a disk, but I don't believe so, because the benchmark performance of such a disk would be abysmal due to the need to read/modify/write the entire track whenever a single sector changes. Are you saying that this whole-track-write mode happens conditionally, only if the queued writes happen to cover the entire track? Perhaps such a disk could be marketable, but the rotational latency advantages in this case are very small compared to the alternative of simply waiting until one of the sectors to be written comes under the head. I know that neither Maxtor's SCSI disks nor their ATA disks blast an entire track in one fell swoop. > For a very small window of time in > the late 1990's, two manufacturers, IBM and Quantum, created disk > drives which were capable of using rotational energy as a power > source (regenerative braking) to complete a write in progress, > following a DC failure (this provided a small post-failure > hold-up time. > > Modern disk drives no longer do this, because disk manufacturers > are morons (or one was a moron, and the others had to compete on > price, which amounts to the same thing). See below - changes made for higher capacity and higher RPM have made it impossible to use the regenerative braking trick on modern drives. > > The net result is that a DC failure can result in an entire track > getting trashed, if it happens at the right time. I'll agree that it can result in partial completion of a queue of writes, with the order of completion being essentially unknowable, and with at most one sector being corrupted, and hence having an invalid ECC (and therefor returning a hard error if read). If that is your definition of "trashing an entire track", I'll accept it. But if you are implying that more than one sector could be unreadable, or that any sector would return data that had not been written to it without giving an error indication, I disagree. The remaining sectors of the track may have new data or old data, depending on the disk scheduling algorithm, but they would not be "corrupt" in the sense of being unreadable, or of returning bogus data without also returning an error indication. If you wish to have writes complete to the media in the order in which you issued then, then you must either a) disable write caching and not use SCSI command queuing for ordered writes or b) enable write caching but do not use SCSI command queuing, and either b1) set the FUA bit in the SCSI CDB and not use command queuing for ordered writes, or b2) follow the ATA write command with a "flush cache" command or c) enable write caching and SCSI command queuing, but c1) set the FUA bit in the SCSI CDB and ensure the command has the "ordered task" attribute in its task tag, so that the command will not be reordered. Upon reflection, I suppose it is possible that if the DC voltage were to remain at the threshold for write-enable for an extended period of time and if the DC-low circuitry for the drive in question did not have hysteresis, then write-gate might toggle off and then back on a few times and as a result corrupt multiple sectors (all of which will show up as hard errors when read). But this would be the result of a design error, not a design intent, and would not apply to all makes and models of disk drives. I agree that it is a shame that drive manufacturers do not offer an "atomic write" feature for a sector. Convince the system manufacturers to supply a "power-fail" warning signal a few milliseconds in advance of the loss of DC power, and I think the drive manufacturers would be happy to provide an atomic write feature. We can no longer use the rotational energy in the platters to keep up the power, because the platter count and media diameter have both steadily decreased - as a result, there is no longer enough rotational inertial to provide the hold-up times needed. Note that it is this reduced platter count and smaller disks which has enabled 10K and 15K RPM disks within the power envelope allotted to a 3.5 inch disk drive. Regards, -Steve (not speaking officially for his employer) -------- Steve Byan Design Engineer Maxtor Corp. MS 1-3/E23 333 South Street Shrewsbury, MA 01545 (508) 770-3414 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Jan 20 11:43:21 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2267037B401 for ; Mon, 20 Jan 2003 11:43:16 -0800 (PST) Received: from rwcrmhc51.attbi.com (rwcrmhc51.attbi.com [204.127.198.38]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8269643F18 for ; Mon, 20 Jan 2003 11:43:15 -0800 (PST) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4]) by rwcrmhc51.attbi.com (rwcrmhc51) with ESMTP id <2003012019431405100pnm8ke>; Mon, 20 Jan 2003 19:43:14 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id LAA40050; Mon, 20 Jan 2003 11:43:13 -0800 (PST) Date: Mon, 20 Jan 2003 11:43:12 -0800 (PST) From: Julian Elischer To: Steve Byan Cc: freebsd-fs@FreeBSD.ORG Subject: Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support) In-Reply-To: <37CA8FF0-2CA5-11D7-962B-00306548867E@maxtor.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org I hate to enter this argument but.... On Mon, 20 Jan 2003, Steve Byan wrote: > > On Friday, January 17, 2003, at 05:27 AM, Terry Lambert wrote: > > > No, the worst case following a power failure is a screwed disk > > track. > > I'm skeptical of this claim, unless you mean it in a way that strikes > me a rather unusual. > > > Modern disk drives read and write a track at a time; this is to > > avoid rotational latency that woul happen if you waited for a > > hard "sector start" marker to come around, and it avoids the need > > for "low level formatting". > > I'm familiar with drives which will re-order their queue of writes for > a track (i.e. SCSI disks with write cache enabled, SCSI disks with > command-queued writes without a "ordered task" tag, or ATA disks with > caching enabled). But you seem to be implying by your mention of > "avoiding rotationaly latency ... waiting for a ... sector start > marker" and mention of "low level formatting" that there exists a > modern SCSI or ATA disk which writes by simply blasting a whole new > track whenever it writes, starting at the current rotational position. WHen I was reading Quntum manuals a few years ago I got the distinct impression that this might be possibly hapenning, however they did say that if you lost power on a drive you could lose the sector that you were writing at the time. They didn't say anything about other sectors.. [...] > I know that neither Maxtor's SCSI disks nor their ATA disks blast an > entire track in one fell swoop. well that's good to know > > See below - changes made for higher capacity and higher RPM have made > it impossible to use the regenerative braking trick on modern drives. > > > > > The net result is that a DC failure can result in an entire track > > getting trashed, if it happens at the right time. > > I'll agree that it can result in partial completion of a queue of > writes, with the order of completion being essentially unknowable, and > with at most one sector being corrupted, and hence having an invalid > ECC (and therefor returning a hard error if read). it would be nice if the drive had enough NVram to hold that one trashed block so it could rewrite it on powerup. > > If that is your definition of "trashing an entire track", I'll accept > it. But if you are implying that more than one sector could be > unreadable, or that any sector would return data that had not been > written to it without giving an error indication, I disagree. The > remaining sectors of the track may have new data or old data, depending > on the disk scheduling algorithm, but they would not be "corrupt" in > the sense of being unreadable, or of returning bogus data without also > returning an error indication. For us the problem is that the drive reports the write as having happenned when it hasn't, so teh filesystem dependencies end up being smashed, because teh filesystem is writing out data in dependency order, but if the data is written in a different order to the drive, the drive can end up being in error in the case of failure. > > If you wish to have writes complete to the media in the order in which > you issued then, then you must either > a) disable write caching and not use SCSI command queuing for ordered > writes > or > b) enable write caching but do not use SCSI command queuing, and either > b1) set the FUA bit in the SCSI CDB and not use command queuing for > ordered writes, or > b2) follow the ATA write command with a "flush cache" command > or > c) enable write caching and SCSI command queuing, but > c1) set the FUA bit in the SCSI CDB and ensure the command has the > "ordered task" attribute in its task tag, so that the command will not > be reordered. that is good information maybe the SCSI and ATA guys can experiment on whether any of these modes gives us acceptable performance. > > > I agree that it is a shame that drive manufacturers do not offer an > "atomic write" feature for a sector. Convince the system manufacturers > to supply a "power-fail" warning signal a few milliseconds in advance > of the loss of DC power, and I think the drive manufacturers would be > happy to provide an atomic write feature. We did this on the Whustle Interjet-II. we couldn't trust the drive manufacturer however, so we had a 70mS hold-up built in which gave us enough time to do things from the kernel. (The hardest place was japan where some places only have AC at about 90VAC and so that was where the 70mSec was measured. at 240VAC (Australia) we had almost 200mSecs from memory :-) The holdup had to give the drive time to complete a seek, write to a track, discover that there was a bad sector in that range, reseek and write the bad part, and reseek back and complete the original write, possibly overflowing to the next track.. It would be interesting if you could tell us what the minimum hold-up would be for a drive to complete any particular given write where the write could be up-to 128KB in size, all worst cases.. > We can no longer use the > rotational energy in the platters to keep up the power, because the > platter count and media diameter have both steadily decreased - as a > result, there is no longer enough rotational inertial to provide the > hold-up times needed. Note that it is this reduced platter count and > smaller disks which has enabled 10K and 15K RPM disks within the power > envelope allotted to a 3.5 inch disk drive. > > Regards, > -Steve (not speaking officially for his employer) > -------- > Steve Byan > Design Engineer > Maxtor Corp. > MS 1-3/E23 > 333 South Street > Shrewsbury, MA 01545 > (508) 770-3414 > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-fs" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Jan 20 21:34:17 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 87AD437B401; Mon, 20 Jan 2003 21:34:15 -0800 (PST) Received: from obsecurity.dyndns.org (adsl-64-169-106-179.dsl.lsan03.pacbell.net [64.169.106.179]) by mx1.FreeBSD.org (Postfix) with ESMTP id BB1DB43F13; Mon, 20 Jan 2003 21:34:14 -0800 (PST) (envelope-from kris@obsecurity.org) Received: from rot13.obsecurity.org (rot13.obsecurity.org [10.0.0.5]) by obsecurity.dyndns.org (Postfix) with ESMTP id 4CC4566B60; Mon, 20 Jan 2003 21:34:14 -0800 (PST) Received: by rot13.obsecurity.org (Postfix, from userid 1000) id 37DF91621; Mon, 20 Jan 2003 21:34:14 -0800 (PST) Date: Mon, 20 Jan 2003 21:34:14 -0800 From: Kris Kennaway To: fs@FreeBSD.org, current@FreeBSD.org Subject: "panic: softdep_disk_io_initiation: read" on 5.0-R Message-ID: <20030121053414.GA8146@rot13.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ReaqsoxgOBHFXBhH" Content-Disposition: inline User-Agent: Mutt/1.4i Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org --ReaqsoxgOBHFXBhH Content-Type: text/plain; charset=us-ascii Content-Disposition: inline One of the gohan machines panicked today [1] with the following: panic: softdep_disk_io_initiation: read Debugger("panic") Stopped at Debugger+0x54: xchgl %ebx,in_Debugger.0 db> trace Debugger(c04167bc,c048aee0,c0428d55,d8e61b68,1) at Debugger+0x54 panic(c0428d55,0,c042c07f,d8e61b80,ce4fd9a8) at panic+0xab softdep_disk_io_initiation(ce4fd9a8,ce4dc090,0,313,c041c03d) at softdep_disk_io_initiation+0xac cluster_wbuild(c6727000,4000,b,0,4) at cluster_wbuild+0x32d vfs_bio_awrite(ce4fd9a8,12,c0429a9d,d1,c02a9f36) at vfs_bio_awrite+0x2ac ffs_fsync(d8e61c90,12,c4160a80,477,0) at ffs_fsync+0x2ca ffs_sync(c44a3800,2,c55f7900,c4160a80,c44a3800) at ffs_sync+0x18e sync(c4160a80,d8e61d10,c0430849,407,0) at sync+0xeb syscall(2f,2f,2f,bfbff950,3) at syscall+0x28e Xint0x80_syscall() at Xint0x80_syscall+0x1d --- syscall (36, FreeBSD ELF32, sync), eip = 0x804b3cb, esp = 0xbfbff8ac, ebp = 0xbfbff928 --- It's running 5.0-RC from about a month ago. Kris [1] Now that 5.0-R is out, the panics have begun again. --ReaqsoxgOBHFXBhH Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+LNvVWry0BWjoQKURAtp4AKCS8XA5LxB4/8aL5QaoHRp0L6WFHACfdbyU AL1Dpakd9z8tBeWCPg2a+Uk= =nJ5C -----END PGP SIGNATURE----- --ReaqsoxgOBHFXBhH-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 1: 2:18 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4F72F37B401 for ; Tue, 21 Jan 2003 01:02:17 -0800 (PST) Received: from thebsh.namesys.com (thebsh.namesys.com [212.16.7.65]) by mx1.FreeBSD.org (Postfix) with SMTP id 54AC143E4A for ; Tue, 21 Jan 2003 01:02:15 -0800 (PST) (envelope-from Nikita@Namesys.COM) Received: (qmail 20450 invoked from network); 21 Jan 2003 09:02:14 -0000 Received: from laputa.namesys.com.7.16.212.in-addr.arpa (HELO laputa.namesys.com) (212.16.7.124) by thebsh.namesys.com with SMTP; 21 Jan 2003 09:02:14 -0000 Received: by laputa.namesys.com (Postfix on SuSE Linux 8.0 (i386), from userid 511) id C553CBAE6; Tue, 21 Jan 2003 12:02:12 +0300 (MSK) From: Nikita Danilov MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15917.3220.652605.965883@laputa.namesys.com> Date: Tue, 21 Jan 2003 12:02:12 +0300 X-PGP-Fingerprint: 43CE 9384 5A1D CD75 5087 A876 A1AA 84D0 CCAA AC92 X-PGP-Key-ID: CCAAAC92 X-PGP-Key-At: http://wwwkeys.pgp.net:11371/pks/lookup?op=get&search=0xCCAAAC92 To: Julian Elischer Cc: Steve Byan , freebsd-fs@FreeBSD.ORG Subject: Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support) In-Reply-To: References: <37CA8FF0-2CA5-11D7-962B-00306548867E@maxtor.com> X-Mailer: VM 7.07 under 21.5 (beta9) "brussels sprouts" XEmacs Lucid Tomato: Mauve Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Julian Elischer writes: > I hate to enter this argument but.... > > > On Mon, 20 Jan 2003, Steve Byan wrote: > [...] > > > > > If you wish to have writes complete to the media in the order in which > > you issued then, then you must either > > a) disable write caching and not use SCSI command queuing for ordered > > writes > > or > > b) enable write caching but do not use SCSI command queuing, and either > > b1) set the FUA bit in the SCSI CDB and not use command queuing for > > ordered writes, or > > b2) follow the ATA write command with a "flush cache" command > > or > > c) enable write caching and SCSI command queuing, but > > c1) set the FUA bit in the SCSI CDB and ensure the command has the > > "ordered task" attribute in its task tag, so that the command will not > > be reordered. > > > > that is good information > maybe the SCSI and ATA guys can experiment on whether any of these modes > gives us acceptable performance. > Linux reiserfs on SCSI devices can run with write-behind caching, and uses write barriers to write transaction commit records. It has been found that performance in this case is identical to just running (unsafely) with write-behind caching, which is much better than using write-through cache. Sorry, I don't have any numbers on this. > > > > [...] Nikita. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 6:41:15 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2299937B401 for ; Tue, 21 Jan 2003 06:41:13 -0800 (PST) Received: from mcomail01.maxtor.com (mcomail01.maxtor.com [134.6.76.15]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5B62843F5B for ; Tue, 21 Jan 2003 06:41:12 -0800 (PST) (envelope-from stephen_byan@maxtor.com) Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1]) by mcomail01.maxtor.com (8.11.6/8.11.6) with ESMTP id h0LEV7w05802 for ; Tue, 21 Jan 2003 07:31:07 -0700 Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id DH1PZRTY; Tue, 21 Jan 2003 07:41:15 -0700 Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM) id JAA0000023521; Tue, 21 Jan 2003 09:41:00 -0500 (EST) Date: Tue, 21 Jan 2003 09:40:52 -0500 Subject: Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support) Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v551) From: Steve Byan To: freebsd-fs@FreeBSD.ORG Content-Transfer-Encoding: 7bit In-Reply-To: Message-Id: <5777A7A4-2D4E-11D7-962B-00306548867E@maxtor.com> X-Mailer: Apple Mail (2.551) Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Monday, January 20, 2003, at 02:43 PM, Julian Elischer wrote: >> I'll agree that it can result in partial completion of a queue of >> writes, with the order of completion being essentially unknowable, and >> with at most one sector being corrupted, and hence having an invalid >> ECC (and therefor returning a hard error if read). > > it would be nice if the drive had enough NVram to hold that one trashed > block so it could rewrite it on powerup. If enough customers show up waving dollar bills in their hands ... > For us the problem is that the drive reports the write as having > happenned when it hasn't, so teh filesystem dependencies end up being > smashed, because teh filesystem is writing out data in dependency > order, > but if the data is written in a different order to the drive, > the drive can end up being in error in the case of failure. That's the cost of write-behind caching. SCSI gives you enough control to avoid this problem. ATA disks don't, but at least they're inexpensive. > We did this on the Whustle Interjet-II. we couldn't trust the drive > manufacturer however, so we had a 70mS hold-up built in > which gave us enough time to do things from the kernel. > (The hardest place was japan where some places only have AC at about > 90VAC and so that was where the 70mSec was measured. at > 240VAC (Australia) we had almost 200mSecs from memory :-) > > The holdup had to give the drive time to complete a seek, write to a > track, discover that there was a bad sector in that range, reseek and > write the bad part, and reseek back and complete the original write, > possibly overflowing to the next track.. > > It would be interesting if you could tell us what the minimum hold-up > would be for a drive to complete any particular given write where the > write could be up-to 128KB in size, all worst cases.. Ick, that could be a big number, maybe a couple of seconds in the very worst-case, I dunno for sure. I think you're probably talking a UPS rather than a large filter cap in the power supply. I think it's technically better to accept that you're not going to get all the data on the disk when power fails, and supply a "power fail" signal to the drive a few sector-times in advance of the power going out of the spec-limits. That way the drive could guarantee that it won't partially overwrite a sector. Regards, -Steve -------- Steve Byan Design Engineer Maxtor Corp. MS 1-3/E23 333 South Street Shrewsbury, MA 01545 (508) 770-3414 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 7: 5:21 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 853F637B401 for ; Tue, 21 Jan 2003 07:05:20 -0800 (PST) Received: from mail.eecs.harvard.edu (bowser.eecs.harvard.edu [140.247.60.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5A25543F1E for ; Tue, 21 Jan 2003 07:05:16 -0800 (PST) (envelope-from ellard@eecs.harvard.edu) Received: by mail.eecs.harvard.edu (Postfix, from userid 465) id DCA3F54C6D1; Tue, 21 Jan 2003 10:05:10 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mail.eecs.harvard.edu (Postfix) with ESMTP id DA69754C580 for ; Tue, 21 Jan 2003 10:05:10 -0500 (EST) Date: Tue, 21 Jan 2003 10:05:10 -0500 (EST) From: Dan Ellard To: freebsd-fs@freebsd.org Subject: how to submit ideas/code for NFS changes? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org As part of some work related to my thesis, I've made some experimental changes to the FreeBSD NFS server code. Some of the changes improved end-to-end read performance substantially. For example, improving the read-ahead heuristic to recognize and handle some predictable but non-sequential access patterns can boost performance by at least 40%. How do I go about submitting my changes to the FreeBSD developers so that they can review them, and decide if they want to incorporate them, make them an optional patch, or ignore them, etc? Thanks, -Dan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 7:15:12 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 28ABB37B401 for ; Tue, 21 Jan 2003 07:15:11 -0800 (PST) Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id E1C0843EB2 for ; Tue, 21 Jan 2003 07:15:09 -0800 (PST) (envelope-from des@ofug.org) Received: by flood.ping.uio.no (Postfix, from userid 2602) id AEA4B536E; Tue, 21 Jan 2003 16:15:07 +0100 (CET) X-URL: http://www.ofug.org/~des/ X-Disclaimer: The views expressed in this message do not necessarily coincide with those of any organisation or company with which I am or have been affiliated. To: fs@freebsd.org Subject: RAID alternatives From: Dag-Erling Smorgrav Date: Tue, 21 Jan 2003 16:15:06 +0100 Message-ID: Lines: 32 User-Agent: Gnus/5.090007 (Oort Gnus v0.07) Emacs/21.2 (i386--freebsd) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org I'm planning a new system to replace my aging 350 MHz K6-2, and am considering various options for increasing disk performance and reliability. I'm thinking of running RAID level 5 across three or four identical IDE disks. The question is what RAID solution to pick: - Software RAID (free, but can't boot from it, and possibly not 100% reliable). Options include: - RAIDframe: "should be considered highly experimental". Any experiences with this? Does it work? Is it fast? Is it reliable? I see it requires wiring down the drive IDs, which vinum doesn't IIRC. - Vinum: I've had mixed experiences with this. There have been some embarassing bugs, particularly in the recovery code, and it has had a tendency to crash the system. Has it improved with age? - GEOM: might be an option in a year or so, but not now. - Hardware RAID (more expensive, but less hassle and possibly higher performance). The problem here is that IDE RAID controllers don't seem to support RAID 4 or 5; they only support RAID 0, which is pointless on its own; RAID 1, which is horribly wasteful; and JBOD, which is just a fancy name for disk concatenation, and is even more pointless than RAID 0. The exception seems to be the 3ware 7500 series - which FreeBSD doesn't seem to support. I'd be happy to be contradicted :) DES -- Dag-Erling Smorgrav - des@ofug.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 7:20:27 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7D32837B401 for ; Tue, 21 Jan 2003 07:20:26 -0800 (PST) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 649FF43E4A for ; Tue, 21 Jan 2003 07:20:25 -0800 (PST) (envelope-from phk@freebsd.org) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0LFJs6Z015006; Tue, 21 Jan 2003 16:19:54 +0100 (CET) (envelope-from phk@freebsd.org) To: Dag-Erling Smorgrav Cc: fs@freebsd.org Subject: Re: RAID alternatives From: phk@freebsd.org In-Reply-To: Your message of "Tue, 21 Jan 2003 16:15:06 +0100." Date: Tue, 21 Jan 2003 16:19:54 +0100 Message-ID: <15005.1043162394@critter.freebsd.dk> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org In message , Dag-Erling Smorgrav writes: > - Hardware RAID (more expensive, but less hassle and possibly higher > performance). I have a promise SX6000 which supports RAID5 and so far it has been behaving itself and seems to be giving good performance. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 7:37:42 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3F1D837B401 for ; Tue, 21 Jan 2003 07:37:41 -0800 (PST) Received: from loopingstar.vidampark.hu (loopingstar.vidampark.hu [195.228.49.130]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6F0C443E4A for ; Tue, 21 Jan 2003 07:37:40 -0800 (PST) (envelope-from mico@vnet.hu) Received: by loopingstar.vidampark.hu (Postfix, from userid 1001) id 6276DBA6E; Tue, 21 Jan 2003 16:31:33 +0100 (CET) Date: Tue, 21 Jan 2003 16:31:32 +0100 From: Miklos Niedermayer To: Dag-Erling Smorgrav Cc: fs@freebsd.org Subject: Re: RAID alternatives Message-ID: <20030121153132.GD50232@bsd.hu> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Hi, On Tue, Jan 21, 2003 at 04:15:06PM +0100, Dag-Erling Smorgrav wrote: > - Hardware RAID (more expensive, but less hassle and possibly higher > performance). The problem here is that IDE RAID controllers don't > seem to support RAID 4 or 5; they only support RAID 0, which is > pointless on its own; RAID 1, which is horribly wasteful; and JBOD, Actually, those cheap IDE "RAID controllers" without RAID4/5 aren't RAID controllers. Mirroring is done by the operating system (you can check this, install the OS on a mirror, then type iostat 2 - the OS will be handling the disks separately). Avoid using these. Real IDE RAID controllers cost almost as much as the SCSI ones. I've only met the Adaptec 2400, it runs fine... bye mico -- Miklos Niedermayer tel +36203891185 icq 161267913 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 7:44:52 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A8D8337B401 for ; Tue, 21 Jan 2003 07:44:49 -0800 (PST) Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id DBA4943F1E for ; Tue, 21 Jan 2003 07:44:48 -0800 (PST) (envelope-from des@ofug.org) Received: by flood.ping.uio.no (Postfix, from userid 2602) id 55256536E; Tue, 21 Jan 2003 16:44:44 +0100 (CET) X-URL: http://www.ofug.org/~des/ X-Disclaimer: The views expressed in this message do not necessarily coincide with those of any organisation or company with which I am or have been affiliated. To: fs@freebsd.org Subject: Re: RAID alternatives References: From: Dag-Erling Smorgrav Date: Tue, 21 Jan 2003 16:44:44 +0100 In-Reply-To: (Dag-Erling Smorgrav's message of "Tue, 21 Jan 2003 16:15:06 +0100") Message-ID: Lines: 58 User-Agent: Gnus/5.090007 (Oort Gnus v0.07) Emacs/21.2 (i386--freebsd) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org --=-=-= Dag-Erling Smorgrav writes: > - RAIDframe: "should be considered highly experimental". Any > experiences with this? Does it work? Is it fast? Is it > reliable? I see it requires wiring down the drive IDs, which > vinum doesn't IIRC. Ho, hum, well, RAIDframe fails to impress me: root@dsa ~# raidctl -c raid0.conf Kernelized RAIDframe activated Searching for raid components... RAIDFRAME: protectedSectors is 64 Waiting for DAG engine to start raid0: rebuilding: raidlookup on device: failed: 2! raidlookup on device: failed! raid0: rebuilding: raidlookup on device: failed: 2! raidlookup on device: failed! raid0: rebuilding: raidlookup on device: failed: 2! raidlookup on device: failed! raid0: rebuilding: raidlookup on device: failed: 2! raidlookup on device: failed! Error: block size on disks (0) must be a power of 2 RAIDFRAME: failed rf_ConfigureDisks with 22 fatal kernel trap: trap entry = 0x2 (memory management fault) faulting va = 0xdeadc0dedeadc0de type = access violation cause = load instructon pc = 0xfffffc00004be3a4 ra = 0xfffffc00004703c4 sp = 0xfffffe000bda1a90 curthread = 0xfffffc000f79f170 pid = 635, comm = raid Stopped at strlen+0x4: ldbu t0,0(a0) <0xdeadc0dedeadc0de> db> trace strlen() at strlen+0x4 kvprintf() at kvprintf+0x634 vsnprintf() at vsnprintf+0x40 panic() at panic+0xb4 _mtx_lock_flags() at _mtx_lock_flags+0x70 msleep() at msleep+0x6e0 DAGExecutionThread() at DAGExecutionThread+0x260 fork_exit() at fork_exit+0xf0 exception_return() at exception_return --- root of call graph --- (see attached configuration file) DES -- Dag-Erling Smorgrav - des@ofug.org --=-=-= Content-Disposition: attachment; filename=raid0.conf # # Experimental RAID5 configuration # # 1 row, 4 columns, 0 spares START array 1 4 0 # 4 "disks" START disks /dev/da1d /dev/da1e /dev/da1f /dev/da1g # RAID level 5 with 32-sector interleave factor START layout 32 1 1 5 # 100-deep FIFO queue START queue fifo 100 --=-=-=-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 7:50:45 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 31EC337B401 for ; Tue, 21 Jan 2003 07:50:45 -0800 (PST) Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5E0BE43F3F for ; Tue, 21 Jan 2003 07:50:44 -0800 (PST) (envelope-from des@ofug.org) Received: by flood.ping.uio.no (Postfix, from userid 2602) id A368E536E; Tue, 21 Jan 2003 16:50:42 +0100 (CET) X-URL: http://www.ofug.org/~des/ X-Disclaimer: The views expressed in this message do not necessarily coincide with those of any organisation or company with which I am or have been affiliated. To: Miklos Niedermayer Cc: fs@freebsd.org Subject: Re: RAID alternatives References: <20030121153132.GD50232@bsd.hu> From: Dag-Erling Smorgrav Date: Tue, 21 Jan 2003 16:50:42 +0100 In-Reply-To: <20030121153132.GD50232@bsd.hu> (Miklos Niedermayer's message of "Tue, 21 Jan 2003 16:31:32 +0100") Message-ID: Lines: 9 User-Agent: Gnus/5.090007 (Oort Gnus v0.07) Emacs/21.2 (i386--freebsd) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Miklos Niedermayer writes: > Real IDE RAID controllers cost almost as much as the SCSI ones. I've only > met the Adaptec 2400, it runs fine... $400 is a little steep though :( DES -- Dag-Erling Smorgrav - des@ofug.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 7:52: 1 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6763037B401; Tue, 21 Jan 2003 07:52:00 -0800 (PST) Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id D4D4F43EB2; Tue, 21 Jan 2003 07:51:59 -0800 (PST) (envelope-from des@ofug.org) Received: by flood.ping.uio.no (Postfix, from userid 2602) id B410F536E; Tue, 21 Jan 2003 16:51:58 +0100 (CET) X-URL: http://www.ofug.org/~des/ X-Disclaimer: The views expressed in this message do not necessarily coincide with those of any organisation or company with which I am or have been affiliated. To: phk@freebsd.org Cc: fs@freebsd.org Subject: Re: RAID alternatives References: <15005.1043162394@critter.freebsd.dk> From: Dag-Erling Smorgrav Date: Tue, 21 Jan 2003 16:51:58 +0100 In-Reply-To: <15005.1043162394@critter.freebsd.dk> (phk@freebsd.org's message of "Tue, 21 Jan 2003 16:19:54 +0100") Message-ID: Lines: 10 User-Agent: Gnus/5.090007 (Oort Gnus v0.07) Emacs/21.2 (i386--freebsd) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org phk@freebsd.org writes: > I have a promise SX6000 which supports RAID5 and so far it has been > behaving itself and seems to be giving good performance. Aha! It seems they also have a four-channel version which is almost affordable. Thanks! DES -- Dag-Erling Smorgrav - des@ofug.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 7:56:57 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0FC1237B401 for ; Tue, 21 Jan 2003 07:56:57 -0800 (PST) Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6B22543F3F for ; Tue, 21 Jan 2003 07:56:56 -0800 (PST) (envelope-from des@ofug.org) Received: by flood.ping.uio.no (Postfix, from userid 2602) id DC4B5536E; Tue, 21 Jan 2003 16:56:54 +0100 (CET) X-URL: http://www.ofug.org/~des/ X-Disclaimer: The views expressed in this message do not necessarily coincide with those of any organisation or company with which I am or have been affiliated. To: "Cheen Liao" Cc: Subject: Re: Transaction File System - a replacement of JFS References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com> <001401c2be93$c36c7490$681adf3d@homexp> From: Dag-Erling Smorgrav Date: Tue, 21 Jan 2003 16:56:54 +0100 In-Reply-To: <001401c2be93$c36c7490$681adf3d@homexp> ("Cheen Liao"'s message of "Sat, 18 Jan 2003 09:48:55 +0800") Message-ID: Lines: 8 User-Agent: Gnus/5.090007 (Oort Gnus v0.07) Emacs/21.2 (i386--freebsd) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org "Cheen Liao" writes: > . develop a prototype on FreeBSD 4.x. Don't bother. Save yourselves a lot of pain by going directly to 5.0. DES -- Dag-Erling Smorgrav - des@ofug.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 9:49: 0 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 56C8F37B401 for ; Tue, 21 Jan 2003 09:48:59 -0800 (PST) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0AB0343F5B for ; Tue, 21 Jan 2003 09:48:59 -0800 (PST) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id F2FCCAE1C1; Tue, 21 Jan 2003 09:48:54 -0800 (PST) Date: Tue, 21 Jan 2003 09:48:54 -0800 From: Alfred Perlstein To: Dan Ellard Cc: freebsd-fs@freebsd.org Subject: Re: how to submit ideas/code for NFS changes? Message-ID: <20030121174854.GF33821@elvis.mu.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org * Dan Ellard [030121 07:05] wrote: > > As part of some work related to my thesis, I've made some experimental > changes to the FreeBSD NFS server code. Some of the changes improved > end-to-end read performance substantially. For example, improving the > read-ahead heuristic to recognize and handle some predictable but > non-sequential access patterns can boost performance by at least 40%. > > How do I go about submitting my changes to the FreeBSD developers so > that they can review them, and decide if they want to incorporate > them, make them an optional patch, or ignore them, etc? The best thing to do is to open a bug report, then post here about it. It's probably best if you CC Matt Dillon (dillon@freebsd.org) and myself on it as well. Thank you for taking an interest! -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 10:25:27 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8FEDF37B401 for ; Tue, 21 Jan 2003 10:25:25 -0800 (PST) Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id 104AD43F13 for ; Tue, 21 Jan 2003 10:25:25 -0800 (PST) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (IDENT:brdavis@localhost.localdomain [127.0.0.1]) by odin.ac.hmc.edu (8.12.3/8.12.3) with ESMTP id h0LIPI6F031448; Tue, 21 Jan 2003 10:25:18 -0800 Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.12.3/8.12.3/Submit) id h0LIPH7X031441; Tue, 21 Jan 2003 10:25:17 -0800 Date: Tue, 21 Jan 2003 10:25:17 -0800 From: Brooks Davis To: Dag-Erling Smorgrav Cc: fs@FreeBSD.ORG Subject: Re: RAID alternatives Message-ID: <20030121102517.B10617@Odin.AC.HMC.Edu> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="V0207lvV8h4k8FAm" Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from des@ofug.org on Tue, Jan 21, 2003 at 04:15:06PM +0100 X-Virus-Scanned: by amavisd-milter (http://amavis.org/) on odin.ac.hmc.edu Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org --V0207lvV8h4k8FAm Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jan 21, 2003 at 04:15:06PM +0100, Dag-Erling Smorgrav wrote: > - Hardware RAID (more expensive, but less hassle and possibly higher > performance). The problem here is that IDE RAID controllers don't > seem to support RAID 4 or 5; they only support RAID 0, which is > pointless on its own; RAID 1, which is horribly wasteful; and JBOD, > which is just a fancy name for disk concatenation, and is even more > pointless than RAID 0. The exception seems to be the 3ware 7500 > series - which FreeBSD doesn't seem to support. I'd be happy to be > contradicted :) FreeBSD definatly support the 3ware 7000 series and I'm fairly sure it support the newer ones as well. 3Ware seems to have done the right thing and maintained the API and device IDs across revs so the driver doesn't need changes. -- Brooks --=20 Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 --V0207lvV8h4k8FAm Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE+LZCMXY6L6fI4GtQRAl0uAJ4ndCdsi/cJZQdc13pbJwMSYbr/1QCfYtrq fu444M+GK8Z32ot4XALXMJQ= =5rOU -----END PGP SIGNATURE----- --V0207lvV8h4k8FAm-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 10:48:18 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0A60237B401 for ; Tue, 21 Jan 2003 10:48:17 -0800 (PST) Received: from sccrmhc03.attbi.com (sccrmhc03.attbi.com [204.127.202.63]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3F76E43E4A for ; Tue, 21 Jan 2003 10:48:16 -0800 (PST) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4]) by sccrmhc03.attbi.com (sccrmhc03) with ESMTP id <200301211848150030005ovke>; Tue, 21 Jan 2003 18:48:15 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id KAA49410; Tue, 21 Jan 2003 10:48:14 -0800 (PST) Date: Tue, 21 Jan 2003 10:48:13 -0800 (PST) From: Julian Elischer To: Dag-Erling Smorgrav Cc: fs@freebsd.org Subject: Re: RAID alternatives In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tue, 21 Jan 2003, Dag-Erling Smorgrav wrote: > I'm planning a new system to replace my aging 350 MHz K6-2, and am > considering various options for increasing disk performance and > reliability. I'm thinking of running RAID level 5 across three or > four identical IDE disks. The question is what RAID solution to pick: [...] > - Hardware RAID (more expensive, but less hassle and possibly higher > performance). The problem here is that IDE RAID controllers don't > seem to support RAID 4 or 5; they only support RAID 0, which is > pointless on its own; RAID 1, which is horribly wasteful; and JBOD, > which is just a fancy name for disk concatenation, and is even more > pointless than RAID 0. The exception seems to be the 3ware 7500 > series - which FreeBSD doesn't seem to support. I'd be happy to be > contradicted :) twe0: <3ware Storage Controller> port 0x8400-0x840f mem 0xf3800000-0xf3ffffff,0x f4000000-0xf400000f irq 10 at device 11.0 on pci0 twe0: 8 ports, Firmware FE7X 1.05.00.034, BIOS BE7X 1.08.00.038 twe1: <3ware Storage Controller> port 0x8000-0x800f mem 0xf2800000-0xf2ffffff,0x f3000000-0xf300000f irq 11 at device 12.0 on pci0 twe1: 8 ports, Firmware FE7X 1.05.00.034, BIOS BE7X 1.08.00.038 twed0: on twe0 twed0: 1526248MB (3125755904 sectors) twed1: on twe1 twed1: 1526248MB (3125755904 sectors) that's two TWE 7500-8 cards, each with 8 x 200GB drives. so far no problems. (except you need to partition them with disklabel as sysinstall can't handle the size..) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 10:56:40 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 809E737B401 for ; Tue, 21 Jan 2003 10:56:39 -0800 (PST) Received: from sccrmhc02.attbi.com (sccrmhc02.attbi.com [204.127.202.62]) by mx1.FreeBSD.org (Postfix) with ESMTP id E810643E4A for ; Tue, 21 Jan 2003 10:56:38 -0800 (PST) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4]) by sccrmhc02.attbi.com (sccrmhc02) with ESMTP id <2003012118563700200mlmshe>; Tue, 21 Jan 2003 18:56:38 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id KAA49482; Tue, 21 Jan 2003 10:56:37 -0800 (PST) Date: Tue, 21 Jan 2003 10:56:35 -0800 (PST) From: Julian Elischer To: Dag-Erling Smorgrav Cc: fs@freebsd.org Subject: Re: RAID alternatives In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tue, 21 Jan 2003, Julian Elischer wrote: > twed1: on twe1 prior to this I was running raid5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 11:28:48 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5868E37B401 for ; Tue, 21 Jan 2003 11:28:47 -0800 (PST) Received: from 66-162-33-178.gen.twtelecom.net (66-162-33-181.gen.twtelecom.net [66.162.33.181]) by mx1.FreeBSD.org (Postfix) with ESMTP id DC9A943E4A for ; Tue, 21 Jan 2003 11:28:46 -0800 (PST) (envelope-from jeff@expertcity.com) Received: from [10.4.1.134] (helo=expertcity.com) by 66-162-33-178.gen.twtelecom.net with esmtp (Exim 3.22 #4) id 18b44k-0003wf-00; Tue, 21 Jan 2003 11:28:46 -0800 Message-ID: <3E2D9F6C.7050600@expertcity.com> Date: Tue, 21 Jan 2003 11:28:44 -0800 From: Jeff Behl User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3a) Gecko/20021212 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Dag-Erling Smorgrav Cc: fs@FreeBSD.ORG Subject: Re: RAID alternatives References: <20030121102517.B10617@Odin.AC.HMC.Edu> In-Reply-To: <20030121102517.B10617@Odin.AC.HMC.Edu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org We've been using a 3ware 7500 for months in a raid5 configuration; works great... Brooks Davis wrote: > On Tue, Jan 21, 2003 at 04:15:06PM +0100, Dag-Erling Smorgrav wrote: > >> - Hardware RAID (more expensive, but less hassle and possibly higher >> performance). The problem here is that IDE RAID controllers don't >> seem to support RAID 4 or 5; they only support RAID 0, which is >> pointless on its own; RAID 1, which is horribly wasteful; and JBOD, >> which is just a fancy name for disk concatenation, and is even more >> pointless than RAID 0. The exception seems to be the 3ware 7500 >> series - which FreeBSD doesn't seem to support. I'd be happy to be >> contradicted :) > > > FreeBSD definatly support the 3ware 7000 series and I'm fairly sure it > support the newer ones as well. 3Ware seems to have done the right > thing and maintained the API and device IDs across revs so the driver > doesn't need changes. > > -- Brooks > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 12: 4:29 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EC68D37B401 for ; Tue, 21 Jan 2003 12:04:28 -0800 (PST) Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id 72C0E43F13 for ; Tue, 21 Jan 2003 12:04:24 -0800 (PST) (envelope-from des@ofug.org) Received: by flood.ping.uio.no (Postfix, from userid 2602) id 3C80C536E; Tue, 21 Jan 2003 21:04:20 +0100 (CET) X-URL: http://www.ofug.org/~des/ X-Disclaimer: The views expressed in this message do not necessarily coincide with those of any organisation or company with which I am or have been affiliated. To: Jeff Behl Cc: fs@FreeBSD.ORG Subject: Re: RAID alternatives References: <20030121102517.B10617@Odin.AC.HMC.Edu> <3E2D9F6C.7050600@expertcity.com> From: Dag-Erling Smorgrav Date: Tue, 21 Jan 2003 21:04:20 +0100 In-Reply-To: <3E2D9F6C.7050600@expertcity.com> (Jeff Behl's message of "Tue, 21 Jan 2003 11:28:44 -0800") Message-ID: Lines: 10 User-Agent: Gnus/5.090007 (Oort Gnus v0.07) Emacs/21.2 (i386--freebsd) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Jeff Behl writes: > We've been using a 3ware 7500 for months in a raid5 configuration; > works great... You're right, I misread the driver source and thought it wouldn't attach to a 7000 or higher. DES -- Dag-Erling Smorgrav - des@ofug.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 12:27: 8 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A952837B401 for ; Tue, 21 Jan 2003 12:27:07 -0800 (PST) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 619AF43ED8 for ; Tue, 21 Jan 2003 12:27:07 -0800 (PST) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id 2E850AE2B7; Tue, 21 Jan 2003 12:27:02 -0800 (PST) Date: Tue, 21 Jan 2003 12:27:02 -0800 From: Alfred Perlstein To: Dag-Erling Smorgrav Cc: Jeff Behl , fs@FreeBSD.ORG Subject: Re: RAID alternatives Message-ID: <20030121202702.GI33821@elvis.mu.org> References: <20030121102517.B10617@Odin.AC.HMC.Edu> <3E2D9F6C.7050600@expertcity.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org * Dag-Erling Smorgrav [030121 12:04] wrote: > Jeff Behl writes: > > We've been using a 3ware 7500 for months in a raid5 configuration; > > works great... > > You're right, I misread the driver source and thought it wouldn't > attach to a 7000 or higher. I'm using a 7500 here, I _was_ using RAID-5 but it was very, very slow (not as slow as the Adaptec one though) so I switched to RAID-10 and it flys. Only other problem is that the dump routine has been broken by the dump API under -current and I'm too clueless and lazy to fix it atm. -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 14:40:44 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0AAA037B401 for ; Tue, 21 Jan 2003 14:40:41 -0800 (PST) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 46F4143E4A for ; Tue, 21 Jan 2003 14:40:40 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0183.cvx40-bradley.dialup.earthlink.net ([216.244.42.183] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18b74L-0005AO-00; Tue, 21 Jan 2003 14:40:34 -0800 Message-ID: <3E2DCC0C.FCAB2EFF@mindspring.com> Date: Tue, 21 Jan 2003 14:39:08 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Steve Byan Cc: freebsd-fs@FreeBSD.ORG Subject: Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support) References: <5777A7A4-2D4E-11D7-962B-00306548867E@maxtor.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4cb781d7f3dfca911418ea9d5c0b57511387f7b89c61deb1d350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Steve Byan wrote: > On Monday, January 20, 2003, at 02:43 PM, Julian Elischer wrote: > > it would be nice if the drive had enough NVram to hold that one trashed > > block so it could rewrite it on powerup. > > If enough customers show up waving dollar bills in their hands ... The disk manufacturers have historically not recognized new markets, until after their competition has already entered them. This was true for 8 inch floppies, 5 Inch floppies, 3 inch floppies, 14 inch fixed disks, 5 inch fixed disks, 3 inch fixed disks, 2.5 inch fixed disks, and the new, quarter size fixed disks. In all cases, the market for these products was a new market (minicomputer, desktop, luggable, minicomputer, desktop, luggable, laptop, cameras/etc.) for which the companies were unprepared, and therefore failed to pursue. How many people remember Shugart or Miniscribe? The problem is that everyone is trying to sell up-market from where they started, as their companies become fat, dumb, and lazy, and therefore require higher margins for the same products. Old companies do not innovate: the decisions that make them profitable make them incapable of anything other than evolutionary, not revolutionary, advances. Don't worry, Open Source Software projects are in exactly the same boat: it requires an entirely different skill set to enter a new market. There's a great book on this: The Innovator's Dilemma Clayton M. Christensen HarperBusiness ISBN: 0-06-662069-4 The hard disk industry is one of his three major examples. 8-). FWIW: the major market you are not seeing here is ATA RAID arrays that can compete with SCSI RAID arrays from other disk vendors, where you can leverage the ATA economyies of scale that make SCSI disks more expensive than ATA disks, in the first place. Basically, the first ATA disk manufacturer to do this will spike much of their competitions SCSI market,as soon as the software types become aware of the change (see below). > > For us the problem is that the drive reports the write as having > > happenned when it hasn't, so teh filesystem dependencies end up being > > smashed, because teh filesystem is writing out data in dependency > > order, > > but if the data is written in a different order to the drive, > > the drive can end up being in error in the case of failure. > > That's the cost of write-behind caching. SCSI gives you enough control > to avoid this problem. ATA disks don't, but at least they're > inexpensive. Which is why people call ATA drives "crap", and disk manufacturers get upset about it: they are competing on size and spindle speed, and somehow seem to have forgotten one of the purposes of their products is to _reliably store data_. The funny thing is that it would cost them nearly nothing, now that they have tagged command queues for ATA drives, to put this feature into ATA drives, as well... in fact, it may even be no more than a firmware hack. > Ick, that could be a big number, maybe a couple of seconds in the very > worst-case, I dunno for sure. I think you're probably talking a UPS > rather than a large filter cap in the power supply. I think it's > technically better to accept that you're not going to get all the data > on the disk when power fails, and supply a "power fail" signal to the > drive a few sector-times in advance of the power going out of the > spec-limits. That way the drive could guarantee that it won't partially > overwrite a sector. That's a really annoying point of view. 8-). The problem with this approach is that it requires cable changes to the drive interface, unless you designate one of the "spare" grounds as being inverse AC present signal; even so, you would not be guaranteed that the motherboard/controller manufacturers have all tied this pin active low in their designs, if it's truly a "spare". That means the disks would not work with some motherboards, which is death in a commodity market. I suspect that this is a good reason that, despite the design being available in your head, no manufacturer has implemented this, even if there was not computer hardware support for it. Basically, this means that we (filesystems engineers) have two wishlist items for disk manufacturers: 1) Add logic to the ATA disks to provide the same control over the ordering of operations (e.g. barriers and completion notification) that SCSI disks have (per the above, this may be nothing more than a firmware hack). 2) Provide the ability to obtain physical geometry information from ATA disks, similar to the information that is returned in SCSI mode page 2. The first can be a "must enable, disabled by default" item, and the second could be a vendor-private command, which keeps both of them from being visible to ignorant users of the disks. If you want to address throwing a chock in the wheels and/or dumping the write queue to on-board NVRAM, assuming an inverse AC fail notification, if it's turned on (off by default to account for floating cable pins, rather than active low, on some motherboards, to avoid sabotaging your existing market), that would be nice too. ;^). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 14:51:13 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BC6A037B401 for ; Tue, 21 Jan 2003 14:51:12 -0800 (PST) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4F59E43E4A for ; Tue, 21 Jan 2003 14:51:12 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0183.cvx40-bradley.dialup.earthlink.net ([216.244.42.183] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18b7EX-0006u5-00; Tue, 21 Jan 2003 14:51:06 -0800 Message-ID: <3E2DCE86.4C416E28@mindspring.com> Date: Tue, 21 Jan 2003 14:49:42 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Dag-Erling Smorgrav Cc: Cheen Liao , freebsd-fs@freebsd.org Subject: Re: Transaction File System - a replacement of JFS References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com> <001401c2be93$c36c7490$681adf3d@homexp> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a48e482ca2dfbe5f483e6a9e250402ce9093caf27dac41a8fd350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Dag-Erling Smorgrav wrote: > "Cheen Liao" writes: > > . develop a prototype on FreeBSD 4.x. > > Don't bother. Save yourselves a lot of pain by going directly to 5.0. 4.x is a stable sytem, and unlikely to change interfaces out from under a developer. Not so, 5.x. It's much easier to get something working, pick a flag day, and do a port, than it is to try and track changes (I made this mistake when John Dyson was revving the VM system, when I did a FreeBSD port to the Motorolla PPC "PowerStack" systems, back in 1996). From personal commercial experience, *never* try to track a moving target, if what you are using the code for is as a platform for research and/or developement, rather than as an ends in itself. FreeBSD people seem to forget that the purpose of most FreeBSD users is not simply "to make FreeBSD better". 8-). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 15:49:59 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E8E0E37B401 for ; Tue, 21 Jan 2003 15:49:58 -0800 (PST) Received: from mail.allcaps.org (allcaps.org [216.240.173.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 90DFA43F3F for ; Tue, 21 Jan 2003 15:49:58 -0800 (PST) (envelope-from bsder@allcaps.org) Received: from mail.allcaps.org (localhost [127.0.0.1]) by mail.allcaps.org (Postfix) with ESMTP id B4EBC92FA9; Tue, 21 Jan 2003 18:49:55 -0500 (EST) Received: from localhost (bsder@localhost) by mail.allcaps.org (8.12.5/8.12.5/Submit) with ESMTP id h0LNnt09006528; Tue, 21 Jan 2003 15:49:55 -0800 X-Authentication-Warning: mail.allcaps.org: bsder owned process doing -bs Date: Tue, 21 Jan 2003 15:49:54 -0800 (PST) From: "Andrew P. Lentvorski, Jr." To: Dag-Erling Smorgrav Cc: fs@freebsd.org Subject: Re: RAID alternatives In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tue, 21 Jan 2003, Dag-Erling Smorgrav wrote: > Ho, hum, well, RAIDframe fails to impress me: I have two boxes with software RAID. Both are running 4.7. One runs RAIDFrame for RAID 1 and the other runs vinum for RAID 1. Both work. I even managed to get RAIDFrame to boot and mirror the root filesystem (I stored a kernel on alternate boot media). It was significant pain, though, and I hope that GEOM makes much of the pain go away in 5.0. I'm waiting for 5.0 to release before trying to do RAIDFrame in the 5.X series. Especially since Scott Long is *WAY* too overloaded right now with the 5.0 release to respond to anything about RAIDFrame. I prefer RAIDFrame because its goals (RAID Framework) are smaller in scope than vinum (full volume manager mechanisms). YMMV. -a To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 16: 8:42 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C5B4037B401 for ; Tue, 21 Jan 2003 16:08:41 -0800 (PST) Received: from mail.allcaps.org (allcaps.org [216.240.173.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6C6C343EB2 for ; Tue, 21 Jan 2003 16:08:41 -0800 (PST) (envelope-from bsder@allcaps.org) Received: from mail.allcaps.org (localhost [127.0.0.1]) by mail.allcaps.org (Postfix) with ESMTP id CD54992FA9; Tue, 21 Jan 2003 19:08:43 -0500 (EST) Received: from localhost (bsder@localhost) by mail.allcaps.org (8.12.5/8.12.5/Submit) with ESMTP id h0M08fgH006594; Tue, 21 Jan 2003 16:08:43 -0800 X-Authentication-Warning: mail.allcaps.org: bsder owned process doing -bs Date: Tue, 21 Jan 2003 16:08:41 -0800 (PST) From: "Andrew P. Lentvorski, Jr." To: Alfred Perlstein Cc: Dag-Erling Smorgrav , Jeff Behl , Subject: Re: RAID alternatives In-Reply-To: <20030121202702.GI33821@elvis.mu.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org * Dag-Erling Smorgrav [030121 12:04] wrote: > > You're right, I misread the driver source and thought it wouldn't > attach to a 7000 or higher. As a side note, I don't use RAID 5 anymore, period. While RAID 10 is wasteful, a double disk failure normally doesn't take it out. That's not true for RAID5. ATA drives have gotten so crappy that I have had drive failures during the process of rebuilding from a drive failure. Maybe we need a RAID 55 which provides resilience against 2 drive failures ... In addition, full RAID 10 provides a very nice method for creating backups. Shut down the system, pull one half the drives, put in all new drives, and rebuild the array. Voila! Instant backup (and instant recovery, if required). -a To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 16:17:40 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8209737B401 for ; Tue, 21 Jan 2003 16:17:38 -0800 (PST) Received: from mercury.ccmr.cornell.edu (mercury.ccmr.cornell.edu [128.84.231.97]) by mx1.FreeBSD.org (Postfix) with ESMTP id 678EB43F6B for ; Tue, 21 Jan 2003 16:17:37 -0800 (PST) (envelope-from mitch@ccmr.cornell.edu) Received: from ori.ccmr.cornell.edu (ori.ccmr.cornell.edu [128.84.231.243]) by mercury.ccmr.cornell.edu (8.9.3/8.9.3) with ESMTP id TAA19122; Tue, 21 Jan 2003 19:17:36 -0500 Received: from localhost (mitch@localhost) by ori.ccmr.cornell.edu (8.12.3/8.12.3) with ESMTP id h0M0HaSc029141; Tue, 21 Jan 2003 19:17:36 -0500 X-Authentication-Warning: ori.ccmr.cornell.edu: mitch owned process doing -bs Date: Tue, 21 Jan 2003 19:17:36 -0500 (EST) From: Mitch Collinsworth To: "Andrew P. Lentvorski, Jr." Cc: fs@FreeBSD.ORG Subject: Re: RAID alternatives In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tue, 21 Jan 2003, Andrew P. Lentvorski, Jr. wrote: > In addition, full RAID 10 provides a very nice method for creating > backups. Shut down the system, pull one half the drives, put in all new > drives, and rebuild the array. Voila! Instant backup (and instant > recovery, if required). It's not instant backup if you have to shut down the system. Lots of fileservers do not have the luxury of daily shutdowns for backup. Filesystems that allow for snapshotting are much more useful in this respect. -Mitch To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 16:52:23 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C104137B401 for ; Tue, 21 Jan 2003 16:52:21 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1D47143F43 for ; Tue, 21 Jan 2003 16:52:16 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h0M0q10L009543; Tue, 21 Jan 2003 16:52:01 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h0M0q0aL009542; Tue, 21 Jan 2003 16:52:00 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Date: Tue, 21 Jan 2003 16:52:00 -0800 From: David Schultz To: "Andrew P. Lentvorski, Jr." Cc: Alfred Perlstein , Dag-Erling Smorgrav , Jeff Behl , fs@FreeBSD.ORG Subject: Re: RAID alternatives Message-ID: <20030122005200.GA9416@HAL9000.homeunix.com> Mail-Followup-To: "Andrew P. Lentvorski, Jr." , Alfred Perlstein , Dag-Erling Smorgrav , Jeff Behl , fs@FreeBSD.ORG References: <20030121202702.GI33821@elvis.mu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Thus spake Andrew P. Lentvorski, Jr. : > * Dag-Erling Smorgrav [030121 12:04] wrote: > > > > You're right, I misread the driver source and thought it wouldn't > > attach to a 7000 or higher. > > As a side note, I don't use RAID 5 anymore, period. While RAID 10 is > wasteful, a double disk failure normally doesn't take it out. That's not > true for RAID5. > > ATA drives have gotten so crappy that I have had drive failures during the > process of rebuilding from a drive failure. Maybe we need a RAID 55 which > provides resilience against 2 drive failures ... The organization that protects against a two-disk failure is usually called RAID Level 6, actually. It is particularly useful in very large disk arrays where the probability of a two-disk failure is nonnegligible. In smaller disk arrays where high levels of reliability are required it can also be useful because drives from the same batch under similar load conditions tend to fail at about the same time, and because the additional load during recovery can cause further failures. However, the overhead is fairly high for small arrays, to the point where it often makes more sense to simply use mirroring. BTW, David Patterson has pointed out that marketing people have invented all sorts of new terms such as ``RAID 9'' (and maybe even ``RAID 55'' as you suggest) that he never coined. Maybe he should have left it at ``mirroring'', ``striping'', ``P+Q redundancy'', etc. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 21 19:10: 8 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0A81A37B401 for ; Tue, 21 Jan 2003 19:10:07 -0800 (PST) Received: from mail.synology.com (dns1.synology.com [210.58.106.131]) by mx1.FreeBSD.org (Postfix) with ESMTP id 55EFE43ED8 for ; Tue, 21 Jan 2003 19:10:05 -0800 (PST) (envelope-from cheen@synology.com) Received: (from root@localhost) by mail.synology.com (8.12.5/8.12.5) id h0M39n5o089224; Wed, 22 Jan 2003 11:09:49 +0800 (CST) Received: from cheennotebook ([192.168.1.187]) (authenticated bits=0) by mail.synology.com (8.12.5/8.12.5av) with ESMTP id h0M39bOT089204; Wed, 22 Jan 2003 11:09:48 +0800 (CST) Message-ID: <000e01c2c1c3$f7c0b3e0$bb01a8c0@cheennotebook> From: "Cheen Liao" To: "Terry Lambert" , "Dag-Erling Smorgrav" Cc: References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com> <001401c2be93$c36c7490$681adf3d@homexp> <3E2DCE86.4C416E28@mindspring.com> Subject: Re: Transaction File System - a replacement of JFS Date: Wed, 22 Jan 2003 11:11:31 +0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 X-Virus-Scanned: by AMaViS perl-11 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Thanks for sharing your experience. This is exactly our concerns. We will not work on 5.0 kernel till its code are "stable". Thanks, Cheen ----- Original Message ----- From: "Terry Lambert" To: "Dag-Erling Smorgrav" Cc: "Cheen Liao" ; Sent: Wednesday, January 22, 2003 6:49 AM Subject: Re: Transaction File System - a replacement of JFS > Dag-Erling Smorgrav wrote: > > "Cheen Liao" writes: > > > . develop a prototype on FreeBSD 4.x. > > > > Don't bother. Save yourselves a lot of pain by going directly to 5.0. > > 4.x is a stable sytem, and unlikely to change interfaces out > from under a developer. Not so, 5.x. > > It's much easier to get something working, pick a flag day, and > do a port, than it is to try and track changes (I made this > mistake when John Dyson was revving the VM system, when I did > a FreeBSD port to the Motorolla PPC "PowerStack" systems, back > in 1996). > > >From personal commercial experience, *never* try to track a > moving target, if what you are using the code for is as a > platform for research and/or developement, rather than as an > ends in itself. FreeBSD people seem to forget that the purpose > of most FreeBSD users is not simply "to make FreeBSD better". > > 8-). > > -- Terry > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-fs" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Jan 22 7:43:31 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3731437B401 for ; Wed, 22 Jan 2003 07:43:26 -0800 (PST) Received: from mcomail01.maxtor.com (mcomail01.maxtor.com [134.6.76.15]) by mx1.FreeBSD.org (Postfix) with ESMTP id 41B4743F5F for ; Wed, 22 Jan 2003 07:43:25 -0800 (PST) (envelope-from stephen_byan@maxtor.com) Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1]) by mcomail01.maxtor.com (8.11.6/8.11.6) with ESMTP id h0MFXGq15755 for ; Wed, 22 Jan 2003 08:33:17 -0700 Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id DH1P6M5X; Wed, 22 Jan 2003 08:43:23 -0700 Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM) id KAA0000028005; Wed, 22 Jan 2003 10:43:08 -0500 (EST) Date: Wed, 22 Jan 2003 10:42:59 -0500 Subject: Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support) Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v551) From: Steve Byan To: freebsd-fs@FreeBSD.ORG Content-Transfer-Encoding: 7bit In-Reply-To: <3E2DCC0C.FCAB2EFF@mindspring.com> Message-Id: <2F16E2F0-2E20-11D7-962B-00306548867E@maxtor.com> X-Mailer: Apple Mail (2.551) Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tuesday, January 21, 2003, at 05:39 PM, Terry Lambert wrote: > Steve Byan wrote: >> On Monday, January 20, 2003, at 02:43 PM, Julian Elischer wrote: >>> it would be nice if the drive had enough NVram to hold that one >>> trashed >>> block so it could rewrite it on powerup. >> >> If enough customers show up waving dollar bills in their hands ... > > The disk manufacturers have historically not recognized new > markets, until after their competition has already entered them. It's a low margin, high volume, capital-intensive, technically challenging industry. Blowing one product generation means losing money big-time for a year. Adding a little cost that doesn't return value can blow your entire margin on a product. These considerations lead to risk-averse management, and I don't blame them a bit. (well, really, not very much :-) [snip] > There's a great book on this: > > The Innovator's Dilemma > Clayton M. Christensen > HarperBusiness > ISBN: 0-06-662069-4 > > The hard disk industry is one of his three major examples. 8-). > Yeah, it's required reading here :-) > FWIW: the major market you are not seeing here is ATA RAID arrays > that can compete with SCSI RAID arrays from other disk vendors, > where you can leverage the ATA economyies of scale that make SCSI > disks more expensive than ATA disks, in the first place. Basically, > the first ATA disk manufacturer to do this will spike much of their > competitions SCSI market,as soon as the software types become aware > of the change (see below). I think the industry is aware of this. However, as you note below, ATA disks are not yet quite up to snuff in this application. Regarding economies of scale and price of SCSI vs ATA, note that while the majority of the price differential is simply economies of scale, most of the remaining price differential between ATA and SCSI disks is due to the performance difference of the mechanics (5400 and 7200 vs 10K and 15K, 1.5 ms short-stroke seek rather than 0.750 ms), rather than the cost of the electronics for the host interface. > > >>> For us the problem is that the drive reports the write as having >>> happenned when it hasn't, so teh filesystem dependencies end up being >>> smashed, because teh filesystem is writing out data in dependency >>> order, >>> but if the data is written in a different order to the drive, >>> the drive can end up being in error in the case of failure. >> >> That's the cost of write-behind caching. SCSI gives you enough control >> to avoid this problem. ATA disks don't, but at least they're >> inexpensive. > > Which is why people call ATA drives "crap", and disk manufacturers > get upset about it: they are competing on size and spindle speed, > and somehow seem to have forgotten one of the purposes of their > products is to _reliably store data_. Seems to me some OS vendors also have forgotten this; one non-Unix file system of considerable popularity uses delayed-writes for all its metadata in order to achieve reasonable speed. As an unfortunate side-effect, chunks of your filesystem might disappear after a power failure. Come to think of it, doesn't Linux ext2fs make the same trade-off? > > The funny thing is that it would cost them nearly nothing, now > that they have tagged command queues for ATA drives, to put this > feature into ATA drives, as well... in fact, it may even be no > more than a firmware hack. While I'm not intimately familiar with the ATA firmware, I get push-back when talking with the folks who do the ATA products because they have a small code-base with very scarce CPU cycles and memory, so they're concerned about the resource cost of this extra code-path. (Note that this function affects only writes, which are not part of the tagged command queues in ATA - ATA queuing is only useful for reads; ATA gets write-queuing by delayed-write caching. ATA queuing only allows disconnecting between sending the command and transferring the data; it doesn't allow disconnecting between transferring the data and transferring the status. Hence ATA queuing is useless for writes.) They are also concerned about complexity; ATA product cycles are very short, so there's a desire to keep things simple, to minimize the risk of bugs. > > >> Ick, that could be a big number, maybe a couple of seconds in the very >> worst-case, I dunno for sure. I think you're probably talking a UPS >> rather than a large filter cap in the power supply. I think it's >> technically better to accept that you're not going to get all the data >> on the disk when power fails, and supply a "power fail" signal to the >> drive a few sector-times in advance of the power going out of the >> spec-limits. That way the drive could guarantee that it won't >> partially >> overwrite a sector. > > That's a really annoying point of view. 8-). It's an unfortunate reality of physics :-) > > The problem with this approach is that it requires cable changes > to the drive interface, unless you designate one of the "spare" > grounds as being inverse AC present signal; even so, you would > not be guaranteed that the motherboard/controller manufacturers > have all tied this pin active low in their designs, if it's > truly a "spare". That means the disks would not work with some > motherboards, which is death in a commodity market. Dunno about the ATA ASICs, but the SCSI ASICs have some GPIO pins that could profitably be mined to provide this functionality via a firmware change, using one of the pins on the option connector. The OEM would need an extra cable to connect to the drive, but this isn't a big deal unless you're into SCA connectors. Write it into your next purchase spec, wave dollar bills in front of the sales representative, and you could get this function. > I suspect that this is a good reason that, despite the design > being available in your head, no manufacturer has implemented > this, even if there was not computer hardware support for it. Actually, none of the big OEM's are interested, because they'd rather give you atomic writes by selling you a big expensive hardware RAID box. That's why the functionality hasn't been implemented. (joke: How to make big money in storage: 1. put price pressure on drive suppliers so that they are forced to manufacture crap. 2. design expensive storage system to rectify problems caused by step 1. 3. Profit! ) Show up with a reasonably-sized market and a feature request for something that can be implemented in firmware, and you can negotiate to get your feature. > > Basically, this means that we (filesystems engineers) have two > wishlist items for disk manufacturers: > > 1) Add logic to the ATA disks to provide the same control > over the ordering of operations (e.g. barriers and > completion notification) that SCSI disks have (per the > above, this may be nothing more than a firmware hack). As noted above, this is a rather large firmware hack. More like a re-write of significant portions of the code. Portions are not even implementable in ATA-land (i.e. write-queuing is broken in the interface definition). By the time you are done, you have special electronics requirements (more SRAM, faster CPU) that are too expensive to go into commodity drives. One could hypothesize a low-volume ATA drive with special electronics, but such low-volume ATA drives probably would cost only slightly less than higher-volume high-performance SCSI drives. Why not just buy the SCSI drive in the first place? > > 2) Provide the ability to obtain physical geometry > information from ATA disks, similar to the information > that is returned in SCSI mode page 2. Write this into your purchase spec and wave dollar bills in front of your sales rep,. Such info was available from Quantum ATA disks; it's probably available from Maxtor's, though I don't know for sure. It's solely a firmware change to provide this functionality. > The first can be a "must enable, disabled by default" item, and > the second could be a vendor-private command, which keeps both > of them from being visible to ignorant users of the disks. > > If you want to address throwing a chock in the wheels and/or > dumping the write queue to on-board NVRAM, assuming an inverse > AC fail notification, if it's turned on (off by default to > account for floating cable pins, rather than active low, on > some motherboards, to avoid sabotaging your existing market), > that would be nice too. ;^). How much extra would you pay? Would you buy sole-sourced drives to get these features? These are all do-able. Negotiate with your vendors. Ask to talk to the drive marketing folks, to get your message heard back at the plant. Regards, -Steve (not speaking for his employer) -------- Steve Byan Design Engineer Maxtor Corp. MS 1-3/E23 333 South Street Shrewsbury, MA 01545 (508) 770-3414 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Jan 22 10:27:34 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4DCAE37B4E3 for ; Wed, 22 Jan 2003 10:27:32 -0800 (PST) Received: from sccrmhc03.attbi.com (sccrmhc03.attbi.com [204.127.202.63]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9AB7543F1E for ; Wed, 22 Jan 2003 10:27:31 -0800 (PST) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4]) by sccrmhc03.attbi.com (sccrmhc03) with ESMTP id <2003012218273000300074f7e>; Wed, 22 Jan 2003 18:27:30 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id KAA58790; Wed, 22 Jan 2003 10:27:29 -0800 (PST) Date: Wed, 22 Jan 2003 10:27:27 -0800 (PST) From: Julian Elischer To: Steve Byan Cc: freebsd-fs@FreeBSD.ORG Subject: Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support) In-Reply-To: <2F16E2F0-2E20-11D7-962B-00306548867E@maxtor.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Wed, 22 Jan 2003, Steve Byan wrote: > > On Tuesday, January 21, 2003, at 05:39 PM, Terry Lambert wrote: > > > > If you want to address throwing a chock in the wheels and/or > > dumping the write queue to on-board NVRAM, assuming an inverse > > AC fail notification, if it's turned on (off by default to > > account for floating cable pins, rather than active low, on > > some motherboards, to avoid sabotaging your existing market), > > that would be nice too. ;^). if you had a way for the CPU to throw the chock in that would be cheapest.. it would be just firmware.. (It may already exist, but it needs to be documented as part of the 'power fail proceedure'). it needs to be able to interrupt whatever is already going on on the drive.. maybe some version of the ata reset? > > How much extra would you pay? Would you buy sole-sourced drives to get > these features? These are all do-able. Negotiate with your vendors. Ask > to talk to the drive marketing folks, to get your message heard back at > the plant. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Jan 22 10:47:32 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BA07137B401 for ; Wed, 22 Jan 2003 10:47:30 -0800 (PST) Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1EFE243ED8 for ; Wed, 22 Jan 2003 10:47:30 -0800 (PST) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (IDENT:brdavis@localhost.localdomain [127.0.0.1]) by odin.ac.hmc.edu (8.12.3/8.12.3) with ESMTP id h0MIlJ6F005454; Wed, 22 Jan 2003 10:47:19 -0800 Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.12.3/8.12.3/Submit) id h0MIlIYP005452; Wed, 22 Jan 2003 10:47:18 -0800 Date: Wed, 22 Jan 2003 10:47:18 -0800 From: Brooks Davis To: Cheen Liao Cc: Terry Lambert , Dag-Erling Smorgrav , freebsd-fs@FreeBSD.ORG Subject: Re: Transaction File System - a replacement of JFS Message-ID: <20030122104718.A23298@Odin.AC.HMC.Edu> References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com> <001401c2be93$c36c7490$681adf3d@homexp> <3E2DCE86.4C416E28@mindspring.com> <000e01c2c1c3$f7c0b3e0$bb01a8c0@cheennotebook> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="d6Gm4EdcadzBjdND" Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <000e01c2c1c3$f7c0b3e0$bb01a8c0@cheennotebook>; from cheen@synology.com on Wed, Jan 22, 2003 at 11:11:31AM +0800 X-Virus-Scanned: by amavisd-milter (http://amavis.org/) on odin.ac.hmc.edu Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org --d6Gm4EdcadzBjdND Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jan 22, 2003 at 11:11:31AM +0800, Cheen Liao wrote: > Thanks for sharing your experience. This is exactly our concerns. We will > not work on 5.0 kernel till its code are "stable". Frankly, if you use 5.0-RELEASE (not HEAD) its ABI/API as stable as any other fixed point, but unlike using 4.x it's _much_ closer to what 5-STABLE will be so this argument is a bit backwards. -- Brooks --=20 Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 --d6Gm4EdcadzBjdND Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE+Luc1XY6L6fI4GtQRAmNrAKDetk7fgklTjyU6HUPlmgvrNDMp2QCcDoGS pWhAtfZGhWBMlkyl73Ztg6o= =mv/j -----END PGP SIGNATURE----- --d6Gm4EdcadzBjdND-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Jan 22 13:19:55 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C961E37B401 for ; Wed, 22 Jan 2003 13:19:53 -0800 (PST) Received: from newsguy.com (smtp.newsguy.com [129.250.170.69]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0182B43F18 for ; Wed, 22 Jan 2003 13:19:53 -0800 (PST) (envelope-from dcs@newsguy.com) Received: from newsguy.com (200-163-023-225.bsace7016.dsl.brasiltelecom.net.br [200.163.23.225]) by newsguy.com (8.9.1a/8.9.1) with ESMTP id NAA20272; Wed, 22 Jan 2003 13:19:27 -0800 (PST) Message-ID: <3E2F0ADB.ACBAA0D6@newsguy.com> Date: Wed, 22 Jan 2003 19:19:23 -0200 From: "Daniel C. Sobral" X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en,pt-BR,pt,en-GB,en-US,ja MIME-Version: 1.0 To: Terry Lambert Cc: Dag-Erling Smorgrav , Cheen Liao , freebsd-fs@FreeBSD.ORG Subject: Re: Transaction File System - a replacement of JFS References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com> <001401c2be93$c36c7490$681adf3d@homexp> <3E2DCE86.4C416E28@mindspring.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Terry Lambert wrote: > > Dag-Erling Smorgrav wrote: > > "Cheen Liao" writes: > > > . develop a prototype on FreeBSD 4.x. > > > > Don't bother. Save yourselves a lot of pain by going directly to 5.0. > > 4.x is a stable sytem, and unlikely to change interfaces out > from under a developer. Not so, 5.x. > > It's much easier to get something working, pick a flag day, and > do a port, than it is to try and track changes (I made this > mistake when John Dyson was revving the VM system, when I did > a FreeBSD port to the Motorolla PPC "PowerStack" systems, back > in 1996). > > >From personal commercial experience, *never* try to track a > moving target, if what you are using the code for is as a > platform for research and/or developement, rather than as an > ends in itself. FreeBSD people seem to forget that the purpose > of most FreeBSD users is not simply "to make FreeBSD better". Since the suggestion was to go directly to 5.0, which will be MUCH closer to the interfaces expected to remain stable on 5.x, the above remarks are not particularly relevant, however true they might be. -- Daniel C. Sobral (8-DCS) dcs@newsguy.com dcs@freebsd.org capo@professional.bsdconspiracy.net Spellng is overated anywy. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Jan 22 15: 5: 8 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EBBAB37B401 for ; Wed, 22 Jan 2003 15:05:00 -0800 (PST) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 13D1443ED8 for ; Wed, 22 Jan 2003 15:04:55 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0182.cvx40-bradley.dialup.earthlink.net ([216.244.42.182] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18bTvB-0005Vh-00; Wed, 22 Jan 2003 15:04:39 -0800 Message-ID: <3E2F2330.F7A46C6E@mindspring.com> Date: Wed, 22 Jan 2003 15:03:12 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Steve Byan Cc: freebsd-fs@FreeBSD.ORG Subject: Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support) References: <2F16E2F0-2E20-11D7-962B-00306548867E@maxtor.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4900de4bcde31f93b006524fd51b0d68a387f7b89c61deb1d350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Steve Byan wrote: > > FWIW: the major market you are not seeing here is ATA RAID arrays > > that can compete with SCSI RAID arrays from other disk vendors, > > where you can leverage the ATA economyies of scale that make SCSI > > disks more expensive than ATA disks, in the first place. Basically, > > the first ATA disk manufacturer to do this will spike much of their > > competitions SCSI market,as soon as the software types become aware > > of the change (see below). > > I think the industry is aware of this. However, as you note below, ATA > disks are not yet quite up to snuff in this application. At this point, it is apparently a matter of firmware. Most ATA firmware is flashable. If a manufacturer were to release source code for its firmware, the disk drive manufacturers would see the same sort of things happen for that disk drive that happened for the Broadcom Tigin II product, which is able to sustain over 500,000 packets a second and 250,000 connections a second, if you are willing to rewrite the firmware on the card -- and if the manufacturer lets you do it. I don't expect this to happen, but it would certainly, over time, end up reducing your firmware costs, since you would then find that your hardware designes were constrained "to run the *good* firmware". > Regarding economies of scale and price of SCSI vs ATA, note that while > the majority of the price differential is simply economies of scale, > most of the remaining price differential between ATA and SCSI disks is > due to the performance difference of the mechanics (5400 and 7200 vs > 10K and 15K, 1.5 ms short-stroke seek rather than 0.750 ms), rather > than the cost of the electronics for the host interface. This is the spindle speed competition. I think they've already lost the size one, and both are at parity (in fact, SCSI tends to follow, rather than lead, in size, these days). I have to say that the thing that motivates people to buy SCSI is not the speed of the disk. The ATA burst transfer rates are significantly higher than SCSI these days, and with interleaved commands from the controller possible on both, the SCSI command overhead an latency is becoming significant. The most meaningful SCSI features are those you have already identified, which all basically boil down to being able to keep the speed without sacrificing reliability, by avoiding the introduction of what would otherwise be stall barriers. For ATA, you effectively have to disable write caching to achieve the same thing. SCSI is at the "acceptable speed" point, and pushing it to "much more than acceptable speed isn't really useful. An applications engineer can make a decision these days on the speed tradeoff, and ignore the disk manufacturers entirely, if they decide to do so. The applications that needed a lot of disk can live with write through caching with a minimum of barriers, as resuired. That, in fact, is the basis for soft updates, and it's the basis for the earlier technology of DOW (Delayed Ordered Writes) out of USL/Novell, and used in Reiser FS. The net effect of this is that it's possible to stuff 64G of RAM into the box you care about access speed on, and use hard drives as nothing more than a non-volatile mirror that's much slower. The next meaningful change that's going to come in SCSI is the ability to assert range locks on the device from multiple host masters. When this happens, it will break down one of the main barriers to scaling applications by throwing hardware at them; this is already the case with GFS (in a really primitive way, using a network lock manager, with significantly higher latency than is achievable with hardware support), and is one of the fundamental drivers for network attached storage, and for NFS servers from companies like NetApp and Auspex, for that matter, at this point. Notice that the drive to up-market has passed the "fast enough" point, as noted by Christensen in the book I already referenced, and which you noted was "required reading". One of the ironic things about "required reading" assignments in industries like the disk industry is that it's very much like the person who believes they can be rich, merely by buying what the rich buy: reading "The Innovator's Dilemma" will not make the industry any less likely to auger-in in an up-market spiral than anything in the past, because the economics of disruptive products remains the same. So the bottom line is that it's possible to add the features you think are not the market drivers to the ATA drives, and if you are right, then they simply will not be used, and the cost will be some time paid a firmware engineer. If, on the other hand, you're wrong, then you will capture significant market from your competitors. I imagine that your SCSI products division would fight this; I don't know where your margins are now, but I expect that a lot of them are in up-scale SCSI. The thing to do would be to talk to your bean-counters, and do a cost analysis, in the case that your ATA drive marginalized everyone else's SCSI drives -- even your own -- and see where that leaves you. It's possible that you could do this on one disk line, and if it starts selling well, let scarcity drive up the price, after which you can keep the price high, and the firmware difference could give you (and other manufacturers who follow you) the differential margin that you're now getting from SCSI. If this happens, you would be very happy, since you will have reduced your costs while not damaging your profitability, by way of pushing SCSI out (I suspect that what would happen instead is that SCSI would be pushed up-market with the locking and multiple mastering other advanced features, which can't be safely duplicated in ATA, for lack of connectors and multihost ATA interconnects). > >> That's the cost of write-behind caching. SCSI gives you enough control > >> to avoid this problem. ATA disks don't, but at least they're > >> inexpensive. > > > > Which is why people call ATA drives "crap", and disk manufacturers > > get upset about it: they are competing on size and spindle speed, > > and somehow seem to have forgotten one of the purposes of their > > products is to _reliably store data_. > > Seems to me some OS vendors also have forgotten this; one non-Unix file > system of considerable popularity uses delayed-writes for all its > metadata in order to achieve reasonable speed. As an unfortunate > side-effect, chunks of your filesystem might disappear after a power > failure. Come to think of it, doesn't Linux ext2fs make the same > trade-off? Windows didn't "invent" vitrual memory and they didn't "invent" protected mode operating, until the market forced it on them, and/or until they were forced to look to the server market for new market, after saturating the desktop. Microsoft is very much an "Innovator's dilemma" company, where you will not see innovation or new technology until it effects their market share to not have it. They are an evolutionary product company, at this point, and they will never deal with an issue until they have to deal with it, because doing otherwise detracts from their bottom line. Note that Windows NT *did* assress this with NTFS, and while it took a coule of false starts to get there, NTFS is now the default in Windows XP systems, from the factory. The EXT2FS *does* have the same problem, but realize that this is because there's a speed issue, and that issue comes from outside; I would argue that it has been addressed by faster transfer rates and tagged command queueing. I'd also argue that Linux knows this, and is tring to address it with all the myriad GFS/XFS/EXT3FS/ReiserFS/etc. projects, which all seek to not have those attributes, even if the authors don't seem to know *why* they are pursuing the goal, or if they do, *why* people are finally getting behind the cart and helping them push, when they struggled alone and forsaken for such a long time. It boils down to "market pressure from Windows XP". > > The funny thing is that it would cost them nearly nothing, now > > that they have tagged command queues for ATA drives, to put this > > feature into ATA drives, as well... in fact, it may even be no > > more than a firmware hack. > > While I'm not intimately familiar with the ATA firmware, I get > push-back when talking with the folks who do the ATA products because > they have a small code-base with very scarce CPU cycles and memory, so > they're concerned about the resource cost of this extra code-path. Worst case, you can make it a feature set (default off) that is followed by a soft reset, to put the drive into the mode. Then the only people who eat the cost are the people who turn it on, in the knowledge that they eat the cost. The funny thing is, this is the same argument you could have used to justify not putting in the knob to turn off write caching -- yet that knob is there. 8-). > (Note that this function affects only writes, which are not part of the > tagged command queues in ATA - ATA queuing is only useful for reads; > ATA gets write-queuing by delayed-write caching. ATA queuing only > allows disconnecting between sending the command and transferring the > data; it doesn't allow disconnecting between transferring the data and > transferring the status. Hence ATA queuing is useless for writes.) They > are also concerned about complexity; ATA product cycles are very short, > so there's a desire to keep things simple, to minimize the risk of bugs. Well, having it "off by default, seperate firmware image after soft reset when on" completely addresses these concerns, I think 8-). I was aware of the tagged command queueing/writing issue; it's very unfortunate that those issues aren't corrected, too. 8-). > > That's a really annoying point of view. 8-). > > It's an unfortunate reality of physics :-) You know, I keep bumping my head on physics; we should do something about that, don't you think? 8-) 8-). > > The problem with this approach is that it requires cable changes > > to the drive interface, unless you designate one of the "spare" > > grounds as being inverse AC present signal; even so, you would > > not be guaranteed that the motherboard/controller manufacturers > > have all tied this pin active low in their designs, if it's > > truly a "spare". That means the disks would not work with some > > motherboards, which is death in a commodity market. > > Dunno about the ATA ASICs, but the SCSI ASICs have some GPIO pins that > could profitably be mined to provide this functionality via a firmware > change, using one of the pins on the option connector. The OEM would > need an extra cable to connect to the drive, but this isn't a big deal > unless you're into SCA connectors. Write it into your next purchase > spec, wave dollar bills in front of the sales representative, and you > could get this function. For cheap devices, I'm not allowed to spec SCSI. 8-(. My own opinion here is that the companies I did the work for didn't really have an expectation of selling 100,000 units, despite their claims in the company meetings, and so single unit cost at the expense of repeat sales was an acceptable tradeoff for them. 8-( 8-(. > Actually, none of the big OEM's are interested, because they'd rather > give you atomic writes by selling you a big expensive hardware RAID > box. That's why the functionality hasn't been implemented. > > (joke: How to make big money in storage: > 1. put price pressure on drive suppliers so that they are forced to > manufacture crap. > 2. design expensive storage system to rectify problems caused by step 1. > 3. Profit! ) Cynical, cynical... > Show up with a reasonably-sized market and a feature request for > something that can be implemented in firmware, and you can negotiate to > get your feature. Anything short of building a multimillion dollar campany that I can do? It's not that I'm averse to that, you understand, it's just that I'd have to delay my gratification about 3 or 4 years... > > Basically, this means that we (filesystems engineers) have two > > wishlist items for disk manufacturers: > > > > 1) Add logic to the ATA disks to provide the same control > > over the ordering of operations (e.g. barriers and > > completion notification) that SCSI disks have (per the > > above, this may be nothing more than a firmware hack). > > As noted above, this is a rather large firmware hack. More like a > re-write of significant portions of the code. Portions are not even > implementable in ATA-land (i.e. write-queuing is broken in the > interface definition). By the time you are done, you have special > electronics requirements (more SRAM, faster CPU) that are too expensive > to go into commodity drives. One could hypothesize a low-volume ATA > drive with special electronics, but such low-volume ATA drives probably > would cost only slightly less than higher-volume high-performance SCSI > drives. Why not just buy the SCSI drive in the first place? The main answer here? I want a minimum level of functionality assurance from all disk drives, not just SCSI, so I can design software systems that don't have to care about the disks that someone slots into the chassis. It reduces overall software complexity to do this. Remember back when IDE drives could not do DMA, and had to use the host CPU for data transfers? That was Bad(tm). I'd like to get away from similar problems, now that that one has been solved. > > 2) Provide the ability to obtain physical geometry > > information from ATA disks, similar to the information > > that is returned in SCSI mode page 2. > > Write this into your purchase spec and wave dollar bills in front of > your sales rep,. Such info was available from Quantum ATA disks; it's > probably available from Maxtor's, though I don't know for sure. It's > solely a firmware change to provide this functionality Yep; I *knew* that an ATA manufacturer had supported it, but I couldn't point at the one, so I stayed away from that earlier when someone claimed ATA did not support it. Thanks for the ammo. 8-) 8-). > > The first can be a "must enable, disabled by default" item, and > > the second could be a vendor-private command, which keeps both > > of them from being visible to ignorant users of the disks. > > > > If you want to address throwing a chock in the wheels and/or > > dumping the write queue to on-board NVRAM, assuming an inverse > > AC fail notification, if it's turned on (off by default to > > account for floating cable pins, rather than active low, on > > some motherboards, to avoid sabotaging your existing market), > > that would be nice too. ;^). > > How much extra would you pay? Would you buy sole-sourced drives to get > these features? These are all do-able. Negotiate with your vendors. Ask > to talk to the drive marketing folks, to get your message heard back at > the plant. We *would* have paid this at Whistle, the chock-in-the-wheels, to avoid having an overly complex power supply turn into a standard supply, a triac, a cap, two regulators, an op-amp, and an optoisolator. 8-). Would have saves us maybe $35-$50 on COGS. I'll see what I can do about finding/creating a similar situation in the future, and using that to leverage the change, via purchases. What's the chances that, once it's written, this stuff will go into the standard production models? -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Jan 22 15:32:22 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0F8B537B401 for ; Wed, 22 Jan 2003 15:32:20 -0800 (PST) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0B64643E4A for ; Wed, 22 Jan 2003 15:32:19 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0182.cvx40-bradley.dialup.earthlink.net ([216.244.42.182] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18bULg-00042T-00; Wed, 22 Jan 2003 15:32:01 -0800 Message-ID: <3E2F299B.A919A49F@mindspring.com> Date: Wed, 22 Jan 2003 15:30:35 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Brooks Davis Cc: Cheen Liao , Dag-Erling Smorgrav , freebsd-fs@FreeBSD.ORG Subject: %.9 or 4.x for developement (was Re: Transaction File System - a replacement of JFS) References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com> <001401c2be93$c36c7490$681adf3d@homexp> <3E2DCE86.4C416E28@mindspring.com> <000e01c2c1c3$f7c0b3e0$bb01a8c0@cheennotebook> <20030122104718.A23298@Odin.AC.HMC.Edu> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4a9a86572c270ea46156f9f5f1ee90704a8438e0f32a48e08350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Brooks Davis wrote: > On Wed, Jan 22, 2003 at 11:11:31AM +0800, Cheen Liao wrote: > > Thanks for sharing your experience. This is exactly our concerns. We will > > not work on 5.0 kernel till its code are "stable". > > Frankly, if you use 5.0-RELEASE (not HEAD) its ABI/API as stable as > any other fixed point, but unlike using 4.x it's _much_ closer to what > 5-STABLE will be so this argument is a bit backwards. I personally expect "the right method to use for locking" to change significantly, over time, as people realize that some of the locking code in there now is bogus. That will have a significant impact on the kernel API's. I also think that a lot of the work to "make things work", without changing interface relationships is going to prove to be fultile in the long run, and *that* will have a significant impact on the kernel API's. As one example, ask yourself these questions about the recent (and ongoing) M_NOWAIT discussion: o Why can't all callers wait for the operation to be completed? o Why can't all callers handle an allocation failure gracefully, without blocking in the allocation routine? The answer to these questions is that some code has the ability to unwind state at the layer at which the allocation is attempted, while other code has state spread across several abstration layers, and so can't successfully unwind it (e.g. hold a lock, call a function, hold another lock...). Such code is legacy code. I expect that FreeBSD will eventually have to do what Solaris, SVR4, AIX, and, now, Linux, have done, and simply bite the bullet, and change the abstraction layering so that all resource references get held at the same stack level. Then the answer to the second question becomes "They can.", and the problem goes away because the idea of the flag to signal variant behaviour between code that can and code that can't unwind state, goes away. But what this boils down to is that interfaces are in serious flux in 5.x, and are likely to remain so for a very long time. It's a very bad idea, from the standpoint of ever getting done with something, to put yourself in a position of not knowing if it's your driver that's causing the problem, or if it's a problem that someone introduced into the host OS, and you're among the first people to notice it. It makes it damn hard to know where to go to look for a bug your are experiencing. This is why most people who are doing research in academic or industrial settings pick a stable platform, and do their research there. The sole exception is if the platform *is* your research, which happens only rarely with established platforms in academia, and almost never, in industry. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Jan 22 15:39:49 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 305C837B401 for ; Wed, 22 Jan 2003 15:39:48 -0800 (PST) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9827543F13 for ; Wed, 22 Jan 2003 15:39:47 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0182.cvx40-bradley.dialup.earthlink.net ([216.244.42.182] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18bUT7-0005Al-00; Wed, 22 Jan 2003 15:39:42 -0800 Message-ID: <3E2F2B64.A3F382AB@mindspring.com> Date: Wed, 22 Jan 2003 15:38:12 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "Daniel C. Sobral" Cc: Dag-Erling Smorgrav , Cheen Liao , freebsd-fs@FreeBSD.ORG Subject: Re: Transaction File System - a replacement of JFS References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com> <001401c2be93$c36c7490$681adf3d@homexp> <3E2DCE86.4C416E28@mindspring.com> <3E2F0ADB.ACBAA0D6@newsguy.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4a9a86572c270ea46ed416e057fb22e2b667c3043c0873f7e350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org "Daniel C. Sobral" wrote: > Since the suggestion was to go directly to 5.0, which will be MUCH > closer to the interfaces expected to remain stable on 5.x, the above > remarks are not particularly relevant, however true they might be. See other posting. The 5.x kernel API is still in great flux. At least with 4.x, there is a market for the product for a year or more, which gives some breathing room for a port. Given the developement effort involved, it's not worth having to always look over your shoulder to see if there's a 5.1. When 4.8 comes out, a 4.7->4.8 port will be trivial. When 5.1 comes out, a 5.0->5.1 port is going to be a living hell. Mark my words. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Jan 23 0:31:22 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E2CAF37B401 for ; Thu, 23 Jan 2003 00:31:20 -0800 (PST) Received: from sydney.worldwide.lemis.com (snat-2.public.linux.conf.au [130.95.169.98]) by mx1.FreeBSD.org (Postfix) with ESMTP id 884C643EB2 for ; Thu, 23 Jan 2003 00:31:19 -0800 (PST) (envelope-from grog@sydney.worldwide.lemis.com) Received: from sydney.worldwide.lemis.com (grog@localhost [127.0.0.1]) by sydney.worldwide.lemis.com (8.12.6/8.12.6) with ESMTP id h0N8VH99001010; Thu, 23 Jan 2003 16:31:17 +0800 (WST) (envelope-from grog@sydney.worldwide.lemis.com) Received: (from grog@localhost) by sydney.worldwide.lemis.com (8.12.6/8.12.6/Submit) id h0N8VF2A001009; Thu, 23 Jan 2003 16:31:15 +0800 (WST) Date: Thu, 23 Jan 2003 16:31:14 +0800 From: Greg Lehey To: Dag-Erling Smorgrav Cc: fs@freebsd.org Subject: Vinum stability (was: RAID alternatives) Message-ID: <20030123083114.GB899@sydney.worldwide.lemis.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Organization: The FreeBSD Project Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.FreeBSD.org/ X-PGP-Fingerprint: 9A1B 8202 BCCE B846 F92F 09AC 22E6 F290 507A 4223 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tuesday, 21 January 2003 at 16:15:06 +0100, Dag-Erling Smorgrav wrote: > I'm planning a new system to replace my aging 350 MHz K6-2, and am > considering various options for increasing disk performance and > reliability. I'm thinking of running RAID level 5 across three or > four identical IDE disks. The question is what RAID solution to pick: > > ... > > - Vinum: I've had mixed experiences with this. There have been > some embarassing bugs, particularly in the recovery code, and it > has had a tendency to crash the system. Has it improved with > age? I suppose that depends on the bugs you've seen. I don't know of anything serious still left in there. There are some issues where incorrect configuration requests are not adequately parsed, and can lead to things like incorrect counts of objects or free space. Under certain circumstances multiple starts can also cause problems. I'm planning to look at least at the second issue Real Soon Now. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Jan 23 1:38:13 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8B0B337B405 for ; Thu, 23 Jan 2003 01:38:09 -0800 (PST) Received: from newsguy.com (smtp.newsguy.com [129.250.170.69]) by mx1.FreeBSD.org (Postfix) with ESMTP id A507D43F3F for ; Thu, 23 Jan 2003 01:38:08 -0800 (PST) (envelope-from dcs@newsguy.com) Received: from newsguy.com (200-163-023-225.bsace7016.dsl.brasiltelecom.net.br [200.163.23.225]) by newsguy.com (8.9.1a/8.9.1) with ESMTP id BAA59840; Thu, 23 Jan 2003 01:37:48 -0800 (PST) Message-ID: <3E2FB7EC.A7C21C29@newsguy.com> Date: Thu, 23 Jan 2003 07:37:48 -0200 From: "Daniel C. Sobral" X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en,pt-BR,pt,en-GB,en-US,ja MIME-Version: 1.0 To: Terry Lambert Cc: Dag-Erling Smorgrav , Cheen Liao , freebsd-fs@FreeBSD.ORG Subject: Re: Transaction File System - a replacement of JFS References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com> <001401c2be93$c36c7490$681adf3d@homexp> <3E2DCE86.4C416E28@mindspring.com> <3E2F0ADB.ACBAA0D6@newsguy.com> <3E2F2B64.A3F382AB@mindspring.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Terry Lambert wrote: > > "Daniel C. Sobral" wrote: > > Since the suggestion was to go directly to 5.0, which will be MUCH > > closer to the interfaces expected to remain stable on 5.x, the above > > remarks are not particularly relevant, however true they might be. > > See other posting. The 5.x kernel API is still in great flux. > At least with 4.x, there is a market for the product for a year > or more, which gives some breathing room for a port. Given > the developement effort involved, it's not worth having to always > look over your shoulder to see if there's a 5.1. When 4.8 comes > out, a 4.7->4.8 port will be trivial. When 5.1 comes out, a > 5.0->5.1 port is going to be a living hell. Mark my words. And, in a year, they'll have to do 4.8->5.1 or 4.8->5.2, which will be MUCH more of a hell than 5.0->5.1 or 5.2. The 5.x branch _will_ become stable. Anything not fitting the timeline will be bumped to 6.x. The question is... will their product be finished in three or four months, for the remaining life of 4.x to be of any use? -- Daniel C. Sobral (8-DCS) dcs@newsguy.com dcs@freebsd.org capo@professional.bsdconspiracy.net Spellng is overated anywy. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Jan 23 3:33:53 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B966C37B401 for ; Thu, 23 Jan 2003 03:33:51 -0800 (PST) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0406C43F6B for ; Thu, 23 Jan 2003 03:33:51 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0347.cvx21-bradley.dialup.earthlink.net ([209.179.193.92] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18bfbv-0006ju-00; Thu, 23 Jan 2003 03:33:32 -0800 Message-ID: <3E2FCC74.9485901C@mindspring.com> Date: Thu, 23 Jan 2003 03:05:24 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "Daniel C. Sobral" Cc: Dag-Erling Smorgrav , Cheen Liao , freebsd-fs@FreeBSD.ORG Subject: Re: Transaction File System - a replacement of JFS References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com> <001401c2be93$c36c7490$681adf3d@homexp> <3E2DCE86.4C416E28@mindspring.com> <3E2F0ADB.ACBAA0D6@newsguy.com> <3E2F2B64.A3F382AB@mindspring.com> <3E2FB7EC.A7C21C29@newsguy.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4a5e9f1aa64c50cd1e2f3ffe720bcddb6548b785378294e88350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org "Daniel C. Sobral" wrote: > And, in a year, they'll have to do 4.8->5.1 or 4.8->5.2, which will be > MUCH more of a hell than 5.0->5.1 or 5.2. But MUCH less hell than porting 4.7->5.0, then 5.0-5.1, then 5.1->5.2... > The 5.x branch _will_ become stable. Anything not fitting the timeline > will be bumped to 6.x. Be sure and annouce it, so that people targetting 4.x can consider starting their porting efforts. 8-). > The question is... will their product be finished in three or four > months, for the remaining life of 4.x to be of any use? People asked that same question in 2001, when 5.x was supposed to be imminent, too. 5.x releases, if they come with any speed at all, will be bug fixes, and features that weren't permitted in 5.0 because waiting almost two years for 5.0 would have been way too long. The answer is that 4.x, as a developement platform, is a better choice, even if you plan on shipping on some other platform. I did tons of developemnt on FreeBSD that shipped on other OS's; the approach worked because I got my code stable, without having to really give a damn about the underlying platform past the point of "yeah, it's there, let me concentrate on my product". Look, I understand why the project wants vendors to target 5.x; the project needs to understand why vendors want to target 4.x. Pesonally, I would not bet my job on meeting a schedule for a product that was intended to ship on 5.x, if the product ws intended to be release in a year or less, at this point... would you? I'll stop now, so that this doesn't turn into a "what's wrong with 5.x" diatribe. -- ery To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Jan 23 13:52:44 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 643F137B401; Thu, 23 Jan 2003 13:52:43 -0800 (PST) Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id 63E3843F3F; Thu, 23 Jan 2003 13:52:42 -0800 (PST) (envelope-from des@ofug.org) Received: by flood.ping.uio.no (Postfix, from userid 2602) id 8B41F536F; Thu, 23 Jan 2003 22:52:40 +0100 (CET) X-URL: http://www.ofug.org/~des/ X-Disclaimer: The views expressed in this message do not necessarily coincide with those of any organisation or company with which I am or have been affiliated. To: Greg Lehey Cc: fs@freebsd.org Subject: Re: Vinum stability (was: RAID alternatives) References: <20030123083114.GB899@sydney.worldwide.lemis.com> From: Dag-Erling Smorgrav Date: Thu, 23 Jan 2003 22:52:40 +0100 In-Reply-To: <20030123083114.GB899@sydney.worldwide.lemis.com> (Greg Lehey's message of "Thu, 23 Jan 2003 16:31:14 +0800") Message-ID: Lines: 22 User-Agent: Gnus/5.090007 (Oort Gnus v0.07) Emacs/21.2 (i386--freebsd) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Greg Lehey writes: > I suppose that depends on the bugs you've seen. I don't know of > anything serious still left in there. There are some issues where > incorrect configuration requests are not adequately parsed, and can > lead to things like incorrect counts of objects or free space. Under > certain circumstances multiple starts can also cause problems. I'm > planning to look at least at the second issue Real Soon Now. I don't remember having had trouble once vinum was actually configured and running (except for the one bug in the recovery code which I fixed way back when, two years ago?), but I've run into a lot of trouble getting it configured. It may have been a combination of pilot error with the "multiple starts" problem you mention. I haven't touched vinum in quite a while, and I'm perfectly willing to believe that it's gotten better since then. Have all the devfs- and geom-related issues been ironed out btw? (except for the growfs thing, which isn't really a vinum problem) DES -- Dag-Erling Smorgrav - des@ofug.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Jan 25 0:12:45 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F241B37B401; Sat, 25 Jan 2003 00:12:43 -0800 (PST) Received: from obsecurity.dyndns.org (adsl-64-169-104-205.dsl.lsan03.pacbell.net [64.169.104.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id EC55443E4A; Sat, 25 Jan 2003 00:12:34 -0800 (PST) (envelope-from kris@obsecurity.org) Received: from rot13.obsecurity.org (rot13.obsecurity.org [10.0.0.5]) by obsecurity.dyndns.org (Postfix) with ESMTP id 52F5966CFB; Sat, 25 Jan 2003 00:12:34 -0800 (PST) Received: by rot13.obsecurity.org (Postfix, from userid 1000) id 1F333170F; Sat, 25 Jan 2003 00:12:34 -0800 (PST) Date: Sat, 25 Jan 2003 00:12:34 -0800 From: Kris Kennaway To: current@FreeBSD.org, fs@FreeBSD.org Subject: INVARIANTS-related fs panic on alpha Message-ID: <20030125081234.GA11722@rot13.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="0F1p//8PRICkK4MW" Content-Disposition: inline User-Agent: Mutt/1.4i Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org --0F1p//8PRICkK4MW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline One of the alpha package clients panicked with this. It was under very high load at the time (25 simultaneous package builds): fatal kernel trap: trap entry = 0x2 (memory management fault) faulting va = 0xdeadc0dedeadc0e6 type = access violation cause = store instruction pc = 0xfffffc000053453c ra = 0xfffffc000053b2a8 sp = 0xfffffe001da15b30 curthread = 0xfffffc003e33b930 pid = 3, comm = g_up Stopped at add_to_worklist+0xac: stq a0,0x8(t0) <0xdeadc0dedeadc0e6> db> trace add_to_worklist() at add_to_worklist+0xac handle_written_inodeblock() at handle_written_inodeblock+0x5e8 softdep_disk_write_complete() at softdep_disk_write_complete+0xac bufdone() at bufdone+0x19c bufdonebio() at bufdonebio+0x1c biodone() at biodone+0x28 g_dev_done() at g_dev_done+0xd8 biodone() at biodone+0x28 g_io_schedule_up() at g_io_schedule_up+0x4c g_up_procbody() at g_up_procbody+0x9c fork_exit() at fork_exit+0x100 exception_return() at exception_return --- root of call graph --- db> --0F1p//8PRICkK4MW Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+MkbxWry0BWjoQKURAmHCAJ4ztPgyniQSbIGk/Eh7WJufN6MzzwCghXil hSqnjctQo/Lt80doeEOZAA8= =npeP -----END PGP SIGNATURE----- --0F1p//8PRICkK4MW-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Jan 25 18:40:44 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CA8B037B405 for ; Sat, 25 Jan 2003 18:40:43 -0800 (PST) Received: from wabakimi.chat.carleton.ca (wabakimi.chat.carleton.ca [134.117.1.98]) by mx1.FreeBSD.org (Postfix) with ESMTP id E6F1343F13 for ; Sat, 25 Jan 2003 18:40:41 -0800 (PST) (envelope-from creyenga@connectmail.carleton.ca) Received: from fireball (resnet-89-057.cavern.carleton.ca [134.117.89.57]) by wabakimi.chat.carleton.ca (8.11.1/8.11.1) with SMTP id h0Q2eZk05826 for ; Sat, 25 Jan 2003 21:40:35 -0500 (EST) Message-ID: <001101c2c4e4$51686960$0200000a@sewer.org> From: "Craig Reyenga" To: Subject: What about a case insensitive Filesystem? Date: Sat, 25 Jan 2003 21:40:41 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Is there any way, either now or in the future, for FreeBSD to be able to have a UFS-based case-insensitive filesystem? It would be great for many applications, such as Samba servers, web servers catered to the general public (angelfire, geocities) and places where the user just doesn't care. Is this at all possible? (I'm not on the list, so CC'ing would be great) -Craig To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Jan 25 20:13:30 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 89AC037B401 for ; Sat, 25 Jan 2003 20:13:29 -0800 (PST) Received: from sydney.lemis.com (dhcp80.trinity.linux.conf.au [130.95.169.80]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0C13D43E4A for ; Sat, 25 Jan 2003 20:13:28 -0800 (PST) (envelope-from grog@sydney.worldwide.lemis.com) Received: from sydney.worldwide.lemis.com (grog@localhost [127.0.0.1]) by sydney.lemis.com (8.12.6/8.12.6) with ESMTP id h0Q4Cx4c004341; Sun, 26 Jan 2003 12:12:59 +0800 (WST) (envelope-from grog@sydney.worldwide.lemis.com) Received: (from grog@localhost) by sydney.worldwide.lemis.com (8.12.6/8.12.6/Submit) id h0Q4CwXA004340; Sun, 26 Jan 2003 12:12:58 +0800 (WST) Date: Sun, 26 Jan 2003 12:12:58 +0800 From: Greg Lehey To: Craig Reyenga Cc: freebsd-fs@freebsd.org Subject: Re: What about a case insensitive Filesystem? Message-ID: <20030126041258.GD3818@sydney.worldwide.lemis.com> References: <001101c2c4e4$51686960$0200000a@sewer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <001101c2c4e4$51686960$0200000a@sewer.org> User-Agent: Mutt/1.4i Organization: The FreeBSD Project Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.FreeBSD.org/ X-PGP-Fingerprint: 9A1B 8202 BCCE B846 F92F 09AC 22E6 F290 507A 4223 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Saturday, 25 January 2003 at 21:40:41 -0500, Craig Reyenga wrote: > > Is there any way, either now or in the future, for FreeBSD to be able to > have a UFS-based case-insensitive filesystem? It would be great for many > applications, such as Samba servers, web servers catered to the general > public (angelfire, geocities) > Is this at all possible? > I'm sure there are ways. I'm not sure that anybody wants to do it. Firstly, what do you mean by "case"? Many languages don't have case. German has one letter (ß) which only exists in lower case. Other ISO 8859-1 languages have different letters and thus different case pairs. Some Microsoft casing conventions are so baroque that it depends on how you access them as to how they get upper-cased. It's a can of worms. > and places where the user just doesn't care. If the user doesn't care, there's no problem with a case-sensitive file system. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Jan 25 20:14:38 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3CCC637B401 for ; Sat, 25 Jan 2003 20:14:37 -0800 (PST) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id EACE843ED8 for ; Sat, 25 Jan 2003 20:14:36 -0800 (PST) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id AB1B9AE2AE; Sat, 25 Jan 2003 20:14:31 -0800 (PST) Date: Sat, 25 Jan 2003 20:14:31 -0800 From: Alfred Perlstein To: Craig Reyenga Cc: freebsd-fs@freebsd.org Subject: Re: What about a case insensitive Filesystem? Message-ID: <20030126041431.GE85104@elvis.mu.org> References: <001101c2c4e4$51686960$0200000a@sewer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <001101c2c4e4$51686960$0200000a@sewer.org> User-Agent: Mutt/1.4i Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org * Craig Reyenga [030125 18:40] wrote: > > Is there any way, either now or in the future, for FreeBSD to be able to > have a UFS-based case-insensitive filesystem? It would be great for many > applications, such as Samba servers, web servers catered to the general > public (angelfire, geocities) and places where the user just doesn't care. > Is this at all possible? > > > (I'm not on the list, so CC'ing would be great) It should not be very hard to do, the only issue might be some interoperability with directory hashing, but that can probably be fixed by just lower/uppercasing the names before building the hash and doing the lookup. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message