From owner-freebsd-current Fri May 10 02:55:31 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id CAA06973 for current-outgoing; Fri, 10 May 1996 02:55:31 -0700 (PDT) Received: from silvia.HIP.Berkeley.EDU (silvia.HIP.Berkeley.EDU [136.152.64.181]) by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id CAA06963 for ; Fri, 10 May 1996 02:55:24 -0700 (PDT) Received: (from asami@localhost) by silvia.HIP.Berkeley.EDU (8.7.5/8.6.9) id CAA00908; Fri, 10 May 1996 02:54:32 -0700 (PDT) Date: Fri, 10 May 1996 02:54:32 -0700 (PDT) Message-Id: <199605100954.CAA00908@silvia.HIP.Berkeley.EDU> To: jgreco@brasil.moneng.mei.com CC: bde@zeta.org.au, current@freebsd.org, nisha@cs.berkeley.edu In-reply-to: <199605091422.JAA00522@brasil.moneng.mei.com> (message from Joe Greco on Thu, 9 May 1996 09:22:11 -0500 (CDT)) Subject: Re: more than 32 scsi disks on a single machine ? From: asami@cs.berkeley.edu (Satoshi Asami) Sender: owner-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Bruce Evans: * > This limits us to 16777216 disks, or only 8388606 disks if we avoid using * > the high bit to avoid sign extension bugs. The limit is in dkunit() in * > * > There are lots of things to change. The encoding would have to be really * > ugly to preserve compatibility with existing device nodes. Joe Greco: * Hmm. * * What if we don't give a damn about existing device nodes? :-) "Take the * plunge". Well we may try that too I guess, but I didn't want to spend hours trying to figure out the right mix in the /dev directory (oh devfs, wasn't you supposed to get rid of the major/minor numbers?), so I went in with the quick and dirty hack. === Index: sys/sys/disklabel.h =================================================================== RCS file: /usr/cvs/src/sys/sys/disklabel.h,v retrieving revision 1.21 diff -u -r1.21 disklabel.h --- 1.21 1996/05/03 05:38:34 +++ disklabel.h 1996/05/10 00:12:30 @@ -374,17 +374,19 @@ ----------------------------------------------------------------- | TYPE |PART2| SLICE | MAJOR? | UNIT |PART | <-soon ----------------------------------------------------------------- + | TYPE | UNIT2 |PART2| SLICE | MAJOR? | UNIT |PART | <-new + ----------------------------------------------------------------- I want 3 more part bits (taken from 'TYPE' (useless as it is) (JRE) */ #define dkmakeminor(unit, slice, part) \ - (((slice) << 16) | ((unit) << 3) | (part)) + (((slice) << 16) | (((unit) & 0x1f) << 3) | (((unit) & 0x1e0) << 19) | (part)) #define dkmodpart(dev, part) (((dev) & ~(dev_t)7) | (part)) #define dkmodslice(dev, slice) (((dev) & ~(dev_t)0x1f0000) | ((slice) << 16)) #define dkpart(dev) (minor(dev) & 7) #define dkslice(dev) ((minor(dev) >> 16) & 0x1f) #define dktype(dev) ((minor(dev) >> 21) & 0x7ff) -#define dkunit(dev) ((minor(dev) >> 3) & 0x1f) +#define dkunit(dev) (((minor(dev) >> 3) & 0x1f) | ((minor(dev) >> 19) & 0x1e0)) #ifdef KERNEL /* Index: sys/etc/etc.i386/MAKEDEV =================================================================== RCS file: /usr/cvs/src/etc/etc.i386/MAKEDEV,v retrieving revision 1.118 diff -u -r1.118 MAKEDEV --- 1.118 1996/05/03 05:37:34 +++ MAKEDEV 1996/05/10 04:47:22 @@ -131,7 +131,7 @@ # Convert disk (type, unit, slice, partition) to minor number dkminor() { - echo $((32 * 65536 * $1 + 8 * $2 + 65536 * $3 + $4)) + echo $((32 * 65536 * $1 + 8 * ($2 % 32) + 256 * 65536 * ($2 / 32) + 65536 * $3 + $4)) } # Convert the last character of a tty name to a minor number. @@ -254,7 +254,7 @@ slice=`expr $i : '..[0-9]*s\([0-9]*\)'` part=`expr $i : '..[0-9]*s[0-9]*\(.*\)'` case $unit in - [0-9]|[1-2][0-9]|30|31) + [0-9]|[1-9][0-9]|[1-4][0-9][0-9]|50[0-9]|510|511) case $slice in [0-9]|[1-2][0-9]|30) oldslice=$slice @@ -414,15 +414,16 @@ esac unit=`expr $i : '..\(.*\)'` case $unit in - [0-9]|[1-2][0-9]|30|31) + [0-9]|[1-9][0-9]|[1-4][0-9][0-9]|50[0-9]|510|511) for slicepartname in s0h s1 s2 s3 s4 do sh MAKEDEV $name$unit$slicepartname done + nunit=$((($unit / 32) * 32 * 65536 + ($unit % 32))) case $name in od|sd) rm -f r${name}${unit}.ctl - mknod r${name}${unit}.ctl c $chr `expr $unit '*' 8 + $scsictl ` + mknod r${name}${unit}.ctl c $chr `expr $nunit '*' 8 + $scsictl ` chmod 600 r${name}${unit}.ctl ;; esac === I grabbed 4 bits from the ever-shrinking "type" field (did Julian actually grab the "part2"? the commit for "reservation" was on 04/95) and changed the macros to reflect that. Then I compiled the kernel, rebooted the machine, modified MAKEDEV to handle the >32 disk case (up to 512 now, since we have 9 bits) and poof! It worked! I now have a 160GB (40 x 4GB) filesystem! :) However, further review showed that actually only 256 of them are usable. With the first disk (bus 0, target 0) wired down to sd500, what I got was: === # This listing automatically generated by lsdev(1) : ahc0 at pci2:4 # int a irq 9 sd244 at SCSI bus 0:0:0 : sd255 at SCSI bus 0:15:0 ahc1 at pci2:5 # int a irq 10 sd0 at SCSI bus 1:0:0 : sd7 at SCSI bus 1:7:0 ahc2 at pci1:4 # int a irq 10 sd8 at SCSI bus 2:4:0 : sd27 at SCSI bus 3:7:0 === Note that 244 = 500 (mod 256). It seems like it's wrapping around at 256, which probably means there's an unsigned int somewhere in the sd (or generic disk) code. So I wired the first disk to 216, and got: === # This listing automatically generated by lsdev(1) : ahc0 at pci2:4 # int a irq 9 sd216 at SCSI bus 0:0:0 : sd227 at SCSI bus 0:15:0 ahc1 at pci2:5 # int a irq 10 sd228 at SCSI bus 1:0:0 : sd235 at SCSI bus 1:7:0 ahc2 at pci1:4 # int a irq 10 sd236 at SCSI bus 2:4:0 : sd247 at SCSI bus 2:15:0 ahc3 at pci1:5 # int a irq 11 sd248 at SCSI bus 3:0:0 : sd255 at SCSI bus 3:7:0 : === I actually tested this whole array, it worked fine, so disks up to 255 are actually usable. What do people think? I'm sure people will agree with Joe that limiting power users to 32 disks (or even less if not contiguous) will seriously damage FreeBSD's reputation as a "server" OS. With Julian and Peter's impending modified driver coming up some time in the future (how long til that? one year?), maybe we shouldn't try to move around things too much. I don't think someone who tries to stick in more than 32 disks into a machine would attempt to mknod them herself, so I guess the apparent ugliness is ok as long as we ship the matching MAKEDEV.... Satoshi