Date: Fri, 10 Jun 2005 09:07:18 -0600 From: "Kenneth D. Merry" <ken@freebsd.org> To: "Raphael H. Becker" <rabe@p-i-n.com> Cc: freebsd-scsi@freebsd.org, Paul Mather <paul@gromit.dlib.vt.edu>, freebsd-current@freebsd.org, "Matthew D. Fuller" <fullermd@over-yonder.net>, Brian Candler <B.Candler@pobox.com> Subject: Re: Accessing SCSI-Devices >2TB Message-ID: <20050610150718.GA7005@nargothrond.kdm.org> In-Reply-To: <20050610162814.A25098@p-i-n.com> References: <20050608152459.BF24E16A45C@hub.freebsd.org> <1118248386.7479.10.camel@zappa.Chelsea-Ct.Org> <20050608171130.GA64736@over-yonder.net> <1118252322.7479.28.camel@zappa.Chelsea-Ct.Org> <20050609113616.I41471@p-i-n.com> <20050609130511.GA732@uk.tiscali.com> <20050610162814.A25098@p-i-n.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jun 10, 2005 at 16:28:14 +0200, Raphael H. Becker wrote: > On Thu, Jun 09, 2005 at 02:05:11PM +0100, Brian Candler wrote: > > On Thu, Jun 09, 2005 at 11:36:16AM +0200, Raphael H. Becker wrote: > > > The first idea was to have just one large logical drive (LD1) with 12 > > > physical discs (PD1 - PD12), where P1 is HotSpare. The RAID wants to talk > > > a LBA64 dialect of SCSI AFAIK and FreeBSD isn't able to talk this with > > > the RAID --> no /dev/daX! > > > > SCSI has always used a Linear (or Logical) Block Address offset from the > > start of the disk. What you probably mean is that the controller is issuing > > a READ(16) command instead of a READ(10), for example. See the SCSI > > documentation: e.g. > > http://www.t10.org/ftp/t10/drafts/sbc2/sbc2r16.pdf > > > > Now, setting aside the ccd workarounds for now, IIUC the fundamental problem > > is that you cannot attach your drive array when it presents itself as a > > single volume with more than 2^31 blocks. > > > > This means that either: > > (1) there's a problem with your drive array under this condition; or > > (2) there's a problem with your SCSI controller under this condition; or > > (3) there's a problem with FreeBSD under this condition. > > > > To prove which it is, I think you need to show the actual problematic SCSI > > command sent to the drive, and the actual response (if any) which comes > > back. > > > > According to your log at > > http://lists.freebsd.org/pipermail/freebsd-current/2005-June/051163.html > > it says that FreeBSD is objecting to the response from the drive array > > (protocol violation in Message In phase) > > > > Perhaps someone here can say what's the best way to enable this level of > > debugging? From the 5.4 source tree it looks like you can define CAMDEBUG > > when building the kernel, and then use "camcontrol debug" to enable > > debugging for a particular target (or "all") > > > > Just a suggestion... > > > > Brian. > > _______________________________________________ > > freebsd-current@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-current > > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > > Hi all, > > thank you for the hints about debugging. > > ( I think this should go over to freebsd-scsi@. I've archived > the thread on -current in > http://rabe.uugrn.org/temp/FreeBSD/bigraid/RAID_2TB.mbox.gz ) > > > > I've done some testing. > > First was to boot another OS with the RAID in two equal partitions, I > tried with knoppix 3.9 (Linux 2.6.11): > http://rabe.uugrn.org/temp/FreeBSD/bigraid/dmesg.knoppix_partition.txt > > ... and with the RAID configuread as one big drive: > http://rabe.uugrn.org/temp/FreeBSD/bigraid/dmesg.knoppix_onebig.txt > > and here the relevant diffs: > http://rabe.uugrn.org/temp/FreeBSD/bigraid/dmesg.knoppix_diff.txt This is quite interesting: =================================================================== scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36 <Adaptec 3960D Ultra160 SCSI adapter> aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs (scsi1:A:0): 160.000MB/s transfers (80.000MHz DT, offset 127, 16bit) Vendor: IFT Model: A12U-G2421 Rev: 342D Type: Direct-Access ANSI SCSI revision: 03 scsi1:A:0:0: Tagged Queuing enabled. Depth 253 sdb : very big device. try to use READ CAPACITY(16). sdb : READ CAPACITY(16) failed. sdb : status=0, message=00, host=5, driver=00 sdb : use 0xffffffff as device size SCSI device sdb: 4294967296 512-byte hdwr sectors (2199023 MB) SCSI device sdb: drive cache: write back sdb : very big device. try to use READ CAPACITY(16). sdb : READ CAPACITY(16) failed. sdb : status=0, message=00, host=5, driver=00 sdb : use 0xffffffff as device size SCSI device sdb: 4294967296 512-byte hdwr sectors (2199023 MB) SCSI device sdb: drive cache: write back sdb: unknown partition table Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0 =================================================================== Linux notices that the device returned 0xffffffff as the capacity in response to a READ CAPACITY(10) command, so it tries a READ CAPACITY(16) command, which *fails*. So even under Linux you aren't getting the full capacity of your device, you're only getting 2TB. > Second I rebooted FreeBSD with CAMDEBUG in kernel and enabled it via > "camcontrol debug ..." and did a "camcontrol rescan 1" then: > http://rabe.uugrn.org/temp/FreeBSD/bigraid/freebsd54_camdebug.txt camcontrol debug -I isn't quite what we need in this situation. Instead, you should try 'camcontrol debug -c'. > A complete dmesg.boot of 5.4 can be found under > http://rabe.uugrn.org/temp/FreeBSD/bigraid/dmesg.freebsd54_onebig.txt > > I will have to try a SuSE Linux Live System where this should work > according to the support of the RAID. > > The workaround (2 partitions mapped to two LUNs and merged into a RAID > in FreeBSD) might work. I have some days for playing around with the > RAID before I need to set it in production. > > Any idea, whats wrong with it? >From what I can see, it's likely the device is misbehaving. The fact that the 16 byte read capacity fails under Linux is telling. If you've got a device that supports a LUN size greater than 2TB, it must support the 16 byte read capacity and read/write commands. Here are some more things you can try. Does your system boot? If so, we can try sending a few commands to the device via the pass(4) driver and see what happens. First, run 'camcontrol devlist' and see if the array is there and whether there is a pass device attached. If so, try this: camcontrol cmd passX -v -c "25 0 0 0 0 0 0 0 0 0" -i 8 "i4 i4" That will send a standard 10 byte read capacity command to the device. Next, try a 16 byte read capacity. This is where things are likely failing in the da(4) driver attach, and apparantly where things are failing under Linux: camcontrol cmd passX -v -c "9e 10 0 0 0 0 0 0 0 0 0 0 0 c 0 0" -i 12 "i4 i4 i4" If that works, there is some other problem. If it fails, then we're fairly close to the problem. Ken -- Kenneth Merry ken@FreeBSD.ORG
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050610150718.GA7005>