Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Jun 2005 09:07:18 -0600
From:      "Kenneth D. Merry" <ken@freebsd.org>
To:        "Raphael H. Becker" <rabe@p-i-n.com>
Cc:        freebsd-scsi@freebsd.org, Paul Mather <paul@gromit.dlib.vt.edu>, freebsd-current@freebsd.org, "Matthew D. Fuller" <fullermd@over-yonder.net>, Brian Candler <B.Candler@pobox.com>
Subject:   Re: Accessing SCSI-Devices >2TB
Message-ID:  <20050610150718.GA7005@nargothrond.kdm.org>
In-Reply-To: <20050610162814.A25098@p-i-n.com>
References:  <20050608152459.BF24E16A45C@hub.freebsd.org> <1118248386.7479.10.camel@zappa.Chelsea-Ct.Org> <20050608171130.GA64736@over-yonder.net> <1118252322.7479.28.camel@zappa.Chelsea-Ct.Org> <20050609113616.I41471@p-i-n.com> <20050609130511.GA732@uk.tiscali.com> <20050610162814.A25098@p-i-n.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jun 10, 2005 at 16:28:14 +0200, Raphael H. Becker wrote:
> On Thu, Jun 09, 2005 at 02:05:11PM +0100, Brian Candler wrote:
> > On Thu, Jun 09, 2005 at 11:36:16AM +0200, Raphael H. Becker wrote:
> > > The first idea was to have just one large logical drive (LD1) with 12 
> > > physical discs (PD1 - PD12), where P1 is HotSpare. The RAID wants to talk 
> > > a LBA64 dialect of SCSI AFAIK and FreeBSD isn't able to talk this with 
> > > the RAID  --> no /dev/daX!
> > 
> > SCSI has always used a Linear (or Logical) Block Address offset from the
> > start of the disk. What you probably mean is that the controller is issuing
> > a READ(16) command instead of a READ(10), for example. See the SCSI
> > documentation: e.g.
> > http://www.t10.org/ftp/t10/drafts/sbc2/sbc2r16.pdf
> > 
> > Now, setting aside the ccd workarounds for now, IIUC the fundamental problem
> > is that you cannot attach your drive array when it presents itself as a
> > single volume with more than 2^31 blocks.
> > 
> > This means that either:
> > (1) there's a problem with your drive array under this condition; or
> > (2) there's a problem with your SCSI controller under this condition; or
> > (3) there's a problem with FreeBSD under this condition.
> > 
> > To prove which it is, I think you need to show the actual problematic SCSI
> > command sent to the drive, and the actual response (if any) which comes
> > back.
> > 
> > According to your log at
> > http://lists.freebsd.org/pipermail/freebsd-current/2005-June/051163.html
> > it says that FreeBSD is objecting to the response from the drive array
> > (protocol violation in Message In phase)
> > 
> > Perhaps someone here can say what's the best way to enable this level of
> > debugging? From the 5.4 source tree it looks like you can define CAMDEBUG
> > when building the kernel, and then use "camcontrol debug" to enable
> > debugging for a particular target (or "all")
> > 
> > Just a suggestion...
> > 
> > Brian.
> > _______________________________________________
> > freebsd-current@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
> 
> Hi all,
> 
> thank you for the hints about debugging. 
> 
> ( I think this should go over to freebsd-scsi@. I've archived 
> the thread on -current in 
> http://rabe.uugrn.org/temp/FreeBSD/bigraid/RAID_2TB.mbox.gz )
> 
> 
> 
> I've done some testing.
> 
> First was to boot another OS with the RAID in two equal partitions, I
> tried with knoppix 3.9 (Linux 2.6.11):
> http://rabe.uugrn.org/temp/FreeBSD/bigraid/dmesg.knoppix_partition.txt
> 
> ... and with the RAID configuread as one big drive:
> http://rabe.uugrn.org/temp/FreeBSD/bigraid/dmesg.knoppix_onebig.txt
> 
> and here the relevant diffs:
> http://rabe.uugrn.org/temp/FreeBSD/bigraid/dmesg.knoppix_diff.txt

This is quite interesting:

===================================================================
scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36
        <Adaptec 3960D Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs

(scsi1:A:0): 160.000MB/s transfers (80.000MHz DT, offset 127, 16bit)
  Vendor: IFT       Model: A12U-G2421        Rev: 342D
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi1:A:0:0: Tagged Queuing enabled.  Depth 253
sdb : very big device. try to use READ CAPACITY(16).
sdb : READ CAPACITY(16) failed.
sdb : status=0, message=00, host=5, driver=00 
sdb : use 0xffffffff as device size
SCSI device sdb: 4294967296 512-byte hdwr sectors (2199023 MB)
SCSI device sdb: drive cache: write back
sdb : very big device. try to use READ CAPACITY(16).
sdb : READ CAPACITY(16) failed.
sdb : status=0, message=00, host=5, driver=00 
sdb : use 0xffffffff as device size
SCSI device sdb: 4294967296 512-byte hdwr sectors (2199023 MB)
SCSI device sdb: drive cache: write back
 sdb: unknown partition table
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
===================================================================

Linux notices that the device returned 0xffffffff as the capacity in
response to a READ CAPACITY(10) command, so it tries a READ CAPACITY(16)
command, which *fails*.

So even under Linux you aren't getting the full capacity of your device,
you're only getting 2TB.

> Second I rebooted FreeBSD with CAMDEBUG in kernel and enabled it via
> "camcontrol debug ..." and did a "camcontrol rescan 1" then:
> http://rabe.uugrn.org/temp/FreeBSD/bigraid/freebsd54_camdebug.txt

camcontrol debug -I isn't quite what we need in this situation.  Instead,
you should try 'camcontrol debug -c'.

> A complete dmesg.boot of 5.4 can be found under 
> http://rabe.uugrn.org/temp/FreeBSD/bigraid/dmesg.freebsd54_onebig.txt
> 
> I will have to try a SuSE Linux Live System where this should work
> according to the support of the RAID.
> 
> The workaround (2 partitions mapped to two LUNs and merged into a RAID
> in FreeBSD) might work. I have some days for playing around with the
> RAID before I need to set it in production. 
> 
> Any idea, whats wrong with it?

>From what I can see, it's likely the device is misbehaving.  The fact that
the 16 byte read capacity fails under Linux is telling.  If you've got a
device that supports a LUN size greater than 2TB, it must support the 16
byte read capacity and read/write commands.

Here are some more things you can try.  Does your system boot?  If so, we
can try sending a few commands to the device via the pass(4) driver and see
what happens.

First, run 'camcontrol devlist' and see if the array is there and whether
there is a pass device attached.  If so, try this:

camcontrol cmd passX -v -c "25 0 0 0 0 0 0 0 0 0" -i 8 "i4 i4"

That will send a standard 10 byte read capacity command to the device.
Next, try a 16 byte read capacity.  This is where things are likely failing
in the da(4) driver attach, and apparantly where things are failing under
Linux:

camcontrol cmd passX -v -c "9e 10 0 0 0 0 0 0 0 0 0 0 0 c 0 0" -i 12 "i4 i4 i4"

If that works, there is some other problem.  If it fails, then we're
fairly close to the problem.

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050610150718.GA7005>