From owner-freebsd-current@FreeBSD.ORG Sun Jun 12 03:54:46 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F232616A41F; Sun, 12 Jun 2005 03:54:45 +0000 (GMT) (envelope-from ken@nargothrond.kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5687743D48; Sun, 12 Jun 2005 03:54:45 +0000 (GMT) (envelope-from ken@nargothrond.kdm.org) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.12.11/8.12.11) with ESMTP id j5C3siTO021294; Sat, 11 Jun 2005 21:54:44 -0600 (MDT) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.12.11/8.12.5/Submit) id j5C3seJv021293; Sat, 11 Jun 2005 21:54:40 -0600 (MDT) (envelope-from ken) Date: Sat, 11 Jun 2005 21:54:40 -0600 From: "Kenneth D. Merry" To: "Raphael H. Becker" Message-ID: <20050612035440.GA21262@nargothrond.kdm.org> References: <20050608152459.BF24E16A45C@hub.freebsd.org> <1118248386.7479.10.camel@zappa.Chelsea-Ct.Org> <20050608171130.GA64736@over-yonder.net> <1118252322.7479.28.camel@zappa.Chelsea-Ct.Org> <20050609113616.I41471@p-i-n.com> <20050609130511.GA732@uk.tiscali.com> <20050610162814.A25098@p-i-n.com> <20050610150718.GA7005@nargothrond.kdm.org> <20050612002508.B25098@p-i-n.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612002508.B25098@p-i-n.com> User-Agent: Mutt/1.4.2i X-Virus-Scanned: ClamAV 0.85.1/925/Sat Jun 11 12:55:32 2005 on nargothrond.kdm.org X-Virus-Status: Clean Cc: freebsd-scsi@freebsd.org, freebsd-current@freebsd.org Subject: Re: Accessing SCSI-Devices >2TB X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Jun 2005 03:54:46 -0000 On Sun, Jun 12, 2005 at 00:25:08 +0200, Raphael H. Becker wrote: > On Fri, Jun 10, 2005 at 09:07:18AM -0600, Kenneth D. Merry wrote: > > > > and here the relevant diffs: > > > http://rabe.uugrn.org/temp/FreeBSD/bigraid/dmesg.knoppix_diff.txt > > > > This is quite interesting: > [....] > > Linux notices that the device returned 0xffffffff as the capacity in > > response to a READ CAPACITY(10) command, so it tries a READ CAPACITY(16) > > command, which *fails*. > > > > So even under Linux you aren't getting the full capacity of your device, > > you're only getting 2TB. > > The support told me, SuSE Linux is known to work with >2TB in one device, > means they might have some patches to work around. I will try a SuSE > live system next days just to get sure it works. But the System won't > be SuSE in future. It would be interesting to see whether that works. That would help narrow down the problem slightly. > > > Second I rebooted FreeBSD with CAMDEBUG in kernel and enabled it via > > > "camcontrol debug ..." and did a "camcontrol rescan 1" then: > > > http://rabe.uugrn.org/temp/FreeBSD/bigraid/freebsd54_camdebug.txt > > > > camcontrol debug -I isn't quite what we need in this situation. Instead, > > you should try 'camcontrol debug -c'. > > # camcontrol debug -c 1:0 > # camcontrol rescan 1 > Re-scan of bus 1 was successful > > in /var/log/messages: > > kernel: (probe0:ahc1:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 > kernel: (probe0:ahc1:0:0:0): INQUIRY. CDB: 12 0 0 0 24 0 > kernel: (probe0:ahc1:0:0:0): INQUIRY. CDB: 12 0 0 0 fc 0 > kernel: (probe0:ahc1:0:0:0): MODE SENSE(06). CDB: 1a 0 a 0 14 0 > kernel: (probe0:ahc1:0:0:0): INQUIRY. CDB: 12 1 80 0 ff 0 > kernel: (probe0:ahc1:0:0:1): INQUIRY. CDB: 12 20 0 0 24 0 > kernel: (probe0:ahc1:0:0:2): INQUIRY. CDB: 12 40 0 0 24 0 > kernel: (probe0:ahc1:0:0:3): INQUIRY. CDB: 12 60 0 0 24 0 > kernel: (probe0:ahc1:0:0:4): INQUIRY. CDB: 12 80 0 0 24 0 > kernel: (probe0:ahc1:0:0:5): INQUIRY. CDB: 12 a0 0 0 24 0 > kernel: (probe0:ahc1:0:0:6): INQUIRY. CDB: 12 c0 0 0 24 0 > kernel: (probe0:ahc1:0:0:7): INQUIRY. CDB: 12 e0 0 0 24 0 > > Does not say anything to me. Hmm, well, you're not going to see the problem CDB that way, because the probe has already happened. To see it, you either need to compile in the debugging flags, or do the following: - unplug the cable from the machine to the RAID array - camcontrol rescan 1 - plug the cable back in - camcontrol rescan 1 > > > Any idea, whats wrong with it? > > > > >From what I can see, it's likely the device is misbehaving. The fact that > > the 16 byte read capacity fails under Linux is telling. If you've got a > > device that supports a LUN size greater than 2TB, it must support the 16 > > byte read capacity and read/write commands. > > So you would say this is a misbehaviour of the RAID's firmware/controller? It's either the RAID box or the ahc driver from what I can see at this point. See below. > > Here are some more things you can try. Does your system boot? > Well, that RAID is just one of 3 RAIDs, the system is on the internal PERC-RAID. > > > If so, we > > can try sending a few commands to the device via the pass(4) driver and see > > what happens. > > > First, run 'camcontrol devlist' and see if the array is there and whether > > there is a pass device attached. If so, try this: > > > > camcontrol cmd passX -v -c "25 0 0 0 0 0 0 0 0 0" -i 8 "i4 i4" > > at scbus1 target 0 lun 0 (pass3) > # camcontrol cmd pass3 -v -c "25 0 0 0 0 0 0 0 0 0" -i 8 "i4 i4" > -1 512 Okay, that's good. The -1 means that the RAID box is telling us that we need to send the 16 byte read capacity command to get the true capacity. (That's what a capacity of 0xffffffff, or -1, means.) > > That will send a standard 10 byte read capacity command to the device. > > Next, try a 16 byte read capacity. This is where things are likely failing > > in the da(4) driver attach, and apparantly where things are failing under > > Linux: > > > > camcontrol cmd passX -v -c "9e 10 0 0 0 0 0 0 0 0 0 0 0 c 0 0" -i 12 "i4 i4 i4" > > # camcontrol cmd pass3 -v -c "9e 10 0 0 0 0 0 0 0 0 0 0 0 c 0 0" -i 12 "i4 i4 i4" > camcontrol: error sending command > (pass3:ahc1:0:0:0): SERVICE ACTION IN(16). CDB: 9e 10 0 0 0 0 0 0 0 0 0 0 0 c 0 0 > (pass3:ahc1:0:0:0): CAM Status: Target Bus Phase Sequence Failure > > dmesg: > (pass3:ahc1:0:0:0): No or incomplete CDB sent to device. > (pass3:ahc1:0:0:0): Protocol violation in Message-in phase. Attempting to abort. > (pass3:ahc1:0:0:0): Abort Tag Message Sent > (pass3:ahc1:0:0:0): SCB 8 - Abort Tag Completed. Hmm, okay, at this point, we have a SCSI protocol violation. (Which is the same thing you saw before.) So this pretty much means it is the 16 byte read capacity that is triggering the problem. The question is, is this problem on the RAID box or in the ahc driver? I would lean towards saying the RAID box has the issue, but Justin (CCed) may be able to give a little more insight. > > If that works, there is some other problem. If it fails, then we're > > fairly close to the problem. > > So, if it's a problem with the RAIDs firmware and/or maybe hardware, > do you expect there's a workaround in FreeBSD for it? It's either a problem with the firmware on the RAID controller or with the ahc driver. If it turns out that the RAID controller is at fault, then you'll need to get fixed firmware. It'll be interesting to see whether a SuSE live system works with it. (And reports a capacity that is greater than 2TB.) Ken -- Kenneth Merry ken@FreeBSD.ORG