From owner-freebsd-bugs@FreeBSD.ORG Thu Jan 24 02:00:01 2013 Return-Path: Delivered-To: freebsd-bugs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5DE3C275 for ; Thu, 24 Jan 2013 02:00:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 4FADFE0F for ; Thu, 24 Jan 2013 02:00:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r0O201qh087947 for ; Thu, 24 Jan 2013 02:00:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r0O201qh087946; Thu, 24 Jan 2013 02:00:01 GMT (envelope-from gnats) Date: Thu, 24 Jan 2013 02:00:01 GMT Message-Id: <201301240200.r0O201qh087946@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org Cc: From: Joshua Sirrine Subject: Re: kern/154299: [arcmsr] arcmsr fails to detect all attached drives X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Joshua Sirrine List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jan 2013 02:00:01 -0000 The following reply was made to PR kern/154299; it has been noted by GNATS. From: Joshua Sirrine To: bug-followup@FreeBSD.org, Rincebrain@gmail.com Cc: Subject: Re: kern/154299: [arcmsr] arcmsr fails to detect all attached drives Date: Wed, 23 Jan 2013 19:59:23 -0600 This is a multi-part message in MIME format. --------------090005070002040708070202 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit First I'd like to appologize right now if I am sending this email and it is not being routed correctly. This is not the same as the ticket system FreeNAS uses so I'm in new territory. I've been using FreeNAS(FreeBSD) for about a year but I am a quick learner. If I need to provide this information in a form other than email to fix this issue please let me know. I believe I have found the cause for disks not being usable as seen on kern/154299 . Here's what I see on my system. My system uses an Areca 1280ML-24 with Firmware 1.49(latest) and uses FreeNAS 8.3.0 x64(based on FreeBSD 8.3) with areca-cli version Version 1.84, Arclib: 300, Date: Nov 9 2010( FreeBSD ). I found this issue when swapping out backplanes for my hard drives. I had drives populating RAID controller ports 1 through 14. Due to a failed backplane I switched the 2 drives that were connected to ports 13 and 14 to 21 and 22 respectively. All of these disks are in a ZFS RAIDZ3 zpool. Note that I have not had any problems with ZFS scrubs or SMART long tests on these drives and they have been running for more than a year so infant mortality is not an issue. Also the RAID controller is in Non-RAID mode so all disks are JBOD by default. Physical Drive Information # Ch# ModelName Capacity Usage =============================================================================== 1 1 WDC WD20EARS-00S8B1 2000.4GB JBOD 2 2 WDC WD20EARS-00S8B1 2000.4GB JBOD 3 3 WDC WD20EARS-00S8B1 2000.4GB JBOD 4 4 WDC WD20EARS-00S8B1 2000.4GB JBOD 5 5 WDC WD20EARS-00S8B1 2000.4GB JBOD 6 6 WDC WD20EARS-00S8B1 2000.4GB JBOD 7 7 WDC WD20EARS-00S8B1 2000.4GB JBOD 8 8 WDC WD20EARS-00S8B1 2000.4GB JBOD 9 9 WDC WD20EARS-00S8B1 2000.4GB JBOD 10 10 WDC WD20EARS-00S8B1 2000.4GB JBOD 11 11 WDC WD20EARS-00S8B1 2000.4GB JBOD 12 12 WDC WD20EARS-00S8B1 2000.4GB JBOD 13 13 N.A. 0.0GB N.A. 14 14 N.A. 0.0GB N.A. 15 15 N.A. 0.0GB N.A. 16 16 N.A. 0.0GB N.A. 17 17 N.A. 0.0GB N.A. 18 18 N.A. 0.0GB N.A. 19 19 N.A. 0.0GB N.A. 20 20 N.A. 0.0GB N.A. 21 21 WDC WD20EARS-00S8B1 2000.4GB JBOD 22 22 WDC WD20EARS-00S8B1 2000.4GB JBOD 23 23 N.A. 0.0GB N.A. 24 24 N.A. 0.0GB N.A. =============================================================================== With this configuration disks 21 and 22 were not available to me(only 12 of the disks were available). I was using a ZFS RAIDZ3 for all of these disks so I immediately lost 2 disks worth of redundancy. The disks showed up in the RAID controller BIOS as well as the areca-cli(as you can see) but /dev was minus 2 disks and a 'zpool status' showed I had 2 missing drives. As soon as I swapped cables so that the disks were back in ports 13 and 14 on the RAID controller everything went back to normal. Knowing that something was wrong I grabbed some spare drives and started experimenting. I wanted to know what was actually wrong because I am trusting this sytem with my data for production use. Please examine the following VolumeSet Information: VolumeSet Information # Name Raid Name Level Capacity Ch/Id/Lun State =============================================================================== 1 WD20EARS-00S8B1 Raid Set # 00 JBOD 2000.4GB 00/00/00 Normal 2 WD20EARS-00S8B1 Raid Set # 01 JBOD 2000.4GB 00/00/01 Normal 3 WD20EARS-00S8B1 Raid Set # 02 JBOD 2000.4GB 00/00/02 Normal 4 WD20EARS-00S8B1 Raid Set # 03 JBOD 2000.4GB 00/00/03 Normal 5 WD20EARS-00S8B1 Raid Set # 04 JBOD 2000.4GB 00/00/04 Normal 6 WD20EARS-00S8B1 Raid Set # 05 JBOD 2000.4GB 00/00/05 Normal 7 WD20EARS-00S8B1 Raid Set # 06 JBOD 2000.4GB 00/00/06 Normal 8 WD20EARS-00S8B1 Raid Set # 07 JBOD 2000.4GB 00/00/07 Normal 9 WD20EARS-00S8B1 Raid Set # 08 JBOD 2000.4GB 00/01/00 Normal 10 WD20EARS-00S8B1 Raid Set # 09 JBOD 2000.4GB 00/01/01 Normal 11 WD20EARS-00S8B1 Raid Set # 10 JBOD 2000.4GB 00/01/02 Normal 12 WD20EARS-00S8B1 Raid Set # 11 JBOD 2000.4GB 00/01/03 Normal 13 WD20EARS-00S8B1 Raid Set # 12 JBOD 2000.4GB 00/01/04 Normal 14 WD20EARS-00S8B1 Raid Set # 13 JBOD 2000.4GB 00/01/05 Normal =============================================================================== GuiErrMsg<0x00>: Success. This is my normal configuration and all disks work. After experimenting it turns out that if I want to use ports 1 through 8 I MUST have a disk in port 1. For ports 9 through 16 I MUST have a disk in port 9. For ports in 17-24 I MUST have a disk in port 17. It appears there may be something special to CH/ID/LUN=XX/XX/00. If there is no disk at LUN=00 then that entire ID is not available for use by FreeBSD despite the areca-cli properly identifying the disk. If you look at the kern/143299: *arcmsr fails to detect all attached drives. It may or may not have something to do with a failed device attached and e.g. PR 148502 or 150390.* *c.f.:* *[root@manticore ~]# areca-cli disk info;ls /dev/da* /dev/ad*;* *# Ch# ModelName Capacity Usage* *===============================================================================* *1 1 N.A. 0.0GB N.A.* *2 2 N.A. 0.0GB N.A.* *3 3 N.A. 0.0GB N.A.* *4 4 N.A. 0.0GB N.A.* *5 5 N.A. 0.0GB N.A.* *6 6 N.A. 0.0GB N.A.* *7 7 N.A. 0.0GB N.A.* *8 8 N.A. 0.0GB N.A.* *9 9 ST31500341AS 1500.3GB JBOD* *10 10 N.A. 0.0GB N.A.* *11 11 ST31500341AS 1500.3GB JBOD* *12 12 ST31500341AS 1500.3GB JBOD* *13 13 ST31500341AS 1500.3GB JBOD* *14 14 N.A. 0.0GB N.A.* *15 15 ST31500341AS 1500.3GB JBOD* *16 16 ST31500341AS 1500.3GB JBOD* *17 17 N.A. 0.0GB N.A.* *18 18 N.A. 0.0GB N.A.* *19 19 ST31500341AS 1500.3GB JBOD* *20 20 ST31500341AS 1500.3GB JBOD* *21 21 ST31500341AS 1500.3GB JBOD* *22 22 0.0GB Failed* *23 23 ST31500341AS 1500.3GB JBOD* *24 24 ST31500341AS 1500.3GB JBOD* *===============================================================================* *GuiErrMsg<0x00>: Success.* */dev/ad4 /dev/ad4s1 /dev/ad4s1a /dev/ad4s1b /dev/ad4s1d /dev/da0 /dev/da1 /dev/da1p1 /dev/da1p9 /dev/da2 /dev/da3 /dev/da4 /dev/da5* *I count 11 drives attached via the arc1280ml, not including the failed drive, and I see 6 appearing.* *camcontrol rescan all and reboots to do help the issue. I am running firmware 1.49.* If you take what I observed and apply it to his post you will see that only disks 9, 11, 12, 13, 15, and 16 would be available to the system. So this is inline with the poster that says he has only 6 disk available. I am writing this email in hopes that someone can find and fix the issue. I do not have any failed disks to experiment with, but I am convinced based on 4 hours of experimenting last night that the issue may only involve failed disks if a disk fails in ports 1, 9 or 17. --------------090005070002040708070202 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit First I'd like to appologize right now if I am sending this email and it is not being routed correctly.  This is not the same as the ticket system FreeNAS uses so I'm in new territory.  I've been using FreeNAS(FreeBSD) for about a year but I am a quick learner.  If I need to provide this information in a form other than email to fix this issue please let me know.

I believe I have found the cause for disks not being usable as seen on kern/154299.  Here's what I see on my system.  My system uses an Areca 1280ML-24 with Firmware 1.49(latest) and uses FreeNAS 8.3.0 x64(based on FreeBSD 8.3) with areca-cli version Version 1.84, Arclib: 300, Date: Nov 9 2010( FreeBSD ).  I found this issue when swapping out backplanes for my hard drives.

I had drives populating RAID controller ports 1 through 14.  Due to a failed backplane I switched the 2 drives that were connected to ports 13 and 14 to 21 and 22 respectively.  All of these disks are in a ZFS RAIDZ3 zpool.  Note that I have not had any problems with ZFS scrubs or SMART long tests on these drives and they have been running for more than a year so infant mortality is not an issue.  Also the RAID controller is in Non-RAID mode so all disks are JBOD by default.

Physical Drive Information
  # Ch# ModelName                       Capacity  Usage
===============================================================================
  1  1  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
  2  2  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
  3  3  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
  4  4  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
  5  5  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
  6  6  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
  7  7  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
  8  8  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
  9  9  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
 10 10  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
 11 11  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
 12 12  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
 13 13  N.A.                               0.0GB  N.A.     
 14 14  N.A.                               0.0GB  N.A.     
 15 15  N.A.                               0.0GB  N.A.     
 16 16  N.A.                               0.0GB  N.A.     
 17 17  N.A.                               0.0GB  N.A.     
 18 18  N.A.                               0.0GB  N.A.     
 19 19  N.A.                               0.0GB  N.A.     
 20 20  N.A.                               0.0GB  N.A.     
 21 21  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
 22 22  WDC WD20EARS-00S8B1             2000.4GB  JBOD     
 23 23  N.A.                               0.0GB  N.A.     
 24 24  N.A.                               0.0GB  N.A.     
===============================================================================

With this configuration disks 21 and 22 were not available to me(only 12 of the disks were available).  I was using a ZFS RAIDZ3 for all of these disks so I immediately lost 2 disks worth of redundancy.  The disks showed up in the RAID controller BIOS as well as the areca-cli(as you can see) but /dev was minus 2 disks and a 'zpool status' showed I had 2 missing drives.  As soon as I swapped cables so that the disks were back in ports 13 and 14 on the RAID controller everything went back to normal.

Knowing that something was wrong I grabbed some spare drives and started experimenting.  I wanted to know what was actually wrong because I am trusting this sytem with my data for production use. Please examine the following VolumeSet Information:

VolumeSet Information
  # Name             Raid Name       Level   Capacity Ch/Id/Lun  State        
===============================================================================
  1 WD20EARS-00S8B1  Raid Set # 00   JBOD    2000.4GB 00/00/00   Normal
  2 WD20EARS-00S8B1  Raid Set # 01   JBOD    2000.4GB 00/00/01   Normal
  3 WD20EARS-00S8B1  Raid Set # 02   JBOD    2000.4GB 00/00/02   Normal
  4 WD20EARS-00S8B1  Raid Set # 03   JBOD    2000.4GB 00/00/03   Normal
  5 WD20EARS-00S8B1  Raid Set # 04   JBOD    2000.4GB 00/00/04   Normal
  6 WD20EARS-00S8B1  Raid Set # 05   JBOD    2000.4GB 00/00/05   Normal
  7 WD20EARS-00S8B1  Raid Set # 06   JBOD    2000.4GB 00/00/06   Normal
  8 WD20EARS-00S8B1  Raid Set # 07   JBOD    2000.4GB 00/00/07   Normal
  9 WD20EARS-00S8B1  Raid Set # 08   JBOD    2000.4GB 00/01/00   Normal
 10 WD20EARS-00S8B1  Raid Set # 09   JBOD    2000.4GB 00/01/01   Normal
 11 WD20EARS-00S8B1  Raid Set # 10   JBOD    2000.4GB 00/01/02   Normal
 12 WD20EARS-00S8B1  Raid Set # 11   JBOD    2000.4GB 00/01/03   Normal
 13 WD20EARS-00S8B1  Raid Set # 12   JBOD    2000.4GB 00/01/04   Normal
 14 WD20EARS-00S8B1  Raid Set # 13   JBOD    2000.4GB 00/01/05   Normal
===============================================================================
GuiErrMsg<0x00>: Success.

This is my normal configuration and all disks work.  After experimenting it turns out that if I want to use ports 1 through 8 I MUST have a disk in port 1.  For ports 9 through 16 I MUST have a disk in port 9.  For ports in 17-24 I MUST have a disk in port 17.  It appears there may be something special to CH/ID/LUN=XX/XX/00.  If there is no disk at LUN=00 then that entire ID is not available for use by FreeBSD despite the areca-cli properly identifying the disk.

If you look at the kern/143299:

arcmsr fails to detect all attached drives. It may or may not have something to do with a failed device attached and e.g. PR 148502 or 150390.

c.f.:

[root@manticore ~]# areca-cli disk info;ls /dev/da* /dev/ad*;
# Ch# ModelName Capacity Usage
===============================================================================
1 1 N.A. 0.0GB N.A.
2 2 N.A. 0.0GB N.A.
3 3 N.A. 0.0GB N.A.
4 4 N.A. 0.0GB N.A.
5 5 N.A. 0.0GB N.A.
6 6 N.A. 0.0GB N.A.
7 7 N.A. 0.0GB N.A.
8 8 N.A. 0.0GB N.A.
9 9 ST31500341AS 1500.3GB JBOD
10 10 N.A. 0.0GB N.A.
11 11 ST31500341AS 1500.3GB JBOD
12 12 ST31500341AS 1500.3GB JBOD
13 13 ST31500341AS 1500.3GB JBOD
14 14 N.A. 0.0GB N.A.
15 15 ST31500341AS 1500.3GB JBOD
16 16 ST31500341AS 1500.3GB JBOD
17 17 N.A. 0.0GB N.A.
18 18 N.A. 0.0GB N.A.
19 19 ST31500341AS 1500.3GB JBOD
20 20 ST31500341AS 1500.3GB JBOD
21 21 ST31500341AS 1500.3GB JBOD
22 22 0.0GB Failed
23 23 ST31500341AS 1500.3GB JBOD
24 24 ST31500341AS 1500.3GB JBOD
===============================================================================
GuiErrMsg<0x00>: Success.
/dev/ad4 /dev/ad4s1 /dev/ad4s1a /dev/ad4s1b /dev/ad4s1d /dev/da0 /dev/da1 /dev/da1p1 /dev/da1p9 /dev/da2 /dev/da3 /dev/da4 /dev/da5

I count 11 drives attached via the arc1280ml, not including the failed drive, and I see 6 appearing.

camcontrol rescan all and reboots to do help the issue. I am running firmware 1.49.

If you take what I observed and apply it to his post you will see that only disks 9, 11, 12, 13, 15, and 16 would be available to the system.  So this is inline with the poster that says he has only 6 disk available.  I am writing this email in hopes that someone can find and fix the issue.  I do not have any failed disks to experiment with, but I am convinced based on 4 hours of experimenting last night that the issue may only involve failed disks if a disk fails in ports 1, 9 or 17.


--------------090005070002040708070202--