From owner-freebsd-scsi@freebsd.org Fri Jun 3 21:49:33 2016 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1BAFFB69714 for ; Fri, 3 Jun 2016 21:49:33 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x22b.google.com (mail-wm0-x22b.google.com [IPv6:2a00:1450:400c:c09::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7B4801F77 for ; Fri, 3 Jun 2016 21:49:32 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x22b.google.com with SMTP id a136so13917514wme.0 for ; Fri, 03 Jun 2016 14:49:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to; bh=Y5w6FWe+yOmxubcUCKAyyJxksqmLINDY62mrwo8ULxY=; b=I9pKJRF+7pbhJXZ4USDRUiNJL7J1B7dMaYe4FKM2Gm/LyJU4IOpyC+jCFVsAlrGcfh ERsfa5J2/pEZDgrAKbrZLR+ZTj1iQYoyRJKqy/qYfRAIcnlvkbc7Ha+orJhezWfIKkLQ 1uuLsXTe2C7SkHeJzRVNo8eg3KUxdnBM4WoG5TdilL7vsB+6AlH53sVW6cqrM+emYg4X dap8c1tMwiajdnYsK5pewjuL/Slygywb8KwZCTvzldXEl3Om8VJkgEq0B6BRCfWoS8hk +XZCNvPMkKZ+Di9rtLE4NAu7OIV042g8Xd4+DbE4AFoGm59h/TeHiN8xeh/C6bgr/lv/ CtwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to; bh=Y5w6FWe+yOmxubcUCKAyyJxksqmLINDY62mrwo8ULxY=; b=WQytWz9SV26AlM1cMOlJ8G+rKn8wI8atfg7T3bC3CI5icerC4Gm49mCIhUdwFk4mTD iI08vDxzmBDnK8oanDQUE1bjitt2ywOjsNGXrdus3nNwmun/fnamYcAOhENOTjFAga10 2PGvTYcsmoS75Ch4ckQP+i48GqPTGOEqaLOyh29XPjjqaKRzhOv7ZJpus7Zi5Ot941A1 +PZVva7fvMbSQrWoH+AdA/2teCU2e3pouadWlHUzoxxhXozHFtsCr48EmDE87HhVzKqJ 3tmFFCYbDbbBjk/r8uBL+qn1u89JshBL+MUVipZ3eRV6uma58FGnfEzYazEGTdpuM5dA oJWA== X-Gm-Message-State: ALyK8tLEBVdCeZr98mcKbCAh7FwVflnAXNk2p363wNEgL341AhRDAvhG9TW0NRFtrlth5+U5 X-Received: by 10.28.23.84 with SMTP id 81mr1384081wmx.46.1464990569918; Fri, 03 Jun 2016 14:49:29 -0700 (PDT) Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171]) by smtp.gmail.com with ESMTPSA id o76sm1482300wme.0.2016.06.03.14.49.28 for (version=TLSv1/SSLv3 cipher=OTHER); Fri, 03 Jun 2016 14:49:28 -0700 (PDT) Subject: Re: Avago LSI SAS 3008 & Intel SSD Timeouts To: freebsd-scsi@freebsd.org References: <30c04d8b-80cb-c637-26dc-97caebad3acb@mindpackstudios.com> From: Steven Hartland Message-ID: Date: Fri, 3 Jun 2016 22:49:29 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: <30c04d8b-80cb-c637-26dc-97caebad3acb@mindpackstudios.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jun 2016 21:49:33 -0000 First thing would be to run gstat with -d to see if you're actually stacking up deletes, a symptom of which can be r/w dropping to zero. If you are seeing significant deletes it could be a FW issue on the drives. If you can eliminate that you could still be seeing a cabling / backplane issue. SSD's stress the signalling much more than HDD's and on several occasions we've had to replace backplanes a few times until it was fixed. Other thing to check is FW version on controller and on the drives, are you up to date? On 03/06/2016 21:36, list-news wrote: > Greetings! > > For the last month I've been fighting with a SAS 3008 issues on a > SuperMicro 2028TP-DECR (2 node Twin server). The system is running > 10.3-RELEASE #1 r298749 > > The errors we see on this system look very similar to the errors > posted here back in March: > https://lists.freebsd.org/pipermail/freebsd-scsi/2016-March/007000.html > > The system hosts a large postgresql database and an application which > spends most of its days calculating log entries on a couple hundred GB > of tables. Initially this system had 2 WD HDD mirrored boot drives, > and 8 Intel 730 - 240GB's SSD's in a zfs pool with 4 mirrored vdevs. > Shortly after install/setup, a couple of those Intel 730's began > showing READ & WRITE errors displayed with zpool status, additionally > mpr0 messages were showing drive timeouts when attempting READ(10), > WRITE(10) and SYNCHRONIZE CACHE(10), such as this: > > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): READ(10). CDB: 28 00 15 > 5a 18 c0 00 00 08 00 length 4096 SMID 240 command timeout cm > 0xfffffe0000cefb00 ccb 0xfffff805dfeca800 target 16, handle(0x0011) > Jun 2 19:48:42 s18 kernel: mpr0: At enclosure level 0, slot 8, > connector name ( ) > Jun 2 19:48:42 s18 kernel: mpr0: timedout cm 0xfffffe0000cefb00 > allocated tm 0xfffffe0000cdd3b0 > Jun 2 19:48:42 s18 kernel: (noperiph:mpr0:0:4294967295:0): SMID 15 > Aborting command 0xfffffe0000cefb00 > Jun 2 19:48:42 s18 kernel: mpr0: Sending reset from mprsas_send_abort > for target ID 16 > [...snipped about 50 mores lines for brevity] > > As a solution, the immediate thought was the drives were faulty. We > decided to replace some of the drives with Intel 480GB DC3610's. > After replacing 4 of the drives we noticed the 3610's were also > throwing errors. Read/write error counts displayed with zpool status > would climb into the thousands in a night of calculations with dozens > of mpr errors in the logs. This was happening to all of the drives in > the array. Some are more likely than others, but given enough time, > all drives would display the errors. This provoked some suspicions > that this problem was backplane and/or controller related, and the > drives were fine (or possibly the drive firmware is the same in the > 730's and 3610's and both exhibit this problem, but the backplane did > seem more likely at the time). > > We moved Node A system sled (cpu/mem/sas/mb) into Node B chassis slot > to see if the errors would change at all (thinking maybe some SATA > ports were possibly bad on the backplane). We hit a bit of luck and > noticed the boot drives, when inserted into Node B drive sleds 1&2 > wouldn't fire up and were undetectable by the SAS controller. > SuperMicro concluded this was likely a backplane issue and a backplane > was replaced. The new backplane came in, the drives in Node B sleds 1 > / 2 are now visible by the controller during boot, confirming the > backplane did have a problem. (This is important to mention as it > seems an internets suggested fix for similar timeout errors is to > replace the backplane and/or cables.) > > Unfortunately, the luck ran out, once we put the system back under > load, the errors came back. I rebuilt the pool just using four 3610's > as 2x mirrored vdevs and copied all the data over. Here is the pool as > of yesterday: > > pool: zp1 > state: ONLINE > scan: resilvered 107M in 0h0m with 0 errors on Thu Jun 2 14:52:43 2016 > config: > > NAME STATE READ WRITE CKSUM > zp1 ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > da8 ONLINE 0 0 0 > > When performing logging calculations, the system looks like this > pretty much continuously (via gstat): > > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 7 8175 1374 7526 0.5 3000 48931 0.5 65.2| da3 > 6 8337 1542 8025 0.5 2995 48947 0.5 68.1| da5 > 2 8740 1529 8584 0.5 3138 52564 0.4 72.6| da7 > 3 8637 1427 7961 0.5 3136 52564 0.4 65.6| da8 > > And zpool iostat: > zp1 414G 474G 10.9K 8.94K > 50.2M 51.8M > mirror 207G 237G 5.44K 4.56K > 25.0M 26.2M > da3 - - 2.50K 2.37K > 12.3M 26.2M > da5 - - 2.65K 2.37K > 13.3M 26.1M > mirror 207G 237G 5.42K 4.39K > 25.3M 25.6M > da7 - - 2.62K 2.40K > 13.2M 25.6M > da8 - - 2.57K 2.40K > 12.5M 25.6M > > After a couple minutes (usually between 2 and 10) with drive %busy and > a cpu load around 70%, the system hangs. > > During the hang, one of two scenarios play out (possibly more, but > these are the ones officially witnessed): > > Scenario 1) Processes continue to function as long as they don't touch > the storage. One drive immediately hits 0 r/w 0 r/s, the rest of the > drives quickly fall to 0 r/s 0 w/s (maybe within 1 - 5 seconds), and > the lights on the chassis for the drives goes black. Any commands I > enter in the terminal hang after the newline. > > During Scenario 1 - gstat output (notice the L(q) column for da7): > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 0 0 0 0.0 0 0 0.0 0.0| da3 > 0 0 0 0 0.0 0 0 0.0 0.0| da5 > 8 0 0 0 0.0 0 0 0.0 0.0| da7 > 0 0 0 0 0.0 0 0 0.0 0.0| da8 > > > Scenario 2) Some processes continue to function in a read-only state, > but all writes for all drives fall to 0. > > During Scenario 2 - gstat output (again notice the L(q) column for da3): > dT: 1.002s w: 1.000s > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 69 0 0 0 0.0 0 0 0.0 0.0| da3 > 0 3207 3207 45727 0.2 0 0 0.0 37.8| da5 > 0 2753 2753 24673 0.2 0 0 0.0 23.1| da7 > 0 3027 3027 26425 0.2 0 0 0.0 28.6| da8 > > About 60 seconds later the system becomes responsive again (I assume > kern.cam.da.default_timeout is the 60 second timer, although I haven't > changed this to confirm), and a bunch of error logs appear for the > drive that had a number in column L(q) > 0 (da7 in scenario 1 last > night, da3 in scenario 2 today). Any commands I typed in the terminal > finally fire off, the cpu and disk usage quickly climb back to > 70-80%+. If the errors are bad enough (or possibly if two drives hang > simultaneously?) it'll show read and/or write errors in zpool status. > > Failed attempts to repair: Replaced Backplane. Full memory tests. > Upgraded 3610's to the latest firmware. Used Node B in place of Node > A. Flashed the controller with latest IT firmware from SuperMicro. > Swapped power supplies. Pulled one proc off Node A and half the > memory, put that into Node B and tried the tests again. Set > kern.cam.da.X.delete_method=DISABLE. Probably tuned a couple dozen > other cam and zfs sysctl/loader variables in testing. > > Nothing stopped these errors. > > Until I ran this: > # camcontrol tags daX -N 1 > (where X is 3,5,7 and 8) > > With tags set to 1 for each device, all errors for the last 18 hours > have ceased (system ran all night at load). I've not spent the time > yet to determine how much that affects performance, but it is > definitely working. > > To reproduce this bug. Today I ran: > # camcontrol tags daX -N 8 > > On all drives. A few minutes later, the drive hangs (see gstat output > for da3 above in scenario 2, which I had copied just seconds before > the errors that follow): > > Jun 3 13:34:10 s18 kernel: (da3:mpr0:0:12:0): READ(10). CDB: 28 00 09 > cc 97 30 00 00 08 00 length 4096 SMID 961 command timeout cm > 0xfffffe0000d2ad50 ccb 0xfffff807bfcef800 target 12, handle(0x000d) > Jun 3 13:34:10 s18 kernel: mpr0: At enclosure level 0, slot 4, > connector name ( ) > Jun 3 13:34:10 s18 kernel: mpr0: timedout cm 0xfffffe0000d2ad50 > allocated tm 0xfffffe0000cdc150 > Jun 3 13:34:10 s18 kernel: (noperiph:mpr0:0:4294967295:0): SMID 1 > Aborting command 0xfffffe0000d2ad50 > Jun 3 13:34:10 s18 kernel: mpr0: Sending reset from mprsas_send_abort > for target ID 12 > [...snipped about 50 mores lines for brevity] > > With 1 tag, gstat still shows L(q) values that jump into the 70's at > times, but with what I understand of the source, devstat calculates > the queued operations separate from camcontrol's simultaneous > dev_openings. So I assume that's nothing to be concerned about but > figured I'd mention it. > > # camcontrol tags da7 -v > > (pass7:mpr0:0:16:0): dev_openings 0 > (pass7:mpr0:0:16:0): dev_active 1 > (pass7:mpr0:0:16:0): allocated 1 > (pass7:mpr0:0:16:0): queued 0 > (pass7:mpr0:0:16:0): held 0 > (pass7:mpr0:0:16:0): mintags 2 > (pass7:mpr0:0:16:0): maxtags 255 > > Here is a full error message from last night on da7 (gstat scenario 1 > above was copied right before these lines). Apologies for the > verbosity, but it may be helpful to some, logged with > dev.mpr.0.debug_level=0x037F: > > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): READ(10). CDB: 28 00 15 > 5a 18 c0 00 00 08 00 length 4096 SMID 240 command timeout cm > 0xfffffe0000cefb00 ccb 0xfffff805dfeca800 target 16, handle(0x0011) > Jun 2 19:48:42 s18 kernel: mpr0: At enclosure level 0, slot 8, > connector name ( ) > Jun 2 19:48:42 s18 kernel: mpr0: timedout cm 0xfffffe0000cefb00 > allocated tm 0xfffffe0000cdd3b0 > Jun 2 19:48:42 s18 kernel: (noperiph:mpr0:0:4294967295:0): SMID 15 > Aborting command 0xfffffe0000cefb00 > Jun 2 19:48:42 s18 kernel: mpr0: Sending reset from mprsas_send_abort > for target ID 16 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): READ(10). CDB: 28 00 14 > 24 35 e0 00 00 08 00 length 4096 SMID 562 command timeout cm > 0xfffffe0000d0a1a0 ccb 0xfffff80daeb84000 target 16, handle(0x0011) > Jun 2 19:48:42 s18 kernel: mpr0: At enclosure level 0, slot 8, > connector name ( ) > Jun 2 19:48:42 s18 kernel: mpr0: queued timedout cm > 0xfffffe0000d0a1a0 for processing by tm 0xfffffe0000cdd3b0 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): SYNCHRONIZE CACHE(10). > CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 810 command timeout > cm 0xfffffe0000d1e720 ccb 0xfffff80628c3d800 target 16, handle(0x0011) > Jun 2 19:48:42 s18 kernel: mpr0: At enclosure level 0, slot 8, > connector name ( ) > Jun 2 19:48:42 s18 kernel: mpr0: queued timedout cm > 0xfffffe0000d1e720 for processing by tm 0xfffffe0000cdd3b0 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): READ(10). CDB: 28 00 13 > cd fc 98 00 00 08 00 length 4096 SMID 571 command timeout cm > 0xfffffe0000d0ad70 ccb 0xfffff8018a54b800 target 16, handle(0x0011) > Jun 2 19:48:42 s18 kernel: mpr0: At enclosure level 0, slot 8, > connector name ( ) > Jun 2 19:48:42 s18 kernel: mpr0: queued timedout cm > 0xfffffe0000d0ad70 for processing by tm 0xfffffe0000cdd3b0 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): READ(10). CDB: 28 00 13 > cd fc 90 00 00 08 00 length 4096 SMID 540 command timeout cm > 0xfffffe0000d084c0 ccb 0xfffff80288e58000 target 16, handle(0x0011) > Jun 2 19:48:42 s18 kernel: mpr0: At enclosure level 0, slot 8, > connector name ( ) > Jun 2 19:48:42 s18 kernel: mpr0: queued timedout cm > 0xfffffe0000d084c0 for processing by tm 0xfffffe0000cdd3b0 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): READ(10). CDB: 28 00 13 > cd fc 88 00 00 08 00 length 4096 SMID 817 command timeout cm > 0xfffffe0000d1f050 ccb 0xfffff8004c3ef800 target 16, handle(0x0011) > Jun 2 19:48:42 s18 kernel: mpr0: At enclosure level 0, slot 8, > connector name ( ) > Jun 2 19:48:42 s18 kernel: mpr0: queued timedout cm > 0xfffffe0000d1f050 for processing by tm 0xfffffe0000cdd3b0 > Jun 2 19:48:42 s18 kernel: mpr0: EventReply : > Jun 2 19:48:42 s18 kernel: EventDataLength: 2 > Jun 2 19:48:42 s18 kernel: AckRequired: 0 > Jun 2 19:48:42 s18 kernel: Event: SasDiscovery (0x16) > Jun 2 19:48:42 s18 kernel: EventContext: 0x0 > Jun 2 19:48:42 s18 kernel: Flags: 1 > Jun 2 19:48:42 s18 kernel: ReasonCode: Discovery Started > Jun 2 19:48:42 s18 kernel: PhysicalPort: 0 > Jun 2 19:48:42 s18 kernel: DiscoveryStatus: 0 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): WRITE(10). CDB: 2a 00 > 11 8d fe 50 00 00 10 00 length 8192 SMID 186 completed cm > 0xfffffe0000ceb420 ccb 0xfffff80cd1d88800 during recovery ioc 804b > scsi 0 state c xfer 0 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): WRITE(10). CDB: 2a 00 > 11 8d fe 50 00 00 10 00 length 8192 SMID 186 terminated ioc 804b scsi > 0 state c xfer 0 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): READ(10). CDB: 28 00 13 > cd fc 98 00 00 08 00 length 4096 SMID 571 completed timedout cm > 0xfffffe0000d0ad70 ccb 0xfffff8018a54b800 during recovery ioc 804b > scsi 0 state c xf (da7:mpr0:0:16:0): READ(10). CDB: 28 00 13 cd fc > 98 00 00 08 00 length 4096 SMID 571 terminated ioc 804b scsi 0 state c > xfer 0 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): READ(10). CDB: 28 00 13 > cd fc 90 00 00 08 00 length 4096 SMID 540 completed timedout cm > 0xfffffe0000d084c0 ccb 0xfffff80288e58000 during recovery ioc 804b > scsi 0 state c xf (da7:mpr0:0:16:0): READ(10). CDB: 28 00 13 cd fc > 90 00 00 08 00 length 4096 SMID 540 terminated ioc 804b scsi 0 state c > xfer 0 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): READ(10). CDB: 28 00 13 > cd fc 88 00 00 08 00 length 4096 SMID 817 completed timedout cm > 0xfffffe0000d1f050 ccb 0xfffff8004c3ef800 during recovery ioc 804b > scsi 0 state c xf (da7:mpr0:0:16:0): READ(10). CDB: 28 00 13 cd fc > 88 00 00 08 00 length 4096 SMID 817 terminated ioc 804b scsi 0 state c > xfer 0 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): READ(10). CDB: 28 00 15 > 5a 18 c0 00 00 08 00 length 4096 SMID 240 completed timedout cm > 0xfffffe0000cefb00 ccb 0xfffff805dfeca800 during recovery ioc 8048 > scsi 0 state c xf (da7:mpr0:0:16:0): READ(10). CDB: 28 00 14 24 35 > e0 00 00 08 00 length 4096 SMID 562 completed timedout cm > 0xfffffe0000d0a1a0 ccb 0xfffff80daeb84000 during recovery ioc 804b > scsi 0 state c xf (da7:mpr0:0:16:0): READ(10). CDB: 28 00 14 24 35 > e0 00 00 08 00 length 4096 SMID 562 terminated ioc 804b scsi 0 state c > xfer 0 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): SYNCHRONIZE CACHE(10). > CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 810 completed > timedout cm 0xfffffe0000d1e720 ccb 0xfffff80628c3d800 during recovery > ioc 804b scsi 0 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): SYNCHRONIZE CACHE(10). > CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 810 terminated ioc > 804b scsi 0 state c xfer 0 > Jun 2 19:48:42 s18 kernel: (xpt0:mpr0:0:16:0): SMID 15 abort TaskMID > 240 status 0x0 code 0x0 count 7 > Jun 2 19:48:42 s18 kernel: (xpt0:mpr0:0:16:0): SMID 15 finished > recovery after aborting TaskMID 240 > Jun 2 19:48:42 s18 kernel: mpr0: Unfreezing devq for target ID 16 > Jun 2 19:48:42 s18 kernel: mpr0: EventReply : > Jun 2 19:48:42 s18 kernel: EventDataLength: 4 > Jun 2 19:48:42 s18 kernel: AckRequired: 0 > Jun 2 19:48:42 s18 kernel: Event: SasTopologyChangeList (0x1c) > Jun 2 19:48:42 s18 kernel: EventContext: 0x0 > Jun 2 19:48:42 s18 kernel: EnclosureHandle: 0x2 > Jun 2 19:48:42 s18 kernel: ExpanderDevHandle: 0x9 > Jun 2 19:48:42 s18 kernel: NumPhys: 31 > Jun 2 19:48:42 s18 kernel: NumEntries: 1 > Jun 2 19:48:42 s18 kernel: StartPhyNum: 8 > Jun 2 19:48:42 s18 kernel: ExpStatus: Responding (0x3) > Jun 2 19:48:42 s18 kernel: PhysicalPort: 0 > Jun 2 19:48:42 s18 kernel: PHY[8].AttachedDevHandle: 0x0011 > Jun 2 19:48:42 s18 kernel: PHY[8].LinkRate: 12.0Gbps (0xbb) > Jun 2 19:48:42 s18 kernel: PHY[8].PhyStatus: PHYLinkStatusChange > Jun 2 19:48:42 s18 kernel: mpr0: (73)->(mprsas_fw_work) Working on > Event: [16] > Jun 2 19:48:42 s18 kernel: mpr0: (74)->(mprsas_fw_work) Event Free: [16] > Jun 2 19:48:42 s18 kernel: mpr0: (74)->(mprsas_fw_work) Working on > Event: [1c] > Jun 2 19:48:42 s18 kernel: mpr0: (75)->(mprsas_fw_work) Event Free: [1c] > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): READ(10). CDB: 28 00 15 > 5a 18 c0 00 00 08 00 > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): CAM status: Command > timeout > Jun 2 19:48:42 s18 kernel: (da7:mpr0:0:16:0): Retrying command > Jun 2 19:48:43 s18 kernel: mpr0: EventReply : > Jun 2 19:48:43 s18 kernel: EventDataLength: 2 > Jun 2 19:48:43 s18 kernel: AckRequired: 0 > Jun 2 19:48:43 s18 kernel: Event: SasDiscovery (0x16) > Jun 2 19:48:43 s18 kernel: EventContext: 0x0 > Jun 2 19:48:43 s18 kernel: Flags: 0 > Jun 2 19:48:43 s18 kernel: ReasonCode: Discovery Complete > Jun 2 19:48:43 s18 kernel: PhysicalPort: 0 > Jun 2 19:48:43 s18 kernel: DiscoveryStatus: 0 > Jun 2 19:48:43 s18 kernel: mpr0: (75)->(mprsas_fw_work) Working on > Event: [16] > Jun 2 19:48:43 s18 kernel: mpr0: (76)->(mprsas_fw_work) Event Free: [16] > Jun 2 19:48:43 s18 kernel: (da7:mpr0:0:16:0): WRITE(10). CDB: 2a 00 > 11 8d fe 50 00 00 10 00 > Jun 2 19:48:43 s18 kernel: (da7:mpr0:0:16:0): CAM status: SCSI Status > Error > Jun 2 19:48:43 s18 kernel: (da7:mpr0:0:16:0): SCSI status: Check > Condition > Jun 2 19:48:43 s18 kernel: (da7:mpr0:0:16:0): SCSI sense: UNIT > ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) > Jun 2 19:48:43 s18 kernel: (da7:mpr0:0:16:0): Retrying command (per > sense data) > > I'm not sure if this is an Intel, CAM, or Avago issue (or something > else?). But I'd be happy to help figure it out if anyone has any > suggestions for me to test? > > Thanks! > > -Kyle > > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"