From owner-freebsd-scsi@FreeBSD.ORG Mon Apr 16 11:07:27 2012 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 228F81065674 for ; Mon, 16 Apr 2012 11:07:27 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 0C4CE8FC1A for ; Mon, 16 Apr 2012 11:07:27 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q3GB7QJp022512 for ; Mon, 16 Apr 2012 11:07:26 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q3GB7QXN022510 for freebsd-scsi@FreeBSD.org; Mon, 16 Apr 2012 11:07:26 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 16 Apr 2012 11:07:26 GMT Message-Id: <201204161107.q3GB7QXN022510@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-scsi@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Apr 2012 11:07:27 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/165982 scsi [mpt] mpt instability, drive resets, and losses on Fre o kern/165740 scsi [cam] SCSI code must drain callbacks before free o kern/163713 scsi [aic7xxx] [patch] Add Adaptec29329LPE to aic79xx_pci.c o kern/162256 scsi [mpt] QUEUE FULL EVENT and 'mpt_cam_event: 0x0' o kern/161809 scsi [cam] [patch] set kern.cam.boot_delay via build option o kern/159412 scsi [ciss] 7.3 RELEASE: ciss0 ADAPTER HEARTBEAT FAILED err o kern/157770 scsi [iscsi] [panic] iscsi_initiator panic o kern/154432 scsi [xpt] run_interrupt_driven_hooks: still waiting after o kern/153514 scsi [cam] [panic] CAM related panic o kern/153361 scsi [ciss] Smart Array 5300 boot/detect drive problem o kern/152250 scsi [ciss] [patch] Kernel panic when hw.ciss.expose_hidden o kern/151564 scsi [ciss] ciss(4) should increase CISS_MAX_LOGICAL to 10 o docs/151336 scsi Missing documentation of scsi_ and ata_ functions in c s kern/149927 scsi [cam] hard drive not stopped before removing power dur o kern/148083 scsi [aac] Strange device reporting o kern/147704 scsi [mpt] sys/dev/mpt: new chip revision, partially unsupp o kern/146287 scsi [ciss] ciss(4) cannot see more than one SmartArray con o kern/145768 scsi [mpt] can't perform I/O on SAS based SAN disk in freeb o kern/144648 scsi [aac] Strange values of speed and bus width in dmesg o kern/144301 scsi [ciss] [hang] HP proliant server locks when using ciss o kern/142351 scsi [mpt] LSILogic driver performance problems o kern/134488 scsi [mpt] MPT SCSI driver probes max. 8 LUNs per device o kern/132250 scsi [ciss] ciss driver does not support more then 15 drive o kern/132206 scsi [mpt] system panics on boot when mirroring and 2nd dri o kern/130621 scsi [mpt] tranfer rate is inscrutable slow when use lsi213 o kern/129602 scsi [ahd] ahd(4) gets confused and wedges SCSI bus o kern/128452 scsi [sa] [panic] Accessing SCSI tape drive randomly crashe o kern/128245 scsi [scsi] "inquiry data fails comparison at DV1 step" [re o kern/127927 scsi [isp] isp(4) target driver crashes kernel when set up o kern/127717 scsi [ata] [patch] [request] - support write cache toggling o kern/123674 scsi [ahc] ahc driver dumping o kern/123520 scsi [ahd] unable to boot from net while using ahd o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/120487 scsi [sg] scsi_sg incompatible with scanners o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load o kern/60598 scsi wire down of scsi devices conflicts with config s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce 50 problems total. From owner-freebsd-scsi@FreeBSD.ORG Tue Apr 17 23:54:46 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A981106564A for ; Tue, 17 Apr 2012 23:54:46 +0000 (UTC) (envelope-from Terence_Telkamp@DELL.com) Received: from aussmtpmrkpc120.us.dell.com (aussmtpmrkpc120.us.dell.com [143.166.82.159]) by mx1.freebsd.org (Postfix) with ESMTP id A13738FC08 for ; Tue, 17 Apr 2012 23:54:45 +0000 (UTC) X-Loopcount0: from 64.238.244.148 X-IronPort-AV: E=Sophos;i="4.75,438,1330927200"; d="scan'208,217";a="499508222" Received: from mail.compellent.com ([64.238.244.148]) by aussmtpmrkpc120.us.dell.com with ESMTP; 17 Apr 2012 18:53:37 -0500 From: Terence Telkamp To: "freebsd-scsi@freebsd.org" Date: Tue, 17 Apr 2012 18:53:36 -0500 Thread-Topic: Impact of changes made to umass.c at r232358 Thread-Index: Ac0c9U2icIreXGSaTgqy/tC1W0GlHA== Message-ID: <975552A94CBC0F4DA60ED7B36C949CBA03E63D25A1@shandy.Beer.Town> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Impact of changes made to umass.c at r232358 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Apr 2012 23:54:46 -0000 I am seeing a kernel panic in FreeBSD 8.1, which is reproduced after physic= ally attaching and detaching a USB device several times. The kernel debugg= er shows that the panic happens in camisr where the cam_sim and its associa= ted mutex are clearly destroyed. sim->refcount is 0, sim->softc is 1 (UMAS= S_GONE), and the sim->mtx is destroyed (mtx_lock =3D 6). This looks very similar to FreeBSD PR kern/153514, which is unfortunately u= nresolved. http://www.freebsd.org/cgi/query-pr.cgi?pr=3D153514 Is it possible that the changes made to umass.c at r232358 might fix this i= ssue? I currently have a machine in this state, so I can gather information from = kdb if it will be helpful. Here is some debug information that I have alre= ady collected: db> show msgbuf msgbufp =3D 0xffffffff84420fe0 magic =3D 63062, size =3D 65504, r=3D 53501, w =3D 54139, ptr =3D 0xfffffff= f84411000, cksum=3D 4373525 0:0): got CAM status 0xa (da3:umass-sim0:0:0:0): fatal error, failed to attach to device (da3:umass-sim0:0:0:0): removing device entry Fatal trap 12: page fault while in kernel mode cpuid =3D 3; apic id =3D 06 fault virtual address =3D 0x290 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff80284c71 stack pointer =3D 0x28:0xffffff800014daf0 frame pointer =3D 0x28:0xffffff800014db40 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 11 (swi2: cambio) Kernel debug trap Tracing pid 11 tid 100037 td 0xffffff0009014ba0 _mtx_lock_sleep() at _mtx_lock_sleep+0x71 _mtx_lock_flags() at _mtx_lock_flags+0xb8 camisr() at camisr+0xc6 intr_event_execute_handlers() at intr_event_execute_handlers+0x66 ithread_loop() at ithread_loop+0x8e fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xffffff800014dd30, rbp =3D 0 --- db> show pcpu cpuid =3D 3 dynamic pcpu =3D 0xffffff807fa22100 curthread =3D 0xffffff0009014ba0: pid 11 "swi2: cambio" curpcb =3D 0xffffff800014dd40 fpcurthread =3D none idlethread =3D 0xffffff0005f4f7c0: pid 10 "idle: cpu3" curpmap =3D 0 tssp =3D 0xffffffff80848738 commontssp =3D 0xffffffff80848738 rsp0 =3D 0xffffff800014dd40 gs32p =3D 0xffffffff80847570 ldt =3D 0xffffffff808475b0 tss =3D 0xffffffff808475a0 db> show thread 100037 Thread 100037 at 0xffffff0009014ba0: proc (pid 11): 0xffffff0005f48460 name: swi2: cambio stack: 0xffffff800014a000-0xffffff800014dfff flags: 0x10004 pflags: 0x210400 state: RUNNING (CPU 3) priority: 44 container lock: sched lock 3 (0xffffffff8064f180) db> show lock 0xffffffff8064f180 class: spin mutex name: sched lock 3 flags: {SPIN, RECURSE} state: {UNOWNED} db> show registers cs 0x20 WAKEUP_efer ds 0x3b WAKEUP_lstar+0x3 es 0x3b003b fs 0x290001b0013 gs 0x290001b ss 0x28 WAKEUP_pat rax 0x6 rcx 0 rdx 0 rbx 0x4 rsp 0xffffff800014daf0 rbp 0xffffff800014db40 rsi 0xffffff0009014ba0 rdi 0xffffff017d0b5210 r8 0x1265 WAKEUP_cpu+0x1215 r9 0 r10 0 r11 0xffffffff80849ac8 __pcpu+0x7c8 r12 0xffffff017d0b5210 r13 0x1265 WAKEUP_cpu+0x1215 r14 0xffffff0009014ba0 r15 0x2 rip 0xffffffff80284c71 _mtx_lock_sleep+0x71 rflags 0x10246 _mtx_lock_sleep+0x71: movl 0x290(%rcx),%ebx db> show irqs irq0: (no thread) irq1: atkbd0 (pid 11) irq3: uart1 (no thread) irq4: uart0 (no thread) irq5: (no thread) irq6: (no thread) irq7: (no thread) irq8: (no thread) irq9: acpi0 (pid 11) irq10: (no thread) irq11: (no thread) irq12: (no thread) irq13: (no thread) irq14: (no thread) irq15: (no thread) irq16: (no thread) irq17: (no thread) irq18: (no thread) irq19: (no thread) irq20: atapci0 (pid 11) {ENTROPY} irq21: (no thread) irq22: ehci1 (pid 11) irq23: ehci0 (pid 11) irq32: (no thread) irq33: (no thread) irq34: (no thread) irq35: (no thread) irq36: (no thread) irq37: (no thread) irq38: (no thread) irq39: (no thread) irq40: (no thread) irq41: (no thread) irq42: (no thread) irq43: (no thread) irq44: (no thread) irq45: (no thread) irq46: (no thread) irq47: (no thread) irq48: (no thread) irq49: (no thread) irq50: (no thread) irq51: (no thread) irq52: (no thread) irq53: (no thread) irq54: (no thread) irq55: (no thread) irq64: (no thread) irq65: (no thread) irq66: (no thread) irq67: (no thread) irq68: (no thread) irq69: (no thread) irq70: (no thread) irq71: (no thread) irq72: (no thread) irq73: (no thread) irq74: (no thread) irq75: (no thread) irq76: (no thread) irq77: (no thread) irq78: (no thread) irq79: (no thread) irq80: (no thread) irq81: (no thread) irq82: (no thread) irq83: (no thread) irq84: (no thread) irq85: (no thread) irq86: (no thread) irq87: (no thread) irq256: ix0:que 0 (pid 11) irq257: ix0:que 1 (pid 11) irq258: ix0:link (pid 11) irq259: ix1:que 0 (pid 11) irq260: ix1:que 1 (pid 11) irq261: ix1:link (pid 11) irq262: cmlpci0 (pid 11) irq263: cmlpci1 (pid 11) irq264: cmlpci2 (pid 11) irq265: cmlpci3 (pid 11) irq266: igb0:que 0 (pid 11) irq267: igb0:que 1 (pid 11) irq268: igb0:que 2 (pid 11) irq269: igb0:que 3 (pid 11) irq270: igb0:link (pid 11) irq271: igb1:que 0 (pid 11) irq272: igb1:que 1 (pid 11) irq273: igb1:que 2 (pid 11) irq274: igb1:que 3 (pid 11) irq275: igb1:link (pid 11) Terence Telkamp Storage Development Associate Engineer II Dell | Compellent From owner-freebsd-scsi@FreeBSD.ORG Wed Apr 18 03:34:20 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E234A106566B; Wed, 18 Apr 2012 03:34:20 +0000 (UTC) (envelope-from matt.thyer@gmail.com) Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by mx1.freebsd.org (Postfix) with ESMTP id E84AA8FC15; Wed, 18 Apr 2012 03:34:19 +0000 (UTC) Received: by wibhj6 with SMTP id hj6so159069wib.13 for ; Tue, 17 Apr 2012 20:34:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=31RcPziM1t0xt5LPzj+UTf7evwRTElp0/7c34MwbKQk=; b=OxAUQ4cCuFA0laOG6JuoOmq0B3ClOJyikqmLkNrOTvJS695GphzZJjhqogzGe5fEIZ q+g3PCo/oaqvYbPDlt4pb8Bwq27jP+sk9YB3tDWGNV1o3LzihxBwyMcHt4w5ZU17CmmU t6yBEmtIJavXELMMqoCMKxwaElNvbrF94PNvx5bSTSoeLsEzhlogeQJsc0HXtJmkrtxP uWv0DyCttHiaC+Xw7t5tJhA1nO6dRpUmrbFLul91dALos3deRR6I3ZgZVyIQAiMJ9W4U GrGemMD4OdPzISIfem8Gc/Qavv5+x9tSyld++oCeNWyf5qAsZJNm77x0CXF9GgfgA9Dw 8gag== MIME-Version: 1.0 Received: by 10.180.95.74 with SMTP id di10mr637779wib.1.1334720058832; Tue, 17 Apr 2012 20:34:18 -0700 (PDT) Received: by 10.216.190.219 with HTTP; Tue, 17 Apr 2012 20:34:18 -0700 (PDT) Received: by 10.216.190.219 with HTTP; Tue, 17 Apr 2012 20:34:18 -0700 (PDT) In-Reply-To: References: <20120120204459.GA51162@nargothrond.kdm.org> <72898EA27A61484885D72A06BD9CECE8@multiplay.co.uk> <20120120232841.GA71874@nargothrond.kdm.org> <20120326132558.GB76647@in-addr.com> <20120403134221.GA87802@in-addr.com> Date: Wed, 18 Apr 2012 13:04:18 +0930 Message-ID: From: Matt Thyer To: Gary Palmer Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Garrett Cooper , freebsd-scsi@freebsd.org, freebsd-current@freebsd.org, "Kenneth D. Merry" , Steven Hartland Subject: Re: LSI supported mps(4) driver available X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 03:34:21 -0000 On Apr 4, 2012 10:02 PM, "Matt Thyer" wrote: > > On 3 April 2012 23:12, Gary Palmer wrote: >> >> I think you should contact either SuperMicro or LSI and open a support >> case as it looks like there could be a problem with either the controller >> or the firmware when presented with mixed speed devices. Either way I think >> this needs to be escalated to the manufacturer. >> >> Regards, >> >> Gary > > > I'm now having no problems since moving the SATA 3 drive to the on board Intel controller. > I'll try to report this to Super Micro & LSI. I spoke too soon. The problem of the SATA 3 drive being FAULTED in the raidz2 pool has indeed been solved by moving that drive from the Super micro (SAS 6G) controller to the onboard Intel (SATA 2) controller. However, the 157k interrupts per second problem remained (its not apparent immediately after boot). However, even this problem has been resolved by upgrading from 8-STABLE to 9-STABLE (as I reported in the freebsd-stable list). So I'm happy now but still no closer to understanding the cause. I'm guessing that it was either USB related or something to do with the on CPU package Intel graphics of the Core i3 530 CPU. From owner-freebsd-scsi@FreeBSD.ORG Wed Apr 18 03:41:55 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4491D106566B; Wed, 18 Apr 2012 03:41:55 +0000 (UTC) (envelope-from jjh@deterlab.net) Received: from tardis.deterlab.net (tardis.deterlab.net [206.117.25.63]) by mx1.freebsd.org (Postfix) with ESMTP id 25B228FC08; Wed, 18 Apr 2012 03:41:55 +0000 (UTC) Received: from [192.168.1.128] (pod.isi.edu [128.9.168.186]) by tardis.deterlab.net (Postfix) with ESMTPSA id EFF933C0282; Tue, 17 Apr 2012 20:41:47 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: text/plain; charset=us-ascii From: John Hickey In-Reply-To: <54373403-939F-4FC5-9A2E-40B2304EB518@deterlab.net> Date: Tue, 17 Apr 2012 20:41:47 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <47976F0C-7786-4B2D-B898-6CE5A9A8EE96@deterlab.net> References: <20120410015210.GI9589@deterlab.net> <4F848B93.10402@brockmann-consult.de> <4F85180D.5060104@brockmann-consult.de> <20120411073532.GC13315@deterlab.net> <54373403-939F-4FC5-9A2E-40B2304EB518@deterlab.net> To: "Desai, Kashyap" X-Mailer: Apple Mail (2.1257) Cc: "freebsd-scsi@freebsd.org" , "Mankani, Krishnaraddi" , "Kenneth D. Merry" , "Reddy, Sreekanth" Subject: Re: Write Timeouts with MPS X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 03:41:55 -0000 I have updated all the drives with the firmware provided by Seagate. = Performance is up and I don't see any timeouts when doing a zpool scrub. = I'm going to give the system more of a workout, but so far I think the = drive firmware did the trick. =20 Seatools for windows is a pain. It will let you select a firmware file = anywhere on your system, but silently fail if you don't put the firmware = update in its program directory. It also seems to have a hard display = limit of ~13 drives. Has anyone had success with using camcontrol = fwdownload with Seagate .LOD firmware files? John On Apr 12, 2012, at 1:16 PM, John Hickey wrote: > I have a firmware update in hand for the drives. I am going to update = my drives and see if I can still reproduce this. >=20 > John >=20 > On Apr 12, 2012, at 5:26 AM, Desai, Kashyap wrote: >=20 >> We never see this issue on our test machines. >> Adding Sreekanth and he will plan to reproduce this issue locally to = have further analysis on issue. >>=20 >> Please help Sreekanth to reproduce it locally. >>=20 >>=20 >> ~ Kashyap >>=20 >>> -----Original Message----- >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- >>> scsi@freebsd.org] On Behalf Of John Hickey >>> Sent: Wednesday, April 11, 2012 1:06 PM >>> To: freebsd-scsi@freebsd.org >>> Subject: Re: Write Timeouts with MPS >>>=20 >>> I pretty much did this and filed a ticket with Seagate this = afternoon. >>> They told me the latest firmware is 0006 (I am at 0001) and wanted >>> the serial numbers of the other drives in the array (probably to >>> confirm firmware compatibility). I suspect I'll have the update in >>> hand tomorrow and see how that works. Running FreeBSD didn't seem = to >>> be an issue to them aside from concern about reading the serial = numbers >>> without seatools. Only issue with that was that I initially gave = them >>> the whole inquiry serial string, but only the first 8 (X) characters = of >>> inquiry are the serial number: >>>=20 >>> $ sudo camcontrol inquiry da3 >>> pass3: Fixed Direct Access SCSI-6 = device >>> pass3: Serial Number XXXXXXXX0000YYYYYYYY >>> pass3: 600.000MB/s transfers, Command Queueing Enabled >>>=20 >>> John >>>=20 >>> On Wed, Apr 11, 2012 at 07:35:09AM +0200, Peter Maloney wrote: >>>> Well, when I emailed some Seagate people, they just told me to use >>>> supported ones. So I suggest you email them about it, telling them = it >>> is >>>> on the compatibility list, and asking for an explanation and fix = (eg. >>>> firmware bug fix). You could also say it is fairly common on = seagate >>>> (and Samsung) disks, and very uncommon with other brands. >>>>=20 >>>> Peter >>>>=20 >>>> On 11.04.2012 00:26, John Hickey wrote: >>>>> I have 19 drives in my array, so changing them isn't that easy. = ;-) >>> They are Seagate Constellation ES 2TB SAS drives (SEAGATE = ST2000NM0001 >>> 0001) and according to LSI documents my whole setup should be = supported. >>> The drives at least aren't being marked as failed. I believe a = change >>> was made a while back to make FreeBSD less sensitive to these sorts = of >>> timeouts. I have had a panic or two on the system, but haven't = tracked >>> down the exact cause yet. >>>>>=20 >>>>> John >>>>>=20 >>>>> On Apr 10, 2012, at 12:35 PM, Peter Maloney wrote: >>>>>=20 >>>>>> I found this only happens with specific disks / disk firmware... >>> but >>>>>> nobody seems to listen to me about it. They all seem to blame the >>>>>> driver. (I blame both, but changing disks is a simple fix.) >>>>>>=20 >>>>>> And looking around, most reports are with various Seagates >>> (including >>>>>> one that can cause this type of error with smartctl -a with a SAS >>>>>> Seagate, but cannot reproduce with the binary LSI driver) or >>> Samsung >>>>>> Spinpoints. The only other disk I know of that does this is a >>> Crucial >>>>>> SSD with old firmware. One guy said he can do a camcontrol rescan >>> to get >>>>>> it back; I tried that and get either panics, hangs, or nothing. >>>>>>=20 >>>>>> What HBA are you using? With my LSI 9211-8i HBAs, the new 3TB >>> Seagate >>>>>> greens don't seem to have this problem. I have no idea if = different >>>>>> disks behave differently with different controllers. I asked >>> Seagate >>>>>> about it and they reply with marketing nonsense about buying >>> enterprise >>>>>> disks instead, and say I should buy disks that are on the = specific >>>>>> compatibility list for the HBA. >>>>>>=20 >>>>>> I found that with the few disks that I have that fail randomly = (and >>>>>> others), I can reproduce the issue (not exact same symptoms = though) >>> by >>>>>> hot pulling the disk while writing something, putting it back, = wait >>> a >>>>>> few seconds (<10; less than enough for the SCSI controller to >>> rescan) >>>>>> pull and replace again. The old 2TB seagate greens fail this = test, >>> but >>>>>> the 3TB ones pass. All 2 and 3 TB Hitachis I tried pass this = test, >>> as >>>>>> well as 3TB WD greens. (all enterprise disks I tried pass this = test >>>>>> except the Toshiba 2TB ones I tried) >>>>>>=20 >>>>>> If I put a "failed" disk back in, it does not work. If I put it = in >>> a >>>>>> different slot, same. But if I put any other disk in, it works >>> fine. So >>>>>> it is the disk, but it is also FreeBSD not being able to >>> reset/rescan >>>>>> it. But it is simple enough to blame both, and since you can't = get >>> rid >>>>>> of the driver, get different disks (eg. swap them with some >>> different >>>>>> same sized ones in a different machine). >>>>>>=20 >>>>>> Here is my forum thread about it, including disk product ids for >>> ones I >>>>>> tested, and a huge list of things that don't fix it. >>>>>> http://forums.freebsd.org/showthread.php?t=3D28252 >>>>>>=20 >>>>>> Peter >>>>>>=20 >>>>>>=20 >>>>>> On 10.04.2012 03:52, John Hickey wrote: >>>>>>> I've seen people having this problem before, but I don't think >>> anyone >>>>>>> has figured it out. I am running: >>>>>>>=20 >>>>>>> FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr 7 >>> 18:05:57 PDT 2012 root@zfs:/usr/obj/usr/src/sys/GENERIC amd64 >>>>>>>=20 >>>>>>> I have the latest LSI IT firmware 13 loaded: >>>>>>>=20 >>>>>>> mps1: port 0xc000-0xc0ff mem 0xfe93c000- >>> 0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5 >>>>>>> mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd >>>>>>> mps1: IOCCapabilities: >>> = 1285c>> c> >>>>>>>=20 >>>>>>> All disks are on a SuperMicro SAS II backplane: >>>>>>>=20 >>>>>>> root@zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist >>>>>>> at scbus0 target 0 lun 0 >>> (da0,pass0) >>>>>>> at scbus0 target 1 lun 0 >>> (da1,pass1) >>>>>>> at scbus1 target 8 lun 0 >>> (da2,pass2) >>>>>>> .... x16 more of the same >>>>>>> at scbus1 target 46 lun 0 >>> (da20,pass20) >>>>>>> at scbus1 target 47 lun 0 >>> (ses0,pass21) >>>>>>>=20 >>>>>>> Essentially when putting the ZFS filesystem under load, I am >>> getting >>>>>>> these sorts of errors: >>>>>>>=20 >>>>>>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0 >>> length 131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 = length >>> 131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 = length >>> 131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0 >>> length 131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0 >>> length 131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 = length >>> 131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length >>> 131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 = length >>> 131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 = length >>> 131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 = length >>> 131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 = length >>> 131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 = length >>> 131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 = length >>> 131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 = length >>> 131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 = length >>> 131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 = length >>> 131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 = length >>> 131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length >>> 131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0 >>> length 131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0 >>> length 131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 = length >>> 131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 = length >>> 131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> _______________________________________________ >>>>>>> freebsd-scsi@freebsd.org mailing list >>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>>>>> To unsubscribe, send any mail to "freebsd-scsi- >>> unsubscribe@freebsd.org" >>>>>> _______________________________________________ >>>>>> freebsd-scsi@freebsd.org mailing list >>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>>>> To unsubscribe, send any mail to "freebsd-scsi- >>> unsubscribe@freebsd.org" >>>>>>=20 >>>>> _______________________________________________ >>>>> freebsd-scsi@freebsd.org mailing list >>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>>> To unsubscribe, send any mail to "freebsd-scsi- >>> unsubscribe@freebsd.org" >>>>=20 >>>> _______________________________________________ >>>> freebsd-scsi@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>> To unsubscribe, send any mail to "freebsd-scsi- >>> unsubscribe@freebsd.org" >>>>=20 >>> _______________________________________________ >>> freebsd-scsi@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>> To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" >>=20 >=20 > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" >=20 From owner-freebsd-scsi@FreeBSD.ORG Fri Apr 20 01:40:10 2012 Return-Path: Delivered-To: freebsd-scsi@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 77671106564A for ; Fri, 20 Apr 2012 01:40:10 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 4932F8FC0A for ; Fri, 20 Apr 2012 01:40:10 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q3K1eALZ003920 for ; Fri, 20 Apr 2012 01:40:10 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q3K1eAWH003919; Fri, 20 Apr 2012 01:40:10 GMT (envelope-from gnats) Date: Fri, 20 Apr 2012 01:40:10 GMT Message-Id: <201204200140.q3K1eAWH003919@freefall.freebsd.org> To: freebsd-scsi@FreeBSD.org From: Jason Wolfe Cc: Subject: Re: kern/154432: [xpt] run_interrupt_driven_hooks: still waiting after 60-300 seconds for xpt_config X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Jason Wolfe List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 01:40:10 -0000 The following reply was made to PR kern/154432; it has been noted by GNATS. From: Jason Wolfe To: bug-followup@FreeBSD.org, robert@bsd.hu Cc: Subject: Re: kern/154432: [xpt] run_interrupt_driven_hooks: still waiting after 60-300 seconds for xpt_config Date: Thu, 19 Apr 2012 18:33:18 -0700 Just wanted to make sure this one didn't slip through the cracks, as it's still happening to quite a few boxes on my end. If it would help I'm happy to supply to access to a machine that exhibits the issue. Jason From owner-freebsd-scsi@FreeBSD.ORG Fri Apr 20 09:11:37 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 03F0D106566C for ; Fri, 20 Apr 2012 09:11:37 +0000 (UTC) (envelope-from gallasch@free.de) Received: from smtp.free.de (smtp.free.de [91.204.6.103]) by mx1.freebsd.org (Postfix) with ESMTP id 6B3218FC0C for ; Fri, 20 Apr 2012 09:11:36 +0000 (UTC) Received: (qmail 11771 invoked from network); 20 Apr 2012 11:11:29 +0200 Received: from smtp.free.de (HELO orwell.free.de) (gallasch@free.de@[91.204.4.103]) (envelope-sender ) by smtp.free.de (qmail-ldap-1.03) with AES128-SHA encrypted SMTP for ; 20 Apr 2012 11:11:29 +0200 From: Kai Gallasch Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Fri, 20 Apr 2012 11:11:28 +0200 Message-Id: <3F81029C-D912-432F-B90A-67DB2A561F81@free.de> To: freebsd-scsi Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Subject: 9-RELENG 20120418 compaq ciss - illegal request X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 09:11:37 -0000 Hi. I found this in the dmesg of a 9-STABLE (2012-04-18) server.. (probe0:ciss0:0:0:0): REPORT LUNS. CDB: a0 0 0 0 0 0 0 0 0 10 0 0=20 (probe0:ciss0:0:0:0): CAM status: SCSI Status Error (probe0:ciss0:0:0:0): SCSI status: Check Condition (probe0:ciss0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid = command operation code) da0 at ciss0 bus 0 scbus2 target 0 lun 0 da0: Fixed Direct Access SCSI-5 device=20 da0: 135.168MB/s transfers da0: Command Queueing enabled da0: 572140MB (1171743324 512 byte sectors: 255H 32S/T 65535C) If I do a camcontrol rescan the warning reappears in dmesg. Is this harmless noise and can be ignored or a bug? Regards, Kai. PS: please cc me on reply. I am not subscribed to freebsd-scsi list.=20=