From owner-freebsd-scsi@FreeBSD.ORG Sun Nov 9 17:20:09 2008 Return-Path: Delivered-To: freebsd-scsi@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1162B1065693 for ; Sun, 9 Nov 2008 17:20:09 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 010818FC1F for ; Sun, 9 Nov 2008 17:20:09 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id mA9HK8gM008366 for ; Sun, 9 Nov 2008 17:20:08 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id mA9HK8LH008361; Sun, 9 Nov 2008 17:20:08 GMT (envelope-from gnats) Date: Sun, 9 Nov 2008 17:20:08 GMT Message-Id: <200811091720.mA9HK8LH008361@freefall.freebsd.org> To: freebsd-scsi@FreeBSD.org From: Kirk Strauser Cc: Subject: Re: kern/128452: [sa] [panic] Accessing SCSI tape drive randomly crashes my amd64 system X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Kirk Strauser List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Nov 2008 17:20:09 -0000 The following reply was made to PR kern/128452; it has been noted by GNATS. From: Kirk Strauser To: bug-followup@FreeBSD.org, kirk@strauser.com Cc: Subject: Re: kern/128452: [sa] [panic] Accessing SCSI tape drive randomly crashes my amd64 system Date: Sun, 9 Nov 2008 11:16:30 -0600 I got another panic this morning when starting an Amanda "flush" from disk to tape. I had recompiled the kernel with SCHED_4BSD instead of SCHED_ULE for testing. Also, I've run memtest on this system for 8+ hours straight with no RAM errors. # kgdb /boot/kernel/kernel /var/crash/vmcore.10 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x258 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff8047d41a stack pointer = 0x10:0xffffffffaef6cac0 frame pointer = 0x10:0xffffff000443aa50 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 50 (syncer) trap number = 12 panic: page fault cpuid = 0 Uptime: 2d16h27m41s Physical memory: 6130 MB Dumping 675 MB: 660 644 628 612 596 580 564 548 532 516 500 484 468 452 436 420 404 388 372 356 340 324 308 292 276 260 244 228 212 196 180 164 148 132 116 100 84 68 52 36 20 4 Reading symbols from /boot/kernel/if_re.ko...Reading symbols from / boot/kernel/if_re.ko.symbols...done. done. Loaded symbols for /boot/kernel/if_re.ko Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from / boot/kernel/coretemp.ko.symbols...done. done. Loaded symbols for /boot/kernel/coretemp.ko Reading symbols from /boot/kernel/cpufreq.ko...Reading symbols from / boot/kernel/cpufreq.ko.symbols...done. done. Loaded symbols for /boot/kernel/cpufreq.ko Reading symbols from /boot/kernel/pflog.ko...Reading symbols from / boot/kernel/pflog.ko.symbols...done. done. Loaded symbols for /boot/kernel/pflog.ko Reading symbols from /boot/kernel/pf.ko...Reading symbols from /boot/ kernel/pf.ko.symbols...done. done. Loaded symbols for /boot/kernel/pf.ko Reading symbols from /boot/kernel/linux.ko...Reading symbols from / boot/kernel/linux.ko.symbols...done. done. Loaded symbols for /boot/kernel/linux.ko Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from / boot/kernel/nullfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/nullfs.ko Reading symbols from /boot/kernel/fdescfs.ko...Reading symbols from / boot/kernel/fdescfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/fdescfs.ko Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from / boot/kernel/accf_http.ko.symbols...done. done. Loaded symbols for /boot/kernel/accf_http.ko Reading symbols from /boot/kernel/green_saver.ko...Reading symbols from /boot/kernel/green_saver.ko.symbols...done. done. Loaded symbols for /boot/kernel/green_saver.ko #0 doadump () at pcpu.h:195 195 pcpu.h: No such file or directory. in pcpu.h (kgdb) list *0xffffffff8047d41a 0xffffffff8047d41a is in _mtx_lock_sleep (/usr/src/sys/kern/ kern_mutex.c:341). 336 */ 337 v = m->mtx_lock; 338 if (v != MTX_UNOWNED) { 339 owner = (struct thread *)(v & ~MTX_FLAGMASK); 340 #ifdef ADAPTIVE_GIANT 341 if (TD_IS_RUNNING(owner)) { 342 #else 343 if (m != &Giant && TD_IS_RUNNING(owner)) { 344 #endif 345 if (LOCK_LOG_TEST(&m->lock_object, 0)) (kgdb) backtrace #0 doadump () at pcpu.h:195 #1 0x0000000000000004 in ?? () #2 0xffffffff80488821 in boot (howto=260) at /usr/src/sys/kern/ kern_shutdown.c:418 #3 0xffffffff80488c5c in panic (fmt=0x104
) at /usr/src/sys/kern/kern_shutdown.c:574 #4 0xffffffff8073f1aa in trap_fatal (frame=0xffffff000443aa50, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:764 #5 0xffffffff8073f551 in trap_pfault (frame=0xffffffffaef6ca10, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:680 #6 0xffffffff8073fe0f in trap (frame=0xffffffffaef6ca10) at /usr/src/ sys/amd64/amd64/trap.c:449 #7 0xffffffff8072685e in calltrap () at /usr/src/sys/amd64/amd64/ exception.S:209 #8 0xffffffff8047d41a in _mtx_lock_sleep (m=0xffffff003c1b74d8, tid=18446742974269467216, opts=Variable "opts" is not available. ) at /usr/src/sys/kern/kern_mutex.c:339 #9 0xffffffff804ff4e2 in vfs_msync (mp=0xffffff000445aa68, flags=2) at /usr/src/sys/kern/vfs_subr.c:2976 #10 0xffffffff804ff73b in sync_fsync (ap=Variable "ap" is not available. ) at /usr/src/sys/kern/vfs_subr.c:3225 #11 0xffffffff804ffebc in sched_sync () at vnode_if.h:538 #12 0xffffffff80468efd in fork_exit (callout=0xffffffff804ff8a7 , arg=0x0, frame=0xffffffffaef6cc80) at /usr/src/sys/kern/kern_fork.c:804 #13 0xffffffff80726c2e in fork_trampoline () at /usr/src/sys/amd64/ amd64/exception.S:455 #14 0x0000000000000000 in ?? () #15 0x0000000000000000 in ?? () #16 0x0000000000000001 in ?? () #17 0x0000000000000000 in ?? () #18 0x0000000000000000 in ?? () #19 0x0000000000000000 in ?? () #20 0x0000000000000000 in ?? () #21 0x0000000000000000 in ?? () #22 0x0000000000000000 in ?? () #23 0x0000000000000000 in ?? () #24 0x0000000000000000 in ?? () #25 0x0000000000000000 in ?? () #26 0x0000000000000000 in ?? () #27 0x0000000000000000 in ?? () #28 0x0000000000000000 in ?? () #29 0x0000000000000000 in ?? () #30 0x0000000000000000 in ?? () #31 0x0000000000000000 in ?? () #32 0x0000000000000000 in ?? () #33 0x0000000000000000 in ?? () #34 0x0000000000000000 in ?? () #35 0x0000000000000000 in ?? () #36 0x0000000000000000 in ?? () #37 0x0000000000000000 in ?? () #38 0x0000000000d04000 in ?? () #39 0x0000000000000002 in ?? () #40 0x0000000000000000 in ?? () #41 0xffffff00044428f0 in ?? () #42 0xffffff00044afa50 in ?? () #43 0xffffff000443aa50 in ?? () #44 0xffffffffaef6ca28 in ?? () #45 0xffffff000443aa50 in ?? () #46 0xffffffff804a7246 in sched_switch (td=0x0, newtd=0xffffffff804ff8a7, flags=1) at /usr/src/sys/kern/sched_4bsd.c:910 #47 0x0000000000000000 in ?? () #48 0x0000000000000000 in ?? () #49 0x0000000000000000 in ?? () #50 0x0000000000000000 in ?? () #51 0x0000000000000000 in ?? () #52 0x0000000000000000 in ?? () #53 0x0000000000000000 in ?? () #54 0x0000000000000000 in ?? () #55 0x0000000000000000 in ?? () #56 0x0000000000000000 in ?? () #57 0x0000000000000000 in ?? () #58 0x0000000000000000 in ?? () #59 0x0000000000000000 in ?? () #60 0x0000000000000000 in ?? () #61 0x0000000000000000 in ?? () #62 0x0000000000000000 in ?? () #63 0x0000000000000000 in ?? () #64 0x0000000000000000 in ?? () #65 0x0000000000000000 in ?? () #66 0x0000000000000000 in ?? () #67 0x0000000000000000 in ?? () #68 0x0000000000000000 in ?? () #69 0x0000000000000000 in ?? () #70 0x0000000000000000 in ?? () #71 0x0000000000000000 in ?? () #72 0x0000000000000000 in ?? () #73 0x0000000000000000 in ?? () #74 0x0000000000000000 in ?? () #75 0x0000000000000000 in ?? () #76 0x0000000000000000 in ?? () #77 0x0000000000000000 in ?? () #78 0x0000000000000000 in ?? () #79 0x0000000000000000 in ?? () #80 0x0000000000000000 in ?? () #81 0x0000000000000000 in ?? () #82 0x0000000000000000 in ?? () #83 0x0000000000000000 in ?? () #84 0x0000000000000000 in ?? () #85 0x0000000000000000 in ?? () #86 0x0000000000000000 in ?? () #87 0x0000000000000000 in ?? () #88 0x0000000000000000 in ?? () #89 0x0000000000000000 in ?? () #90 0x0000000000000000 in ?? () #91 0x0000000000000000 in ?? () #92 0x0000000000000000 in ?? () #93 0x0000000000000000 in ?? () #94 0x0000000000000000 in ?? () #95 0x0000000000000000 in ?? () #96 0x0000000000000000 in ?? () #97 0x0000000000000000 in ?? () #98 0x0000000000000000 in ?? () #99 0x0000000000000000 in ?? () #100 0x0000000000000000 in ?? () #101 0x0000000000000000 in ?? () #102 0x0000000000000000 in ?? () #103 0x0000000000000000 in ?? () #104 0x0000000000000000 in ?? () #105 0x0000000000000000 in ?? () #106 0x0000000000000000 in ?? () #107 0x0000000000000000 in ?? () #108 0x0000000000000000 in ?? () #109 0x0000000000000000 in ?? () #110 0x0000000000000000 in ?? () #111 0x0000000000000000 in ?? () #112 0x0000000000000000 in ?? () #113 0x0000000000000000 in ?? () #114 0x0000000000000000 in ?? () #115 0x0000000000000000 in ?? () #116 0x0000000000000000 in ?? () #117 0x0000000000000000 in ?? () #118 0x0000000000000000 in ?? () Cannot access memory at address 0xffffffffaef6d000 (kgdb) quit From owner-freebsd-scsi@FreeBSD.ORG Mon Nov 10 11:06:57 2008 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A945F1065673 for ; Mon, 10 Nov 2008 11:06:57 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 902048FC20 for ; Mon, 10 Nov 2008 11:06:57 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id mAAB6vnF049850 for ; Mon, 10 Nov 2008 11:06:57 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id mAAB6v4e049846 for freebsd-scsi@FreeBSD.org; Mon, 10 Nov 2008 11:06:57 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 10 Nov 2008 11:06:57 GMT Message-Id: <200811101106.mAAB6v4e049846@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-scsi@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Nov 2008 11:06:57 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/128452 scsi [sa] [panic] Accessing SCSI tape drive randomly crashe o kern/128245 scsi [scsi] "inquiry data fails comparison at DV1 step" [re o kern/127927 scsi [isp] isp(4) target driver crashes kernel when set up o kern/127901 scsi [scsi] "inquiry data fails comparison at DV1 step" [re o kern/126866 scsi [isp] [panic] kernel panic on card initialization o kern/124667 scsi [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi o kern/123674 scsi [ahc] ahc driver dumping o kern/123666 scsi [aac] attach fails with Adaptec SAS RAID 3805 controll o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/120487 scsi [sg] scsi_sg incompatible with scanners o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/119668 scsi [cam] [patch] certain errors are too verbose comparing o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/94838 scsi Kernel panic while mounting SD card with lock switch o o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load o kern/60598 scsi wire down of scsi devices conflicts with config s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/40895 scsi wierd kernel / device driver bug o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/38828 scsi [dpt] [request] DPT PM2012B/90 doesn't work o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce 30 problems total. From owner-freebsd-scsi@FreeBSD.ORG Wed Nov 12 22:04:17 2008 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D81C71065676 for ; Wed, 12 Nov 2008 22:04:17 +0000 (UTC) (envelope-from Carole.Macheret@ch.meggitt.com) Received: from gw.vibro-meter.com (gw.vibro-meter.com [62.2.232.101]) by mx1.freebsd.org (Postfix) with ESMTP id 40BC38FC1A for ; Wed, 12 Nov 2008 22:04:17 +0000 (UTC) (envelope-from Carole.Macheret@ch.meggitt.com) Received: from Vm-Fribourg-MTA by gw.vibro-meter.com with Novell_GroupWise; Wed, 12 Nov 2008 22:44:02 +0100 Message-Id: <491B5C2A.1F16.0013.0@ch.meggitt.com> X-Mailer: Novell GroupWise Internet Agent 7.0.3 Date: Wed, 12 Nov 2008 22:43:54 +0100 From: "Carole Macheret" To: "Scott Long" References: <4874F53A0200001300130DE3@gw.vibro-meter.com> <48A465B10200001300132295@gw.vibro-meter.com> <48A46586.1F16.0013.0@ch.meggitt.com><48A46586.1F16.0013.0@ch.meggitt.com> <48A4666C.6080008@samsco.org> In-Reply-To: <48A4666C.6080008@samsco.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Cc: freebsd-scsi@freebsd.org, Roland Rothen Subject: Re: g_vfs_done X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Nov 2008 22:04:17 -0000 Hi Scott, Thanks a lot for your advice, we have finally run some tests with the = following setting changed: kern.cam.da.retry_count=3D100 (in /etc/sysctl.co= nf) Now the FreeBSD virtual machines doesn't freeze anymore after loosing the = disks during the IPstor failover. Best regards Carole Macheret >>> Scott Long 14.08.2008 19:07 >>> Carole Macheret wrote: > Hello, >=20 > We are using FreeBSD 7.0-RELEASE #1 running Squid and Zabbix on vmware = ESX 3.0.2 and our vmware ESX servers access our SAN through IpStor cluster = (Storage virtualization and mirroring).=20 >=20 > We have 2 storages (EVA 6100) and the IpStor solution allows us to = mirror disks on both EVAs. >=20 > We have a problem with both the Zabbix and Squid FreeBSD virtual = machines, when the virtual machine is loosing its disks (EVA controller = reboot or ipstor cluster failover), we have several "g_vfs_done() : = da1s1d[WRITE(offset=3D2312431234, length=3D12453)] error=3D 5" errors then = the host is definitively frozen. The disk loss lasts 1-5 seconds. Windows = virtual machines do freeze during the loss then continue working. On = Windows we had to specify a longer timeout for local disk in registry. >=20 > Does anybody has an idea what could be tuned to avoid this problem ? >=20 > Attached you can find the dmesg and a screenshot of the g_vfs_done = error... >=20 > Thanks in advance for your help >=20 So the virtual disks that the FreeBSD images are using in VMWare are on an IpStor, and those periodically go away, yes? What's probably happening is that the VMWare host is triggering an event in the FreeBSD client VM that essentially is making the virtual disks go away. Inside the FreeBSD VM, the SCSI layer tries to talk to the disk and gets a selection timeout since the disk is no longer there. It doesn't know that this is a temporary state, and it declares the I/O as failed. At that point, the BSD VM gets upset and everything gets bad. There is a property called kern.cam.da.default_timeout. It's set to 60 seconds, but I don't think that it will help you in this case, since it's likely that the i/o is failing because of a selection timeout, not because the virtual disk is slow in completing the i/o. The kern.cam.da.retry_count property is set to 5, and changing it might help since it might be able to force enough retries to give time for the virtual disk to come back. Try the following command on a running system: sysctl kern.cam.da.retry_count=3D100 This will allow for about 25 seconds worth of retries (a selection attempt takes 250ms, so you'll get about 4 retries per second). If this doesn't work, try configuring VMWare to give you a serial console that you can capture on the host, then set bootverbose during boot and send me the log once the problem happens. Scott