From owner-freebsd-scsi@FreeBSD.ORG  Mon Oct  1 11:07:27 2012
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 61D5410656B4
	for <freebsd-scsi@FreeBSD.org>; Mon,  1 Oct 2012 11:07:27 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 4AF858FC15
	for <freebsd-scsi@FreeBSD.org>; Mon,  1 Oct 2012 11:07:27 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q91B7RYO025091
	for <freebsd-scsi@FreeBSD.org>; Mon, 1 Oct 2012 11:07:27 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q91B7QMb025089
	for freebsd-scsi@FreeBSD.org; Mon, 1 Oct 2012 11:07:26 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 1 Oct 2012 11:07:26 GMT
Message-Id: <201210011107.q91B7QMb025089@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-scsi@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Oct 2012 11:07:27 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/169976  scsi       [cam] [patch] make scsi_da use sysctl values where app
o kern/169974  scsi       [cam] [patch] add Quirks for SSD that are 4k optimised
o kern/169835  scsi       [patch] remove some unused variables from scsi_da prob
o kern/169801  scsi       [cam] [patc] make changes to delete_method in scsi_da 
o kern/169403  scsi       [cam] [patch] CAM layer, I/O starvation, no fairness
o kern/165982  scsi       [mpt] mpt instability, drive resets, and losses on Fre
o kern/165740  scsi       [cam] SCSI code must drain callbacks before free
o kern/163713  scsi       [aic7xxx] [patch] Add Adaptec29329LPE to aic79xx_pci.c
o kern/162256  scsi       [mpt] QUEUE FULL EVENT and 'mpt_cam_event: 0x0'
o kern/161809  scsi       [cam] [patch] set kern.cam.boot_delay via build option
o kern/159412  scsi       [ciss] 7.3 RELEASE: ciss0 ADAPTER HEARTBEAT FAILED err
o kern/157770  scsi       [iscsi] [panic] iscsi_initiator panic
o kern/154432  scsi       [xpt] run_interrupt_driven_hooks: still waiting after 
o kern/153514  scsi       [cam] [panic] CAM related panic
o kern/153361  scsi       [ciss] Smart Array 5300 boot/detect drive problem
o kern/152250  scsi       [ciss] [patch] Kernel panic when hw.ciss.expose_hidden
o kern/151564  scsi       [ciss] ciss(4) should increase  CISS_MAX_LOGICAL to 10
o docs/151336  scsi       Missing documentation of scsi_ and ata_ functions in c
s kern/149927  scsi       [cam] hard drive not stopped before removing power dur
o kern/148083  scsi       [aac] Strange device reporting
o kern/147704  scsi       [mpt] sys/dev/mpt: new chip revision, partially unsupp
o kern/146287  scsi       [ciss] ciss(4) cannot see more than one SmartArray con
o kern/145768  scsi       [mpt] can't perform I/O on SAS based SAN disk in freeb
o kern/144648  scsi       [aac] Strange values of speed and bus width in dmesg
o kern/144301  scsi       [ciss] [hang] HP proliant server locks when using ciss
o kern/142351  scsi       [mpt] LSILogic driver performance problems
o kern/134488  scsi       [mpt] MPT SCSI driver probes max. 8 LUNs per device
o kern/132250  scsi       [ciss] ciss driver does not support more then 15 drive
o kern/132206  scsi       [mpt] system panics on boot when mirroring and 2nd dri
o kern/130621  scsi       [mpt] tranfer rate is inscrutable slow when use lsi213
o kern/129602  scsi       [ahd] ahd(4) gets confused and wedges SCSI bus
o kern/128452  scsi       [sa] [panic] Accessing SCSI tape drive randomly crashe
o kern/128245  scsi       [scsi] "inquiry data fails comparison at DV1 step" [re
o kern/127927  scsi       [isp] isp(4) target driver crashes kernel when set up 
o kern/127717  scsi       [ata] [patch] [request] - support write cache toggling
o kern/123674  scsi       [ahc] ahc driver dumping
o kern/123520  scsi       [ahd] unable to boot from net while using ahd
o sparc/121676 scsi       [iscsi] iscontrol do not connect iscsi-target on sparc
o kern/120487  scsi       [sg] scsi_sg incompatible with scanners
o kern/120247  scsi       [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s 
o kern/114597  scsi       [sym] System hangs at SCSI bus reset with dual HBAs
o kern/110847  scsi       [ahd] Tyan U320 onboard problem with more than 3 disks
o kern/99954   scsi       [ahc] reading from DVD failes on 6.x [regression]
o kern/92798   scsi       [ahc] SCSI problem with timeouts
o kern/90282   scsi       [sym] SCSI bus resets cause loss of ch device
o kern/76178   scsi       [ahd] Problem with ahd and large SCSI Raid system
o kern/74627   scsi       [ahc] [hang] Adaptec 2940U2W Can't boot 5.3
s kern/61165   scsi       [panic] kernel page fault after calling cam_send_ccb
o kern/60641   scsi       [sym] Sporadic SCSI bus resets with 53C810 under load
o kern/60598   scsi       wire down of scsi devices conflicts with config
s kern/57398   scsi       [mly] Current fails to install on mly(4) based RAID di
o kern/52638   scsi       [panic] SCSI U320 on SMP server won't run faster than 
o kern/44587   scsi       dev/dpt/dpt.h is missing defines required for DPT_HAND
o kern/39388   scsi       ncr/sym drivers fail with 53c810 and more than 256MB m
o kern/35234   scsi       World access to /dev/pass? (for scanner) requires acce

55 problems total.


From owner-freebsd-scsi@FreeBSD.ORG  Fri Oct  5 03:53:01 2012
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id BAF7E106566B;
	Fri,  5 Oct 2012 03:53:01 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 8DAF18FC0C;
	Fri,  5 Oct 2012 03:53:01 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q953r1JA073175;
	Fri, 5 Oct 2012 03:53:01 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q953r0rA073170;
	Fri, 5 Oct 2012 03:53:00 GMT (envelope-from linimon)
Date: Fri, 5 Oct 2012 03:53:00 GMT
Message-Id: <201210050353.q953r0rA073170@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-scsi@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/171650: [da] da(4) driver does not recognize end of cciss
	(SmartArray) >volume reconstruction
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Oct 2012 03:53:01 -0000

Old Synopsis: ``da'' driver does not recognize end of cciss (SmartArray) >volume reconstruction
New Synopsis: [da] da(4) driver does not recognize end of cciss (SmartArray) >volume reconstruction

Responsible-Changed-From-To: freebsd-bugs->freebsd-scsi
Responsible-Changed-By: linimon
Responsible-Changed-When: Fri Oct 5 03:52:35 UTC 2012
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=171650

From owner-freebsd-scsi@FreeBSD.ORG  Fri Oct  5 13:13:56 2012
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: by hub.freebsd.org (Postfix, from userid 821)
	id C387E106564A; Fri,  5 Oct 2012 13:13:56 +0000 (UTC)
Date: Fri, 5 Oct 2012 13:13:56 +0000
From: John <jwd@FreeBSD.org>
To: FreeBSD-FS <freebsd-fs@freebsd.org>
Message-ID: <20121005131356.GA13888@FreeBSD.org>
References: <20121003032738.GA42140@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20121003032738.GA42140@FreeBSD.org>
User-Agent: Mutt/1.4.2.1i
Cc: FreeBSD-SCSI <freebsd-scsi@freebsd.org>
Subject: Re: ZFS/istgt lockup
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Oct 2012 13:13:56 -0000

Copying this reply to -scsi. Not sure if it's more of a zfs issue
or istgt... more below...

----- John's Original Message -----
> Hi Folks,
> 
>    I've been chasing a problem that I'm not quite sure originates
> on the BSD side, but the system shouldn't lock up and require a power
> cycle to reboot.
> 
>    The config:  I have a bsd system running 9.1RC handing out a
> 36TB volume to a Linux RHEL 6.1 system. The RHEL 6.1 systems is
> doing heavy I/O & number crunching. Many hours into the job stream
> the kernel becomes quite unhappy:
> 
> kernel: __ratelimit: 27665 callbacks suppressed
> kernel: swapper: page allocation failure. order:1, mode:0x4020
> kernel: Pid: 0, comm: swapper Tainted: G           ---------------- T 2.6.32-131.0.15.el6.x86_64 #1
> kernel: Call Trace:
> kernel: <IRQ>  [<ffffffff811201e6>] ? __alloc_pages_nodemask+0x716/0x8b0
> kernel: [<ffffffff8115464a>] ? alloc_pages_current+0xaa/0x110
> kernel: [<ffffffffa0115cd5>] ? refill_fl+0x3d5/0x4a0 [cxgb3]
> kernel: [<ffffffff814200bd>] ? napi_frags_finish+0x6d/0xb0
> kernel: [<ffffffffa0116d33>] ? process_responses+0x653/0x1450 [cxgb3]
> kernel: [<ffffffff810e7f62>] ? ring_buffer_lock_reserve+0xa2/0x160
> kernel: [<ffffffffa0117b6c>] ? napi_rx_handler+0x3c/0x90 [cxgb3]
> kernel: [<ffffffff814225a3>] ? net_rx_action+0x103/0x2f0
> kernel: [<ffffffff8106f717>] ? __do_softirq+0xb7/0x1e0
> kernel: [<ffffffff810d69d6>] ? handle_IRQ_event+0xf6/0x170
> kernel: [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30
> kernel: [<ffffffff8100df05>] ? do_softirq+0x65/0xa0
> kernel: [<ffffffff8106f505>] ? irq_exit+0x85/0x90
> kernel: [<ffffffff814e3505>] ? do_IRQ+0x75/0xf0
> kernel: [<ffffffff8100bad3>] ? ret_from_intr+0x0/0x11
> kernel: <EOI>  [<ffffffff810362ab>] ? native_safe_halt+0xb/0x10
> kernel: [<ffffffff81100826>] ? ftrace_raw_event_power_start+0x16/0x20
> kernel: [<ffffffff810142fd>] ? default_idle+0x4d/0xb0
> kernel: [<ffffffff81009e96>] ? cpu_idle+0xb6/0x110
> kernel: [<ffffffff814d493c>] ? start_secondary+0x202/0x245
> 
>    On the bsd side, the istgt daemon appears to see that one of the
> connection threads is down and attempts to restart it. At this point,
> the istgt process size starts to grow.
> 
> USER   PID  %CPU %MEM     VSZ    RSS TT  STAT STARTED       TIME COMMAND
> root  1224   0.0  0.4 8041092 405472 v0- DL    4:59PM   15:28.72 /usr/local/bin/istgt
> root  1224   0.0  0.4 8041092 405472 v0- IL    4:59PM   63:18.34 /usr/local/bin/istgt
> root  1224   0.0  0.4 8041092 405472 v0- IL    4:59PM   61:13.80 /usr/local/bin/istgt
> root  1224   0.0  0.4 8041092 405472 v0- IL    4:59PM    0:00.00 /usr/local/bin/istgt
> 
>    There are more than 1400 threads reported.
> 
>    Also of interest, netstat shows:
> 
> tcp4       0      0 10.59.6.12.5010        10.59.25.113.54076     CLOSE_WAIT
> tcp4       0      0 10.60.6.12.5010        10.60.25.113.33345     CLOSED
> tcp4       0      0 10.59.6.12.5010        10.59.25.113.54074     CLOSE_WAIT
> tcp4       0      0 10.60.6.12.5010        10.60.25.113.33343     CLOSED
> tcp4       0      0 10.59.6.12.5010        10.59.25.113.54072     CLOSE_WAIT
> tcp4       0      0 10.60.6.12.5010        10.60.25.113.33341     CLOSED
> tcp4       0      0 10.60.6.12.5010        10.60.25.113.33339     CLOSED
> tcp4       0      0 10.59.6.12.5010        10.59.25.113.54070     CLOSE_WAIT
> tcp4       0      0 10.60.6.12.5010        10.60.25.113.53806     CLOSE_WAIT
> 
>    There are more than 1400 sockets in the CLOSE* state. What would
> prevent these sockets from cleaning up in a reasonable timeframe?
> Both sides of the mpio connection appear to be attempting reconnects.
> 
>    An attempt to gracefully kill istgt fails. A kill -9 does not clean
> things up either.
> 
>    A procstat -kk 1224 after the kill -9 shows:
> 
>   PID    TID COMM             TDNAME           KSTACK
>  1224 100959 istgt            sigthread        mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 zio_wait+0x61 dbuf_read+0x5e5 dmu_buf_hold+0xe0 zap_lockdir+0x58 zap_
> lookup_norm+0x45 zap_lookup+0x2e zfs_dirent_lock+0x4ff zfs_dirlook+0x69 zfs_lookup+0x26b zfs_freebsd_lookup+0x81 vfs_cache_lookup+0xf8 VOP_LOOKUP_APV+0x40 lookup+0x
> 464 namei+0x4e9 vn_open_cred+0x3cb
>  1224 100960 istgt            luthread #1      mi_switch+0x186 sleepq_wait+0x42 _sleep+0x376 bwait+0x64 physio+0x246 devfs_write_f+0x8d dofilewrite+0x8b kern_writev
> +0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7
>  1224 103533 istgt            sendthread #1493 mi_switch+0x186 thread_suspend_switch+0xc9 thread_single+0x1b2 exit1+0x72 sigexit+0x7c postsig+0x3a4 ast+0x26c doreti
> _ast+0x1f
> 
> 
>    An attempt to forcefully export the pool hangs also. A procstat
> shows:
> 
>   PID    TID COMM             TDNAME           KSTACK                       
>  4427 100991 zpool            -                mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 dbuf_read+0x30b dmu_buf_hold+0xe0 zap_lockdir+0x58 zap_lookup_norm+0x45 zap_lookup+0x2e dsl_dir_open_spa+0x121 dsl_dataset_hold+0x3b dmu_objset_hold+0x23 zfs_ioc_objset_stats+0x2b zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioctl+0x115 sys_ioctl+0xfd amd64_syscall+0x546 Xfast_syscall+0xf7 
> 
> 
> 
>    If anyone has any ideas, please let me know. I know I've left a lot
> of config information out in an attempt to keep the email shorter.
> 
>    Random comments:
> 
>    This happens with or without multipathd enabled on the linux client.
> 
>    If I catch the istgt daemon while it's creating threads and kill it
> the system will not lock up.
> 
>    I see no errors in the istgt log file. One of my next things to try
> is to enable all debugging... The amount of debugging data captured
> is quite large :-(
> 
>    I am using chelsio 10G cards on both client/server which have been
> rock solid in all other cases.
> 
>    Thoughts welcome!
> 
> Thanks,
> John

Hi Folks,

   I've managed to replicate this problem once. Basically, it appears
the linux client sends an abort which is processed here:

istgt_iscsi_op_task:

        switch (function) {
        case ISCSI_TASK_FUNC_ABORT_TASK:
                ISTGT_LOG("ABORT_TASK\n");
                SESS_MTX_LOCK(conn);
                rc = istgt_lu_clear_task_ITLQ(conn, conn->sess->lu, lun,
                    ref_CmdSN);
                SESS_MTX_UNLOCK(conn);
                if (rc < 0) {
                        ISTGT_ERRLOG("LU reset failed\n");
                }
                istgt_clear_transfer_task(conn, ref_CmdSN);
                break;

   At this point, the queue depth is 62. There appears to be one thread
in the zfs code performing a read.

   No other processing occurs after this point. A zfs list hangs. The
pool cannot be exported. The istgt daemon cannot be fully killed. A
reboot requires a power reset (ie: reboot hangs after flushing buffers).

   The only thing that does appear to be happening is a growing list
of connections:

tcp4       0      0 10.60.6.12.5010        10.60.25.113.56577     CLOSE_WAIT
tcp4       0      0 10.60.6.12.5010        10.60.25.113.56576     CLOSE_WAIT
tcp4       0      0 10.60.6.12.5010        10.60.25.113.56575     CLOSE_WAIT
tcp4       0      0 10.60.6.12.5010        10.60.25.113.56574     CLOSE_WAIT
tcp4       0      0 10.60.6.12.5010        10.60.25.113.56573     CLOSE_WAIT
tcp4       0      0 10.60.6.12.5010        10.60.25.113.56572     CLOSE_WAIT
tcp4       0      0 10.60.6.12.5010        10.60.25.113.56571     CLOSE_WAIT
tcp4       0      0 10.60.6.12.5010        10.60.25.113.56570     CLOSE_WAIT
tcp4       0      0 10.60.6.12.5010        10.60.25.113.56569     CLOSE_WAIT

   Currently, about 390 and slowly going up. This implies to me that there
is some sort of reconnect ocurring that is failing.

   On the client side, I think the problem is related to a Chelsio N320
10G nic which is showing RX overflows. After showing about 40000 overflows
the ABORT was received on the server side. I've never seen a chelsio card
have overflow problems. The server is using the same model chelsio card
with no issues.

   Again, any thoughts/comments are welcome!

Thanks,
John