From owner-freebsd-scsi@FreeBSD.ORG  Mon Feb  3 10:49:04 2014
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 02F837E6;
 Mon,  3 Feb 2014 10:49:04 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id C9035181C;
 Mon,  3 Feb 2014 10:49:03 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s13An3HM016936;
 Mon, 3 Feb 2014 10:49:03 GMT
 (envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
 by freefall.freebsd.org (8.14.8/8.14.8/Submit) id s13An3uV016935;
 Mon, 3 Feb 2014 10:49:03 GMT (envelope-from linimon)
Date: Mon, 3 Feb 2014 10:49:03 GMT
Message-Id: <201402031049.s13An3uV016935@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-scsi@FreeBSD.org
From: linimon@FreeBSD.org
Subject: Re: kern/186258: [mps] Heap overrun in mps(4)
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Feb 2014 10:49:04 -0000

Synopsis: [mps] Heap overrun in mps(4)

Responsible-Changed-From-To: freebsd-bugs->freebsd-scsi
Responsible-Changed-By: linimon
Responsible-Changed-When: Mon Feb 3 10:48:43 UTC 2014
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=186258

From owner-freebsd-scsi@FreeBSD.ORG  Mon Feb  3 11:06:53 2014
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C82B219D
 for <freebsd-scsi@FreeBSD.org>; Mon,  3 Feb 2014 11:06:53 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id B3E821A5A
 for <freebsd-scsi@FreeBSD.org>; Mon,  3 Feb 2014 11:06:53 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s13B6rt0022765
 for <freebsd-scsi@FreeBSD.org>; Mon, 3 Feb 2014 11:06:53 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.8/8.14.8/Submit) id s13B6rZ0022763
 for freebsd-scsi@FreeBSD.org; Mon, 3 Feb 2014 11:06:53 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 3 Feb 2014 11:06:53 GMT
Message-Id: <201402031106.s13B6rZ0022763@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
 owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@freebsd.org>
To: freebsd-scsi@FreeBSD.org
Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Feb 2014 11:06:53 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/186258  scsi       [mps] Heap overrun in mps(4)
o kern/184059  scsi       [mps] mps SCSI driver causes FreeBSD to hang during bo
o kern/179932  scsi       [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP 
o kern/178795  scsi       [mps] MSI for mps driver doesn't work under vmware
o kern/165982  scsi       [mpt] mpt instability, drive resets, and losses on Fre
o kern/165740  scsi       [cam] SCSI code must drain callbacks before free
f kern/162256  scsi       [mpt] QUEUE FULL EVENT and 'mpt_cam_event: 0x0'
o docs/151336  scsi       Missing documentation of scsi_ and ata_ functions in c
o kern/148083  scsi       [aac] Strange device reporting
o kern/144648  scsi       [aac] Strange values of speed and bus width in dmesg
o kern/142351  scsi       [mpt] LSILogic driver performance problems
o kern/134488  scsi       [mpt] MPT SCSI driver probes max. 8 LUNs per device
o kern/130621  scsi       [mpt] tranfer rate is inscrutable slow when use lsi213
f kern/129602  scsi       [ahd] ahd(4) gets confused and wedges SCSI bus
f kern/123674  scsi       [ahc] ahc driver dumping
o sparc/121676 scsi       [iscsi] iscontrol do not connect iscsi-target on sparc

16 problems total.


From owner-freebsd-scsi@FreeBSD.ORG  Mon Feb  3 11:20:13 2014
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9D93075;
 Mon,  3 Feb 2014 11:20:13 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 712C41CC9;
 Mon,  3 Feb 2014 11:20:13 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s13BKDJ8029039;
 Mon, 3 Feb 2014 11:20:13 GMT
 (envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
 by freefall.freebsd.org (8.14.8/8.14.8/Submit) id s13BKDWh029038;
 Mon, 3 Feb 2014 11:20:13 GMT (envelope-from linimon)
Date: Mon, 3 Feb 2014 11:20:13 GMT
Message-Id: <201402031120.s13BKDWh029038@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-scsi@FreeBSD.org
From: linimon@FreeBSD.org
Subject: Re: kern/184975: [ses] SCSI Environmental Services (ses) driver
 report wrong information
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Feb 2014 11:20:13 -0000

Old Synopsis: SCSI Environmental Services (ses) driver report wrong information
New Synopsis: [ses] SCSI Environmental Services (ses) driver report wrong information

Responsible-Changed-From-To: freebsd-bugs->freebsd-scsi
Responsible-Changed-By: linimon
Responsible-Changed-When: Mon Feb 3 11:19:56 UTC 2014
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=184975

From owner-freebsd-scsi@FreeBSD.ORG  Mon Feb  3 16:23:51 2014
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 64B0F271;
 Mon,  3 Feb 2014 16:23:51 +0000 (UTC)
Received: from smtp.infotech.no (smtp.infotech.no [82.134.31.41])
 by mx1.freebsd.org (Postfix) with ESMTP id E3DB618B1;
 Mon,  3 Feb 2014 16:23:47 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by smtp.infotech.no (Postfix) with ESMTP id 30A282041C3;
 Mon,  3 Feb 2014 17:14:43 +0100 (CET)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
 by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id IX0Pf4w+iqhg; Mon,  3 Feb 2014 17:14:43 +0100 (CET)
Received: from [10.7.0.30] (unknown [10.7.0.30])
 by smtp.infotech.no (Postfix) with ESMTPA id 651C22041AF;
 Mon,  3 Feb 2014 17:14:42 +0100 (CET)
Message-ID: <52EFC058.20404@interlog.com>
Date: Mon, 03 Feb 2014 11:14:16 -0500
From: Douglas Gilbert <dgilbert@interlog.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, 
 freebsd-scsi@FreeBSD.org
Subject: Re: kern/184975: [ses] SCSI Environmental Services (ses) driver report
 wrong information
References: <201402031120.s13BKDWh029038@freefall.freebsd.org>
In-Reply-To: <201402031120.s13BKDWh029038@freefall.freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: dgilbert@interlog.com
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Feb 2014 16:23:51 -0000

On 14-02-03 06:20 AM, linimon@FreeBSD.org wrote:
> Old Synopsis: SCSI Environmental Services (ses) driver report wrong information
> New Synopsis: [ses] SCSI Environmental Services (ses) driver report wrong information

s/Environmental/Enclosure/

In my experience, OSes (e.g. Linux and FreeBSD) can have
problems with ses drivers inside the kernel because the
enclosure vendors are so sloppy in following the various
SES standards. There are exceptions but on average the
current implementations destroy what otherwise would have
been a good idea.


Doug Gilbert

From owner-freebsd-scsi@FreeBSD.ORG  Mon Feb  3 16:32:30 2014
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 730EB48C;
 Mon,  3 Feb 2014 16:32:30 +0000 (UTC)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4F272196A;
 Mon,  3 Feb 2014 16:32:30 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id s13GWJwA013452
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Mon, 3 Feb 2014 08:32:19 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id s13GWJsi013451;
 Mon, 3 Feb 2014 08:32:19 -0800 (PST) (envelope-from sgk)
Date: Mon, 3 Feb 2014 08:32:19 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: John Baldwin <jhb@freebsd.org>
Subject: Re: Instant panic CAM or USB subsystem
Message-ID: <20140203163219.GA13386@troutmask.apl.washington.edu>
References: <20140125172106.GA67590@troutmask.apl.washington.edu>
 <201401281232.21958.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201401281232.21958.jhb@freebsd.org>
User-Agent: Mutt/1.5.22 (2013-10-16)
Cc: Alexander Motin <mav@freebsd.org>, freebsd-current@freebsd.org,
 scsi@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Feb 2014 16:32:30 -0000

On Tue, Jan 28, 2014 at 12:32:21PM -0500, John Baldwin wrote:
> On Saturday, January 25, 2014 12:21:06 pm Steve Kargl wrote:
> > If I plug my Samsung Intensity II cellphone into a usb port,
> > I get an instant panic.  This is 100% reproducible.  I have
> > the core and kernel for further debugging.  Dmesg.boot follows
> > my sig.
> > 
> > % kgdb /boot/kernel/kernel /vmcore.0
> > 
> > Unread portion of the kernel message buffer:
> > cd1 at umass-sim1 bus 1 scbus4 target 0 lun 0
> > cd1: <SAMSUNG CD-ROM 1.00> Removable CD-ROM SCSI-2 device 
> > cd1: Serial Number 000000000002
> > cd1: 1.000MB/s transfers
> > cd1: cd present [3840000 x 512 byte records]
> > cd1: quirks=0x10<10_BYTE_ONLY>
> > panic: mutex CAM device lock not owned at /usr/src/sys/cam/cam_periph.c:301
> > cpuid = 0
> > KDB: enter: panic
> 
> scsi@ might work better for this.  It looks like when cdasync() calls 
> cam_periph_alloc() it doesn't have its associated xpt_path locked.  All the 
> other async xpt callbacks I looked at don't lock the xpt path either.  It 
> seems they expect it to be locked by the caller when they are invoked.  It 
> seems xpt_async_process_dev() doesn't always lock xpt_lock, but sometimes
> locks the device instead:
> 
> 	/*
> 	 * If async for specific device is to be delivered to
> 	 * the wildcard client, take the specific device lock.
> 	 * XXX: We may need a way for client to specify it.
> 	 */
> 	if ((device->lun_id == CAM_LUN_WILDCARD &&
> 	     path->device->lun_id != CAM_LUN_WILDCARD) ||
> 	    (device->target->target_id == CAM_TARGET_WILDCARD &&
> 	     path->target->target_id != CAM_TARGET_WILDCARD) ||
> 	    (device->target->bus->path_id == CAM_BUS_WILDCARD &&
> 	     path->target->bus->path_id != CAM_BUS_WILDCARD)) {
> 		mtx_unlock(&device->device_mtx);
> 		xpt_path_lock(path);
> 		relock = 1;
> 	} else
> 		relock = 0;
> 
> 	(*(device->target->bus->xport->async))(async_code,
> 	    device->target->bus, device->target, device, async_arg);
> 	xpt_async_bcast(&device->asyncs, async_code, path, async_arg);
> 
> 	if (relock) {
> 		xpt_path_unlock(path);
> 		mtx_lock(&device->device_mtx);
> 	}
> 
> Maybe try going up to this frame (16) in your dump and do
> 'p *device->target'?  However, someone with more CAM knowledge needs to look 
> at this to see what is actually broken.
> 

I finally have time to look at this again.  Here's kgdb for frame 16
as you suggested and then frame 17.


Script started on Mon Feb  3 08:16:32 2014
% kgdb /dsk1/obj/usr/src/sys/MOBILE/kernel.debug vmcore.0

Unread portion of the kernel message buffer:
panic: mutex CAM device lock not owned at /usr/src/sys/cam/cam_periph.c:301
cpuid = 1
KDB: enter: panic

#16 0xc047d6a5 in xpt_async_process_dev (device=<value optimized out>, 
    arg=0xc70aa800) at /usr/src/sys/cam/cam_xpt.c:4208
#17 0xc047b346 in xpt_async_process (periph=0x0, ccb=0xc70aa800)
    at /usr/src/sys/cam/cam_xpt.c:4173
#18 0xc047bd15 in xpt_done_process (ccb_h=0xc70aa800)
    at /usr/src/sys/cam/cam_xpt.c:5249
#19 0xc047ef14 in xpt_done_td (arg=<value optimized out>)
    at /usr/src/sys/cam/cam_xpt.c:5276
#20 0xc0723daf in fork_exit (callout=0xc047edb0 <xpt_done_td>)
    at /usr/src/sys/kern/kern_fork.c:977
#21 0xc09fb3e4 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:278
Current language:  auto; currently minimal
(kgdb) frame 16
#16 0xc047d6a5 in xpt_async_process_dev (device=<value optimized out>, 
    arg=0xc70aa800) at /usr/src/sys/cam/cam_xpt.c:4208
4208				cur_entry->callback(cur_entry->callback_arg,
(kgdb) p *device
Cannot access memory at address 0x0
(kgdb) up 1
#17 0xc047b346 in xpt_async_process (periph=0x0, ccb=0xc70aa800)
    at /usr/src/sys/cam/cam_xpt.c:4173
4173			xpt_async_process_dev(xpt_periph->path->device, ccb);
(kgdb) p *xpt_periph->path->device->target
$2 = {ed_entries = {tqh_first = 0xc6f4b800, tqh_last = 0xc6f4b80c}, links = {
    tqe_next = 0x0, tqe_prev = 0xc6eaaa00}, bus = 0xc6eaaa00, 
  target_id = 4294967295, refcount = 2, generation = 1, last_reset = {
    tv_sec = 0, tv_usec = 0}, rpl_size = 0, luns = 0x0, luns_mtx = {
    lock_object = {lo_name = 0xc0a3f9bc "CAM LUNs lock", lo_flags = 16973824, 
      lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}}
(kgdb) p *xpt_periph->path->device->target->bus
$3 = {et_entries = {tqh_first = 0xc6eaa980, tqh_last = 0xc6eaa988}, links = {
    tqe_next = 0x0, tqe_prev = 0xc7690008}, path_id = 4294967295, 
  sim = 0xc6eaaa80, last_reset = {tv_sec = 0, tv_usec = 0}, flags = 0, 
  refcount = 3, generation = 3, parent_dev = 0x0, xport = 0xc0b2f568, 
  eb_mtx = {lock_object = {lo_name = 0xc0a3f85a "CAM bus lock", 
      lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}}
(kgdb) quit
% exit
exit

Script done on Mon Feb  3 08:20:44 2014

-- 
Steve

From owner-freebsd-scsi@FreeBSD.ORG  Mon Feb  3 22:00:43 2014
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A971F75D;
 Mon,  3 Feb 2014 22:00:43 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu
 [IPv6:2001:470:8b2d:1e1c:21b:21ff:feb8:d7b0])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 71E8A18E4;
 Mon,  3 Feb 2014 22:00:43 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1])
 by khavrinen.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s13M0fei087879
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256
 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA);
 Mon, 3 Feb 2014 17:00:41 -0500 (EST)
 (envelope-from wollman@khavrinen.csail.mit.edu)
Received: (from wollman@localhost)
 by khavrinen.csail.mit.edu (8.14.7/8.14.7/Submit) id s13M0fJq087876;
 Mon, 3 Feb 2014 17:00:41 -0500 (EST) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <21232.4489.544435.898780@khavrinen.csail.mit.edu>
Date: Mon, 3 Feb 2014 17:00:41 -0500
From: Garrett Wollman <wollman@csail.mit.edu>
To: "Kenneth D. Merry" <ken@freebsd.org>
Subject: Re: Heap overflow in mps(4) (was: Re: stable/9 mps(4) rev 254938 ==
 BOOM!)
In-Reply-To: <20140131003342.GA11755@nargothrond.kdm.org>
References: <21225.19508.683025.581620@khavrinen.csail.mit.edu>
 <201401292137.s0TLbD5G006716@hergotha.csail.mit.edu>
 <20140129221514.GA47535@nargothrond.kdm.org>
 <21225.38749.179621.454579@khavrinen.csail.mit.edu>
 <20140131003342.GA11755@nargothrond.kdm.org>
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (khavrinen.csail.mit.edu [127.0.0.1]); Mon, 03 Feb 2014 17:00:41 -0500 (EST)
Cc: freebsd-scsi@freebsd.org, scottl@freebsd.org, freebsd-stable@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Feb 2014 22:00:43 -0000

<<On Thu, 30 Jan 2014 17:33:42 -0700, "Kenneth D. Merry" <ken@freebsd.org> said:

> The attached patch should fix the leaked allocations.  I'm CCing Steve and
> Kashyap at LSI so that they can verify that this is the right place to do
> the mapping shutdown.

It does fix the leak.

> I don't know yet why that particular change is causing problems.  Perhaps
> it just moved things around and exposed an existing problem.

> The fact that the redzone code doesn't expose any problems makes it more
> likely that it is a problem other than a heap overflow.

> Since it is consistent, is there any chance you could hook up remote gdb to
> the box and poke around when it crashes?  Perhaps you'll see something
> interesting that will point to the problem.

No way to do a remote GDB, unfortunately.  However, I tried a few
other things:

- It makes no difference whether mps.ko is preloaded or loaded in
single-user mode.

- If I boot a kernel/modules without redzone, loading mps.ko
instapanics, in a very different place (apologies for the poor
transcription; I can either be up in the machine room to plug in USB
sticks or use the serial console, not both):

--- trap 0xc, rip = 0xffff....f807e934a, rsp = 0xff...94da4c48f0, rbp = 0xff...94da4c4950 ---
bzero() at bzero+0xa/frame 0xff...94da4c4af0
mpssas_add_device() at mpssas_add_device+0x78/frame 0xff..94da4c4af0
mpssas_firmware_event_work() at mpssas_firmware_event_work+0x437/frame 0xff....94da4c4b78
taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xff..94da4c4bc0
taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame 0xff..94da4c4be0

Inspection of the code does not reveal any arc from mpssas_add_device
to bzero.  The return address in the frame is the location of the
first function call (to mpssas_startup_increment()) in
mpssas_add_device().

So I think it's fair to say that something is scribbling over memory
in quite a bad way.

Two things that may be relevant: on boot, this server's MPT2 BIOS
always complains "adapter configuration may have changed", and I
haven't discovered anything in the configuration utility that changes
this.  Also, on boot, I always get the following messages:

failure at /usr/src-9-stable/sys/dev/mps/mps_sas_lsi.c:667/mpssas_add_device()! Could not get ID for device with handle 0x0010
mpssas_fw_work: failed to add device with handle 0x10

This has been true across mps(4) revisions, on all three copies of
this hardware that I have in service.

-GAWollman

From owner-freebsd-scsi@FreeBSD.ORG  Tue Feb  4 07:39:42 2014
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 30C01A29;
 Tue,  4 Feb 2014 07:39:42 +0000 (UTC)
Received: from mail-ee0-f41.google.com (mail-ee0-f41.google.com [74.125.83.41])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 65F091C1F;
 Tue,  4 Feb 2014 07:39:40 +0000 (UTC)
Received: by mail-ee0-f41.google.com with SMTP id e51so1935040eek.28
 for <multiple recipients>; Mon, 03 Feb 2014 23:39:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
 :references:in-reply-to:content-type:content-transfer-encoding;
 bh=63JPcyGQBiTBLad94svl/IPeP2Dhluug8kcL3MhloBI=;
 b=lo4du05u/+Ny5joYVWxyACv33yJ2Lxe/6QE7D4a6cE+JzLAlR8ZuWUyyD4jbIdQjRW
 gW7ubPNX9sCSdxN5yjcQBAD9xDMuCsan+gWLZ2r9xuact0kKCG0ralz8OZiKuq9MOAey
 mizK1o2LHv5FcChcDhCNM9g+dlcmuNuvDd6CAzItcTf9t582MpI8pYPil4T/D1oDZpCE
 5P1TGMe3rFqENKDtn2dxMpa1YY8k3dCsj1LtXbEPfSQiUK+v26vo1vAi0sdm7n45WOnM
 GbakhJE2bQKqxW34uHmBCLk/JhOLMDchwDlHQW56qH02dWTWZb1ZJeiEjJni0EUEFJn1
 ZTaA==
X-Received: by 10.14.29.6 with SMTP id h6mr1034131eea.84.1391499544433;
 Mon, 03 Feb 2014 23:39:04 -0800 (PST)
Received: from mavbook.mavhome.dp.ua ([134.249.139.101])
 by mx.google.com with ESMTPSA id m9sm71924582eeh.3.2014.02.03.23.39.02
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Mon, 03 Feb 2014 23:39:03 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <52F09914.5040202@FreeBSD.org>
Date: Tue, 04 Feb 2014 09:39:00 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: Steve Kargl <sgk@troutmask.apl.washington.edu>, 
 John Baldwin <jhb@freebsd.org>
Subject: Re: Instant panic CAM or USB subsystem
References: <20140125172106.GA67590@troutmask.apl.washington.edu>
 <201401281232.21958.jhb@freebsd.org>
 <20140128195842.GA83173@troutmask.apl.washington.edu>
In-Reply-To: <20140128195842.GA83173@troutmask.apl.washington.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-current@freebsd.org, scsi@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Feb 2014 07:39:42 -0000

On 28.01.2014 21:58, Steve Kargl wrote:
> On Tue, Jan 28, 2014 at 12:32:21PM -0500, John Baldwin wrote:
>> On Saturday, January 25, 2014 12:21:06 pm Steve Kargl wrote:
>>> If I plug my Samsung Intensity II cellphone into a usb port,
>>> I get an instant panic.  This is 100% reproducible.  I have
>>> the core and kernel for further debugging.  Dmesg.boot follows
>>> my sig.
>>>
>>> % kgdb /boot/kernel/kernel /vmcore.0
>>>
>>> Unread portion of the kernel message buffer:
>>> cd1 at umass-sim1 bus 1 scbus4 target 0 lun 0
>>> cd1: <SAMSUNG CD-ROM 1.00> Removable CD-ROM SCSI-2 device
>>> cd1: Serial Number 000000000002
>>> cd1: 1.000MB/s transfers
>>> cd1: cd present [3840000 x 512 byte records]
>>> cd1: quirks=0x10<10_BYTE_ONLY>
>>> panic: mutex CAM device lock not owned at /usr/src/sys/cam/cam_periph.c:301
>>> cpuid = 0
>>> KDB: enter: panic
>>
>> scsi@ might work better for this.  It looks like when cdasync() calls
>> cam_periph_alloc() it doesn't have its associated xpt_path locked.  All the
>> other async xpt callbacks I looked at don't lock the xpt path either.  It
>> seems they expect it to be locked by the caller when they are invoked.  It
>> seems xpt_async_process_dev() doesn't always lock xpt_lock, but sometimes
>> locks the device instead:
>>
>> 	/*
>> 	 * If async for specific device is to be delivered to
>> 	 * the wildcard client, take the specific device lock.
>> 	 * XXX: We may need a way for client to specify it.
>> 	 */
>> 	if ((device->lun_id == CAM_LUN_WILDCARD &&
>> 	     path->device->lun_id != CAM_LUN_WILDCARD) ||
>> 	    (device->target->target_id == CAM_TARGET_WILDCARD &&
>> 	     path->target->target_id != CAM_TARGET_WILDCARD) ||
>> 	    (device->target->bus->path_id == CAM_BUS_WILDCARD &&
>> 	     path->target->bus->path_id != CAM_BUS_WILDCARD)) {
>> 		mtx_unlock(&device->device_mtx);
>> 		xpt_path_lock(path);
>> 		relock = 1;
>> 	} else
>> 		relock = 0;
>>
>> 	(*(device->target->bus->xport->async))(async_code,
>> 	    device->target->bus, device->target, device, async_arg);
>> 	xpt_async_bcast(&device->asyncs, async_code, path, async_arg);
>>
>> 	if (relock) {
>> 		xpt_path_unlock(path);
>> 		mtx_lock(&device->device_mtx);
>> 	}
>>
>> Maybe try going up to this frame (16) in your dump and do
>> 'p *device->target'?  However, someone with more CAM knowledge needs to look
>> at this to see what is actually broken.
>>
>> It seems a bit odd that it thinks your phone is a CD player.
>
> Thanks for the follow-up.  I poked around a bit, but don't
> recall looking at *device->target.   Under Windows, 3
> filesystems show up, and the one causing problems is listed
> as CDFS.

I guess problem may be not that phone is reported as CD, but that it is 
reported as several CDs on one target. Note that you already see cd1 
reported, but another one was still trying to allocate when system panicked.

I think that CAM CD driver incorrectly assumes that your device is CD 
changer. I've pulled real 5-disk SCSI CD changer from my depths of my 
table and got panic very much like yours just on boot. It seems that 
respective changer code was not properly re-locked during recent CAM 
locking project.

I am going to analyze this case deeper to fix in properly, while for 
your case I can propose such quick quirk:

--- scsi_cd.c   (revision 261448)
+++ scsi_cd.c   (working copy)
@@ -223,6 +223,10 @@ static struct cd_quirk_entry cd_quirk_table[] =
         {
                 { T_CDROM, SIP_MEDIA_REMOVABLE, "CHINON", "CD-ROM 
CDS-535","*"},
                 /* quirks */ CD_Q_BCD_TRACKS
+       },
+       {
+               { T_CDROM, SIP_MEDIA_REMOVABLE, "SAMSUNG", "CD-ROM","1.00"},
+               /* quirks */ CD_Q_NO_CHANGER
         }
  };


-- 
Alexander Motin

From owner-freebsd-scsi@FreeBSD.ORG  Tue Feb  4 22:09:59 2014
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A0F56B4C;
 Tue,  4 Feb 2014 22:09:59 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu
 [IPv6:2001:470:8b2d:1e1c:21b:21ff:feb8:d7b0])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 67B151210;
 Tue,  4 Feb 2014 22:09:59 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1])
 by khavrinen.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s14M9v7P098695
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256
 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA);
 Tue, 4 Feb 2014 17:09:57 -0500 (EST)
 (envelope-from wollman@khavrinen.csail.mit.edu)
Received: (from wollman@localhost)
 by khavrinen.csail.mit.edu (8.14.7/8.14.7/Submit) id s14M9vMo098692;
 Tue, 4 Feb 2014 17:09:57 -0500 (EST) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <21233.25909.355102.743155@khavrinen.csail.mit.edu>
Date: Tue, 4 Feb 2014 17:09:57 -0500
From: Garrett Wollman <wollman@csail.mit.edu>
To: "Kenneth D. Merry" <ken@freebsd.org>
Subject: Re: Heap overflow in mps(4) (was: Re: stable/9 mps(4) rev 254938 ==
 BOOM!)
In-Reply-To: <20140131003342.GA11755@nargothrond.kdm.org>
References: <21225.19508.683025.581620@khavrinen.csail.mit.edu>
 <201401292137.s0TLbD5G006716@hergotha.csail.mit.edu>
 <20140129221514.GA47535@nargothrond.kdm.org>
 <21225.38749.179621.454579@khavrinen.csail.mit.edu>
 <20140131003342.GA11755@nargothrond.kdm.org>
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (khavrinen.csail.mit.edu [127.0.0.1]); Tue, 04 Feb 2014 17:09:57 -0500 (EST)
Cc: freebsd-scsi@freebsd.org, scottl@freebsd.org, freebsd-stable@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Feb 2014 22:09:59 -0000

<<On Thu, 30 Jan 2014 17:33:42 -0700, "Kenneth D. Merry" <ken@freebsd.org> said:

> The fact that the redzone code doesn't expose any problems makes it more
> likely that it is a problem other than a heap overflow.

So I built a new kernel with DEBUG_MEMGUARD.  When
vm.memguard.desc="mps", everything works fine both through two
load/unload cycles and statically compiled into the kernel.  When
vm.memguard.desc is not set, instapanic as before.  (I'm trying
memguard rather than redzone as it has much less of a performance
impact, so I can start doing some of the performance testing I was
originally intending to do.

Are there any debugging options that I could usefully enable that
would show just what mps is doing when the fault happens?  I see that
there are lots of tracing options but I don't know what would actually
be useful.

-GAWollman

From owner-freebsd-scsi@FreeBSD.ORG  Tue Feb  4 23:34:57 2014
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 82869720
 for <freebsd-scsi@freebsd.org>; Tue,  4 Feb 2014 23:34:57 +0000 (UTC)
Received: from nm28-vm3.bullet.mail.ne1.yahoo.com
 (nm28-vm3.bullet.mail.ne1.yahoo.com [98.138.91.158])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3F52A1A43
 for <freebsd-scsi@freebsd.org>; Tue,  4 Feb 2014 23:34:56 +0000 (UTC)
Received: from [98.138.101.132] by nm28.bullet.mail.ne1.yahoo.com with NNFMP;
 04 Feb 2014 23:34:50 -0000
Received: from [98.138.226.56] by tm20.bullet.mail.ne1.yahoo.com with NNFMP;
 04 Feb 2014 23:34:45 -0000
Received: from [127.0.0.1] by smtp207.mail.ne1.yahoo.com with NNFMP;
 04 Feb 2014 23:34:45 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024;
 t=1391556885; bh=rxZw2dsf4UHe/Sc+SnwRZg6sr1Ok71SgknlkGhVc8N0=;
 h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc:Content-Transfer-Encoding:Message-Id:References:To:X-Mailer;
 b=h8R7Lax0flGdhDs/Q6iauYUmtUQHFFHtWsjZUXTGxA2Og0a+lyabQWWbRfD/a+3593mQ+fdircNba2hunngEsXUW0bXS4ISC/H2jFIjCvXQ8Uyn5WsmN+PpzvENS+x8Nn3uGJ2F8PdRT7tj7KaxVO9JRr136j1rIvbffu6oF9XU=
X-Yahoo-Newman-Id: 796784.87775.bm@smtp207.mail.ne1.yahoo.com
X-Yahoo-Newman-Property: ymail-3
X-YMail-OSG: p1KzD_EVM1n5AsdNmuVw5Ea9tSJeRDQxjjXsagOgc6gBsPo
 d5YHDgWG6PTqT9rbcuJbL33iXpJw3xgvi5Uw9xN6yE8j9y_n0IuHfxLr3VQb
 yc7M0uyzSq3k_r.1B0f3_EQIb4uAN80z_RCG_1lHernfEZuVJocsg11iH6nW
 VBtto9FkfpH8lB0s_5fWNv.cvrn99MQaGRK6tN.IlGji3t.xLJpAHdsqmzhe
 hfb7GuBBZCyX5fqm5ilszP.44eAzqRhH2PPifotjIkBTfi1yVFwFK7NkmymU
 IcNrp6.kWZiL0ZlTbNNhoCJp_Z8OQylNMuI85X2bmYvtaJLzdrQF5QDcWxQw
 DW9Sfq0OG7pUI.dL4MynGbBLIRqWF8PqrVTGussdnnGZ3afsUdCojBWN8fQF
 gRAnr6QfOO7g0O1k.1bzZTTLx1rskrZGsKReIV6er31w0FJ_DrxYjQnZTPbk
 76SYQvpTBrfXA9dg26GMeKy_QJxSsM0sL9FHYNmU5n798YlP3jYoaF0LXsW8
 A4j_qQ21kVcwhvkNobm.KOePKH8yHlAk467Rx8gv86yqWZ2PncN8d210sSHa
 BmL.40IEV
X-Yahoo-SMTP: clhABp.swBB7fs.LwIJpv3jkWgo2NU8-
X-Rocket-Received: from lgmac-eding.corp.netflix.com (scott4long@69.53.236.251
 with plain [63.250.193.228])
 by smtp207.mail.ne1.yahoo.com with SMTP; 04 Feb 2014 15:34:40 -0800 PST
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\))
Subject: Re: Heap overflow in mps(4) (was: Re: stable/9 mps(4) rev 254938 ==
 BOOM!)
From: Scott Long <scott4long@yahoo.com>
In-Reply-To: <21233.25909.355102.743155@khavrinen.csail.mit.edu>
Date: Tue, 4 Feb 2014 16:34:36 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <E775293C-C688-4174-A224-F35090C36BF9@yahoo.com>
References: <21225.19508.683025.581620@khavrinen.csail.mit.edu>
 <201401292137.s0TLbD5G006716@hergotha.csail.mit.edu>
 <20140129221514.GA47535@nargothrond.kdm.org>
 <21225.38749.179621.454579@khavrinen.csail.mit.edu>
 <20140131003342.GA11755@nargothrond.kdm.org>
 <21233.25909.355102.743155@khavrinen.csail.mit.edu>
To: Garrett Wollman <wollman@csail.mit.edu>
X-Mailer: Apple Mail (2.1827)
Cc: freebsd-stable@freebsd.org, "Kenneth D. Merry" <ken@freebsd.org>,
 "FreeBSD-scsi@freebsd.org" <freebsd-scsi@freebsd.org>
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Feb 2014 23:34:57 -0000


On Feb 4, 2014, at 3:09 PM, Garrett Wollman <wollman@csail.mit.edu> =
wrote:

> <<On Thu, 30 Jan 2014 17:33:42 -0700, "Kenneth D. Merry" =
<ken@freebsd.org> said:
>=20
>> The fact that the redzone code doesn't expose any problems makes it =
more
>> likely that it is a problem other than a heap overflow.
>=20
> So I built a new kernel with DEBUG_MEMGUARD.  When
> vm.memguard.desc=3D"mps", everything works fine both through two
> load/unload cycles and statically compiled into the kernel.  When
> vm.memguard.desc is not set, instapanic as before.  (I'm trying
> memguard rather than redzone as it has much less of a performance
> impact, so I can start doing some of the performance testing I was
> originally intending to do.
>=20
> Are there any debugging options that I could usefully enable that
> would show just what mps is doing when the fault happens?  I see that
> there are lots of tracing options but I don't know what would actually
> be useful.
>=20

Try the patch at http://people.freebsd.org/~scottl/mps.memguard.diff

I haven=92t even compile tested it, so hopefully any mistakes are easy =
to fix
and aren=92t too embarrassing.  The target array is an obvious culprit =
since it=92s
often indexed without bounds.  If this doesn=92t fix it then I=92ll have =
to think of
some other culprits.  Another next step would be to further divide and =
test
the M_MPT2 malloc allocation type.

Scott


From owner-freebsd-scsi@FreeBSD.ORG  Wed Feb  5 02:08:14 2014
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A944CEAD;
 Wed,  5 Feb 2014 02:08:14 +0000 (UTC)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 85F5F1A70;
 Wed,  5 Feb 2014 02:08:14 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id s152846x054127
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Tue, 4 Feb 2014 18:08:04 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id s15284XP054126;
 Tue, 4 Feb 2014 18:08:04 -0800 (PST) (envelope-from sgk)
Date: Tue, 4 Feb 2014 18:08:04 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Alexander Motin <mav@FreeBSD.org>
Subject: Re: Instant panic CAM or USB subsystem
Message-ID: <20140205020804.GA54095@troutmask.apl.washington.edu>
References: <20140125172106.GA67590@troutmask.apl.washington.edu>
 <201401281232.21958.jhb@freebsd.org>
 <20140128195842.GA83173@troutmask.apl.washington.edu>
 <52F09914.5040202@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <52F09914.5040202@FreeBSD.org>
User-Agent: Mutt/1.5.22 (2013-10-16)
Cc: freebsd-current@freebsd.org, John Baldwin <jhb@freebsd.org>,
 scsi@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Feb 2014 02:08:14 -0000

On Tue, Feb 04, 2014 at 09:39:00AM +0200, Alexander Motin wrote:
> 
> I guess problem may be not that phone is reported as CD, but that it is 
> reported as several CDs on one target. Note that you already see cd1 
> reported, but another one was still trying to allocate when system panicked.

Good guess see below.

> I think that CAM CD driver incorrectly assumes that your device is CD 
> changer. I've pulled real 5-disk SCSI CD changer from my depths of my 
> table and got panic very much like yours just on boot. It seems that 
> respective changer code was not properly re-locked during recent CAM 
> locking project.

If you come up with a patch, I can test it for you.

> I am going to analyze this case deeper to fix in properly, while for 
> your case I can propose such quick quirk:
> 
> --- scsi_cd.c   (revision 261448)
> +++ scsi_cd.c   (working copy)
> @@ -223,6 +223,10 @@ static struct cd_quirk_entry cd_quirk_table[] =
>          {
>                  { T_CDROM, SIP_MEDIA_REMOVABLE, "CHINON", "CD-ROM 
> CDS-535","*"},
>                  /* quirks */ CD_Q_BCD_TRACKS
> +       },
> +       {
> +               { T_CDROM, SIP_MEDIA_REMOVABLE, "SAMSUNG", "CD-ROM","1.00"},
> +               /* quirks */ CD_Q_NO_CHANGER
>          }
>   };
> 

With your quirk, the laptop booted and plugging in the cellphone
does not cause a panic.  :-)  

dmesg shows

ugen3.2: <Qualcomm, Incorporated> at usbus3
umass1: <Qualcomm, Incorporated USB MMC Storage, class 0/0, rev 1.10/0.00,\
         addr 2> on usbus3
cd1 at umass-sim1 bus 1 scbus5 target 0 lun 0
cd1: <SAMSUNG CD-ROM 1.00> Removable CD-ROM SCSI-2 device 
cd1: Serial Number 000000000002
cd1: 1.000MB/s transfers
cd1: cd present [3840000 x 512 byte records]
cd1: quirks=0x14<NO_CHANGER,10_BYTE_ONLY>
cd2 at umass-sim1 bus 1 scbus5 target 0 lun 1
cd2: <SAMSUNG CD-ROM 1.00> Removable CD-ROM SCSI-2 device 
cd2: Serial Number 000000000002
cd2: 1.000MB/s transfers
cd2: cd present [1084 x 512 byte records]
cd2: quirks=0x14<NO_CHANGER,10_BYTE_ONLY>

After a few seconds, the cellphone display shows

> Sync Music to Phone
> Sync Music to Card
> Copy/Move Files

and the following appears in dmesg

ugen3.2: <Qualcomm, Incorporated> at usbus3 (disconnected)
umass1: at uhub3, port 2, addr 2 (disconnected)
cd1 at umass-sim1 bus 1 scbus5 target 0 lun 0
cd1: <SAMSUNG CD-ROM 1.00> s/n 000000000002 detached
cd2 at umass-sim1 bus 1 scbus5 target 0 lun 1
cd2: <SAMSUNG CD-ROM 1.00> s/n 000000000002 detached
(cd2:umass-sim1:1:0:1): Periph destroyed
(cd1:umass-sim1:1:0:0): Periph destroyed
ugen3.2: <SAMSUNG Electronics Bo.,Ltd.> at usbus3

This is fine with me as I only use the laptop as a charging station.

-- 
Steve

From owner-freebsd-scsi@FreeBSD.ORG  Wed Feb  5 19:21:43 2014
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id BD44CC87;
 Wed,  5 Feb 2014 19:21:43 +0000 (UTC)
Received: from mail.ambrisko.com (mail.ambrisko.com [70.91.206.90])
 by mx1.freebsd.org (Postfix) with ESMTP id 8C0D71DB6;
 Wed,  5 Feb 2014 19:21:43 +0000 (UTC)
X-Ambrisko-Me: Yes
Received: from server2.ambrisko.com (HELO internal.ambrisko.com)
 ([192.168.1.2])
 by ironport.ambrisko.com with ESMTP; 05 Feb 2014 11:26:08 -0800
Received: from ambrisko.com (localhost [127.0.0.1])
 by internal.ambrisko.com (8.14.4/8.14.4) with ESMTP id s15JLaUp072430;
 Wed, 5 Feb 2014 11:21:36 -0800 (PST)
 (envelope-from ambrisko@ambrisko.com)
Received: (from ambrisko@localhost)
 by ambrisko.com (8.14.4/8.14.4/Submit) id s15JLatT072429;
 Wed, 5 Feb 2014 11:21:36 -0800 (PST) (envelope-from ambrisko)
Date: Wed, 5 Feb 2014 11:21:36 -0800
From: Doug Ambrisko <ambrisko@ambrisko.com>
To: Mark Johnston <markj@freebsd.org>
Subject: Re: mfi(4) support for MegaRAID Fury cards
Message-ID: <20140205192136.GA71309@ambrisko.com>
References: <20131227220455.GA6027@charmander.home>
 <20140124190832.GB28724@ambrisko.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20140124190832.GB28724@ambrisko.com>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-scsi@freebsd.org, ambrisko@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Feb 2014 19:21:43 -0000

On Fri, Jan 24, 2014 at 11:08:32AM -0800, Doug Ambrisko wrote:
| On Fri, Dec 27, 2013 at 05:04:55PM -0500, Mark Johnston wrote:
| | Hello,
| | 
| | The patch here adds mfi(4) support for my LSI 9341-4i controller, which
| | has device ID 0x5f:
| | 
| | http://people.freebsd.org/~markj/patches/mfi_fury.diff
| | 
| | This diff was mostly obtained by going through the mrsas(4) code
| | specific to Invader (DID 0x5d) and Fury (DID 0x5f) controllers. The main
| | change is to add an end-of-list marker to scatter-gather DMA lists
| | before handing them to the firmware. Without this, large writes to an
| | mfi(4) volume result in a firmware crash loop, and the system needs to
| | be reset. The diff adds code for both Invader and Fury cards, as this is
| | what's done in mrsas(4); I haven't tested with an Invader card though,
| | as I don't have access to one. With this patch, I'm able to boot FreeBSD
| | 8.2 off of a RAID 1 volume on my 9341-4i.
| | 
| | Would anyone be able to review or test this patch? I'm particularly
| | interested if anyone could try it out with an Invader or Fury card
| | (there shouldn't be any differences in driver behaviour with other
| | cards).
| 
| The patch looks good.  I can test it out on a Invader card that I have.
| I don't have a Fury card.  I was holding off waiting to see how we
| should resolve the mrsas(4) driver from LSI conflict.  We have been
| looking at what needs to be done to get mrsas(4) into FreeBSD.  I
| posted a change to FreeBSD SCSI list to add a tunable to reduce
| the probe priority of mfi(4) for ThunderBolt and later cards.  This
| way they can both be in the GENERIC kernel etc. and not have an
| issue.  We'll need to do some minor updates to your patch to work
| with that since I added another flag in the ident area.

After fixing the merge conflict with my recent change it works with my
Invader card.  I don't see any issues with the patch.

Do you want to redo the patch and then commit it or just commit once
you've made the change.  Please make sure you do it with -current.
After this we should plan to MFC these changes all the way back to
8-stable.

Thanks,

Doug A.

From owner-freebsd-scsi@FreeBSD.ORG  Thu Feb  6 02:57:40 2014
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 605554DE;
 Thu,  6 Feb 2014 02:57:40 +0000 (UTC)
Received: from mail-ie0-x22f.google.com (mail-ie0-x22f.google.com
 [IPv6:2607:f8b0:4001:c03::22f])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1EC281770;
 Thu,  6 Feb 2014 02:57:40 +0000 (UTC)
Received: by mail-ie0-f175.google.com with SMTP id ar20so298059iec.6
 for <multiple recipients>; Wed, 05 Feb 2014 18:57:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:date:from:to:cc:subject:message-id:references:mime-version
 :content-type:content-disposition:in-reply-to:user-agent;
 bh=mxPSjXfZovfRc76HObnv0UGLsgvN+fl3AN0wr2DClKo=;
 b=f1nK/rF6fClVT57T009/n2iMNw8Ho9Wr8qEOC+SWRIy/xSQPycXIH40OYxbxFsdTg2
 B3GWUyxYgH70IPyYBreLteWQgi0R1TpEqgMPggjODCGyK6LhQ9osMjsW4PG/m0fH5ph/
 719OaKhys9dUJfnYC658cHRm+upy78tIRdIUE3zeXWcAmoRt63XEGtq2/KYT1rXydQdq
 ICrOXwB8zwdeGQbGLIL/ZOwmYlfEC3m5r8HBYC52CbsxcezXVOpAun0LQHvfg0tQfaF2
 m7DQnKKNSI25TC8YTfWOzlUXCDs4lZU+P82Fb+GRA3TOPmdCI5ggL/XoHwUIio7ddpxq
 vkkQ==
X-Received: by 10.50.43.233 with SMTP id z9mr27520961igl.33.1391655459491;
 Wed, 05 Feb 2014 18:57:39 -0800 (PST)
Received: from raichu (198-84-185-216.cpe.teksavvy.com. [198.84.185.216])
 by mx.google.com with ESMTPSA id kt2sm62989701igb.1.2014.02.05.18.57.32
 for <multiple recipients>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 05 Feb 2014 18:57:38 -0800 (PST)
Sender: Mark Johnston <markjdb@gmail.com>
Date: Wed, 5 Feb 2014 21:57:10 -0500
From: Mark Johnston <markj@freebsd.org>
To: Doug Ambrisko <ambrisko@ambrisko.com>
Subject: Re: mfi(4) support for MegaRAID Fury cards
Message-ID: <20140206025710.GA77280@raichu>
References: <20131227220455.GA6027@charmander.home>
 <20140124190832.GB28724@ambrisko.com>
 <20140205192136.GA71309@ambrisko.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20140205192136.GA71309@ambrisko.com>
User-Agent: Mutt/1.5.22 (2013-10-16)
Cc: freebsd-scsi@freebsd.org, ambrisko@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Feb 2014 02:57:40 -0000

On Wed, Feb 05, 2014 at 11:21:36AM -0800, Doug Ambrisko wrote:
> On Fri, Jan 24, 2014 at 11:08:32AM -0800, Doug Ambrisko wrote:
> | On Fri, Dec 27, 2013 at 05:04:55PM -0500, Mark Johnston wrote:
> | | Hello,
> | | 
> | | The patch here adds mfi(4) support for my LSI 9341-4i controller, which
> | | has device ID 0x5f:
> | | 
> | | http://people.freebsd.org/~markj/patches/mfi_fury.diff
> | | 
> | | This diff was mostly obtained by going through the mrsas(4) code
> | | specific to Invader (DID 0x5d) and Fury (DID 0x5f) controllers. The main
> | | change is to add an end-of-list marker to scatter-gather DMA lists
> | | before handing them to the firmware. Without this, large writes to an
> | | mfi(4) volume result in a firmware crash loop, and the system needs to
> | | be reset. The diff adds code for both Invader and Fury cards, as this is
> | | what's done in mrsas(4); I haven't tested with an Invader card though,
> | | as I don't have access to one. With this patch, I'm able to boot FreeBSD
> | | 8.2 off of a RAID 1 volume on my 9341-4i.
> | | 
> | | Would anyone be able to review or test this patch? I'm particularly
> | | interested if anyone could try it out with an Invader or Fury card
> | | (there shouldn't be any differences in driver behaviour with other
> | | cards).
> | 
> | The patch looks good.  I can test it out on a Invader card that I have.
> | I don't have a Fury card.  I was holding off waiting to see how we
> | should resolve the mrsas(4) driver from LSI conflict.  We have been
> | looking at what needs to be done to get mrsas(4) into FreeBSD.  I
> | posted a change to FreeBSD SCSI list to add a tunable to reduce
> | the probe priority of mfi(4) for ThunderBolt and later cards.  This
> | way they can both be in the GENERIC kernel etc. and not have an
> | issue.  We'll need to do some minor updates to your patch to work
> | with that since I added another flag in the ident area.
> 
> After fixing the merge conflict with my recent change it works with my
> Invader card.  I don't see any issues with the patch.

Thanks! I've committed the change as r261535.

> 
> Do you want to redo the patch and then commit it or just commit once
> you've made the change.  Please make sure you do it with -current.
> After this we should plan to MFC these changes all the way back to
> 8-stable.

Sure, sounds good.

-Mark

From owner-freebsd-scsi@FreeBSD.ORG  Sat Feb  8 21:28:24 2014
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 41968205;
 Sat,  8 Feb 2014 21:28:24 +0000 (UTC)
Received: from pi.nmdps.net (pi.nmdps.net [109.61.102.5])
 by mx1.freebsd.org (Postfix) with ESMTP id CB24E10D6;
 Sat,  8 Feb 2014 21:28:23 +0000 (UTC)
Received: from pi.nmdps.net (localhost [127.0.0.1])
 (Authenticated sender: krichy@cflinux.hu)
 by pi.nmdps.net (Postfix) with ESMTPSA id 03B7115E6;
 Sat,  8 Feb 2014 22:28:15 +0100 (CET)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
 format=flowed
Content-Transfer-Encoding: 8bit
Date: Sat, 08 Feb 2014 22:28:13 +0100
From: krichy@cflinux.hu
To: zfs-devel@freebsd.org, freebsd-scsi@freebsd.org
Subject: Re: Outage related to hard drive failure
Message-ID: <ba6c5381cce9cf6548c7ce1394be9d7a@cflinux.hu>
X-Sender: krichy@cflinux.hu
User-Agent: Roundcube Webmail/0.9.5
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Feb 2014 21:28:24 -0000

Dear Brian,

Unfortunately I just can report about same issue I ran into a few weeks 
ago. We run a FreeNAS server which hosts the VM images, and serves them 
over NFS. One time I began receiving notifications that the virtual 
hosts served by this NAS went down. I checked the NAS, found that one 
drive attached to a mps/lsi HBA stopped responding to the HBA at all, 
and thus blocked the whole pool. That was also strange to me that 
neither a timeout event happened, so actually zfs thought that all 
drives are fine, just one blocked the whole pool IOs. And unfortunately 
I even could not offline that drive, only a hard reset helped. The drive 
was so unresponsive that the bootup for FreeNAS also took long, but at 
least that time zfs somehow noticed the drive is missing, and removed it 
from the pool. And after that the pool worked fine, of course in a 
degraded state, but healthy, we could initiate a replacement.

I only have the logs for the bootup:
mps0: mpssas_scsiio_timeout checking sc 0xffffff8000fd1000 cm 
0xffffff800100c0a0
(probe14:mps0:0:14:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 
500 command timeout cm 0xffffff800100c0a0 ccb 0xfffffe0010f03000
mps0: mpssas_alloc_tm freezing simq
mps0: timedout cm 0xffffff800100c0a0 allocated tm 0xffffff8000fe47b0
(probe14:mps0:0:14:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 
500 completed timedout cm 0xffffff800100c0a0 ccb 0xfffffe0010f03000 
during recovery ioc 8048 scsi 0 state c xfer 0
(noperiph:mps0:0:14:0): SMID 6 abort TaskMID 500 status 0x0 code 0x0 
count 1
(noperiph:mps0:0:14:0): SMID 6 finished recovery after aborting TaskMID 
500
mps0: mpssas_free_tm releasing simq
(probe14:mps0:0:14:0): INQUIRY. CDB: 12 00 00 00 24 00
(probe14:mps0:0:14:0): CAM status: Command timeout
(probe14:mps0:0:14:0): Retrying command
mps0: mpssas_scsiio_timeout checking sc 0xffffff8000fd1000 cm 
0xffffff8001032f50
(probe14:mps0:0:14:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 
986 command timeout cm 0xffffff8001032f50 ccb 0xfffffe0010f03000
mps0: mpssas_alloc_tm freezing simq
mps0: timedout cm 0xffffff8001032f50 allocated tm 0xffffff8000fe48f8
(probe14:mps0:0:14:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 
986 completed timedout cm 0xffffff8001032f50 ccb 0xfffffe0010f03000 
during recovery ioc 8048 scsi 0 state c xfer 0
(noperiph:mps0:0:14:0): SMID 7 abort TaskMID 986 status 0x0 code 0x0 
count 1
(noperiph:mps0:0:14:0): SMID 7 finished recovery after aborting TaskMID 
986
mps0: mpssas_free_tm releasing simq
(probe14:mps0:0:14:0): INQUIRY. CDB: 12 00 00 00 24 00
(probe14:mps0:0:14:0): CAM status: Command timeout
(probe14:mps0:0:14:0): Retrying command
mps0: mpssas_scsiio_timeout checking sc 0xffffff8000fd1000 cm 
0xffffff8001010c38
(probe14:mps0:0:14:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 
559 command timeout cm 0xffffff8001010c38 ccb 0xfffffe0010f03000
mps0: mpssas_alloc_tm freezing simq
mps0: timedout cm 0xffffff8001010c38 allocated tm 0xffffff8000fe4a40
(probe14:mps0:0:14:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 
559 completed timedout cm 0xffffff8001010c38 ccb 0xfffffe0010f03000 
during recovery ioc 8048 scsi 0 state c xfer 0
(noperiph:mps0:0:14:0): SMID 8 abort TaskMID 559 status 0x0 code 0x0 
count 1
(noperiph:mps0:0:14:0): SMID 8 finished recovery after aborting TaskMID 
559
mps0: mpssas_free_tm releasing simq
(probe14:mps0:0:14:0): INQUIRY. CDB: 12 00 00 00 24 00
(probe14:mps0:0:14:0): CAM status: Command timeout
(probe14:mps0:0:14:0): Retrying command
mps0: mpssas_scsiio_timeout checking sc 0xffffff8000fd1000 cm 
0xffffff8001007278
(probe14:mps0:0:14:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 
439 command timeout cm 0xffffff8001007278 ccb 0xfffffe0010f03000
mps0: mpssas_alloc_tm freezing simq
mps0: timedout cm 0xffffff8001007278 allocated tm 0xffffff8000fe4b88
(probe14:mps0:0:14:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 
439 completed timedout cm 0xffffff8001007278 ccb 0xfffffe0010f03000 
during recovery ioc 8048 scsi 0 state c xfer 0
(noperiph:mps0:0:14:0): SMID 9 abort TaskMID 439 status 0x0 code 0x0 
count 1
(noperiph:mps0:0:14:0): SMID 9 finished recovery after aborting TaskMID 
439
mps0: mpssas_free_tm releasing simq
(probe14:mps0:0:14:0): INQUIRY. CDB: 12 00 00 00 24 00
(probe14:mps0:0:14:0): CAM status: Command timeout
(probe14:mps0:0:14:0): Retrying command


A side note is that the drive was removed from its hot-swap bay, then 
reinserted, and since then it is working fine. That is a seagate 
ST32000645SS. And no errors reported.

Regards,

2014-02-08 21:02 időpontban Brian Gardner ezt írta:
> Last year upgraded my production servers from ufs on adaptec raid
> controllers to zfs raidz on lsi controllers.  Last week I had an
> outage, the culprit being a failed hard in my raidz.  My log was
> littered with kernel messages such as the ones below.  About 20
> minutes after the first message some aspects of my host where hung,
> and I was notified that my site was down.  I noticed these messages in
> my log but running a zfs status however showed only a few checksum
> errors and the failed drive didn't get degraded.  Manually degrading
> the drive solved my problems.  This seems very odd to me.  It's almost
> as if zfs wasn't getting messages from the lsi driver regarding these
> read/write failures.  Do I need to tune something in the mps/lsi or
> zfs drivers to help it deal with failures?  I'm running FreeBSD
> 8.3-Release-p3 with the generic kernel, nothing unusual about my setup
> other than I use jails extensively.  There are four drives in the
> raidz configuration in question:
> 
> zpool status
>   pool: storage
>  state: ONLINE
>   scan: resilvered 67.6G in 2h21m with 0 errors on Tue Aug  6 00:44:14 
> 2013
> config:
> 
> 	NAME        STATE     READ WRITE CKSUM
> 	storage     ONLINE       0     0     0
> 	  raidz1-0  ONLINE       0     0     0
> 	    da0     ONLINE       0     0     0
> 	    da1     ONLINE       0     0     0
> 	    da2     ONLINE       0     0     0
> 	    da3     ONLINE       0     0     0
> 
> Jan 29 19:03:18 host2 kernel: (da0:mpslsi0:0:0:0): WRITE(10). CDB: 2a
> 0 9 72 d8 13 0 0 1d 0
> Jan 29 19:03:18 host2 kernel: (da0:mpslsi0:0:0:0): CAM status: SCSI 
> Status Error
> Jan 29 19:03:18 host2 kernel: (da0:mpslsi0:0:0:0): SCSI status: Check 
> Condition
> Jan 29 19:03:18 host2 kernel: (da0:mpslsi0:0:0:0): SCSI sense:
> Deferred error: MEDIUM ERROR info:97ef13b asc:15,1 (Mechanical
> positioning error) actual retry count: 15
> Jan 29 19:04:00 host2 kernel: (da0:mpslsi0:0:0:0): READ(10). CDB: 28 0
> 9 75 b9 96 0 0 b 0
> Jan 29 19:04:00 host2 kernel: (da0:mpslsi0:0:0:0): CAM status: SCSI 
> Status Error
> Jan 29 19:04:00 host2 kernel: (da0:mpslsi0:0:0:0): SCSI status: Check 
> Condition
> Jan 29 19:04:00 host2 kernel: (da0:mpslsi0:0:0:0): SCSI sense: MEDIUM
> ERROR info:975b99b asc:15,1 (Mechanical positioning error) actual
> retry count: 15
> Jan 29 19:04:54 host2 kernel: mpslsi0: mpssas_scsiio_timeout checking
> sc 0xffffff80005ee000 cm 0xffffff800060d408
> Jan 29 19:04:54 host2 kernel: (da0:mpslsi0:0:0:0): READ(10). CDB: 28 0
> 2 e3 6a 1a 0 0 1 0 length 512 SMID 153 command timeout cm
> 0xffffff800060d408 ccb 0xffffff000a1ce800
> Jan 29 19:04:54 host2 kernel: mpslsi0: mpssas_alloc_tm freezing simq
> Jan 29 19:04:54 host2 kernel: mpslsi0: timedout cm 0xffffff800060d408
> allocated tm 0xffffff8000604718
> Jan 29 19:04:54 host2 kernel: (da0:mpslsi0:0:0:0): READ(10). CDB: 28 0
> 2 e3 6a 1a 0 0 1 0 length 512 SMID 153 completed timedout cm
> 0xffffff800060d408 ccb 0xffffff000a1ce800 during recovery ioc 8048
> scsi 0 state c xfer 0
> Jan 29 19:04:54 host2 kernel: (noperiph:mpslsi0:0:0:0): SMID 43 abort
> TaskMID 153 status 0x0 code 0x0 count 1
> Jan 29 19:04:54 host2 kernel: (noperiph:mpslsi0:0:0:0): SMID 43
> finished recovery after aborting TaskMID 153
> Jan 29 19:04:54 host2 kernel: mpslsi0: mpssas_free_tm releasing simq
> Jan 29 19:04:54 host2 kernel: mpslsi0: mpssas_scsiio_timeout checking
> sc 0xffffff80005ee000 cm 0xffffff800060d928
> Jan 29 19:04:54 host2 kernel: (da0:mpslsi0:0:0:0): READ(10). CDB: 28 0
> 9 41 c7 53 0 0 b 0 length 5632 SMID 157 command timeout cm
> 0xffffff800060d928 ccb 0xffffff0176ccb000
> Jan 29 19:04:54 host2 kernel: mpslsi0: mpssas_alloc_tm freezing simq
> Jan 29 19:04:54 host2 kernel: mpslsi0: timedout cm 0xffffff800060d928
> allocated tm 0xffffff8000604860
> Jan 29 19:04:54 host2 kernel: (da0:mpslsi0:0:0:0): READ(10). CDB: 28 0
> 9 41 c7 53 0 0 b 0 length 5632 SMID 157 completed timedout cm
> 0xffffff800060d928 ccb 0xffffff0176ccb000 during recovery ioc 8048
> scsi 0 state c xfer 0(noperiph:mpslsi0:0:0:0): SMID 44 abort TaskMID
> 157 status 0x0 code 0x0 count 1
> Jan 29 19:04:54 host2 kernel: (noperiph:mpslsi0:0:0:0): SMID 44
> finished recovery after aborting TaskMID 157
> Jan 29 19:04:54 host2 kernel: mpslsi0: mpssas_free_tm releasing simq
> Jan 29 19:04:54 host2 kernel: mpslsi0: mpssas_scsiio_timeout checking
> sc 0xffffff80005ee000 cm 0xffffff80006235a8
> Jan 29 19:04:54 host2 kernel: (da0:mpslsi0:0:0:0): WRITE(10). CDB: 2a
> 0 9 72 e0 d3 0 1 0 0 length 131072 SMID 429 command timeout cm
> 0xffffff80006235a8 ccb 0xffffff000c600000
> Jan 29 19:04:54 host2 kernel: mpslsi0: mpssas_alloc_tm freezing simq
> Jan 29 19:04:54 host2 kernel: mpslsi0: timedout cm 0xffffff80006235a8
> allocated tm 0xffffff80006049a8
> Jan 29 19:04:54 host2 kernel: mpslsi0: mpssas_scsiio_timeout checking
> sc 0xffffff80005ee000 cm 0xffffff800063e2e0
> Jan 29 19:04:54 host2 kernel: (da0:mpslsi0:0:0:0): WRITE(10). CDB: 2a
> 0 9 72 e1 d3 0 0 3 0 length 1536 SMID 764 command timeout cm
> 0xffffff800063e2e0 ccb 0xffffff01dfa19800
> Jan 29 19:04:54 host2 kernel: mpslsi0: queued timedout cm
> 0xffffff800063e2e0 for processing by tm 0xffffff80006049a8
> Jan 29 19:04:54 host2 kernel: (da0:mpslsi0:0:0:0): WRITE(10). CDB: 2a
> 0 9 72 e0 d3 0 1 0 0 length 131072 SMID 429 completed timedout cm
> 0xffffff80006235a8 ccb 0xffffff000c600000 during recovery ioc 8048
> scsi 0 state c xfe(noperiph:mpslsi0:0:0:0): SMID 45 abort TaskMID 429
> status 0x0 code 0x0 count 1
> Jan 29 19:04:54 host2 kernel: (noperiph:mpslsi0:0:0:0): SMID 45
> continuing recovery after aborting TaskMID 429
> Jan 29 19:04:54 host2 kernel: mpslsi0: mpssas_scsiio_timeout checking
> sc 0xffffff80005ee000 cm 0xffffff8000610890
> Jan 29 19:04:54 host2 kernel: (da0:mpslsi0:0:0:0): WRITE(10). CDB: 2a
> 0 9 72 e3 31 0 0 be 0 length 97280 SMID 194 command timeout cm
> 0xffffff8000610890 ccb 0xffffff013d0f3800
> Jan 29 19:04:54 host2 kernel: mpslsi0: queued timedout cm
> 0xffffff8000610890 for processing by tm 0xffffff80006049a8
> Jan 29 19:04:54 host2 kernel: (da0:mpslsi0:0:0:0): WRITE(10). CDB: 2a
> 0 9 72 e1 d3 0 0 3 0 length 1536 SMID 764 completed timedout cm
> 0xffffff800063e2e0 ccb 0xffffff01dfa19800 during recovery ioc 8048
> scsi 0 state c xfer (noperiph:mpslsi0:0:0:0): SMID 45 abort TaskMID
> 764 status 0x0 code 0x0 count 1
> _______________________________________________
> zfs-devel@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/zfs-devel
> To unsubscribe, send any mail to "zfs-devel-unsubscribe@freebsd.org"