Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Oct 2009 11:36:19 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        Andrew Thompson <thompsa@FreeBSD.org>
Cc:        FreeBSD-Current <freebsd-current@freebsd.org>, Scott Long <scottl@FreeBSD.org>
Subject:   Re: CAM problem
Message-ID:  <4ADD7683.7040907@FreeBSD.org>
In-Reply-To: <mailpost.1255999338.6409497.5480.mailing.freebsd.current@FreeBSD.cs.nctu.edu.tw>
References:  <mailpost.1255999338.6409497.5480.mailing.freebsd.current@FreeBSD.cs.nctu.edu.tw>

next in thread | previous in thread | raw e-mail | index | archive | help
Andrew Thompson wrote:
> I have a cam problem that is noticeable with usb devices. It relates to
> the ordering of xpt_release_device() and the CAM_DEV_UNCONFIGURED flag
> when yanking a device that has stalled. This then causes a problem with
> the usb explore thread which will end up waiting on simfree forever,
> blocking any further usb attach/detach on the controller.
> 
> Hopefully my printfs can show the problem. I have replaced the pointers
> returned from xpt_alloc_device() with pretty names, <dev3> is the one in
> question.
> 
> <...unplug...>
> 
> ugen1.3: <KINGSTON> at usbus1 (disconnected)
> umass0: at uhub2, port 1, addr 3 (disconnected)
> umass_detach:
> usb_cam_action, device GONE
> usb_cam_action, device GONE
> usb_cam_action, device GONE
> xpt_find_bus: ref=6 -> 7
> usb_cam_action, device GONE
> usb_cam_action, device GONE

As I can see, you are returning CAM_TID_INVALID error here. There is no
special error handling for this error, comparing to CAM_SEL_TIMEOUT. If
you return CAM_SEL_TIMEOUT there, device will be killed immediately and
probably workaround this specific problem.

> xpt_release_device dev3 failed, ref=3 unconf=0
> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=7 -> 6
> (da0:umass-sim0:0:0:0): got CAM status 0x39
> (da0:umass-sim0:0:0:0): fatal error, failed to attach to device
> (da0:umass-sim0:0:0:0): lost device
> (da0:umass-sim0:0:0:0): removing device entry
> 
>  ^^^ USB disk had stalled on attach

This thing drops reference as periph driver detached itself, but device
is still treated as valid by XPT.

> xpt_release_device dev3 failed, ref=1 unconf=0
> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=6 -> 5
> xpt_release_device dev3 failed, ref=0 unconf=0
> 
>  ^^^ last reference to dev3 dropped

>From deallocation point of view, configured status handled the same as
one more reference...

> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=5 -> 4
> xpt_release_device dev2 OK 
> xpt_release_target: xpt_release_bus
> xpt_release_bus: ref=4 -> 3
> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=3 -> 2
> umass_cam_detach_sim: calling xpt_bus_deregister
> xpt_find_bus: ref=2 -> 3
> xpt_alloc_target: ref=3 -> 4
> xpt_alloc_device: device = dev4
> scsi_dev_async: set dev dev3 unconfigured
> 
>  ^^^ dev3 gets the CAM_DEV_UNCONFIGURED flag cleared here

... but removing configured status does not call deallocation, as
unreferencing does.

> xpt_bus_deregister: xpt_release_bus
> xpt_release_bus: ref=4 -> 3
> xpt_release_device dev4 OK 
> xpt_release_target: xpt_release_bus
> xpt_release_bus: ref=3 -> 2
> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=2 -> 1
> umass_cam_detach_sim:
> umass-sim0: waiting... ref = 1
> 
>  ^^^ wait on "simfree" forever.

I think correct solution will be to additionally increment reference
counter before clearing CAM_DEV_UNCONFIGURED and decrement it back after
setting CAM_DEV_UNCONFIGURED back. Check for CAM_DEV_UNCONFIGURED inside
xpt_release_device() then could be removed or turned into assertion.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4ADD7683.7040907>