From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 8 17:37:34 2004 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CF9D216A4CE for ; Wed, 8 Sep 2004 17:37:34 +0000 (GMT) Received: from gandalf.online.bg (gandalf.online.bg [217.75.128.9]) by mx1.FreeBSD.org (Postfix) with SMTP id 4572743D2D for ; Wed, 8 Sep 2004 17:37:06 +0000 (GMT) (envelope-from roam@ringlet.net) Received: (qmail 24120 invoked from network); 8 Sep 2004 17:34:19 -0000 Received: from unknown (HELO straylight.m.ringlet.net) (217.75.134.254) by gandalf.online.bg with SMTP; 8 Sep 2004 17:34:19 -0000 Received: (qmail 9290 invoked by uid 1000); 8 Sep 2004 17:36:43 -0000 Date: Wed, 8 Sep 2004 20:36:42 +0300 From: Peter Pentchev To: "M. Warner Losh" Message-ID: <20040908173642.GB1924@straylight.m.ringlet.net> Mail-Followup-To: "M. Warner Losh" , freebsd-hackers@freebsd.org References: <20040908150943.GA1924@straylight.m.ringlet.net> <20040908.111706.115908959.imp@bsdimp.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="NklN7DEeGtkPCoo3" Content-Disposition: inline In-Reply-To: <20040908.111706.115908959.imp@bsdimp.com> User-Agent: Mutt/1.5.6i cc: freebsd-hackers@freebsd.org Subject: Re: [PATCH] Fix USB panics X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Sep 2004 17:37:35 -0000 --NklN7DEeGtkPCoo3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Sep 08, 2004 at 11:17:06AM -0600, M. Warner Losh wrote: > [[ redirected to hackers only ]] Thanks for looking over this so quickly! > In message: <20040908150943.GA1924@straylight.m.ringlet.net> > Peter Pentchev writes: [snip] > : So here's a How to Panic a RELENG_5 Kernel In Five Easy Steps :) > :=20 > : 1. Make sure usbd is not running so it cannot load drivers on demand. > :=20 > : 2. kldload usb, and do not load *any* USB device drivers, not even uge= n. > :=20 > : 3. Plug a device, any device. > : 3a. usbd_probe_and_attach() looks for a specific driver, and fails. > : 3b. usbd_probe_and_attach() looks for a generic driver, and fails. > : 3c. usbd_probe_and_attach() leaves the newly-allocated device_t > : structure in place (and added to the parent bus), but removes any > : subdevs traces in the usbd_device_handle. >=20 > Leaving it in place is the right thing to do. It looks like the > second half of this isn't. Ahh... hold on to this thought, now. > : So.. what should be done about it? Basically, the problem is that > : usbd_probe_and_attach() leaves an orphaned device_t pointing to the > : usbd_device_handle that will be freed when the hardware device is > : physically unplugged. >=20 > I'd use different terms to describe it, but what you are saying is > essentially correct.=20 Yeah, well, this is the first time I dip my fingers into both the USB code and the busdma framework at all, so I'm probably saying a lot of things the wrong way :) > : The attached patch seems to solve the problem - and a couple of related > : others - at least for me. The #ifdef's are effectively #if 0's, I've > : just left them in to keep a perspective on the original code. The idea > : is to keep the device_t pointer in the usbd_device_handle, but take care > : to check if the device is actually attached before dereferencing it. >=20 > Hmmmm That's the third chunk of the patch, where I added the device_is_attached() check to usbd_fill_deviceinfo(). Until now, the device could never possibly have been NOT attached, since usbd_probe_and_attach() would never have left a non-NULL subdev in place. How's that for a double.. no, quadruple negative in the last sentence, now? [snip] > : Is this even the right direction? IMHO, this situation ought to be > : resolved one way or another before 5.3 hits the shelves, even if the > : solution has nothing to do with my proposed patch :) >=20 > Yes. I don't like parts of it, but they are the parts that you > yourself don't like either :-(. >=20 > : Index: dev/usb/uhub.c > : =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > : RCS file: /home/ncvs/src/sys/dev/usb/uhub.c,v > : retrieving revision 1.62 > : diff -u -r1.62 uhub.c > : --- dev/usb/uhub.c 15 Aug 2004 23:39:18 -0000 1.62 > : +++ dev/usb/uhub.c 8 Sep 2004 14:06:45 -0000 > : @@ -707,7 +707,9 @@ > : continue; > : for (i =3D 0; dev->subdevs[i]; i++) { > : if (dev->subdevs[i] =3D=3D child) { > : +#ifdef ROAM_SKIP_USB_DEVICE_LEAK > : dev->subdevs[i] =3D NULL; > : +#endif > : return; > : } > : } >=20 > I agree with your assessment that the above really makes the routine: >=20 > Static void > uhub_child_detached(device_t self, device_t child) > { > struct uhub_softc *sc =3D device_get_softc(self); > usbd_device_handle devhub =3D sc->sc_hub; >=20 > if (!devhub->hub) > /* should never happen; children are only created after init */ > panic("hub not fully initialised, but child deleted?"); > } >=20 > which looks right to me (although having the panic there seems overly > defensive). Yep, that's why I wondered if the whole function should not be removed. Still, it might be a good thing to leave it in, just in case - after all, this is not an overly-often executed path. > : Index: dev/usb/usb_subr.c > : =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > : RCS file: /home/ncvs/src/sys/dev/usb/usb_subr.c,v > : retrieving revision 1.69 > : diff -u -r1.69 usb_subr.c > : --- dev/usb/usb_subr.c 15 Aug 2004 23:39:18 -0000 1.69 > : +++ dev/usb/usb_subr.c 8 Sep 2004 14:05:51 -0000 > : @@ -1016,9 +1016,11 @@ > : if (dv !=3D NULL) { > : return (USBD_NORMAL_COMPLETION); > : } > : +#ifdef ROAM_SKIP_USB_DEVICE_LEAK > : tmpdv =3D dev->subdevs; > : dev->subdevs =3D 0; > : free(tmpdv, M_USB); > : +#endif > : =20 > : /* > : * The generic attach failed, but leave the device as it is. >=20 > I'm not sure I understand why the above fixes anything. Is it because > it makes the proper references go away? If so, it can likely be > deleted. I'll need to thread through the code to see if any of the > other instances of those 3 lines of code can go. Remember the thought I told you to hold on to? :) This is the part where usbd_probe_and_attach() removes the usbd_device_handle's reference to the device_t - and now usb_disconnect_port() thinks it's safe to remove the usbd_device_handle. However, the device_t itself still holds a reference to it in the ivars. Removing these particular three lines of code right here makes usbd_probe_and_attach() keep the subdevs ref, as it very well should. It's true that these same lines are used in two other places, but it's okay - it's in usbd_probe_and_attach() too, it's just garbage collection after the previous unsuccessful attempts to attach a driver. At least I think it's okay, since there might be a fun failure case if a driver tried to attach and created another device_t, then failed to attach and for some reason did not remove the device_t. This sounds suspiciously like the situation we're up against right here... it makes me wonder now, although until now I was convinced it was okay to free the subdevs in the previous failed attempts. > : @@ -1346,7 +1348,7 @@ > : di->udi_speed =3D dev->speed; > : =20 > : if (dev->subdevs !=3D NULL) { > : - for (i =3D 0; dev->subdevs[i] && > : + for (i =3D 0; dev->subdevs[i] && device_is_attached(dev->subdevs[i])= && > : i < USB_MAX_DEVNAMES; i++) { > : strncpy(di->udi_devnames[i], USBDEVPTRNAME(dev->subdevs[i]), > : USB_MAX_DEVNAMELEN); >=20 > OK. I understand this as well. It likely is a good change. Well, it is a no-op unless the three lines above are removed, since if usbd_probe_and_attach() fails, there won't be any non-null subdevs. G'luck, Peter --=20 Peter Pentchev roam@ringlet.net roam@cnsys.bg roam@FreeBSD.org PGP key: http://people.FreeBSD.org/~roam/roam.key.asc Key fingerprint FDBA FD79 C26F 3C51 C95E DF9E ED18 B68D 1619 4553 Hey, out there - is it *you* reading me, or is it someone else? --NklN7DEeGtkPCoo3 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (FreeBSD) iD8DBQFBP0Mq7Ri2jRYZRVMRApySAJ948KCQqXEpHpH8sGpnGR8gm/aSAgCgksyS FjkwSL+stWwvrCnCLv95F0Q= =/Hph -----END PGP SIGNATURE----- --NklN7DEeGtkPCoo3--