FreeBSD Mail Archives

Date:      Wed, 26 Sep 2012 20:53:39 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        "Kenneth D. Merry" <ken@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, jdp@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, phk@FreeBSD.org
Subject:   Re: svn commit: r240822 - head/sys/geom
Message-ID:  <20120926185339.GA1402@garage.freebsd.pl>
In-Reply-To: <20120926172917.GA71268@nargothrond.kdm.org>
References:  <201209221241.q8MCfnhJ067937@svn.freebsd.org> <20120925233712.GA26920@nargothrond.kdm.org> <20120926072005.GH1391@garage.freebsd.pl> <20120926172917.GA71268@nargothrond.kdm.org>


--PEIAKu/WMn1b1Hv9
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Sep 26, 2012 at 11:29:17AM -0600, Kenneth D. Merry wrote:
> On Wed, Sep 26, 2012 at 09:20:06 +0200, Pawel Jakub Dawidek wrote:
> > On Tue, Sep 25, 2012 at 05:37:12PM -0600, Kenneth D. Merry wrote:
> > > On Sat, Sep 22, 2012 at 12:41:49 +0000, Pawel Jakub Dawidek wrote:
> > > > Log:
> > > >   Use the topology lock to protect list of providers while witherin=
g them.
> > > >   It is possible that provider is destroyed while we are iterating =
over the
> > > >   list.
> > [...]
> > > > --- head/sys/geom/geom_disk.c	Sat Sep 22 12:40:52 2012	(r240821)
> > > > +++ head/sys/geom/geom_disk.c	Sat Sep 22 12:41:49 2012	(r240822)
> > > > @@ -635,10 +635,13 @@ disk_gone(struct disk *dp)
> > > >  	struct g_geom *gp;
> > > >  	struct g_provider *pp;
> > > > =20
> > > > +	g_topology_lock();
> > > >  	gp =3D dp->d_geom;
> > > > -	if (gp !=3D NULL)
> > > > +	if (gp !=3D NULL) {
> > > >  		LIST_FOREACH(pp, &gp->provider, provider)
> > > >  			g_wither_provider(pp, ENXIO);
> > > > +	}
> > > > +	g_topology_unlock();
> > > >  }
> > [...]
> > > This breaks devices going away in CAM.
> > >=20
> > > When the da(4) driver calls disk_gone(), it is necessarily holding th=
e SIM
> > > lock, which is a regular MTX_DEF mutex.  The GEOM topology lock is an=
 sx
> > > lock, and of WITNESS blows up because of that:
> > [...]
> > > disk_gone() needs to be callable from an interrupt context.  So it ca=
nnot
> > > acquire the topology lock.
> >=20
> > I can reword what you said above a bit: disk_gone() modifies GEOM
> > topology, so it has to hold the topology lock, thus it cannot be called
> > from an interrupt context.
>=20
> Okay, but either way we need some way to inform GEOM that the device has
> gone away.
>=20
> > We might be able to change the topology lock to LIST_FOREACH_SAFE(), as
> > g_wither_provider() can only destroy current provider. This is because
> > CAM own the geom and nobody should be able to mess with its provider's
> > list, apart from CAM itself. So I'd need to know how CAM ensures that
> > two disk_gone() cannot be called for one geom. The answer might be that
> > those geoms have always only one provider, but then I cannot explain why
> > we have loop there. Maybe jdp@ will remember why.
>=20
> Well, disk_gone() will only be called once for each da(4) instance that
> goes away.  There is a check in cam_periph_invalidate()
> (sys/cam_cam_periph.c) that insures that the peripheral driver's
> invalidate routine only gets called once.
>=20
> I don't know whether there is more than one provider or not.  It may be
> that LIST_FOREACH_SAFE() is sufficient.  g_wither_provider() does set the
> G_PF_WITHER flag, and that wouldn't be protected by anything other than t=
he
> CAM SIM lock in this case.
>=20
> > I also read original commit message and I don't fully understand.
> >=20
> > It was 7 years ago so I'll quote entire commit message:
> >=20
> > 	Fix a bug that caused some /dev entries to continue to exist after
> > 	the underlying drive had been hot-unplugged from the system.  Here
> > 	is a specific example.  Filesystem code had opened /dev/da1s1e.
> > 	Subsequently, the drive was hot-unplugged.  This (correctly) caused
> > 	all of the associated /dev/da1* entries to be deleted.  When the
> > 	filesystem later realized that the drive was gone it closed the
> > 	device, reducing the write-access counts to 0 on the geom providers
> > 	for da1s1e, da1s1, and da1.  This caused geom to re-taste the
> > 	providers, resulting in the devices being created again.  When the
> > 	drive was hot-plugged back in, it resulted in duplicate /dev entries
> > 	for da1s1e, da1s1, and da1.
> >=20
> > 	This fix adds a new disk_gone() function which is called by CAM when a
> > 	drive goes away.  It orphans all of the providers associated with the
> > 	drive, setting an error condition of ENXIO in each one.  In addition,
> > 	we prevent a re-taste on last close for writing if an error condition
> > 	has been set in the provider.
> >=20
> > If disk is gone it should orphan its provider. This should cause
> > orphaning depended providers, including da1s1 and then da1s1e.
> > Orphaning sets provider's error and providers with the error set are not
> > retasted. This is from g_access():
> >=20
> > 		else if (pp->acw !=3D 0 && pp->acw =3D=3D -dcw && pp->error =3D=3D 0 =
&&
> > 		    !(pp->geom->flags & G_GEOM_WITHER))
> > 			g_post_event(g_new_provider_event, pp, M_WAITOK,
> > 			    pp, NULL);
> >=20
> > My question is: is disk_geom() really needed? Maybe there is some small
> > bug somewhere, but I think GEOM should handle all this without disk_gon=
e().
>=20
> CAM needs a three step process when a device goes away.  It is currently
> represented with disk_gone(), the d_gone() callback inside the disk class,
> and disk_destroy().
>=20
> Here is what CAM needs at each step:
>=20
> 1.  When a device goes away, we need a method to call from daoninvalidate=
()
>     (or any other peripheral driver invalidate routine) with these
>     properties:
>     - It tells GEOM that the device has gone away, and starts the process
>       of shutting down the device.  (i.e. withers/orphans the provider)
>     - It is callable from an interrupt context, with the SIM (MTX_DEF) lo=
ck
>       held, so it can't sleep.

Neither g_wither_provider() nor g_orphan_provider() require the topology
lock. They only acquire the event lock, but it is regular mutex, so this
is fine. Traversing geom's providers list looks like something that does
need the topology lock, but maybe traversing is not needed at all.
The reason for this change was a panic in iSCSI initiator where
disk_gone() was called and provider was destroyed before g_wither_geom()
returned.

> 2.  When GEOM has finished cleaning up the device (initiated in step 1),
>     it should call the d_gone() method (dadiskgonecb() in the da(4)
>     driver) to tell the CAM peripheral driver that the following is true:
>     - All I/O to the device has been completed.
>     - No more I/O will be sent to the device (e.g. via the strategy routi=
ne)
>     - No more open/close requests will be sent to the device.  (The final
>       close should have been sent to the peripheral driver at this point.)
>=20
>     CAM will release a reference on the device at that point, and in the
>     typical case (where that is the last reference) go to step 3.
>=20
> 3.  When CAM has finished cleaning up all of its state, it will tell GEOM
>     that it is okay to finalize the destruction of the device.  So it nee=
ds
>     a method to call with these properties:
>=20
>     - It is callable from an interrupt context, with the SIM (MTX_DEF) lo=
ck
>       held.  So it can't sleep.
>     - When it is called, the da(4) driver is indicating that it no longer
>       holds any references to that particular GEOM disk instance.
>=20
> So whether it is disk_gone() or something else, we do need a way to orphan
> the provider.  Otherwise, how does GEOM know that the disk is gone?

So maybe disk_destroy() should first orphan provider, which in turn will
set its error. If provider's error is set, all I/O requests will be
denied by GEOM by returning provider's error, so strategy method within
a driver won't be called.

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://tupytaj.pl

--PEIAKu/WMn1b1Hv9
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlBjTzMACgkQForvXbEpPzQ/rgCeL+BtP4vEpYJVOPDm4wBv+x79
lugAniapLHRZM7sK30CpeMv9kq2Q9YfZ
=0CZs
-----END PGP SIGNATURE-----

--PEIAKu/WMn1b1Hv9--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120926185339.GA1402>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation