Date: Wed, 26 Sep 2012 09:20:06 +0200 From: Pawel Jakub Dawidek <pjd@FreeBSD.org> To: "Kenneth D. Merry" <ken@FreeBSD.org> Cc: svn-src-head@FreeBSD.org, jdp@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, phk@FreeBSD.org Subject: Re: svn commit: r240822 - head/sys/geom Message-ID: <20120926072005.GH1391@garage.freebsd.pl> In-Reply-To: <20120925233712.GA26920@nargothrond.kdm.org> References: <201209221241.q8MCfnhJ067937@svn.freebsd.org> <20120925233712.GA26920@nargothrond.kdm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--RwGu8mu1E+uYXPWP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Sep 25, 2012 at 05:37:12PM -0600, Kenneth D. Merry wrote: > On Sat, Sep 22, 2012 at 12:41:49 +0000, Pawel Jakub Dawidek wrote: > > Log: > > Use the topology lock to protect list of providers while withering th= em. > > It is possible that provider is destroyed while we are iterating over= the > > list. [...] > > --- head/sys/geom/geom_disk.c Sat Sep 22 12:40:52 2012 (r240821) > > +++ head/sys/geom/geom_disk.c Sat Sep 22 12:41:49 2012 (r240822) > > @@ -635,10 +635,13 @@ disk_gone(struct disk *dp) > > struct g_geom *gp; > > struct g_provider *pp; > > =20 > > + g_topology_lock(); > > gp =3D dp->d_geom; > > - if (gp !=3D NULL) > > + if (gp !=3D NULL) { > > LIST_FOREACH(pp, &gp->provider, provider) > > g_wither_provider(pp, ENXIO); > > + } > > + g_topology_unlock(); > > } [...] > This breaks devices going away in CAM. >=20 > When the da(4) driver calls disk_gone(), it is necessarily holding the SIM > lock, which is a regular MTX_DEF mutex. The GEOM topology lock is an sx > lock, and of WITNESS blows up because of that: [...] > disk_gone() needs to be callable from an interrupt context. So it cannot > acquire the topology lock. I can reword what you said above a bit: disk_gone() modifies GEOM topology, so it has to hold the topology lock, thus it cannot be called =66rom an interrupt context. We might be able to change the topology lock to LIST_FOREACH_SAFE(), as g_wither_provider() can only destroy current provider. This is because CAM own the geom and nobody should be able to mess with its provider's list, apart from CAM itself. So I'd need to know how CAM ensures that two disk_gone() cannot be called for one geom. The answer might be that those geoms have always only one provider, but then I cannot explain why we have loop there. Maybe jdp@ will remember why. I also read original commit message and I don't fully understand. It was 7 years ago so I'll quote entire commit message: Fix a bug that caused some /dev entries to continue to exist after the underlying drive had been hot-unplugged from the system. Here is a specific example. Filesystem code had opened /dev/da1s1e. Subsequently, the drive was hot-unplugged. This (correctly) caused all of the associated /dev/da1* entries to be deleted. When the filesystem later realized that the drive was gone it closed the device, reducing the write-access counts to 0 on the geom providers for da1s1e, da1s1, and da1. This caused geom to re-taste the providers, resulting in the devices being created again. When the drive was hot-plugged back in, it resulted in duplicate /dev entries for da1s1e, da1s1, and da1. This fix adds a new disk_gone() function which is called by CAM when a drive goes away. It orphans all of the providers associated with the drive, setting an error condition of ENXIO in each one. In addition, we prevent a re-taste on last close for writing if an error condition has been set in the provider. If disk is gone it should orphan its provider. This should cause orphaning depended providers, including da1s1 and then da1s1e. Orphaning sets provider's error and providers with the error set are not retasted. This is from g_access(): else if (pp->acw !=3D 0 && pp->acw =3D=3D -dcw && pp->error =3D=3D 0 && !(pp->geom->flags & G_GEOM_WITHER)) g_post_event(g_new_provider_event, pp, M_WAITOK, pp, NULL); My question is: is disk_geom() really needed? Maybe there is some small bug somewhere, but I think GEOM should handle all this without disk_gone(). --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl --RwGu8mu1E+uYXPWP Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlBirKUACgkQForvXbEpPzS1hQCg0AwfF+cYiOLIEjd/ehDSxsNS Ge4AoJHk7+zvjZZkZ7uByTFQqcd4kAHx =zXue -----END PGP SIGNATURE----- --RwGu8mu1E+uYXPWP--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120926072005.GH1391>