From owner-freebsd-fs@FreeBSD.ORG Fri Mar 21 10:04:33 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DC62CEC0; Fri, 21 Mar 2014 10:04:33 +0000 (UTC) Received: from mail.dawidek.net (garage.dawidek.net [91.121.88.72]) by mx1.freebsd.org (Postfix) with ESMTP id A0E172B0; Fri, 21 Mar 2014 10:04:32 +0000 (UTC) Received: from localhost (58.wheelsystems.com [83.12.187.58]) by mail.dawidek.net (Postfix) with ESMTPSA id 674F31E6; Fri, 21 Mar 2014 11:04:24 +0100 (CET) Date: Fri, 21 Mar 2014 11:06:33 +0100 From: Pawel Jakub Dawidek To: Andriy Gapon Subject: Re: g_mirror_access() dropping geom topology_lock [Was: Kernel crash trying to import a ZFS pool with log device] Message-ID: <20140321100633.GA1656@garage.freebsd.pl> References: <532B5A0C.1010008@incore.de> <532C085D.3020201@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="EVF5PPMfhYS0aIcm" Content-Disposition: inline In-Reply-To: <532C085D.3020201@FreeBSD.org> X-OS: FreeBSD 11.0-CURRENT amd64 User-Agent: Mutt/1.5.22 (2013-10-16) Cc: freebsd-fs@FreeBSD.org, Andreas Longwitz , freebsd-geom@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Mar 2014 10:04:33 -0000 --EVF5PPMfhYS0aIcm Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Mar 21, 2014 at 11:37:33AM +0200, Andriy Gapon wrote: > I see two issues here. > First, the ZFS tasting code could be made more robust. If it never tried= to > re-use the consumer and always created a new one, then most likely this c= rash > could be avoided. But there is no bug in the code. The code is correct = and it > it uses GEOM topology lock to avoid any concurrency issues. This is the problem, in my opinion. GEOM classes have to have the ability to drop the topology lock in the access method. Without such ability any more complex GEOM class cannot work or will require tons of hacks to do their job. Not only my GEOM classes do that - GRAID does the same. I'd much prefer for us to accept the fact that GEOM classes are allowed to drop the topology lock in their access methods and fix it in ZFS. > But GEOM mirror code breaks a contract on which the ZFS code relies. > g_access() must be called with the topology lock hold. > I extend this requirement to a requirement that access method of any GEOM > provider must operate under the topology lock and must never drop it. > In other words, if a caller must acquire g_topology_lock before calling > g_access, then in return it must have a guarantee that the GEOM topology = stays > unchanged across the call to g_access(). > g_mirror_access() breaks the above contract. >=20 > So, the code in vdev_geom_attach() obtains g_topology_lock, then it finds= an > existing valid consumer and calls g_access() on it. It reasonably expect= s that > the consumer remains valid, but because g_mirror_access() drops and requi= res the > topology lock, there is a chance that the topology can change and the con= sumer > may become invalid. >=20 > I am not very familiar with gmirror code, so I am not sure how to fix the > problem from that end. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com --EVF5PPMfhYS0aIcm Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iEYEARECAAYFAlMsDykACgkQForvXbEpPzT2/wCgikwhKj4jipMzxnUyD8EvW0Ag vWIAoK8QSmWe+fx5e7x99qfP3JqmlGCL =JY2h -----END PGP SIGNATURE----- --EVF5PPMfhYS0aIcm--