From owner-freebsd-fs@FreeBSD.ORG Wed Sep 22 19:10:59 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B29A81065670 for ; Wed, 22 Sep 2010 19:10:59 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 5F1378FC1E for ; Wed, 22 Sep 2010 19:10:56 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 9BABE45CD9; Wed, 22 Sep 2010 21:10:54 +0200 (CEST) Received: from localhost (chello089077043238.chello.pl [89.77.43.238]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 7F49B45C9C; Wed, 22 Sep 2010 21:10:49 +0200 (CEST) Date: Wed, 22 Sep 2010 21:10:29 +0200 From: Pawel Jakub Dawidek To: Mikolaj Golub Message-ID: <20100922191028.GD2895@garage.freebsd.pl> References: <868w2yaweh.fsf@kopusha.home.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="zbGR4y+acU1DwHSi" Content-Disposition: inline In-Reply-To: <868w2yaweh.fsf@kopusha.home.net> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT amd64 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: freebsd-fs@freebsd.org Subject: Re: hastd: parent got stuck in waitpid() X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Sep 2010 19:10:59 -0000 --zbGR4y+acU1DwHSi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Sep 19, 2010 at 12:57:10PM +0300, Mikolaj Golub wrote: > Hi, >=20 > When trying to produce the scenario described in another thread (hastd: p= ossible > race when a worker is starting) I stepped on another issue. I was running= the > following script: >=20 > #!/bin/sh >=20 > for i in `jot 1000`; do > hastctl status storage > /dev/null > done & > for i in `jot 1000`; do > hastctl role init storage > hastctl role primary storage > done >=20 > Parent hastd got stuck but that time when changing the role to init and > terminating the worker: in waitpid() after sending kill() to the worker. = It > looked like the signal was lost. I don't have a clue how this might happe= n but > it is rather easy reproducible in my environment with the script above. Could you try r213009? The problem was (I believe) that signal mask was configured after we forked, so there was a window where signal could have been delivered, but before we could handled it properly. Now signal mask is configured in the main process and the primary process inherits it, so there is no window anymore. Your test also triggered different bug for me - a descriptor leak, which is now also fixed. Thanks for the reports! --=20 Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --zbGR4y+acU1DwHSi Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkyaVKQACgkQForvXbEpPzTnjQCgvvMwbElJ+bT4YbnEN5iYYL29 4IMAoLkaDA6Zsglnp1g94BMctUOlTXdI =kQsd -----END PGP SIGNATURE----- --zbGR4y+acU1DwHSi--