From owner-freebsd-ports@freebsd.org Fri Oct 23 16:41:46 2015 Return-Path: Delivered-To: freebsd-ports@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D151EA1C76B for ; Fri, 23 Oct 2015 16:41:46 +0000 (UTC) (envelope-from rcarter@pinyon.org) Received: from quine.pinyon.org (quine.pinyon.org [65.101.5.249]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AFCFFE6 for ; Fri, 23 Oct 2015 16:41:46 +0000 (UTC) (envelope-from rcarter@pinyon.org) Received: by quine.pinyon.org (Postfix, from userid 122) id E968D160329; Fri, 23 Oct 2015 09:34:15 -0700 (MST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on quine.pinyon.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 Received: from feyerabend.n1.pinyon.org (acipenser.esturion.net [65.101.5.252]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by quine.pinyon.org (Postfix) with ESMTPSA id CF9451602EA for ; Fri, 23 Oct 2015 09:34:13 -0700 (MST) To: FreeBSD Ports ML From: "Russell L. Carter" Subject: hung poudriere bulk recovery Message-ID: <562A6185.5000305@pinyon.org> Date: Fri, 23 Oct 2015 09:34:13 -0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-ports@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Porting software to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Oct 2015 16:41:46 -0000 Greetings, Recently my nightly cron poudriere builds have been occasionally hanging. For instance, here's last night's, with apparently no progress for over 10 hours: root@terpsichore> poudriere status SET PORTS JAIL BUILD STATUS QUEUE BUILT FAIL SKIP IGNORE REMAIN TIME LOGS - default 10-stable-amd64 2015-10-22_22h30m08s parallel_build 488 34 0 0 0 454 10:45:56 /ssd1/poudriere/data/logs/bulk/10-stable-amd64-default/2015-10-22_22h30m08s root@terpsichore> htop now shows no significant activity for the specified 3 builders: root@terpsichore> ps xa | grep poud 72482 - Is 0:00.01 /bin/sh /root/poudriere/run-poudriere-bulk 73202 - S 0:04.24 sh -e /usr/local/share/poudriere/bulk.sh -f /root/poudriere/ports -j 10-stable-amd64 73347 - S 1:55.38 sh -e /usr/local/share/poudriere/bulk.sh -f /root/poudriere/ports -j 10-stable-amd64 73352 - I 0:00.08 sh -e /usr/local/share/poudriere/bulk.sh -f /root/poudriere/ports -j 10-stable-amd64 6119 1 S+ 0:00.00 grep poud root@terpsichore> If I reboot, so that the tmp zfs filesystems are unmounted, and manually rerun the exact same script as the previous cron'd, hung instance, poudriere has (so far) run to completion. I'm not sure how to debug this, but in the interim, I'm very curious how I can stop the hung bulk run, and either restart it, or clean up the various mounted zfs filesystems and manually restart from the beginning w/o rebooting. Studying the man page, it's not clear at all the Right Way to do this, so any pointers here would be appreciated. I'm leaving the system untouched for now so that I can try out any suggestions for cleanup and restart. Thanks, Russell