From owner-freebsd-current@freebsd.org Tue Aug 7 02:29:52 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 73C2C106F66F for ; Tue, 7 Aug 2018 02:29:52 +0000 (UTC) (envelope-from lwhsu.freebsd@gmail.com) Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 07E6A76A4B; Tue, 7 Aug 2018 02:29:51 +0000 (UTC) (envelope-from lwhsu.freebsd@gmail.com) Received: by mail-wr1-f44.google.com with SMTP id g6-v6so14195329wrp.0; Mon, 06 Aug 2018 19:29:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yoDmuVtHu356PLRRjayFbndsrdfUU8PS91hU/5/1BrU=; b=nbeHAJhVqHmJYd5jXvopVolcUG+QUKxzomTXKOV0SSLeoYaweHELbYO6KS17AJ9Grl QOgCrDd68bGvrgFnyxcTzz0ZkELRrAtEhUQ0Xi/kkWosx27/HxVvDO1184sQo91jObGQ mGaJhd9wS5jljWlWZgJCW8xTB7dmaPwdR19Xbmu8tyw120T34dBvr9hwBbN9tPZBds05 u6htIh1LCoU2Hu3C581WzwUbcDi7XSzBzpiyiExkrMr2Y0Qmw+ZLS5GGvLNORF5tUSps W4KzNoxBU2lXeEXEmIdx2YRdfdywRWsxNoZEy0HpuD3X1QdSr9sjVGDxpWn0lo+Rr5Kl RZ3g== X-Gm-Message-State: AOUpUlE36HYZqhoWzbePI7ODGvLAtdGAElbnK4x9Lftm4J5pl69SNYmd 1aaGd+W8iUrtIdry16/mcB1DeM39st2pZO254sd4orZx0uY= X-Google-Smtp-Source: AAOMgpegEdMQiMiGEP9FW4vuoqQN0Ci6hYz5ekYP9e6c0SPXIs2Z8xRjWuUWOaXNXfyAsaR4S3lqTFz9yVmuARiuGJ0= X-Received: by 2002:adf:ed41:: with SMTP id u1-v6mr10897254wro.262.1533608984920; Mon, 06 Aug 2018 19:29:44 -0700 (PDT) MIME-Version: 1.0 References: <74EAD684-0E0B-453A-B746-156777CF604A@yahoo.com> <1884103f-d1fb-aca6-2edd-062e11d05617@FreeBSD.org> <33a43aac-231f-6158-1de4-f5dbfaf195df@FreeBSD.org> <29F7FD25-147A-4B87-AC96-23CB3B1C38C7@yahoo.com> In-Reply-To: <29F7FD25-147A-4B87-AC96-23CB3B1C38C7@yahoo.com> From: Li-Wen Hsu Date: Tue, 7 Aug 2018 03:29:33 +0100 Message-ID: Subject: Re: A head buildworld race visible in the ci.freebsd.org build history To: Mark Millard Cc: Ed Maste , Bryan Drewery , FreeBSD Current , Alexander Motin Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Aug 2018 02:29:52 -0000 On Thu, Jun 21, 2018 at 10:49 PM Mark Millard wrote: > Has the range r328278 < PROBLEM_START <= r330304 been narrowed down > some more? > > (I'm just curious were the problem started.) After several rounds of binary search, I found it might have something todo with r329625. The only thing I think this commit related to the situation we met is it touched the code for doing unmount. But I cannot confirm if it is the cause. It is a bit tricky to reproduce. I will try to keep it concise. We do builds for head in a jail (11.2-RELEASE) on a -CURRENT host. The jail is on a dedicated zfs. And there is a daemon doing jail/zfs cleanup running outside of the jail. In some edge cases, that cleanup daemon wants to destroy the zfs of the jail in which a build is still running. If that happens, with an earlier -CURRENT, it should just get "cannot unmount '/jenkins/jails/test-ranlib': Device busy" and nothing serious will happen. Recently, although it still didn't destroy the busy zfs, it started causing build error out with "ranlib: fatal: Failed to open 'libXXX.a'" To reproduce this, create a zfs and use that as the root of a jail, run this build script under /usr/src inside the jail: https://gist.github.com/lwhsu/ae3b8b1f0c856837f93984ab2493f629#file-build-sh Run this cleanup script on the host: https://gist.github.com/lwhsu/ae3b8b1f0c856837f93984ab2493f629#file-clean-test-ranlib-sh (need to modify the zfs path) I use powerpcspe as TARGET_ARCH here because it takes a shorter time in one iteration. There should be nothing related to the architectures. I am not very sure about what is the next step, maybe modifying ranlib and log more what it gets "fatal: Failed to open 'libxxx.a'" Any good idea about debugging this? Li-wen -- Li-Wen Hsu https://lwhsu.org