From owner-freebsd-ports@freebsd.org Tue May 10 15:05:31 2016 Return-Path: Delivered-To: freebsd-ports@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 656E1B36796 for ; Tue, 10 May 2016 15:05:31 +0000 (UTC) (envelope-from mad@madpilot.net) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 3D9D716C8 for ; Tue, 10 May 2016 15:05:31 +0000 (UTC) (envelope-from mad@madpilot.net) Received: by mailman.ysv.freebsd.org (Postfix) id 3CB71B3678A; Tue, 10 May 2016 15:05:31 +0000 (UTC) Delivered-To: ports@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3C32BB36789 for ; Tue, 10 May 2016 15:05:31 +0000 (UTC) (envelope-from mad@madpilot.net) Received: from mail.madpilot.net (grunt.madpilot.net [78.47.145.38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DC2C016BB for ; Tue, 10 May 2016 15:05:30 +0000 (UTC) (envelope-from mad@madpilot.net) Received: from mail (mail [192.168.254.3]) by mail.madpilot.net (Postfix) with ESMTP id 3r42cR658czZxV; Tue, 10 May 2016 17:05:27 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=madpilot.net; h= content-transfer-encoding:content-type:content-type:in-reply-to :mime-version:user-agent:date:date:message-id:from:from :references:subject:subject:received:received; s=mail; t= 1462892722; x=1464707123; bh=bB1u3cxWLKfQFnVHKBNu5wx2/mf5ONEG21S +q7B5PZM=; b=N5qaY6Tnmc6DLfxjcbEBMT4F3DA9ERyfVj+T4qu3ua+uKF9vHLA RTVmYh9f/e2lGCsUrNHcGjzlUA2PCYOAO5ZjnP+Od4bxppLCsrEy6JpN45CUV5PN lNehD0N2AlN9fraJoOaZcnMkyl1uRrpU4oZcBM7i7ulMV2vlNK+/Z23I= Received: from mail.madpilot.net ([192.168.254.3]) by mail (mail.madpilot.net [192.168.254.3]) (amavisd-new, port 10024) with ESMTP id 1k2_ZZMkOL00; Tue, 10 May 2016 17:05:22 +0200 (CEST) Received: from marvin.madpilot.net (micro.madpilot.net [88.149.173.206]) by mail.madpilot.net (Postfix) with ESMTPSA; Tue, 10 May 2016 17:05:22 +0200 (CEST) Subject: Re: Poudriere question To: RW , ports@freebsd.org References: <3557cbcd-3992-5db5-c5dc-7912508e1956@madpilot.net> <20160510123517.2107653b@gumby.homeunix.com> <20160510152540.31793420@gumby.homeunix.com> From: Guido Falsi Message-ID: Date: Tue, 10 May 2016 17:05:20 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: <20160510152540.31793420@gumby.homeunix.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-ports@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Porting software to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2016 15:05:31 -0000 On 05/10/16 16:25, RW via freebsd-ports wrote: > On Tue, 10 May 2016 14:35:35 +0200 > Guido Falsi wrote: > >> On 05/10/16 13:35, RW via freebsd-ports wrote: >>> On Mon, 9 May 2016 20:15:12 +0200 >>> Guido Falsi wrote: >>> >>>> On 05/09/16 19:52, Fernando ApesteguĂ­a wrote: >>>>> Hi all, >>>>> >>>>> Is it safe to use different invocations of poudriere concurrently >>>>> for different jails but using the same ports collection? >>>>> >>>> >>>> Yes it is, or at least should be. >>>> >>>> The ports trees are mounted read only in the jails, the wrkdir is >>>> defined at a different path. >>> >>> What about the distfiles directory? >>> >>> Having two "make checksums" running on the same file used to work >>> fairly well, but not any more because the target now deletes an >>> incomplete file rather than trying to resume it. >>> >>> This wont damage packages, but it can cause two "make checksums" to >>> get locked in a cycle of deleting each other's files and end w >>> one getting a failed checksum. >> >> Yes it happens, I even have used the same disfiles over NFS with more >> than one machine/poudriere accessing it. >> >> The various instances do overwrite each other and checksums do fail >> but usually in the end one of them "wins" and the correct file ends >> up being completed, with other instances reading that one. I agree >> this happens just by chance and not due to good design. > > Only the last process will terminate with a complete file and without > error, when another process runs out of retries, the file with the > directory entry is a download in progress which will fail the checksum. > > If it commonly ends-up working in poudriere that's probably a property > of how poudriere orders things. But you still have the problem of > wasted time and bandwidth. This problem is most likely with large > distfiles and there's at least one that's 1 GB. As I said, yes this ends up working by chance most of the time, and not without the problems you note. My comment was just stating the situation, I don't have a solution, but if you have an idea you can propose patches to poudriere. Sharing distfiles directory between processes is anyway "racy" in itself. Any way to "fix" this that comes to my mind would require adding special knowledge about the distfiles directory working in the poudriere process or the jails, which beats some design principles behind poudriere. > > > The way this used to work is that the second process would try to > resume the download which presumably involved getting a lock on the > file. For smaller files it would just work. Worst case was that the > second process would fail after a timeout. It all depends on what you are trying to obtain/doing. In my case I had at most 3-4 simultaneous accesses and since it "mostly worked" I never investigated it more. If you need high concurrency you need to work out some other solution. The distfile cache system has never been designed with concurrency in mind. One possible solution is not using a distfile cache for the jails making each jail have it's own and leverage MASTER_SITE_OVERRIDE to point to a local server which acts as a cache. Then one needs a way to sync things back to that machine after a successful download. Poudriere hooks come to my mind, but there isn't one for "post-fetch". > > I think the change came in to delete possible re-rolled distfiles > automatically (a relatively minor problem), but in the process it > created this problem and also broke resuming downloads. > > I don't see the reason for checking and deleting the file before > attempting to resume it. Never seen poudriere remove distfiles, nor the ports tree do that, what change are you referring to? -- Guido Falsi