Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 May 2016 17:05:20 +0200
From:      Guido Falsi <mad@madpilot.net>
To:        RW <rwmaillists@googlemail.com>, ports@freebsd.org
Subject:   Re: Poudriere question
Message-ID:  <ea568927-b319-63e6-a804-78aced1e3495@madpilot.net>
In-Reply-To: <20160510152540.31793420@gumby.homeunix.com>
References:  <CAGwOe2Y7HjkK_QxocycmFcKzCUBAVU-87CWqOAzp6ZMUaJMbkA@mail.gmail.com> <3557cbcd-3992-5db5-c5dc-7912508e1956@madpilot.net> <20160510123517.2107653b@gumby.homeunix.com> <b2db2a55-49ae-c4b2-ec10-5ba91b1d5056@madpilot.net> <20160510152540.31793420@gumby.homeunix.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 05/10/16 16:25, RW via freebsd-ports wrote:
> On Tue, 10 May 2016 14:35:35 +0200
> Guido Falsi wrote:
> 
>> On 05/10/16 13:35, RW via freebsd-ports wrote:
>>> On Mon, 9 May 2016 20:15:12 +0200
>>> Guido Falsi wrote:
>>>   
>>>> On 05/09/16 19:52, Fernando ApesteguĂ­a wrote:  
>>>>> Hi all,
>>>>>
>>>>> Is it safe to use different invocations of poudriere concurrently
>>>>> for different jails but using the same ports collection?
>>>>>     
>>>>
>>>> Yes it is, or at least should be.
>>>>
>>>> The ports trees are mounted read only in the jails, the wrkdir is
>>>> defined at a different path.  
>>>
>>> What about the distfiles directory? 
>>>
>>> Having two "make checksums" running on the same file used to work
>>> fairly well, but not any more because the target now deletes an
>>> incomplete file rather than trying to resume it.
>>>
>>> This wont damage packages, but it can cause two "make checksums" to
>>> get locked in a cycle of deleting each other's files and end w
>>> one getting a failed checksum.   
>>
>> Yes it happens, I even have used the same disfiles over NFS with more
>> than one machine/poudriere accessing it.
>>
>> The various instances do overwrite each other and checksums do fail
>> but usually in the end one of them "wins" and the correct file ends
>> up being completed, with other instances reading that one. I agree
>> this happens just by chance and not due to good design.
> 
> Only the last process will terminate with a complete file and without
> error, when another process runs out of retries, the file with the
> directory entry is a download in progress which will fail the checksum.
> 
> If it commonly ends-up working in poudriere that's probably a property
> of how  poudriere orders things. But you still have the problem of
> wasted time and bandwidth. This problem is most likely with large
> distfiles and there's at least one that's 1 GB.

As I said, yes this ends up working by chance most of the time, and not
without the problems you note.

My comment was just stating the situation, I don't have a solution,  but
if you have an idea you can propose patches to poudriere.

Sharing distfiles directory between processes is anyway "racy" in itself.

Any way to "fix" this that comes to my mind would require adding special
knowledge about the distfiles directory working in the poudriere process
or the jails, which beats some design principles behind poudriere.

> 
> 
> The way this used to work is that the second process would try to
> resume the download which presumably  involved getting a lock on the
> file. For smaller files it would just work. Worst case was that the
> second process would fail after a timeout.

It all depends on what you are trying to obtain/doing. In my case I had
at most 3-4 simultaneous accesses and since it "mostly worked" I never
investigated it more.

If you need high concurrency you need to work out some other solution.
The distfile cache system has never been designed with concurrency in mind.

One possible solution is not using a distfile cache for the jails making
each jail have it's own and leverage MASTER_SITE_OVERRIDE to point to a
local server which acts as a cache. Then one needs a way to sync things
back to that machine after a successful download. Poudriere hooks come
to my mind, but there isn't one for "post-fetch".

> 
> I think the change came in to delete possible re-rolled distfiles
> automatically (a relatively minor problem), but in the process it
> created this problem and also broke resuming downloads. 
> 
> I don't see the reason for checking and deleting the file before
> attempting to resume it.

Never seen poudriere remove distfiles, nor the ports tree do that, what
change are you referring to?

-- 
Guido Falsi <mad@madpilot.net>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ea568927-b319-63e6-a804-78aced1e3495>