Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Jan 2022 15:46:39 +0100
From:      Jan Bramkamp <crest@rlwinm.de>
To:        ports@freebsd.org, Baptiste Daroussin <bapt@FreeBSD.org>
Subject:   Modular fetch design proposal: Was: [HEADSUP] Deprecation of the ftp support in pkg
Message-ID:  <dd853a98-b2af-4813-1f22-0c99598238d8@rlwinm.de>
In-Reply-To: <c36969e9fdd1772788562a06f4d53189@bsdforge.com>
References:  <20220120142519.a5juoe75oppmnyby@aniel.nours.eu> <f1aca07d3cedd30b9a1df6624e950ffb@bsdforge.com> <e10f85c4-ed28-4475-bcbf-d4e572a6b954@FreeBSD.org> <d284c4d5d415fc17d3d7fbed354ddc77@bsdforge.com> <c93d717c-a62e-44ab-b5bf-f109810d65c4@FreeBSD.org> <a517d06c2faeed9883d5da787e4307ed@bsdforge.com> <DEF3FF17-5F38-4E1E-A55C-7E7472826AB9@punkt.de> <c36969e9fdd1772788562a06f4d53189@bsdforge.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 24.01.22 09:12, Chris wrote:
> On 2022-01-23 10:19, Patrick M. Hausen wrote:
>> Hi all,
>>
>> I did not really have an opinion on this, since we never used FTP,
>> but I was a bit surprised by the suggestion to use SSH instead.
>>
>> It never occurred to us that anything but HTTP(S) was possible.
>> We simply run Nginx in a jail serving the packages that Poudriere
>> produces for us. Setup time/effort: 5 minutes.
>>
>> Now after this comment:
>>
>>> Am 22.01.2022 um 09:35 schrieb Chris <portmaster@bsdforge.com>:
>>> I find it's less "housekeeping" to use ftp(1) setup through inetd(8) 
>>> for pkg repos, than
>>> via ssh.
>>
>> I understand the appeal of FTP.
>> Maybe this discussion is focusing on the wrong topic. Perhaps
>> we should consider including a light weight way to serve HTTP(S)
>> in base? Like Lighttpd, which as far as I know comes with a BSD
>> 3-clause equivalent license.
>>
>> But then the general tendency has been to remove network services
>> from base rather than introduce them. Like e.g. BIND.
>>
>> So I really have no idea what the general opinion is, just wanted
>> to throw in that IMHO HTTPS is the best protocol to the task and
>> if some way to serve that could be included in base, I for one would
>> appreciate that.
>>
>> OTOH Chris, what's keeping you from installing a web server just
>> serving static files?
> Different environments/ different requirements. But habit as much as 
> anything else.
> Ftp is trivial, has always been available. So I never even need to 
> think about it.
> I perform mass installs/upgrades in large networks. There is no 
> overhead using ftp
> either through a one-start | inetd. The clients are all started/used 
> at will.
> It seems to me that removing features also removes value. IMHO the 
> gain from the
> removal of transports as trivial as ftp(1) bring little to the table 
> for all
> concerned. But that's just me. :-)

Have you ever looked into a FTP protocol parser and what's required to 
get different FTP configurations through the NAT infested networks of 
today? FTP is an ugly protocol from the beginning of time that should 
have been put down decades ago. Even without pipelining HTTP saves 
several network round trips and poudriere already generates HTML and 
JSON status updates during builds as read only web ui.


This thread has shown that users have deployed complex, fragile 
workarounds the limited protocol selection offered by pkg. I recommend 
adding a clean and official extension interface spawning fetch helper 
processes from a well known location outside of $PATH derived from the 
URI schema (e.g. ${PREFIX}/libexec/pkg/fetch-${SCHEMA}). To keep helpers 
simple and small they would be started in an execution environment 
(working directory, environment variables, minimal set of inherited file 
descriptors) to be prepared by pkg expecting the repository URI as first 
(and only?) argument. Reading a stream of pairs of file name (e.g. the 
package hash stored in the repository) and relative path per line to 
fetch from standard input into the inherited working directory allowing 
users to add their own transport helpers similar to git.

To support progress updates and allow pkg to start the installation of 
fetch packages as soon as possible helpers could write lines with 
"${BYTES_FETCHED} ${BYTES_TOTAL} ${FILE}" to standard output 
periodically. A (permanent) transfer failure could be encoded by a 
negative $BYTES_FETCHED and a successfully completed transfer as 
$BYTES_FETCHED == $BYTES_TOTAL. If the helper doesn't know the file size 
it should be allowed to use negative $BYTES_TOTAL values in all but the 
last progress update (per fetched file). All transfers not reported as 
successfully completed or permanently failed are implicitly confirmed by 
exiting with EX_OK. Other exit codes implicitly fail all unconfirmed 
transfers. Pkg should clean up the working directory after the the 
helper has exited to delete partially transferred files (and anything 
else the helper may have left taking care not to follow symlinks). Pkg 
should apply resource limits and drop privileges (when running as root) 
before exec()ing into the helper. Well written helpers can use capsicum 
to provide further defense in depth.

The package repository already contains the the expected package sizes. 
As an optimization for dealing with out of sync mirrors the known file 
sizes can be matched against positive file sizes reported by helpers to 
fail quickly.

Refactoring all supported protocols to use this interface would reduce 
the complexity of pkg itself.

This design can be further extended with more features (and potential 
for bugs) until we end up with something similar to the git annex 
external special remote protocol 
(https://git-annex.branchable.com/design/external_special_remote_protocol/) 
if there are enough relevant use cases justifying the additional 
complexity in pkg and its file transfer helpers.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?dd853a98-b2af-4813-1f22-0c99598238d8>