From owner-freebsd-git@freebsd.org Wed Dec 2 16:08:12 2020 Return-Path: Delivered-To: freebsd-git@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id C785C47ECA7 for ; Wed, 2 Dec 2020 16:08:12 +0000 (UTC) (envelope-from marcnarc@gmail.com) Received: from mailman.nyi.freebsd.org (mailman.nyi.freebsd.org [IPv6:2610:1c1:1:606c::50:13]) by mx1.freebsd.org (Postfix) with ESMTP id 4CmP5S4GZYz3DxQ for ; Wed, 2 Dec 2020 16:08:12 +0000 (UTC) (envelope-from marcnarc@gmail.com) Received: by mailman.nyi.freebsd.org (Postfix) id 9276547EC0F; Wed, 2 Dec 2020 16:08:12 +0000 (UTC) Delivered-To: git@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 923B447EAC0 for ; Wed, 2 Dec 2020 16:08:12 +0000 (UTC) (envelope-from marcnarc@gmail.com) Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com [IPv6:2607:f8b0:4864:20::732]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CmP5S3TYlz3Dcw; Wed, 2 Dec 2020 16:08:12 +0000 (UTC) (envelope-from marcnarc@gmail.com) Received: by mail-qk1-x732.google.com with SMTP id v143so1693392qkb.2; Wed, 02 Dec 2020 08:08:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=sOR9ZOJmTLn00I85aznevMdo/0B/Ib6KB/ypYkPGgm8=; b=cPRyMoWsQPW+O+IFC8wc5Me/xPhZ3KbonFuuc9ZMaL0GM0uMWRxXH9464U3HyXZ63b EOoEe0tiFQ17S9VshoxM04AX6R0gTofmnorYmpthZELl2qCWFWzViFJEESQPIKT201qJ +PgD7qVwoZmcX14qfaiEH1k14ubgWNaSi/+nKe/4EJAptFA2KK0h+outaiqYHyEeiek9 m2zSc5KDPsat8sCBYB6P7O9O9dNhBztNElxfkrZkHPjuaVngWgtYpFKlf6rtNrL3OG5v uMsR5bYmPMKQmjDWuyyiqjsoBqZ9AFHLxzaaJt39MWiYdF4zu39M6qrNfjkGyOlaudjn jLxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=sOR9ZOJmTLn00I85aznevMdo/0B/Ib6KB/ypYkPGgm8=; b=t115pDaWt1YLZQBTRYnGOR8MzIYqaa2rXzIRQm7AFwT0TvC5YzPH4T/9tk7ML0/I+f sIdu0r9uWO/JU5OylbAqTo+PFjgMzZ6BRU40Sbo9MM6x21ydevwDZENINjjMyMS/eHmy 4p7kOf4B4EgrdoVTsqcH4ClIgWPslSVxqfv9MNCV8mICdewSc/34LeVVEsCQgquElM1P K2K2GKSIdSibTITnHammJvQF/80//Zb+TQXdfTuiR5eEq4L9eecgGxIavKHLUDDlSBhd jWw99FO6rhjARtLnNVFwSsGKQ7q1Tpg4nGM5FsPPn3aHB3ctQUlKXrN4nz/K31SlCbx8 CmYg== X-Gm-Message-State: AOAM532Gj9KcVcLQE6gFHFljBhB72goX20xwSnL7RXgMQiH6sM7NZ4CZ SptIvQVrJKvJhefQuJwe2hI= X-Google-Smtp-Source: ABdhPJxal9VSG2MFr/0yECPx1pxMr91BsCraGf23WYGZDhjdCa3/+jOnyO0pIeIaPK4utXTiytF1pg== X-Received: by 2002:a05:620a:2189:: with SMTP id g9mr3305656qka.488.1606925291512; Wed, 02 Dec 2020 08:08:11 -0800 (PST) Received: from [192.168.222.18] (192-222-183-158.qc.cable.ebox.net. [192.222.183.158]) by smtp.gmail.com with ESMTPSA id c6sm2104272qkg.54.2020.12.02.08.08.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 02 Dec 2020 08:08:10 -0800 (PST) Subject: Using git branches for ports (was: Re: converting rmport to git) To: Warner Losh Cc: git@freebsd.org, "portmgr@FreeBSD.org" References: <20201129164707.GA31739@freefall.freebsd.org> <14871125-A032-4980-8DB1-0210E34D5A11@FreeBSD.org> <20201130105337.GA42359@freefall.freebsd.org> <7246FB00-655B-4BD4-BC99-B87E4595969C@FreeBSD.org> <20201201095906.GA50345@freefall.freebsd.org> From: Marc Branchaud Message-ID: Date: Wed, 2 Dec 2020 11:08:10 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.3.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4CmP5S3TYlz3Dcw X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-git@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion of git use in the FreeBSD project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Dec 2020 16:08:12 -0000 On 2020-12-01 11:36 a.m., Warner Losh wrote: > > To be honest, though, I think this is an area where some experimentation to > understand the alternatives is needed because this use case is relatively > rare in the larger open source community. OK, so I just have to ask (and I apologize if I'm opening a can of worms that has already been discussed, or that nobody wants to look at; I'll drop this if it's just noise): Have you considered using a branch for each port? Yes, I'm talking about 41,000+ branches. Git should not have any trouble dealing with that. There are a few advantages to this approach: * Each port's change history is fully isolated and easy to track. (Don't worry about having lots of near-duplicate files in different branches or directories, as git is very efficient at dealing with this.) * MFCs are proper git merges, which means that it is very easy to understand which changes have landed where. * Cases like removing/re-adding a port would take place on that port's branch, making it obvious just how that work was done. If this sounds appealing, then the real question is whether or not this approach trips up any important cases that arise when working on ports. I can't answer that, but in the grand tradition of git branch ASCII-art, here is a pretty picture to help understand what this approach might look like. In the following: - "a" thru "f" represent commits of some work on a port (net/gsk, for example). - "M#" represent git merge commits of some net/gsk changes. - "m" represents a git merge commit of some other port's changes. My proposed branch names are on the left. Commit history proceeds from left to right. main ....--m---m--M1--m--M2--m--M3--m / / / net/gsk ....-a--b---c--d--e---f \ 2020Q4 ....---m---M4---m---m So the net/gsk port evolves on its own "net/gsk" branch, with commits a..f. We see that the a and b changes were merged into the "main" branch by merge-commit M1. Merge commit M2 brought in changes c and d, and then merge M3 brought changes e and f into "main". Meanwhile, only changes a and b have been merged into the "2020Q4" branch (commit M4). Both the "main" and "2020Q4" branches also contain merges from other ports' branches (the "m" commits). The mainline branches ("main", "2020Q4", etc) would consist almost entirely of merge commits. The net/gsk changes in the mainline branches can be easily obtained from simple git commands. To see the net/gsk work that has happened in a mainline branch like 2020Q4, just do git log 2020Q4 -- net/gsk That will list commits a, b and M4. No need to do any patch-level analysis. That command will also work with the existing git repo migrated form svn. But the branch-based model has some additional power. For example, a command like git log --oneline --graph 2020Q4 -- net/gsk will output an ASCII-art picture of the 2020Q4 branch's view of the net/gsk port, similar to what I drew above. More importantly, it's easy to see where any particular piece of the net/gsk work has landed: git branch -a --contains would report the "main" and "2020Q4" branches (here the is the SHA-ID of commit b). No need to deal with "combined" MFCs or did-this-change-match-that-patch problems. What about the rmport script? The branches I'm describing contain the full ports tree -- they're not "partial" or "sparse" in any way. So to remove the net/gsk port, rmport would just checkout the "net/gsk" branch and do the removal there. Then that can be merged (manually or automatically) into whatever mainline branch is desired. There's no need to remove the "net/gsk" branch though, and it's better to keep it around in case someone wants to revive the net/gsk port in the future. This branch-based model can be adopted atop the transitioned ports repo as it stands today. There's no need (nor is it possible) to retroactively translate the svn history into this structure. Sure, the migrated svn history isn't amenable to tricks like "git branch --contains", but that will become less important as time marches on. And the migrated history can still be teased out using patch-level commands like "git cherry". Those are my main points, so you can stop reading here if you're already annoyed! I'm now going to delve into some of the flexibility that this approach offers. In this model the net/gsk port is free to evolve as it needs to in the "net/gsk" branch. From the above we see that changes a and b were deemed good enough to put into 2020Q4, but changes c-f are still a bit experimental and they're still being validated on the "main" branch. (I'm making some assumptions here about how people develop the ports. Apologies if I got it wrong; I'm sure this model can accommodate a different workflow.) In fact, that "net/gsk" branch can itself contain sub-branches for special circumstances. Let's say that commit b has a bug. We'd like to fix that bug in both "main" and "2020Q4", but if we just plop the fix onto the tip of the "net/gsk" branch (as commit g, say) that change will have commits c-f has part of its history: main ....--m---m--M1--m--M2--m--M3--m / / / net/gsk ....-a--b---c--d--e---f---g \ 2020Q4 ....---m---M4---m---m If we just merged g into 2020Q4, we'd also bring in the c-f changes which we do not want to have on the 2020Q4 branch. So instead, we can fix the bug in a mini branch based on the b commit, then merge that work where it's needed: main ....--m---m--M1--m--M2--m--M3--m--M6 / / / / net/gsk ....-a--b---c--d--e---f------g' |\ / | \---------b'-----/ \ \ 2020Q4 ....---m---M4---m---m--M5 Here we've fixed the bug with commit b', which is based directly on commit b and so we can merge b' into the "2020Q4" branch (commit M5), with the confidence that we're only bringing in the exact bug fix we need. Meanwhile, we also merge b' onto the tip of the "net/gsk" branch (commit g'), fixing the bug on the port's own branch, and then merge g' into the "main" branch as commit M6. It is completely clear what happened to the net/gsk port, and how those changes were brought into the mainline branches. One last wrinkle about this picture: Note how I did not put a name on the branch with the b' commit. Git is perfectly happy to deal with this kind of anonymous branching, and so there's no need to pollute the central FreeBSD-ports repository with names for these kinds of branches. But that does not prevent the net/gsk developer from having a *local* name for that branch in their own, local clone of the repository. The developer can name their local branch whatever makes sense to them. When they push one of the merge commits (M5, g' or M6) to the central repo, the b' commit rides along but without the developer's local branch name. The history recorded in the central repository is as depicted, with b' living on a nameless branch. I can't believe you've read all of this! Thanks! M.