Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Nov 2020 10:16:26 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Marc Branchaud <marcnarc@gmail.com>
Cc:        Dan Langille <dan@langille.org>, freebsd-git@freebsd.org
Subject:   Re: Monitoring commits on all branches
Message-ID:  <CANCZdfqiEMSrqHrwAk_YbJYk9AHDCQEhH1%2Bqg6Vb44ovn_envQ@mail.gmail.com>
In-Reply-To: <3c9f6285-ae7c-1062-2dd3-42f8c953a230@gmail.com>
References:  <197541CC-FEA7-4B4C-936E-66A5625BB64C@langille.org> <3c9f6285-ae7c-1062-2dd3-42f8c953a230@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Thanks Marc! This is great advice... more comments below...

On Thu, Nov 19, 2020 at 9:16 AM Marc Branchaud <marcnarc@gmail.com> wrote:

> On 2020-11-18 8:49 p.m., Dan Langille wrote:
> > How can a repo be monitored for commits on all branches?
> >
> > I know how to ask a given branch: do you have any commits after foo_hash?
> >
> > How do I:
> >
> > * get a list of all commits since foo_hash
>
> A quick a note about Warner's reply:
>
> > git log $hash..HEAD
>
> "HEAD" is just a git nickname for "whatever you have currently
> checked-out" (which can be a branch, a tag, or "detached" commit SHA ID).
>
> > * know which branch each of those commits was on (e.g. master,
> branches/2020Q4)
>
> Unfortunately you'll find most normal git advice to be a bit frustrating
> with the FreeBSD repos, because FreeBSD doesn't work the way most people
> use git.  Specifically, the FreeBSD project does not ever merge branches
> (in the git sense of the word "merge").  Things would be very, very much
> easier if the FreeBSD project were to use git-style merging.  I believe
> there are discussions underway about adjusting the whole MFC process for
> the git world.  I admit that part of my motivation in writing this
> message is to provide grist for that mill.
>

FreeBSD src will be doing cherry-picks. There's only pain and suffering
from merge commits in this environment. Git's tools are adequate to cope
with individual and squashed cherry picks.


> Fortunately even without git-merged branches, there are still git tools
> that help, though they're not as precise as one would like.
>

They are for src. I suspect for ports they might not be.


> Let's look at a concrete example with the beta ports git repo (which I
> just cloned), and compare the 2020Q4 and main branches.  I'll start with
> some overall exploration, then address your specific question.
>
> There are 298 commits in the 2020Q4 branch.  I know this because
>      git merge-base origin/main origin/branches/2020Q4
> tells me where 2020Q4 branched off of main: commit 5dbe4e5f775ea2.  And
>      git rev-list 5dbe4e5f775ea2..origin/branches/2020Q4 | wc -l
> says "299".  (The "rev-list" command is a bare-bones version of "log"
> that only lists commit SHA IDs.)
>
> Meanwhile there have been 4538 commits to the main branch since commit
> 5dbe4e5f775ea2.
>
> As far as git is concerned, those 299 commits in 2020Q4 are *different*
> from anything in main.  Even though most of them made the exact same
> code changes, they were created at different times, often by different
> authors, and they have different commit messages.
>

True.


> But you can still ask git to look at the code-change level to see which
> 2020Q4 commits exactly replicated the code change from main:
>
>      git cherry -v origin/main origin/branches/2020Q4
>
> This little piece of magic looks at the 299 commits in 2020Q4 that are
> not in main and compares their code changes to the 4538 commits in main
> that are not in 2020Q4.  It prints out the 299 2020Q4 commit SHA IDs,
> prefixed with either a "- " or a "+ ".  The -v appends the commit
> message's first line:
>
>      - 394d9746e5eea73f56334b2e7ddbdc8f686d6541 MFH: r550869
>      + 1ac9571956759c91d852ee92859a12e52dcbde48 MFH: r550885 r550886
>      - fd411bdfda55488b84de75e6b043c513a281abf0 MFH: r551209
>      - 533cdaa97457b3318aebcc53f7a1a46ea66721da MFH: r551236
>      ......
>
> A "-" means that the commit matches the code change made by a commit in
> main, while a "+" means that the commit's code change does not *exactly*
> match any main commit since commit 5dbe4e5f775ea2.
>
> So
>      git cherry -v origin/main origin/branches/2020Q4 | grep ^-
> shows us the 234 2020Q4 commits that made the exact same change as a
> commit in main.
>
> And
>      git cherry -v origin/main origin/branches/2020Q4 | grep ^+
> shows us that there are 41 not-exactly-the-same-change commits in
> 2020Q4.  Mostly these are ones that combined two or more MFH's into one
> commit (e.g. 2020Q4 commit 1ac95719567), or that changed a file in a
> slightly different way (see the first patch hunk of 2020Q4 commit
> cbd002878f2, compared to its counterpart in main: commit a5d21ea16b6).
>

Yes. These sorts of issues are why doing merge commits aren't always the
right way to go because we're not merging the entire history together
(doing a join), but rather just small subsets of it. How to cope with the
mostly the same small files tree that is our ports tree in the face of
git's guessing which does a poor job on such a tree is an interesting
problem to solve. merge commits can help some of the issue, but they can
create other issues as well when done incorrectly....

Even so, great hints for how to find cherry picked items. I suspect we'll
need to have some tooling that embeds hash(es) into the commit message in
some stylized way to allow tracking the non--trivial patch changes that
sometimes happen: squashing several cherry picks, necessary differences due
to branch drift, etc. It's unclear how we should do this, though, in a way
that works well, is reliable and doesn't add undue friction to the
process...


> Now to your specific question: Given a commit, how can we tell which
> branches contain that code change?  Let's look at main commit
> 6a9a8389d609 which I've determined, through manual spelunking, matches
> 2020Q4's commit 02eba4048564.
>
> At a basic level, "git cherry" can tell us that *something* in 2020Q4
> made the same change as commit 6a9a8389d609.  Here I reversed the order
> of the branch names in the command:
>      git cherry origin/branches/2020Q4 origin/main | grep 6a9a8389d609
> This outputs:
>      - 6a9a8389d609ca0370c8c6eb8f993c1aa4071681
> and the "-" tells me that 6a9a8389d609's code change is *somewhere* in
> 2020Q4 unique 299 commits.
>
> Unfortunately there's no convenient git command that'll tell you *which*
> 2020Q4 commit replicated commit 6a9a8389d609.  For that, we need to do a
> bit of scripting:
>
> -----8<-----8<-----8<-----8<-----
>
> #!/bin/sh
>
> TARGET="6a9a8389d609"
>
> BASE=`git merge-base origin/branches/2020Q4 origin/main`
>
> TARGET_PATCH_ID=`git show -p $TARGET | git patch-id --stable | cut -f 1
> -d ' '`
>
> for REV in `git rev-list $BASE..origin/branches/2020Q4`; do
>     PATCH_ID=`git show -p $REV | git patch-id --stable | cut -f 1 -d ' '`
>     if [ "$PATCH_ID" = "$TARGET_PATCH_ID" ]; then
>        echo "Found a commit that replicated target commit $TARGET:"
>        echo
>        git show -s $REV
>        exit 0
>     fi
> done
>
> echo "Did not find any commit that exactly replicated $TARGET."
> exit 1
>
> ----->8----->8----->8----->8-----
>
> This only looks at the 2020Q4 branch, but it's easily adapted to look at
> a user-specified branch, or multiple branches.  (In the above I used
> "git patch-id", which is what "git cherry" uses internally to identify a
> commit's code changes.)
>
> I hope all this helps a bit!
>

It does. I thought I'd had my head deep into git, but hadn't stumbled upon
this.

It looks useful enough I'll try to add a section to my FAQ.

Thanks!

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqiEMSrqHrwAk_YbJYk9AHDCQEhH1%2Bqg6Vb44ovn_envQ>