Date: Thu, 19 Nov 2020 17:50:13 -0500 From: Marc Branchaud <marcnarc@gmail.com> To: Warner Losh <imp@bsdimp.com> Cc: Dan Langille <dan@langille.org>, freebsd-git@freebsd.org Subject: Re: Monitoring commits on all branches Message-ID: <6ead26a8-54e3-ed0e-d1b7-28d69753dea4@gmail.com> In-Reply-To: <CANCZdfqiEMSrqHrwAk_YbJYk9AHDCQEhH1%2Bqg6Vb44ovn_envQ@mail.gmail.com> References: <197541CC-FEA7-4B4C-936E-66A5625BB64C@langille.org> <3c9f6285-ae7c-1062-2dd3-42f8c953a230@gmail.com> <CANCZdfqiEMSrqHrwAk_YbJYk9AHDCQEhH1%2Bqg6Vb44ovn_envQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-11-19 12:16 p.m., Warner Losh wrote: > > Thanks Marc! This is great advice... more comments below... > > On Thu, Nov 19, 2020 at 9:16 AM Marc Branchaud <marcnarc@gmail.com > <mailto:marcnarc@gmail.com>> wrote: > > On 2020-11-18 8:49 p.m., Dan Langille wrote: > > How can a repo be monitored for commits on all branches? > > > > I know how to ask a given branch: do you have any commits after > foo_hash? > > > > How do I: > > > > * get a list of all commits since foo_hash > > A quick a note about Warner's reply: > > > git log $hash..HEAD > > "HEAD" is just a git nickname for "whatever you have currently > checked-out" (which can be a branch, a tag, or "detached" commit SHA > ID). > > > * know which branch each of those commits was on (e.g. master, > branches/2020Q4) > > Unfortunately you'll find most normal git advice to be a bit > frustrating > with the FreeBSD repos, because FreeBSD doesn't work the way most > people > use git. Specifically, the FreeBSD project does not ever merge > branches > (in the git sense of the word "merge"). Things would be very, very > much > easier if the FreeBSD project were to use git-style merging. I believe > there are discussions underway about adjusting the whole MFC process > for > the git world. I admit that part of my motivation in writing this > message is to provide grist for that mill. > > > FreeBSD src will be doing cherry-picks. There's only pain and suffering > from merge commits in this environment. Git's tools are adequate to cope > with individual and squashed cherry picks. Fair enough. I'm also sure that the git community would welcome patches that help make FreeBSD's workflow a bit smoother. > Fortunately even without git-merged branches, there are still git tools > that help, though they're not as precise as one would like. > > > They are for src. I suspect for ports they might not be. > > Let's look at a concrete example with the beta ports git repo (which I > just cloned), and compare the 2020Q4 and main branches. I'll start > with > some overall exploration, then address your specific question. > > There are 298 commits in the 2020Q4 branch. I know this because > git merge-base origin/main origin/branches/2020Q4 > tells me where 2020Q4 branched off of main: commit 5dbe4e5f775ea2. And > git rev-list 5dbe4e5f775ea2..origin/branches/2020Q4 | wc -l > says "299". (The "rev-list" command is a bare-bones version of "log" > that only lists commit SHA IDs.) > > Meanwhile there have been 4538 commits to the main branch since commit > 5dbe4e5f775ea2. > > As far as git is concerned, those 299 commits in 2020Q4 are *different* > from anything in main. Even though most of them made the exact same > code changes, they were created at different times, often by different > authors, and they have different commit messages. > > > True. > > But you can still ask git to look at the code-change level to see which > 2020Q4 commits exactly replicated the code change from main: > > git cherry -v origin/main origin/branches/2020Q4 > > This little piece of magic looks at the 299 commits in 2020Q4 that are > not in main and compares their code changes to the 4538 commits in main > that are not in 2020Q4. It prints out the 299 2020Q4 commit SHA IDs, > prefixed with either a "- " or a "+ ". The -v appends the commit > message's first line: > > - 394d9746e5eea73f56334b2e7ddbdc8f686d6541 MFH: r550869 > + 1ac9571956759c91d852ee92859a12e52dcbde48 MFH: r550885 r550886 > - fd411bdfda55488b84de75e6b043c513a281abf0 MFH: r551209 > - 533cdaa97457b3318aebcc53f7a1a46ea66721da MFH: r551236 > ...... > > A "-" means that the commit matches the code change made by a commit in > main, while a "+" means that the commit's code change does not > *exactly* > match any main commit since commit 5dbe4e5f775ea2. > > So > git cherry -v origin/main origin/branches/2020Q4 | grep ^- > shows us the 234 2020Q4 commits that made the exact same change as a > commit in main. > > And > git cherry -v origin/main origin/branches/2020Q4 | grep ^+ > shows us that there are 41 not-exactly-the-same-change commits in > 2020Q4. Mostly these are ones that combined two or more MFH's into one > commit (e.g. 2020Q4 commit 1ac95719567), or that changed a file in a > slightly different way (see the first patch hunk of 2020Q4 commit > cbd002878f2, compared to its counterpart in main: commit a5d21ea16b6). > > > Yes. These sorts of issues are why doing merge commits aren't always the > right way to go because we're not merging the entire history together > (doing a join), but rather just small subsets of it. How to cope with > the mostly the same small files tree that is our ports tree in the face > of git's guessing which does a poor job on such a tree is an interesting > problem to solve. merge commits can help some of the issue, but they can > create other issues as well when done incorrectly.... I admit I don't quite follow you there, but I'm particularly ignorant of the ports tree. I have some quite-likely-stupid ideas after having played with it for 10 minutes while composing my earlier message, but even if the ideas are somehow clever I suspect they'd entail too much workflow change to be palatable. > Even so, great hints for how to find cherry picked items. I suspect > we'll need to have some tooling that embeds hash(es) into the commit > message in some stylized way to allow tracking the non--trivial patch > changes that sometimes happen: squashing several cherry picks, necessary > differences due to branch drift, etc. It's unclear how we should do > this, though, in a way that works well, is reliable and doesn't add > undue friction to the process... It's traditional when doing a cherry-pick to add a Cherry-picked-from: <SHA ID> line to the commit message. The "cherry-pick" command even has a -x option to automatically add such a line to the new commit's message. (There's also a "git interpret-trailers" command that is a general-purpose tool for manipulating "Foo: blah blah" lines in commit messages.) "git cherry-pick" might actually lead people away from squashing together multiple changes into one commit, because you have to make a bit of an effort to get cherry-pick to squash things up. I personally think the project would benefit from discouraging squashed-together MFC's. > Now to your specific question: Given a commit, how can we tell which > branches contain that code change? Let's look at main commit > 6a9a8389d609 which I've determined, through manual spelunking, matches > 2020Q4's commit 02eba4048564. > > At a basic level, "git cherry" can tell us that *something* in 2020Q4 > made the same change as commit 6a9a8389d609. Here I reversed the order > of the branch names in the command: > git cherry origin/branches/2020Q4 origin/main | grep 6a9a8389d609 > This outputs: > - 6a9a8389d609ca0370c8c6eb8f993c1aa4071681 > and the "-" tells me that 6a9a8389d609's code change is *somewhere* in > 2020Q4 unique 299 commits. > > Unfortunately there's no convenient git command that'll tell you > *which* > 2020Q4 commit replicated commit 6a9a8389d609. For that, we need to > do a > bit of scripting: > > -----8<-----8<-----8<-----8<----- > > #!/bin/sh > > TARGET="6a9a8389d609" > > BASE=`git merge-base origin/branches/2020Q4 origin/main` > > TARGET_PATCH_ID=`git show -p $TARGET | git patch-id --stable | cut -f 1 > -d ' '` > > for REV in `git rev-list $BASE..origin/branches/2020Q4`; do > PATCH_ID=`git show -p $REV | git patch-id --stable | cut -f 1 > -d ' '` > if [ "$PATCH_ID" = "$TARGET_PATCH_ID" ]; then > echo "Found a commit that replicated target commit $TARGET:" > echo > git show -s $REV > exit 0 > fi > done > > echo "Did not find any commit that exactly replicated $TARGET." > exit 1 > > ----->8----->8----->8----->8----- > > This only looks at the 2020Q4 branch, but it's easily adapted to > look at > a user-specified branch, or multiple branches. (In the above I used > "git patch-id", which is what "git cherry" uses internally to > identify a > commit's code changes.) > > I hope all this helps a bit! > > > It does. I thought I'd had my head deep into git, but hadn't stumbled > upon this. I've been using git for over 10 years, and I still discover new things. This "git cherry" stuff, for example, I've only started using a little bit in the last few months. > It looks useful enough I'll try to add a section to my FAQ. I'm honoured! M.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6ead26a8-54e3-ed0e-d1b7-28d69753dea4>