From owner-freebsd-git@freebsd.org Thu Jun 18 08:44:39 2020 Return-Path: Delivered-To: freebsd-git@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 66E973490F0 for ; Thu, 18 Jun 2020 08:44:39 +0000 (UTC) (envelope-from uspoerlein@gmail.com) Received: from mail-oi1-f195.google.com (mail-oi1-f195.google.com [209.85.167.195]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49nb8k4hNYz4KYW; Thu, 18 Jun 2020 08:44:38 +0000 (UTC) (envelope-from uspoerlein@gmail.com) Received: by mail-oi1-f195.google.com with SMTP id a21so4370386oic.8; Thu, 18 Jun 2020 01:44:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2QmdhkWZ/UqF8An+SlDZy+ctg5+o9gUduYQ+TPZQxow=; b=AO07tb4XjgBxRuRwq5RnOhxE4riR+1FsD6dJLjddmAr3U4v9CJspEPf8aOjqtPSXCG aanw3fm2sW+MIO4npLHl0SMqNdqdz/sk1Cj0FwheFX6KR9F41pi4AEDtOHZsPoLsRdyH 5pqZIeF0y61NZzXsCUozU1Hcssg/yj1WRuaFB8y/GkDygGabNSTDzfAdyg0NHkMazMbF CK3DUSlZNfk0Gu/EqnCH/lcp0mbm3uK4VEIn+T4KU3BSRSSMiNNeEMoEidYH+wLVc5s9 oKTVBHjv0mTWwIR7ODCVsZskeF4huqPmVXhzF5Zsr9wQZmtXmXb5pEPlUq2Q5T16FyX5 kVIA== X-Gm-Message-State: AOAM532XQo3Fd531yIxsxNFKChXyPuBgTRJ6SjoLi1O7TluKeAVXfcE0 4qWk5gqkSX9hBkwAlTr5IhHlg1rqqHz+gf7l+c0AuL0l X-Google-Smtp-Source: ABdhPJyCZUx7Y1F236r2Ml7E9F/lET6OHK4KDxkImE/l2joo5ihKI9A3ZGTZlY0x+bRj+uJTAX66D1kn4q7rjrPSlkM= X-Received: by 2002:aca:ebca:: with SMTP id j193mr2014186oih.42.1592469876850; Thu, 18 Jun 2020 01:44:36 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Ulrich_Sp=C3=B6rlein?= Date: Thu, 18 Jun 2020 10:44:25 +0200 Message-ID: Subject: Re: Next odd commit affecting `git subtree split` experiments with contrib/elftoolchain To: Ed Maste Cc: freebsd-git@freebsd.org X-Rspamd-Queue-Id: 49nb8k4hNYz4KYW X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US] Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.33 X-BeenThere: freebsd-git@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussion of git use in the FreeBSD project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jun 2020 08:44:39 -0000 On Wed, Jun 17, 2020 at 6:47 PM Ed Maste wrote: > On Wed, 17 Jun 2020 at 12:00, Ulrich Sp=C3=B6rlein wrot= e: > > > > Running the subtree split, I get a history with about 437 commits. I se= e > in your https://github.com/emaste/elftoolchain/tree/split-from-cgit-beta > that you only end up with 277 commits (if that display is to be trusted). > > Are you using unmodified subtree split, from git port/pkg? The patch > set from Tom Clarkson improves the detection of mainline vs subtree > significantly. In the existing cgit-beta (without the MFH changes you > discussed here) it produces a subtree with tens/hundreds of thousands > of commits, because a mainline commit "leaks" into the subtree via a > merge. The patched git subtree is what I used for the split > elftoolchain that I shared. > Yes, this is using plain git subtree w/o patches, but it's on a repo that has no MFH head =E2=86=92 project merges. As I wrote, it comes out with ~400 commits, = while a patched subtree split will only produce about 280, so going with the patched subtree seems more sensible. > > I'm not sure whether it would be straightforward to squash the right > commits and keep > > the ones with the proper commit message. Your repo still has a view MFH > commits that > > one might want to remove. Using git `filter-repo` might do the trick ..= . > > Indeed, although I'm not particularly concerned if there are a few > stray MFH commits - it's a little bit of clutter but accurately > represents what happened in that subtree in the svn world. > I tried filter-repo on your elftoolchain history, and it does nothing :( There are still many tree objects that are identical between commits, but they are usually merges, so it's not a simple "empty" commit that we can toss out. However, I wonder if we cannot flatten the history to be a single line instead of the merges. Do merge commits make any sense for the resulting repo? If we could make it all linear, then the "empty" commits would stand out better and filter-repo could toss them away. Here's what I mean: * | e7db0cf7 0a4a2d68 - Add META_MODE support. |\ \ | * \ 3cc0c3e8 935facd3 - Merge sync of head | |\ \ | * | | a16e339e d637a411 - Merge from head | |\| | * | | | eb3e8834 0a4a2d68 - elfcopy: Handle objects without a ".shstrtab" section string table ... | |/ |/| * | b5199d77 d637a411 - Fix the conversion macro for .note sections, broken in the cas There's a long "branch" that was created from b5199d77, then sees 2 commits (with different trees, mind you) a16e339e and 3cc0c3e8. But once it is merged back into main with e7db0cf7, the resulting tree is 0a4a2d68, which is just the same pre-merge (in commit eb3e8834). So the whole exercise of merging in the off-shoot branch doesn't alter the tree at all. The algorithm would be simple: - for all merge commits, check their tree vs. the tree of all parents. IFF one of the parents has the same tree as the merge commit =E2=86=92 remove the merge commit The last step is actually tricky, as you need to update all the children of the merge commit with the new parent. You need to check all revisions for that (in your repo, there are actually 2!) Doing it manually with a graft: % git replace --graft e7db0cf7 eb3e8834 results in * | e7db0cf7 0a4a2d68 - (replaced) Add META_MODE support. * | eb3e8834 0a4a2d68 - elfcopy: Handle objects without a ".shstrtab" section string table * | 666a6d09 7d6c3ca1 - Update to ELF Tool Chain r3223 This looks useful, I guess. There is a different case of history also, where a branch is merged in multiple times, but the branch never had a commit so far! * | ab49ac72 2fcd31e1 - Copy elftoolchain readelf from vendor branch |\| * | b5f0dac9 00d9d606 - Correct elftoolchain strip(1) memory size calculation * | daa082dd 57cf3e02 - libelf: Fix cross-endian ELF note file / memory conversion * | c7ce7bba 41742fa1 - Track libarchive API change * | 1481386a 62292f3c - Temporarily disable non-FreeBSD NT_ note types * | 2b9d3d05 d080f9c9 - Fix elftoolchain tools in-tree build * | f4b35757 b87618f0 - Copy elftoolchain binutils replacements from vendor branch |\| * | b5199d77 d637a411 - Fix the conversion macro for .note sections, broken in ... * | eda82609 cffe5c27 - GCC for PowerPC does not align .note secti... * | 0501d6e9 b7163de0 - Reapply r221569, r233401, r233524 and r255105: Add support ... * | 28d52963 195f7d2b - Remove trailing whitespace. * | 3342d176 e298487d - * Allow API dwarf_loclist_n() and dwarf_loclist() to be called with ... * | 297fc330 e3796a34 - Add a sanity check: The provided offset for the desired location ... * | 8bd68bad 4613fb47 - API dwarf_attrval_flag() should properly handle an attribute ... * | 039bcdce 651ee9b1 - Fix typo: the public API dwarf_child() should return ... * | 605443d0 c431e1b8 - Fix a warning in libdwarf found by -Wmissing-variable * | 6458e5be f938195c - Apply r241720 by ed: * | fdaacf72 322e4f0d - Use FreeBSD's ELF headers instead of the elfdefinitions.h ... * | afb9ed40 bcc80b4a - Copy libelf, libdwarf and common files from vendor/ to contrib/. |/ * 5265ace0 13068447 - Initial import of elftoolchain r2974. The branch on the right is merged in f4b35757 and ab49ac72, but it never actually had a commit itself till then. I don't know how to detect that, but let's graft it away anyway. % git replace --graft f4b35757 b5199d77 % git replace --graft ab49ac72 b5f0dac9 % git filter-repo --debug --prune-empty=3Dalways --prune-degenerate=3Dalway= s --force This results in at most (!) 3 branches active at the same time, and that only for 1 commit anyway. All MFHs are gone. The graph output has the branches swapped, so a diff(1) is useless, you need to manually look at them side-by-side (I hope this will be readable, the hashes are tree-hashes, left is old, right is after 3 grafts and filter-repo) * | f5beebdd Update to ELF Tool Chain r3|* | f5beebdd Update to ELF Tool Chain r34 |\| ||\| | * 5d850df4 Import ELF Tool Chain snaps|| * 5d850df4 Import ELF Tool Chain snapsh | * 4faefe71 Import ELF Tool Chain snaps|| * 4faefe71 Import ELF Tool Chain snapsh * | ce92a093 elfcopy: map all !alnum cha|* | ce92a093 elfcopy: map all !alnum char * | e1c6f66f MFH |* | e1c6f66f elfcopy: fix symbol table ha |\ \ |* | fba12860 elfcopy: overhaul of LMA han | * | e1c6f66f elfcopy: fix symbol table|* | b6574d32 libelf: correct byte count i * | | fba12860 MFH |* | ecdda97c libdwarf: fix SHT_REL reloca |\| | |* | f3d41e3c elfcopy: fail if debug link | * | fba12860 elfcopy: overhaul of LMA |* | 6c9c0b57 Allow elfcopy to convert bet * | | b6574d32 MFH |* | a6ca6b74 Update ELF Tool Chain to ups |\| | ||\| | * | b6574d32 libelf: correct byte coun|| * 3de9500b Import ELF Tool Chain snapsh | * | ecdda97c libdwarf: fix SHT_REL rel|* | 848f4fb5 readelf: decode AArch64 TLS * | | f3d41e3c MFH |* | bc4cfa9f readelf: report value of unk |\| | |* | 75cc50b8 readelf: avoid accidental fa | * | f3d41e3c elfcopy: fail if debug li|* | 641e5321 Add config for RISC-V ISA. * | | 6c9c0b57 MFH |* | f4bd9e9f Fixed uninitialized variable |\| | |* | 47b58c30 Update to ELF Tool Chain r32 | * | 6c9c0b57 Allow elfcopy to convert ||\| * | | a6ca6b74 MFH || * fa4048de Import ELF Tool Chain snapsh |\| | |* | eff30dab elfcopy: include extension b | * | a6ca6b74 Update ELF Tool Chain to |* | 49fea608 elfcopy: exclude extension w | |\| |* | e2028f7d readelf: add Xen ELF notes | | * 3de9500b Import ELF Tool Chain sna|* | e1809d24 Add missing commas * | | 848f4fb5 MFH |* | 3d703d04 Add definitions for MIPS TLS |\| | |* | 3f2daddd addr2line: initialize die to | * | 848f4fb5 readelf: decode AArch64 T|* | 1314d2c4 Update to ELF Tool Chain r32 | * | bc4cfa9f readelf: report value of ||\| | * | 75cc50b8 readelf: avoid accidental|| * 3204d66e Import ELF Tool Chain snapsh * | | 641e5321 MFH |* | f56b32a7 Rename ELFOSABI_SYSV to ELFO |\| | |* | f1fa8947 readelf: Correct typo HPUS - | * | 641e5321 Add config for RISC-V ISA|* | 5cdc13da addr2line: skip CUs lacking > > Would be good if you could run a script against all contrib prefixes an= d > later > > count the number of commits that a contrib-tree produces to see if > something > > weird happens. > > You mean try running `git subtree split` on each contrib prefix, and > checking that the number of commits in each generated tree is > sensible? For example, inspect any subtree with over say 500 commits? > > As a first pass for identifying contrib prefixes I tried: > > ls -1d contrib/* sys/contrib/* crypto/* sys/crypto/* cddl/* sys/cddl/* > sys/gnu/* > > sys/crypto/ and the cddl ones aren't quite right, and I still need to > check for additional hierarchy (e.g., if we have cases like > contrib/netbsd/blocklist instead of contrib/blocklist) > Yes, we need to check all of them and it might need a manual mapping (or at least a handful exceptions). > > > You can test both parents whether they are reachable from > vendor/elftoolchain/dist, > > I'm hoping to find an algorithm that could be made general and > submitted upstream, so that we could have something like git subtree > split --initial --prefix=3Dcontrib/elftoolchain, and have the --initial > calculate the --onto revision automatically. If we produce some > bespoke tooling for FreeBSD though this branch name approach should > work, but I think we'd have to have a map of contrib directory to > vendor branch. I believe that some are not the same in contrib and > vendor. > It might really be too bespoke for FreeBSD and we only need to do all of this once, yes? > I think we have three ways we can address this: > > 1. Change the svn2git process so that we don't trip over unpatched > git-subtree's issue with mainline history leaking into the subtree. > 2. Get Tom Clarkson's git-subtree patches into upstream git, or > require that contrib maintainers use our own patched git until that > happens. > 3. Develop and use an alternate subtree splitter. > > I suppose there is also > 4. Reconsider git subtree altogether (e.g. submodules). > but I think there's little appetite for this. > > At this point I think that option 2 is the most straightforward, and > I'm now reasonably confident that it will work as we want. With this > being the case I'd say we should focus on tuning svn2git to produce > "sensible" output without regard to how unpatched git-subtree handles > the output. That is, I'd say I'm broadly happy with the state of > conversion in cgit-beta today. > 2 would be my preference as well. So someone will need to run subtree split for all contrib software and let me know if there are blockers that the svn2git conversion could help out with (or we need to help out all subtree splits with a dozen carefully placed grafts and a rewrite. Doesn't sound too bad?) Cheers Uli