From owner-freebsd-git@freebsd.org Wed Jun 17 16:47:12 2020 Return-Path: Delivered-To: freebsd-git@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 01FBA353768 for ; Wed, 17 Jun 2020 16:47:12 +0000 (UTC) (envelope-from carpeddiem@gmail.com) Received: from mail-io1-f65.google.com (mail-io1-f65.google.com [209.85.166.65]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49n9vz4pS6z43Wh; Wed, 17 Jun 2020 16:47:11 +0000 (UTC) (envelope-from carpeddiem@gmail.com) Received: by mail-io1-f65.google.com with SMTP id u13so3505448iol.10; Wed, 17 Jun 2020 09:47:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=4eQQZjDkg/yR3wpcg1Iucle2H7dLwVnemFBw7XjGrHs=; b=ZQ7+VrHdJOVOb5HeNKJVtTiB6kguWp2xqPo3ce/W9uJ1XHV/NfsbUKDAFJlNIqYoQI TKiw0nKojL1Naoh0FRLNOn319iM5wO+bJRaZRcm2y7+FcL4B7ODAC49yGBR/TANTPrU5 NfvVw1ZRxnRfBKuEFKITl54LsBcer1pY1KRwdF+9eG73MkWdLPkBmmk6waTTMwaDHnm4 3iKMXxrxMjFnT0bi4/l0bcVfHxbZ5Gn6kX48nfHzejQHKs7ZgYRakGTF+FVXEgfXc2+Z aq+LqDM3p0tETlUNLYgciBR2XWh3YMayuMaWliDSh1xjIMFNIlOrMyEYHU6IZLC9Qnv3 PYFA== X-Gm-Message-State: AOAM533E/xRjG0CtiayC+hnqegrWLLYeTQdhuuFsYuhBNh7XjhoTJCRv cKSrMleA1s8H1PCtHF5lbJq4k8yAmlnNKehLbQzvtdgWvrQ= X-Google-Smtp-Source: ABdhPJz+Q9zHM3OWRKLBY1G/wtc8q+Q4B5pJgqFm/BJKvVLi5jTagJp8i5amqw/rWXsLlZcjY4m5GVicBGWe8SN8AYs= X-Received: by 2002:a6b:b252:: with SMTP id b79mr282673iof.31.1592412429309; Wed, 17 Jun 2020 09:47:09 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ed Maste Date: Wed, 17 Jun 2020 12:46:57 -0400 Message-ID: Subject: Re: Next odd commit affecting `git subtree split` experiments with contrib/elftoolchain To: =?UTF-8?Q?Ulrich_Sp=C3=B6rlein?= Cc: freebsd-git@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 49n9vz4pS6z43Wh X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US] X-BeenThere: freebsd-git@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussion of git use in the FreeBSD project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Jun 2020 16:47:12 -0000 On Wed, 17 Jun 2020 at 12:00, Ulrich Sp=C3=B6rlein wrote: > > Running the subtree split, I get a history with about 437 commits. I see = in your https://github.com/emaste/elftoolchain/tree/split-from-cgit-beta th= at you only end up with 277 commits (if that display is to be trusted). Are you using unmodified subtree split, from git port/pkg? The patch set from Tom Clarkson improves the detection of mainline vs subtree significantly. In the existing cgit-beta (without the MFH changes you discussed here) it produces a subtree with tens/hundreds of thousands of commits, because a mainline commit "leaks" into the subtree via a merge. The patched git subtree is what I used for the split elftoolchain that I shared. > I'm not sure whether it would be straightforward to squash the right comm= its and keep > the ones with the proper commit message. Your repo still has a view MFH c= ommits that > one might want to remove. Using git `filter-repo` might do the trick ... Indeed, although I'm not particularly concerned if there are a few stray MFH commits - it's a little bit of clutter but accurately represents what happened in that subtree in the svn world. > Just to make sure, you know that you can get this like so: > % git log --reverse --format=3D%h master -- contrib/elftoolchain/ | head = -1 > 37429c2aa7e7 For the email I sent I just reviewed all of the contrib/elftoolchain history anyway, and looked at the last commit. Thanks for this though; I suspect that if we try automating this we could add --merges. > (note sure why using -n1 instead of head(1) will result in the latest, no= t the oldest. Seems that it ignores --reverse) Indeed, this looks like a git bug. > Would be good if you could run a script against all contrib prefixes and = later > count the number of commits that a contrib-tree produces to see if someth= ing > weird happens. You mean try running `git subtree split` on each contrib prefix, and checking that the number of commits in each generated tree is sensible? For example, inspect any subtree with over say 500 commits? As a first pass for identifying contrib prefixes I tried: ls -1d contrib/* sys/contrib/* crypto/* sys/crypto/* cddl/* sys/cddl/* sys/= gnu/* sys/crypto/ and the cddl ones aren't quite right, and I still need to check for additional hierarchy (e.g., if we have cases like contrib/netbsd/blocklist instead of contrib/blocklist) > You can test both parents whether they are reachable from vendor/elftoolc= hain/dist, I'm hoping to find an algorithm that could be made general and submitted upstream, so that we could have something like git subtree split --initial --prefix=3Dcontrib/elftoolchain, and have the --initial calculate the --onto revision automatically. If we produce some bespoke tooling for FreeBSD though this branch name approach should work, but I think we'd have to have a map of contrib directory to vendor branch. I believe that some are not the same in contrib and vendor. > or look at their notes: > > % git log -n1 --format=3D%P 37429c2aa7e7 | xargs -n1 -I@ git log -n1 --fo= rmat=3D"%h %N" @ > 8a7f75c8fcc5 svn path=3D/head/; revision=3D260666 > > 5265ace0e440 svn path=3D/vendor/elftoolchain/dist/; revision=3D260684 > svn path=3D/vendor/elftoolchain/elftoolchain-r2974/; revision=3D260685; t= ag=3Dvendor/elftoolchain/elftoolchain-r2974 This seems like a simpler, workable approach for our tree - anything with a note containing "svn path=3D/vendor" is a subtree commit. > For my own understanding, all the issues around subtree splitting are act= ually > not blocking the conversion in any way, right? All they do is make the li= ves miserable > for contrib-software maintainers and they might delay new code drops unde= r > contrib/ yes? It depends on your definition of "blocking" I think, but your statement is generally true - we could use the existing cgit-beta conversion, build releases from it, etc. In the current form, with unpatched git-subtree, the bootstrap process will be quite awkward for contrib software maintainers though. I think we have three ways we can address this: 1. Change the svn2git process so that we don't trip over unpatched git-subtree's issue with mainline history leaking into the subtree. 2. Get Tom Clarkson's git-subtree patches into upstream git, or require that contrib maintainers use our own patched git until that happens. 3. Develop and use an alternate subtree splitter. I suppose there is also 4. Reconsider git subtree altogether (e.g. submodules). but I think there's little appetite for this. At this point I think that option 2 is the most straightforward, and I'm now reasonably confident that it will work as we want. With this being the case I'd say we should focus on tuning svn2git to produce "sensible" output without regard to how unpatched git-subtree handles the output. That is, I'd say I'm broadly happy with the state of conversion in cgit-beta today.