Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 08 Aug 2025 11:35:19 +1000
From:      "Rob Norris" <robn@despairlabs.com>
To:        "Warner Losh" <imp@bsdimp.com>
Cc:        "Ed Maste" <emaste@freebsdfoundation.org>, "FreeBSD Hackers" <freebsd-hackers@freebsd.org>
Subject:   Re: RFC: Adopting SPDX for SBOM generation
Message-ID:  <65d7020b-b397-46de-9034-b35afd20d031@app.fastmail.com>
In-Reply-To:  <CANCZdfo9bx9rMB5oOy=dJ8fhK1k=iJmcDHDh7EBsg8d_1nWURQ@mail.gmail.com>
References:   <CAAeFWmnnX2=qj53je_nCgRBFG2g_%2BEx2CMxfUhLAahnB3obNQw@mail.gmail.com> <aJMyg1hAGQmSwrlJ@freefall.freebsd.org> <CANCZdfrD0BmKsfpnOiSqYfWWJL2UYjCaxyyQLhpG082LwBuKwQ@mail.gmail.com> <a9b3da3d-3f17-486f-b40a-472ddbb8baba@app.fastmail.com> <CANCZdfo9bx9rMB5oOy=dJ8fhK1k=iJmcDHDh7EBsg8d_1nWURQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
On Fri, 8 Aug 2025, at 10:40 AM, Warner Losh wrote:
> On Thu, Aug 7, 2025, 5:42 PM Rob Norris <robn@despairlabs.com> wrote:
>> Incidentally, this year we (OpenZFS) started some work on getting license tags onto everything and enforcing them in CI, see https://github.com/openzfs/zfs/pull/17001
> 
> What tool did you use to change those 3200-odd files? We have 150k files last I checked...

Honestly, Perl one-liners. Most of the files have the CDDL files have the same text, so I matched those and shoved a tag in the top, which covered a couple-thousand. Then I started writing the checker, and having it spit out things that were missing. Then I did a oneliner for the GPL files (most of the Linux SPL). Rinse-repeat until I got enough thart I could start understanding the one-offs and the exceptions. It was like, a week of evenings, and mostly mechanical search-and-append.

Maybe not the best method for 150K files, I dunno, but I probably would have started the same way, and it might work fine if there's not much variation among those files. Also it was helped by almost everything already having a license header of some sort, so its mostly a text matching problem. There's a few that don't that I had to go back through the git history to try to establish provenance, and there's some we haven't got around to yet (see the exception list). If you had a lot of mixed and incomplete history, then another approach might be needed.

Rob.
[-- Attachment #2 --]
<!DOCTYPE html><html><head><title></title></head><body><div>On Fri, 8 Aug 2025, at 10:40 AM, Warner Losh wrote:</div><blockquote type="cite" id="qt" style=""><div dir="auto"><div><div>On Thu, Aug 7, 2025, 5:42 PM Rob Norris &lt;<a href="mailto:robn@despairlabs.com">robn@despairlabs.com</a>&gt; wrote:</div><div class="qt-gmail_quote qt-gmail_quote_container"><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div>Incidentally, this year we (OpenZFS) started some work on getting license tags onto everything and enforcing them in CI, see&nbsp;<a href="https://github.com/openzfs/zfs/pull/17001" target="_blank" rel="noreferrer">https://github.com/openzfs/zfs/pull/17001</a><br></div></blockquote></div><div><br></div><div>What tool did you use to change those 3200-odd files? We have 150k files last I checked...</div></div></div></blockquote><div><br></div><div>Honestly, Perl one-liners. Most of the files have the CDDL files have the same text, so I matched those and shoved a tag in the top, which covered a couple-thousand. Then I started writing the checker, and having it spit out things that were missing. Then I did a oneliner for the GPL files (most of the Linux SPL). Rinse-repeat until I got enough thart I could start understanding the one-offs and the exceptions. It was like, a week of evenings, and mostly mechanical search-and-append.</div><div><br></div><div>Maybe not the best method for 150K files, I dunno, but I probably would have started the same way, and it might work fine if there's not much variation among those files. Also it was helped by almost everything already having a license header of some sort, so its mostly a text matching problem. There's a few that don't that I had to go back through the git history to try to establish provenance, and there's some we haven't got around to yet (see the exception list). If you had a lot of mixed and incomplete history, then another approach might be needed.</div><div><br></div><div>Rob.<br></div></body></html>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?65d7020b-b397-46de-9034-b35afd20d031>