Date: Fri, 08 Aug 2025 11:35:19 +1000 From: "Rob Norris" <robn@despairlabs.com> To: "Warner Losh" <imp@bsdimp.com> Cc: "Ed Maste" <emaste@freebsdfoundation.org>, "FreeBSD Hackers" <freebsd-hackers@freebsd.org> Subject: Re: RFC: Adopting SPDX for SBOM generation Message-ID: <65d7020b-b397-46de-9034-b35afd20d031@app.fastmail.com> In-Reply-To: <CANCZdfo9bx9rMB5oOy=dJ8fhK1k=iJmcDHDh7EBsg8d_1nWURQ@mail.gmail.com> References: <CAAeFWmnnX2=qj53je_nCgRBFG2g_%2BEx2CMxfUhLAahnB3obNQw@mail.gmail.com> <aJMyg1hAGQmSwrlJ@freefall.freebsd.org> <CANCZdfrD0BmKsfpnOiSqYfWWJL2UYjCaxyyQLhpG082LwBuKwQ@mail.gmail.com> <a9b3da3d-3f17-486f-b40a-472ddbb8baba@app.fastmail.com> <CANCZdfo9bx9rMB5oOy=dJ8fhK1k=iJmcDHDh7EBsg8d_1nWURQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] On Fri, 8 Aug 2025, at 10:40 AM, Warner Losh wrote: > On Thu, Aug 7, 2025, 5:42 PM Rob Norris <robn@despairlabs.com> wrote: >> Incidentally, this year we (OpenZFS) started some work on getting license tags onto everything and enforcing them in CI, see https://github.com/openzfs/zfs/pull/17001 > > What tool did you use to change those 3200-odd files? We have 150k files last I checked... Honestly, Perl one-liners. Most of the files have the CDDL files have the same text, so I matched those and shoved a tag in the top, which covered a couple-thousand. Then I started writing the checker, and having it spit out things that were missing. Then I did a oneliner for the GPL files (most of the Linux SPL). Rinse-repeat until I got enough thart I could start understanding the one-offs and the exceptions. It was like, a week of evenings, and mostly mechanical search-and-append. Maybe not the best method for 150K files, I dunno, but I probably would have started the same way, and it might work fine if there's not much variation among those files. Also it was helped by almost everything already having a license header of some sort, so its mostly a text matching problem. There's a few that don't that I had to go back through the git history to try to establish provenance, and there's some we haven't got around to yet (see the exception list). If you had a lot of mixed and incomplete history, then another approach might be needed. Rob. [-- Attachment #2 --] <!DOCTYPE html><html><head><title></title></head><body><div>On Fri, 8 Aug 2025, at 10:40 AM, Warner Losh wrote:</div><blockquote type="cite" id="qt" style=""><div dir="auto"><div><div>On Thu, Aug 7, 2025, 5:42 PM Rob Norris <<a href="mailto:robn@despairlabs.com">robn@despairlabs.com</a>> wrote:</div><div class="qt-gmail_quote qt-gmail_quote_container"><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div>Incidentally, this year we (OpenZFS) started some work on getting license tags onto everything and enforcing them in CI, see <a href="https://github.com/openzfs/zfs/pull/17001" target="_blank" rel="noreferrer">https://github.com/openzfs/zfs/pull/17001</a><br></div></blockquote></div><div><br></div><div>What tool did you use to change those 3200-odd files? We have 150k files last I checked...</div></div></div></blockquote><div><br></div><div>Honestly, Perl one-liners. Most of the files have the CDDL files have the same text, so I matched those and shoved a tag in the top, which covered a couple-thousand. Then I started writing the checker, and having it spit out things that were missing. Then I did a oneliner for the GPL files (most of the Linux SPL). Rinse-repeat until I got enough thart I could start understanding the one-offs and the exceptions. It was like, a week of evenings, and mostly mechanical search-and-append.</div><div><br></div><div>Maybe not the best method for 150K files, I dunno, but I probably would have started the same way, and it might work fine if there's not much variation among those files. Also it was helped by almost everything already having a license header of some sort, so its mostly a text matching problem. There's a few that don't that I had to go back through the git history to try to establish provenance, and there's some we haven't got around to yet (see the exception list). If you had a lot of mixed and incomplete history, then another approach might be needed.</div><div><br></div><div>Rob.<br></div></body></html>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?65d7020b-b397-46de-9034-b35afd20d031>
