From owner-freebsd-arch@freebsd.org Sun Nov 29 05:06:01 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8BE5FA3B602 for ; Sun, 29 Nov 2015 05:06:01 +0000 (UTC) (envelope-from dongseob.park@gmail.com) Received: from mail-ob0-x232.google.com (mail-ob0-x232.google.com [IPv6:2607:f8b0:4003:c01::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5472512E8 for ; Sun, 29 Nov 2015 05:06:01 +0000 (UTC) (envelope-from dongseob.park@gmail.com) Received: by obdgf3 with SMTP id gf3so106828295obd.3 for ; Sat, 28 Nov 2015 21:06:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=6G8l4CBHVyoph77ibcJThuNIUwovaer9qUigpNfIMdc=; b=ZnboNHR2f4xB1UrytffnqZJr1xDAZeQMHqJivMUe9pq6p41MnBKmoFxTwEPgg/Nub0 NpA5iBqyj9aT7vkgjDpVRviMJ1K5Mx42B/jM/ptSNuokNnxSNtee1y1zwzdATT/uboq6 YigCvp+QygH8aIImmsj1YB9GDACa9ZstO5wsW3ujw5tL4jheG4g7ZKLPOvXeRCLVEZBs 5scNUuzF5CjfFr8XrTxjSpsvc4dsJdwpy5NjXKEyizP4YopnmKio1wPjkBPCqj/uU8nC 68z65VE7djuVuM/PwI4AK62Q0ohoQVlWOn6VqDAS+Rzv2MMDc/34QPUtUehMp2nNF4GI 4WOw== MIME-Version: 1.0 X-Received: by 10.182.66.116 with SMTP id e20mr38167268obt.68.1448773560630; Sat, 28 Nov 2015 21:06:00 -0800 (PST) Received: by 10.202.49.21 with HTTP; Sat, 28 Nov 2015 21:06:00 -0800 (PST) Date: Sun, 29 Nov 2015 14:06:00 +0900 Message-ID: Subject: Found aarch64 image, How do I run this image? From: Dongseob Park To: freebsd-arch@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Nov 2015 05:06:01 -0000 Hello, I found aarch64 VM images on FreeBSD homepage. ( ftp://ftp.freebsd.org/pub/FreeBSD/snapshots/VM-IMAGES/11.0-CURRENT/aarch64/Latest/ ) I know, Virtualbox doesn't support arm64-aarch64 but there're vhd/vmdk images is available. I guess it can be run in qemu but does someone tried that images on x86/x86_64 host? Thank you From owner-freebsd-arch@freebsd.org Sun Nov 29 15:53:53 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6E400A3CF0B for ; Sun, 29 Nov 2015 15:53:53 +0000 (UTC) (envelope-from carpeddiem@gmail.com) Received: from mail-io0-x234.google.com (mail-io0-x234.google.com [IPv6:2607:f8b0:4001:c06::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3A4131431 for ; Sun, 29 Nov 2015 15:53:53 +0000 (UTC) (envelope-from carpeddiem@gmail.com) Received: by ioir85 with SMTP id r85so151221309ioi.1 for ; Sun, 29 Nov 2015 07:53:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=reyhDNx94yXF6xN3X2fsQ6SoCWbItwWzbqddNy0+MLA=; b=Q6tvSeJGDbTnATHtJ9dMBW8BF4AO4qnWtibH6U/aWaWii9pSltG+oF5dVvvPw2GUYU xo2oKqqVmPVouOPPQl4yAKU1qZO+wfvkNBJY2EFNA7T+8OwGF+iA95LjalmqIo7p/KxS iYEsJBMCW2hepyzbOd6pGD5RvfdgV9BBLMdewjeCHoqZm9U7cKhkMITPfldKYMbsIf/a IrpMnwf+c9Eob5gQBWmJP4ETP5xxMlCHj5PDp8ddpUuW+ViCJQTQuUQhRHQJqlvuLW2J /LtoQG5OskUe1v6wTTWTi0+jSUNybuDZm6IUY6THRTogeAqVuKlskXyBdve0IFuo6FcX 76rg== X-Received: by 10.107.30.80 with SMTP id e77mr54212030ioe.180.1448812432714; Sun, 29 Nov 2015 07:53:52 -0800 (PST) MIME-Version: 1.0 Sender: carpeddiem@gmail.com Received: by 10.107.169.85 with HTTP; Sun, 29 Nov 2015 07:53:33 -0800 (PST) In-Reply-To: References: From: Ed Maste Date: Sun, 29 Nov 2015 10:53:33 -0500 X-Google-Sender-Auth: MgbEdOO2Q3-VLSPJXRDo7-SvplE Message-ID: Subject: Re: Found aarch64 image, How do I run this image? To: Dongseob Park Cc: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Nov 2015 15:53:53 -0000 On 29 November 2015 at 00:06, Dongseob Park wrote: > Hello, > > I found aarch64 VM images on FreeBSD homepage. ... > I guess it can be run in qemu but does someone tried that images on > x86/x86_64 host? Yes, the images work well with QEMU AArch64 system mode. Detailed instructions for running them are included in the snapshot announcements e.g. https://lists.freebsd.org/pipermail/freebsd-snapshots/2015-November/000184.html From owner-freebsd-arch@freebsd.org Sun Nov 29 18:04:27 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E52EFA220A9 for ; Sun, 29 Nov 2015 18:04:27 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qk0-x230.google.com (mail-qk0-x230.google.com [IPv6:2607:f8b0:400d:c09::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9A4691196 for ; Sun, 29 Nov 2015 18:04:27 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by qkao63 with SMTP id o63so51429789qka.2 for ; Sun, 29 Nov 2015 10:04:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=pWfgGDPFecq44833b/xoqnPkdnrhkGPIMCf5P9ZuA44=; b=0+gBWY8tTzynG313thHZUiBjspPGEA2eALSdungu0kmhxcExSRVeVZiY2cNkKS8sEo y0dqJOOfyXNYBfnp94uEOI2UoAYClgYBRCMvdl+4iV2ueEhxcivFh1GEo1I0m9+VtLp9 V/RHgHzHgX5mNXZ+ff8U83KgpdTd13nktABODYtiIm4ZvmYPmupE1BFSwqwOKrK6I0UX WO51n+YrytxSxZzm4Nt4PPeeBbMq/GiZBuqahwdljjOj0+3OXZDnsaFOUs8CiM2haWSq HU87rzu+aHa2+0kHy/MttNlD0SlMk8ZhUuQrnfFQL+AK7gRTzj2Jel7srYnw5rCsOnWf 6yCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:date:message-id:subject:from :to:content-type; bh=pWfgGDPFecq44833b/xoqnPkdnrhkGPIMCf5P9ZuA44=; b=f62Ubw+NFiJveBEo5YPCOur0pYbB9yE61dXZWEwChc8Z1ViS7DUqvLbFwUkJ7+hKE7 3ZN/BJ8L2OrMi7kg2wXHoSUFR6w4lOZpVPi1RprEa8a+xeEsXrFyQEIhlpRbcPVeHF/w /HKvybkONsI1RhJL/NHc6tUwwdn8tBIx+TfDzWG9lgwohKV/ylNzwcAZ+0ML1e5EH6wi 9ow8k0aiUYQGoea0h1YSQl3cENySXpnWq+TGpkwq1Xj9nZHuo93Q6LFInzjdBNhnNwfN SpV6PRSMpX5yPpWpFLrPg9/ccbiIqHVvXdH5pN80BJTl6Cfqno2VWMjA7AMCYUF3K18+ 0Smg== X-Gm-Message-State: ALoCoQlt5G2bjYe1HNjFAQ1+GF0ekkQcZKQa3fCRF4X2iRVeNiSyzVpEp8q9wUxwMUjObxLn98Sb MIME-Version: 1.0 X-Received: by 10.55.23.170 with SMTP id 42mr66812812qkx.42.1448820266417; Sun, 29 Nov 2015 10:04:26 -0800 (PST) Sender: wlosh@bsdimp.com Received: by 10.140.27.181 with HTTP; Sun, 29 Nov 2015 10:04:26 -0800 (PST) X-Originating-IP: [50.253.99.174] Date: Sun, 29 Nov 2015 11:04:26 -0700 X-Google-Sender-Auth: JyfejodoBCDxxwG92XrHck_qCD0 Message-ID: Subject: mtree "language" enhancements From: Warner Losh To: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Nov 2015 18:04:28 -0000 Greetings, As part of making NanoBSD buildable by non-root, I've found a need to have a richer mtree language than we currently have. mtree started out as a language to express hierarchies of files. It does a decent job at that, even if some of the tools that we have in the tree aren't so great about manipulating them. One could easily wish for better tools, but that's not the topic of this thread. So, I've started to move the language into one that can also journal changes to a tree, and have been moving NanoBSD to using wrappers that do the changes to the tree and record the journal events at the end of the metalog produced from buildworld. I have a second tool that reads the meta log, and applies the actions to the earlier entries and then produces a final metalog that's used for makefs. These tools are still evolving, but before I got too close to the point of committing, I thought I'd post a proposed extension to mtree for comments so I don't have to change too much. I'd like a new type called 'action' (so type=action in the records). This type is defined loosely to manipulate and earlier entry (or maybe entries, still unsure) in the file. Each action entry would have an 'action' keyword. The keywords I've defined so far are as follows: 1. "unlink" which throws away the previous entry. That entry has been removed. It may apply to files or directories, but it is an error not to remove all entries in a directory when removing the directory. 2. "move" which relocates a previous entry. An additional targetpath keyword specifies the ultimate destination for this entry. 3. "copy" which duplicates a previous entry. It too takes targetpath. 4. "meta" which changes the meta data of the previous entry. All keywords on this are merged with the previous entry. The one other thing that my merging tool does is to remove all size keywords. In the NanoBSD environment, size is irrelevant. Files are replaced and appended to all the time in the build process, and it doesn't make sense to track the size. makefs fails if the size is different, so post-processing of the tree, say to add a new default to /etc/defaults/rc.conf or to tweak /etc/ttys to turn on/off a tty (or append a new entry) will cause it to fail. I would be nice of mtree could do this, but is simply can't (but see above for whining about better tools being beyond the scope of this). If things go well, we could eventually move these extensions into mtree so that the post-processing stage is no longer necessary. I'm content to maintain the hundred or two lines of awk I've written to implement it. I chose awk because it does the job well enough, though python might do it better. But I don't want to talk about that choice since right now it is purely internal to NanoBSD (though I hope that other build orchestration systems like src/release and crochet look to adopt). Comments? Warner From owner-freebsd-arch@freebsd.org Sun Nov 29 18:16:11 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 35CC1A223EE for ; Sun, 29 Nov 2015 18:16:11 +0000 (UTC) (envelope-from phk@phk.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id F1E611A24 for ; Sun, 29 Nov 2015 18:16:10 +0000 (UTC) (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.55.3]) by phk.freebsd.dk (Postfix) with ESMTP id D6ABB4F865; Sun, 29 Nov 2015 18:16:08 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.15.2/8.15.2) with ESMTP id tATIG7bh044937; Sun, 29 Nov 2015 18:16:07 GMT (envelope-from phk@phk.freebsd.dk) To: Warner Losh cc: "freebsd-arch@freebsd.org" Subject: Re: mtree "language" enhancements In-reply-to: From: "Poul-Henning Kamp" References: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <44935.1448820967.1@critter.freebsd.dk> Content-Transfer-Encoding: quoted-printable Date: Sun, 29 Nov 2015 18:16:07 +0000 Message-ID: <44936.1448820967@critter.freebsd.dk> X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Nov 2015 18:16:11 -0000 -------- In message , Warner Losh writes: >As part of making NanoBSD buildable by non-root, I've found a need to hav= e >a richer mtree language than we currently have. >I'd like a new type called 'action' (so type=3Daction in the records). Th= is >type is defined loosely to manipulate and earlier entry (or maybe entries= , >still unsure) in the file. I suggest you define this so that all records have an action, and that the default action is "create" >2. "move" which relocates a previous entry. An additional targetpath >keyword specifies the ultimate destination for this entry. >3. "copy" which duplicates a previous entry. It too takes targetpath. Is targetpath absolute or relative ? Can it reach out of the mtree root ? >4. "meta" which changes the meta data of the previous entry. All keywords >on this are merged with the previous entry. System-III called this "chmog" if I recall correctly :-) >The one other thing that my merging tool does is to remove all size >keywords. That sounds wrong to me. Shouldn't you just emit "meta" records updating the size as appropriate ? What about digest fields ? -- = Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe = Never attribute to malice what can adequately be explained by incompetence= . From owner-freebsd-arch@freebsd.org Sun Nov 29 18:58:50 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 51C58A22EF1 for ; Sun, 29 Nov 2015 18:58:50 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qg0-x22c.google.com (mail-qg0-x22c.google.com [IPv6:2607:f8b0:400d:c04::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0271C1C1F for ; Sun, 29 Nov 2015 18:58:50 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by qgeb1 with SMTP id b1so104753885qge.1 for ; Sun, 29 Nov 2015 10:58:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=vAc4IdjrxBOEGT48D7sMUiLSwoq1q0xU/woPUOc2wTQ=; b=K98xmwhrJe3Pb7C3HHAk/YCq94XSbwWvQw7TqawQ9CjWS4FmAoXiPRFYPhrfpEz0a3 JbEMN8PkyT/T3dIcYPnI29z6xT/cub3yg+uG3BzoCz4SA5x95wKiCdyBrcVppkhGC0r2 a5bTOPRmCqTQhl1ho0eDnzvMidCFviAVwDtgT9i+uU3SeEq4Bd6j4jr2mAT9KLVz2fDp CKdVMVAj5FQbkpHg/56IPAcGI21EQSjPZOGOv/JqFNCvq9DvfLcrK1ybGNjdaxApfRdL r6QfPll5SczkYMGtn6gKPOQB0a8mE0+ZV6Igx4dMU3KIw2DYJWBEoJKdbpJ1DZNq4Kom +eaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=vAc4IdjrxBOEGT48D7sMUiLSwoq1q0xU/woPUOc2wTQ=; b=Dwy2k7iacJkVrOGEB2fJTSjsbWfig0uYHMuIkdpcsSnnbYWI3i5R0e/TeDrDk57HlY B5GpwMZxX16qR+2C7nwaXIRB3eBLWAR0jecU4/XxKe9Z48y72nYH98sgS03b4ktXcTd5 JJZsLjChaqW17tihCPPX31ZGJoVhub0lPPJS3d41s1ZpaJPnsIkAgT1vVdZ/x+a3R8fr sKGQO4k5n8BjDkRkoRZcU9T0sHiJtFmSt8pezXVT+7LaQzGdWvADYBx4rJq6nEQuaiiC ok+0X1dIIkkhvwrAUleUwR7FIM6XMHnhXrq6sOnVr0ZVkf0Q1qbIlrrQedKt2EbNh6vd S3wQ== X-Gm-Message-State: ALoCoQmrxzVz6DAhJW8poj5Zw6sjSYesvoyhGmyx1oC9dy5L5KI+z6S55qC9dWIIyEXmgINSETYb MIME-Version: 1.0 X-Received: by 10.140.250.70 with SMTP id v67mr72858582qhc.43.1448823528479; Sun, 29 Nov 2015 10:58:48 -0800 (PST) Sender: wlosh@bsdimp.com Received: by 10.140.27.181 with HTTP; Sun, 29 Nov 2015 10:58:48 -0800 (PST) X-Originating-IP: [50.253.99.174] In-Reply-To: <44936.1448820967@critter.freebsd.dk> References: <44936.1448820967@critter.freebsd.dk> Date: Sun, 29 Nov 2015 11:58:48 -0700 X-Google-Sender-Auth: -kl1I4-yMiPzw2MsU5BGsWBxY5I Message-ID: Subject: Re: mtree "language" enhancements From: Warner Losh To: Poul-Henning Kamp Cc: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Nov 2015 18:58:50 -0000 On Sun, Nov 29, 2015 at 11:16 AM, Poul-Henning Kamp wrote: > -------- > In message < > CANCZdfrDtfkwKxMV3o9tcQNzBQDKZdTx1JErkTKtC7UZORT5aA@mail.gmail.com> > , Warner Losh writes: > > >As part of making NanoBSD buildable by non-root, I've found a need to have > >a richer mtree language than we currently have. > > >I'd like a new type called 'action' (so type=action in the records). This > >type is defined loosely to manipulate and earlier entry (or maybe entries, > >still unsure) in the file. > > I suggest you define this so that all records have an action, and that > the default action is "create" >From a practical point of view, I didn't consider this, but that is what would be a logical consequence of these extensions. > > >2. "move" which relocates a previous entry. An additional targetpath > >keyword specifies the ultimate destination for this entry. > >3. "copy" which duplicates a previous entry. It too takes targetpath. > > Is targetpath absolute or relative ? > relative to top of tree. > Can it reach out of the mtree root ? Nope. Those cases need entirely new entries. > > >4. "meta" which changes the meta data of the previous entry. All keywords > >on this are merged with the previous entry. > > System-III called this "chmog" if I recall correctly :-) I love that term. I'll steal it :) > > >The one other thing that my merging tool does is to remove all size > >keywords. > > That sounds wrong to me. Shouldn't you just emit "meta" records updating > the size as appropriate ? > Emitting records that change the size is possible, but would add an extra step. It's easy to catch mv, rm, etc, but hard to catch >>. I took the easy way out of just ignoring size changes, though one could add a nano_resize command that you need to call after changing the size of a file in the post-processing phase. > What about digest fields ? > In my use case, they are irrelevant. They aren't generated by buildworld's metalog, and aren't generally useful. They might add some protection against tampering between when the tree is created and when it is put into a partition, but that's racy. For an attacker, if they can replace the file after it is created but before the checksum is run, they win. So there's little value here for me. However, having said that, digest fields either should be discarded (for the same reason as size), or they should be correct before the dedup tool / enhanced mtree gets to them. This gets into the nuts and bolts of NanoBSD: we copy files around all the time, but have no spec for them. The usual answer is to have a bunch of chmod / chown calls that 'fix' them up and generate a mtree for the image so you can protect against corruption in the field (or at least know what changed). In a nopriv-build, you need to somehow record these changes. Do I continue the traditional behavior, or do I require a new mtree spec for all the files you wish top copy and use that to modify the metalog, or hack the permissions directly for the priv-build case. The decision between discard and check likely is an input to the dedup tool. For NanoBSD the decision is likely to default to discard. But other tools might want to check, and some NanoBSD users may wish to climb the hill to being correct by adding calls to correct the size everywhere. My first goal is to create a tool that produces correct images with the right permissions. A secondary goal would be to safe-guard the process from unintended changes that would be caught by size and/or digest changes. It isn't a current feature of NanoBSD, but that doesn't make it undesirable. Especially if your NanoBSD build process puts precious files onto the media that you want to make sure the rest of the build process doesn't tamper with accidentally to guard against bugs... Warner From owner-freebsd-arch@freebsd.org Sun Nov 29 18:59:45 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 336EFA22F4F for ; Sun, 29 Nov 2015 18:59:45 +0000 (UTC) (envelope-from tim@kientzle.com) Received: from monday.kientzle.com (kientzle.com [142.254.26.11]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F15E41CD1; Sun, 29 Nov 2015 18:59:44 +0000 (UTC) (envelope-from tim@kientzle.com) Received: (from root@localhost) by monday.kientzle.com (8.14.4/8.14.4) id tATJ0dUt078799; Sun, 29 Nov 2015 19:00:39 GMT (envelope-from tim@kientzle.com) Received: from [192.168.2.108] (192.168.1.101 [192.168.1.101]) by kientzle.com with SMTP id ad4h8n64z9wwwb5fgxrj5zs5vn; Sun, 29 Nov 2015 19:00:39 +0000 (UTC) (envelope-from tim@kientzle.com) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.1 \(3096.5\)) Subject: Re: mtree "language" enhancements From: Tim Kientzle In-Reply-To: Date: Sun, 29 Nov 2015 10:59:36 -0800 Cc: "freebsd-arch@freebsd.org" , Michal Ratajsky , Brooks Davis Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Warner Losh X-Mailer: Apple Mail (2.3096.5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Nov 2015 18:59:45 -0000 Sounds interesting. Have you talked with Michal (CCed) who is working on a libmtree library? The capabilities you're describing here really need to be bundled into a = library, I think. In particular, the ability to "unlink", "copy", etc, = is much more useful if you can directly query the mtree file contents to = perform conditional changes. (For example, it may be important to = remove an empty directory which requires you to be able to query whether = a directory has files in it.) I would also be interested in a description of the processing model. It = sounds like you're assuming the same model used by the current mtree = program -- mtree files are processed sequentially line-by-line as they = are read. For instance, libarchive's mtree processor works differently; it reads = the entire input, merging redundant lines for the same file, and then = processes the list. This is more explicitly declarative, and simplifies = things like modifying the ownership or permissions of already-listed = files. > Each action entry would have an 'action' keyword. In terms of the language per se, this seems unnecessary. I've = proposed alternate language below that omits the unnecessary = "type=3Daction" by just adding new keywords. > The keywords I've defined > so far are as follows: > 1. "unlink" which throws away the previous entry. That entry has been > removed. It may apply to files or directories, but it is an error not = to > remove all entries in a directory when removing the directory. # When set on an entry, a matching file on disk will be removed. # This would also be useful for things like ObsoleteFiles unlink=3Dtrue > 2. "move" which relocates a previous entry. An additional targetpath > keyword specifies the ultimate destination for this entry. # When set on an entry, moves the existing file to the new name rename=3D # Example foo/bar type=3Dfile owner=3Droot mode=3D0755 rename=3Dfoo/baz > 3. "copy" which duplicates a previous entry. It too takes target path. # As with rename, except it copies the contents. copy_from=3D # properties that are not specified will be copied as well # Create foo/bar by copying foo/baz, preserving all attributes foo/bar type=3Dfile copy_from=3Dfoo/baz # Create foo/bar as above, but modify the owner foo/bar owner=3Ddialer type=3Dfile copy_from=3Dfoo/baz > 4. "meta" which changes the meta data of the previous entry. All = keywords > on this are merged with the previous entry. As above, libarchive's mtree processor already does this by default; no = language change is needed. > The one other thing that my merging tool does is to remove all size > keywords. ... [comments about modifying existing files] One common case here is appending new contents to an existing file. = That could similarly be handled with the same pattern: # Append from source foo/bar append_from=3D In particular, that removes the need to find the source file to modify = it in-place. I've run into various headaches with Crochet when the = /usr/obj layout changes between releases and Crochet cannot find the new = location of a file. This would remove the need to always modify the = file in-place. (But not all.) Cheers, Tim > On Nov 29, 2015, at 10:04 AM, Warner Losh wrote: >=20 > Greetings, >=20 > As part of making NanoBSD buildable by non-root, I've found a need to = have > a richer mtree language than we currently have. >=20 > mtree started out as a language to express hierarchies of files. It = does a > decent job at that, even if some of the tools that we have in the tree > aren't so great about manipulating them. One could easily wish for = better > tools, but that's not the topic of this thread. >=20 > So, I've started to move the language into one that can also journal > changes to a tree, and have been moving NanoBSD to using wrappers that = do > the changes to the tree and record the journal events at the end of = the > metalog produced from buildworld. I have a second tool that reads the = meta > log, and applies the actions to the earlier entries and then produces = a > final metalog that's used for makefs. These tools are still evolving, = but > before I got too close to the point of committing, I thought I'd post = a > proposed extension to mtree for comments so I don't have to change too = much. >=20 > I'd like a new type called 'action' (so type=3Daction in the records). = This > type is defined loosely to manipulate and earlier entry (or maybe = entries, > still unsure) in the file. >=20 > Each action entry would have an 'action' keyword. The keywords I've = defined > so far are as follows: > 1. "unlink" which throws away the previous entry. That entry has been > removed. It may apply to files or directories, but it is an error not = to > remove all entries in a directory when removing the directory. > 2. "move" which relocates a previous entry. An additional targetpath > keyword specifies the ultimate destination for this entry. > 3. "copy" which duplicates a previous entry. It too takes targetpath. > 4. "meta" which changes the meta data of the previous entry. All = keywords > on this are merged with the previous entry. >=20 > The one other thing that my merging tool does is to remove all size > keywords. In the NanoBSD environment, size is irrelevant. Files are > replaced and appended to all the time in the build process, and it = doesn't > make sense to track the size. makefs fails if the size is different, = so > post-processing of the tree, say to add a new default to > /etc/defaults/rc.conf or to tweak /etc/ttys to turn on/off a tty (or = append > a new entry) will cause it to fail. I would be nice of mtree could do = this, > but is simply can't (but see above for whining about better tools = being > beyond the scope of this). >=20 > If things go well, we could eventually move these extensions into = mtree so > that the post-processing stage is no longer necessary. I'm content to > maintain the hundred or two lines of awk I've written to implement it. = I > chose awk because it does the job well enough, though python might do = it > better. But I don't want to talk about that choice since right now it = is > purely internal to NanoBSD (though I hope that other build = orchestration > systems like src/release and crochet look to adopt). >=20 > Comments? >=20 > Warner > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to = "freebsd-arch-unsubscribe@freebsd.org" >=20 From owner-freebsd-arch@freebsd.org Sun Nov 29 19:10:43 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 275DAA3A263 for ; Sun, 29 Nov 2015 19:10:43 +0000 (UTC) (envelope-from sjg@juniper.net) Received: from na01-bl2-obe.outbound.protection.outlook.com (mail-bl2on0103.outbound.protection.outlook.com [65.55.169.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AA57C108D for ; Sun, 29 Nov 2015 19:10:42 +0000 (UTC) (envelope-from sjg@juniper.net) Received: from CO2PR05CA030.namprd05.prod.outlook.com (10.141.241.158) by BN1PR05MB057.namprd05.prod.outlook.com (10.255.202.139) with Microsoft SMTP Server (TLS) id 15.1.331.20; Sun, 29 Nov 2015 19:10:33 +0000 Received: from BL2FFO11FD021.protection.gbl (2a01:111:f400:7c09::111) by CO2PR05CA030.outlook.office365.com (2a01:111:e400:1429::30) with Microsoft SMTP Server (TLS) id 15.1.331.20 via Frontend Transport; Sun, 29 Nov 2015 19:10:33 +0000 Authentication-Results: spf=softfail (sender IP is 66.129.239.18) smtp.mailfrom=juniper.net; bsdimp.com; dkim=none (message not signed) header.d=none;bsdimp.com; dmarc=none action=none header.from=juniper.net; Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.18 as permitted sender) Received: from p-emfe01b-sac.jnpr.net (66.129.239.18) by BL2FFO11FD021.mail.protection.outlook.com (10.173.161.100) with Microsoft SMTP Server (TLS) id 15.1.331.11 via Frontend Transport; Sun, 29 Nov 2015 19:10:32 +0000 Received: from magenta.juniper.net (172.17.27.123) by p-emfe01b-sac.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.123.3; Sun, 29 Nov 2015 11:10:31 -0800 Received: from chaos.jnpr.net (chaos.jnpr.net [172.21.16.28]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id tATJAOD47457; Sun, 29 Nov 2015 11:10:28 -0800 (PST) (envelope-from sjg@juniper.net) Received: from chaos (localhost [IPv6:::1]) by chaos.jnpr.net (Postfix) with ESMTP id 4AA70580A9; Sun, 29 Nov 2015 11:10:24 -0800 (PST) To: Warner Losh CC: "freebsd-arch@freebsd.org" , Subject: Re: mtree "language" enhancements In-Reply-To: References: Comments: In-reply-to: Warner Losh message dated "Sun, 29 Nov 2015 11:04:26 -0700." From: "Simon J. Gerraty" X-Mailer: MH-E 8.6; nmh 1.6; GNU Emacs 24.5.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <5867.1448824224.1@chaos> Content-Transfer-Encoding: quoted-printable Date: Sun, 29 Nov 2015 11:10:24 -0800 Message-ID: <25335.1448824224@chaos> X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1; BL2FFO11FD021; 1:p4ijqT8L2b/Rc0+sMlpsNboJpE/X/TxGVkG6YXEh+SWQu00iXTCbCN7MSjoOzs/L+93EGg3VmtHRFjPcrlhuZ0WyEr4ZHdFPVNx4n2yiOf7er/h6BRtfYG9T7ehQDylGOy2dTDu518+hxyruEK2H/9T9WLJroe+cP0pub3qUcJ5ZDbvi86WcLAgcscjGeqBAfmhnc0oAGEu76F7SWFk6UqQCd6B2Qiz7QF2TmMz2fKw66Vq1mlzFetuUxvBbeYsKiM8wvhRRwF3axVpomwCPsx6sncBfkHjHY8DTsg2hEazG5lghHDcVRWZfP0pRmvaQWbgCj6FrYdTHOMjZPiTwh8Xrt184+AEUJuiYaheUarI= X-Forefront-Antispam-Report: CIP:66.129.239.18; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(2980300002)(189002)(199003)(54094003)(24454002)(76176999)(11100500001)(50466002)(105596002)(117636001)(6806005)(47776003)(50986999)(107886002)(23726003)(1220700001)(4001430100002)(50226001)(77096005)(5008740100001)(586003)(33716001)(97756001)(57986006)(1096002)(189998001)(76506005)(19580405001)(81156007)(5001960100002)(92566002)(106466001)(19580395003)(110136002)(46406003)(2950100001)(97736004)(87936001)(86362001)(69596002)(42262002)(62816006); DIR:OUT; SFP:1102; SCL:1; SRVR:BN1PR05MB057; H:p-emfe01b-sac.jnpr.net; FPR:; SPF:SoftFail; PTR:InfoDomainNonexistent; MX:1; A:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; BN1PR05MB057; 2:bYKKUZGVxsfjrEeRbiIiFCOHUexAQN0gXRBcBAbBoF6/iegysaMicr5dSPns8E5fPAKFjP+r9z/MN0z5OKy7qUtoPYYTjdLvOglLMoEuTPCWyAp/q/2QVDxjq99p0LG9Ms60YXQZX+DtvZ00ochFow==; 3:G+uSrvE8D40tAu/Mpx1LGziT+QvMz7gZWr9Kwwjmuhq31A57sKDEzlJo+6yFLpYnjZNtWY98Z/qdbzohdT2BR8kHTO147QInpOsjg65deMDH+6eBQOUiYSDfEbCBbJsCGWbxK3zdkfGot6LefnRs5XF7A3MkmMXijI9AMQ2ZlDR6sswvcUVCjYn/b+Kzw2rHIoBh4kZZPo4GM6wue0Av0wv7+EoicQIf5ujxGCSqVhM=; 25:2wNehURNAsIwE/rIi7iztzTZfwWVzLGg/uUl/YaUDRHxAqAe3BxAbLArQIn9NbTvj2YbajN/yi0lLvBXKRUlojCBrgTuyTyUpGEI8/swzxJQ5OLwr1xJCMLmCgLEl5Igq47h1xj9qdkbSnSx4tj57IMFWwH5j7AGN7F821mnZgcAfK+811/FjFoGY0dNKSEFR25983ynNSeb7pN4BE36bRVsy6siXFx86jOSa9c+C5QfkXLt9rG8BMMmI+7vBxJux2yM2YYEx5VmqcH6Qycjqw== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BN1PR05MB057; X-Microsoft-Exchange-Diagnostics: 1; BN1PR05MB057; 20:vZhfBFQUh3xBili5nxAHEa5tlLcW7hH5o2Veopmx9XpcBhIolmNxeXh6BcVDq2tRN9+8R5gcCHq4XH9rw4KxoGEWKop2WZ6EfKZViE99jMac1+Jg3S0MwCNFV+wIcpYveat6onh6BeCgs8DmUj/ga0CKkvUOxJVJShYjKsQengDkIbwhJkq4enJdf/epmsUC9mUf+D8Jl5EXyQKnFMoauzaW3DwdFW9NdTK8Aq5J8/yolt1gHUpJiEBGPImgwNnwfcsSk+e6eObzjO0KMqv1rDISCk/5ybGWd6n7xhMTkpDab2ird7c37UWY3k6yG8Y1aR5wkFEERT5Di7C4pDCXLGKsfmdFQEe0qgzkESH853PlW9n2XnNReZ/WdIh5fc2V6h3TK5NWKrClvf8JYl4ZhP6jIgt9mgNvGLbjzz8MfaaNd2VtNZ2YmBKqYPxmScZ1YRkhEZoLgsJz3rr41cnlBoZvOhoAGL/uAwu4XYgjexjun3tfhab4SHdT8YOvk4bX; 4:setPoCI5egWQLHfJ54YH2CLgfpQfAXC5tq+p0fojPVAvFCkHT/KOTaTQ0948oeQsw2v4AsaimEVdRJvNzw1utI61paB8Z1lKPbG47qTImKMwnYmGRVcrSnlRCz2Fmj2HO0/0vLVZghJWE2E8M/yCVCIL95a8/rrwxZ7wARV5NnOjnF3qQHM5HT4NW8CYX1x0Wk6FLKqDgtGS4o9TEckbQyH2GYfvsxdxFjR8epRe5AJ36TiKaaG+fAyHJH88gn21Z9wRPLFLtPbwdZYvrXnqCqxkpR2JFR2DpQjlLzrY9YYZe/UAOAoRKHGb69OhuWaKtGTzURw/relUVSzVXM8QoXddhYJKQ9pvutrMFsWRnO8mLhCYS6w4zW9v5JdYkjJt X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(601004)(2401047)(8121501046)(5005006)(520078)(10201501046)(3002001); SRVR:BN1PR05MB057; BCL:0; PCL:0; RULEID:; SRVR:BN1PR05MB057; X-Forefront-PRVS: 0775716B9D X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BN1PR05MB057; 23:lT20y/aYPHzrNcS9yqp6Zea7Bo4DBLiZDLN65iGyMd?= =?us-ascii?Q?FH9A1FE5QsZuVFLYecm2gZpE7UNG6nPZ0gSVHWcnzKpslzcW3+HxiZA6WRAT?= =?us-ascii?Q?PTqx/32LYCa2cV/BT2udEHqnlF38eABXizykS16n0eLFzPpWNcLkI51mEeBR?= =?us-ascii?Q?KyFRVnj/Z7OkQjK82EnZUQ+fy6dUKHMHC99IVgtQ9KLK1WElRJvtO9Uy1ElL?= =?us-ascii?Q?KeWE3vXTpI44RCAGndMtrs63Klk5k2zG3eLLZWSTNYwgRSY0TsbyJSJKMJL0?= =?us-ascii?Q?qtvnb1WL8DrtYF9pyrcJNmZDfPbtg9fWOZI/SyILRtIT4ysWOxVMO5Lkq5N9?= =?us-ascii?Q?N7dZMX7p+axPmwCIcNMRwRkxit7ndKI5T0P/to7kTWoQFXh65icDiZSLNDtY?= =?us-ascii?Q?5ssSskbSTUfOCCcZq1w4/Xd83FngUpLbfHlXNuEuaKybLxEnp5YrbgcrQRnO?= =?us-ascii?Q?7O7EnYWI/uryWYt6FjWx4LIHf5dbW9uXIgPEvP3C+1KZChPSpj63uZWBpnkd?= =?us-ascii?Q?2lioWfX6/WVx+P/QkuMPQopWUxbQkXUUDlx7wPz63wq6iccsV0g1pHozQReP?= =?us-ascii?Q?YEbPo1a69RbS7dn076x86jrsf/87qLAdRhUnUzqM0kUGZwuF19V+gkyxDRh2?= =?us-ascii?Q?SLRiRXSls5IBVkute6FBxSH2cvPG04WhWITSgXRJMDXecUER6BStYXnZfy6h?= =?us-ascii?Q?Z7tLFF9IF/XDjsp/Gjmgt0DbpPL9PLiloE7yxBeg1xt9TBC5LT1GRvzWOW5j?= =?us-ascii?Q?eas/zDAyRtxumuVVDFU/bZ9MZ2IT9UaU+k8/CJvz8i4T9vwCYzFBi2DWAx3+?= =?us-ascii?Q?YQJG0yaOSKlxA4xYc1ONZiLtIHlqsj1r1q8BJGIFkBR1H4QsY2Nt/uWvJRoW?= =?us-ascii?Q?eAIDhScADozEffYrK4cFb9b6iDE9O8PeywSOpICVuDWrlIg+Scr+AsfZVJkX?= =?us-ascii?Q?c/hVantFGHhADOFidTRet/gP1gMLUMDiJdHms6akgnJvKY1+10+ZsqkVWgaN?= =?us-ascii?Q?lwIsFxQyIl39FIB2n3bEuRd0KQlp2iRhsEqH3MZD0+wSCLbIlpg87YsGhMna?= =?us-ascii?Q?+cwAkpuGAOOb7SfXun4iV3Rr4RSBtaQJqJxZ9hUiPbGmpH+sTjkzBA4G+mly?= =?us-ascii?Q?tijH/qET2/2Gvf7/LEtCFSKOGniV++?= X-Microsoft-Exchange-Diagnostics: 1; BN1PR05MB057; 5:wOQ5XFcvp3CaQ1JltThYPA96qL3dcA4N0aE4S4mjCCMyLS8ay8Iadojt+REY0jf2po9UDBjwkElnSz8rfemYCjMuXXEQclqTRGKP32ZTuqCpy9A844KFMhma35+sDKVNI1k6ohYfaBusnbIP6S6Itw==; 24:4AJeX1YIukdK4JLx36CZjTaR3BiBaFUqr+HC7EgLEya36wuT5Di9BuwBAJIUHzHSzF1QH3qV1ZyrWc0NqZc0Yctw3fZcUyQcQfvFA/gP3QE= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: juniper.net X-MS-Exchange-CrossTenant-OriginalArrivalTime: 29 Nov 2015 19:10:32.9706 (UTC) X-MS-Exchange-CrossTenant-Id: bea78b3c-4cdb-4130-854a-1d193232e5f4 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=bea78b3c-4cdb-4130-854a-1d193232e5f4; Ip=[66.129.239.18]; Helo=[p-emfe01b-sac.jnpr.net] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN1PR05MB057 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Nov 2015 19:10:43 -0000 Warner Losh wrote: > As part of making NanoBSD buildable by non-root, I've found a need to ha= ve > a richer mtree language than we currently have. No fundamental objection there. Indeed I'd really like the ability to provide default uid/gid for the case that a uname/gname cannot be looked up. Or even just a flag to say if lookup fails use 0:0 This would avoid the need to post-process BSD.var.dist to replace all uname/gname with uid=3D0/gid=3D0 during various bootstrap situations. > I'd like a new type called 'action' (so type=3Daction in the records). T= his > type is defined loosely to manipulate and earlier entry (or maybe entrie= s, > still unsure) in the file. > = > Each action entry would have an 'action' keyword. The keywords I've defi= ned would or could? > so far are as follows: > 1. "unlink" which throws away the previous entry. That entry has been > removed. It may apply to files or directories, but it is an error not to > remove all entries in a directory when removing the directory. > 2. "move" which relocates a previous entry. An additional targetpath > keyword specifies the ultimate destination for this entry. > 3. "copy" which duplicates a previous entry. It too takes targetpath. > 4. "meta" which changes the meta data of the previous entry. All keyword= s > on this are merged with the previous entry. Probably need to know a bit more about how NanoBSD is built/packaged to comment more usefully. Any useful references? > The one other thing that my merging tool does is to remove all size > keywords. In the NanoBSD environment, size is irrelevant. Files are .. > replaced and appended to all the time in the build process, and it doesn= 't > make sense to track the size. makefs fails if the size is different, so Agreed. Where do these size keywords come from? We (Juniper) do not have them in any of our mtree based manifests. Which we use directly with makefs. On the off chance it is of interest... I wonder if this style of manifest would simplify your problem? I believe all the code needed (other than makefiles) is in head at least. There are two styles supported, classic mtree: #mtree # # Group IDs used: # 0 wheel # # User IDs used: # 0 root # /set uid=3D0 gid=3D0 mode=3D555 type=3Dfile bin type=3Ddir cat contents=3D"${STAGE_OBJTOP}/bin/cat" cp contents=3D"${STAGE_OBJTOP}/bin/cp" = .. which is good for manually maintained manifests, and for autogenerated (eg via find) an full path format: usr/tests/bin/cat/d_align.in mode=3D0644 contents=3D"/b/sjg/work/stable10/= obj/stage/i386/usr/tests/bin/cat/d_align.in" usr/tests/bin/cat/d_align.out mode=3D0644 contents=3D"/b/sjg/work/stable10= /obj/stage/i386/usr/tests/bin/cat/d_align.out" the two can be combined - an mtree style header with autogenerated info appended. > If things go well, we could eventually move these extensions into mtree = so > that the post-processing stage is no longer necessary. I'm content to > maintain the hundred or two lines of awk I've written to implement it. I > chose awk because it does the job well enough, though python might do it > better. But I don't want to talk about that choice since right now it is > purely internal to NanoBSD (though I hope that other build orchestration > systems like src/release and crochet look to adopt). FWIW we use python when awk/sed etc prove insufficient or cumbersome but awk/sed are usually adequate. From owner-freebsd-arch@freebsd.org Sun Nov 29 19:22:15 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F2C13A3A5F6 for ; Sun, 29 Nov 2015 19:22:14 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qg0-x22b.google.com (mail-qg0-x22b.google.com [IPv6:2607:f8b0:400d:c04::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9C4AA1873 for ; Sun, 29 Nov 2015 19:22:14 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by qgcc31 with SMTP id c31so103675028qgc.3 for ; Sun, 29 Nov 2015 11:22:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=S19dNrPy0MVYJzpMwq79XMl8sMZoEd6/wdRjqf5epbA=; b=hhh2SZlpj8Cpc5X1KeUs6MUAjMVvxsW/W0lgT+JQ6DQuNensqfQMfzpa4qfzx43F8T fbQ9FkGNB/SyxPdR/YyTaiIjoVf0eBXbX+Hf6171YPZuk2cTE0hdNYsccFoM76REK14K EQJxxn4zDt3+c4cwXLpicpvveQ2xKPUesFM8cSLKpVQY4Lvq2wBthCzhCr7UkXmZfHXx FvyYyA+JbjLTF7atJCMTtAhyjHQGUYf2TMnA1hoLCI1DNFJiIqt4+ws4aF6VXKrq6Byz cqiUZ/Ufwkgjt9uR6LlO7pby0YWE7pkFckJHjOzIk7jztO6LF0Q/QaqybnfRrmSEYGrg 1htw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=S19dNrPy0MVYJzpMwq79XMl8sMZoEd6/wdRjqf5epbA=; b=F2mYECS10ue6U25/NOHskBc8L0klcFG+5R7H97FT06TU8C0gk4hFWSbp0iuuacpgl0 P6T7ppx8QXIBFl8a94yD/V9Pr+VhUpj+CtJdQic/KUU4b7PJ03QLtRF/sNu/QDj/uHtV bQ+Ot/tX+uzA83ekFIAikapAjAiqJpAij+kIwQjniZTGbm9CQgekOHWPJxZYb5Rrz/67 bVfyvxFvjljFTrVYT1yTa2sIzWd/L7jCbopXygjnOe5NqBfksJCZcUllI9TgOXq7VWhn 1H7kMq7HAoQbpXotY0hVB6VNPLi9rBFh9Cd0UNhhSf3BYHIAqhYcKUYzudA+RlyyWauI SuNQ== X-Gm-Message-State: ALoCoQkCdLMghFYaMYKW9Z60wFihbGRHruamWE0fdmN5WlvrOp3RcMcnO+sYTXMlKFHdxbD1Khpe MIME-Version: 1.0 X-Received: by 10.140.99.86 with SMTP id p80mr49010003qge.97.1448824933653; Sun, 29 Nov 2015 11:22:13 -0800 (PST) Sender: wlosh@bsdimp.com Received: by 10.140.27.181 with HTTP; Sun, 29 Nov 2015 11:22:13 -0800 (PST) X-Originating-IP: [50.253.99.174] In-Reply-To: References: Date: Sun, 29 Nov 2015 12:22:13 -0700 X-Google-Sender-Auth: Job1OmA60ftleSj_qqDjJHpoj_0 Message-ID: Subject: Re: mtree "language" enhancements From: Warner Losh To: Tim Kientzle Cc: "freebsd-arch@freebsd.org" , Michal Ratajsky , Brooks Davis Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Nov 2015 19:22:15 -0000 On Sun, Nov 29, 2015 at 11:59 AM, Tim Kientzle wrote: > Sounds interesting. > > Have you talked with Michal (CCed) who is working on a libmtree library? > No. I haven't. I've been thinking mostly what's the fastest way I can get NanoBSD working in a nopriv (-DNO_ROOT) environment that wouldn't be hard to push into a library later. > The capabilities you're describing here really need to be bundled into a > library, I think. In particular, the ability to "unlink", "copy", etc, is > much more useful if you can directly query the mtree file contents to > perform conditional changes. (For example, it may be important to remove > an empty directory which requires you to be able to query whether a > directory has files in it.) > In the NanoBSD context, these entries would be automatically generated, so the tree is at hand. There'd be no need for this conditional stuff, though having it as an additional extension wouldn't be bad. > I would also be interested in a description of the processing model. It > sounds like you're assuming the same model used by the current mtree > program -- mtree files are processed sequentially line-by-line as they are > read. > The processing model is that the resulting mtree file is read sequentially. Each new entry either creates a new node in an internal representation, or modifies a previous node. Once everything has been processed, the internal representation would be used to do something. In my case, I'd output an mtree file free of these extensions. > For instance, libarchive's mtree processor works differently; it reads the > entire input, merging redundant lines for the same file, and then processes > the list. This is more explicitly declarative, and simplifies things like > modifying the ownership or permissions of already-listed files. Yes. My awk script that is the first manifestation of these extensions is implemented this way. That's why I described it as a journal, but didn't explain that in my nomenclature, a journal is process first to last to get the current state. > > > Each action entry would have an 'action' keyword. > > In terms of the language per se, this seems unnecessary. I've proposed > alternate language below that omits the unnecessary "type=action" by just > adding new keywords. That would work too. I came up with the type=action thing as a way to avoid a lot of new keywords, and to segregate the new actions from the old, but what you propose would also work and might be more general. > The keywords I've defined > > so far are as follows: > > 1. "unlink" which throws away the previous entry. That entry has been > > removed. It may apply to files or directories, but it is an error not to > > remove all entries in a directory when removing the directory. > > # When set on an entry, a matching file on disk will be removed. > # This would also be useful for things like ObsoleteFiles > unlink=true OK. That's a little different than what I had in mind. My notion was that the tree would be modified in place to remove the file, and this entry would announce that action so the mtree internal representation could be modified to reflect that. Though I do see value in your approach. > > > 2. "move" which relocates a previous entry. An additional targetpath > > keyword specifies the ultimate destination for this entry. > > # When set on an entry, moves the existing file to the new name > rename= > > # Example > foo/bar type=file owner=root mode=0755 rename=foo/baz That would work. > > > 3. "copy" which duplicates a previous entry. It too takes target path. > > # As with rename, except it copies the contents. > copy_from= > Yes. > # properties that are not specified will be copied as well > # Create foo/bar by copying foo/baz, preserving all attributes > foo/bar type=file copy_from=foo/baz > # Create foo/bar as above, but modify the owner > foo/bar owner=dialer type=file copy_from=foo/baz s/owner/uname=/ but I like that. > > 4. "meta" which changes the meta data of the previous entry. All keywords > > on this are merged with the previous entry. > > As above, libarchive's mtree processor already does this by default; no > language change is needed. OK. If it matches existing practice, I'm cool with the change. > > The one other thing that my merging tool does is to remove all size > > keywords. ... [comments about modifying existing files] > > One common case here is appending new contents to an existing file. That > could similarly be handled with the same pattern: > > # Append from source > foo/bar append_from= > That's a novel idea. My most-processor might have a little trouble with it if we were trying not to modify the actual target tree. But with modify in place, we could make it work. > In particular, that removes the need to find the source file to modify it > in-place. I've run into various headaches with Crochet when the /usr/obj > layout changes between releases and Crochet cannot find the new location of > a file. This would remove the need to always modify the file in-place. > (But not all.) > It is a useful pattern. Most of the nanobsd scripts I've seen use >> to append individual files, one line at a time. Warner Cheers, > > Tim > > > > > On Nov 29, 2015, at 10:04 AM, Warner Losh wrote: > > > > Greetings, > > > > As part of making NanoBSD buildable by non-root, I've found a need to > have > > a richer mtree language than we currently have. > > > > mtree started out as a language to express hierarchies of files. It does > a > > decent job at that, even if some of the tools that we have in the tree > > aren't so great about manipulating them. One could easily wish for better > > tools, but that's not the topic of this thread. > > > > So, I've started to move the language into one that can also journal > > changes to a tree, and have been moving NanoBSD to using wrappers that do > > the changes to the tree and record the journal events at the end of the > > metalog produced from buildworld. I have a second tool that reads the > meta > > log, and applies the actions to the earlier entries and then produces a > > final metalog that's used for makefs. These tools are still evolving, but > > before I got too close to the point of committing, I thought I'd post a > > proposed extension to mtree for comments so I don't have to change too > much. > > > > I'd like a new type called 'action' (so type=action in the records). This > > type is defined loosely to manipulate and earlier entry (or maybe > entries, > > still unsure) in the file. > > > > Each action entry would have an 'action' keyword. The keywords I've > defined > > so far are as follows: > > 1. "unlink" which throws away the previous entry. That entry has been > > removed. It may apply to files or directories, but it is an error not to > > remove all entries in a directory when removing the directory. > > 2. "move" which relocates a previous entry. An additional targetpath > > keyword specifies the ultimate destination for this entry. > > 3. "copy" which duplicates a previous entry. It too takes targetpath. > > 4. "meta" which changes the meta data of the previous entry. All keywords > > on this are merged with the previous entry. > > > > The one other thing that my merging tool does is to remove all size > > keywords. In the NanoBSD environment, size is irrelevant. Files are > > replaced and appended to all the time in the build process, and it > doesn't > > make sense to track the size. makefs fails if the size is different, so > > post-processing of the tree, say to add a new default to > > /etc/defaults/rc.conf or to tweak /etc/ttys to turn on/off a tty (or > append > > a new entry) will cause it to fail. I would be nice of mtree could do > this, > > but is simply can't (but see above for whining about better tools being > > beyond the scope of this). > > > > If things go well, we could eventually move these extensions into mtree > so > > that the post-processing stage is no longer necessary. I'm content to > > maintain the hundred or two lines of awk I've written to implement it. I > > chose awk because it does the job well enough, though python might do it > > better. But I don't want to talk about that choice since right now it is > > purely internal to NanoBSD (though I hope that other build orchestration > > systems like src/release and crochet look to adopt). > > > > Comments? > > > > Warner > > _______________________________________________ > > freebsd-arch@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > > > From owner-freebsd-arch@freebsd.org Sun Nov 29 22:49:07 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CC676A3C786 for ; Sun, 29 Nov 2015 22:49:07 +0000 (UTC) (envelope-from tim@kientzle.com) Received: from monday.kientzle.com (kientzle.com [142.254.26.11]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9C7751D0C; Sun, 29 Nov 2015 22:49:07 +0000 (UTC) (envelope-from tim@kientzle.com) Received: (from root@localhost) by monday.kientzle.com (8.14.4/8.14.4) id tATMo7sI079532; Sun, 29 Nov 2015 22:50:07 GMT (envelope-from tim@kientzle.com) Received: from [192.168.2.108] (192.168.1.101 [192.168.1.101]) by kientzle.com with SMTP id mup93iqhwzimyvqxn4c8dbxkd2; Sun, 29 Nov 2015 22:50:06 +0000 (UTC) (envelope-from tim@kientzle.com) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.1 \(3096.5\)) Subject: Re: mtree "language" enhancements From: Tim Kientzle In-Reply-To: Date: Sun, 29 Nov 2015 14:49:03 -0800 Cc: "freebsd-arch@freebsd.org" , Michal Ratajsky , Brooks Davis Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Warner Losh , "Simon J. Gerraty" X-Mailer: Apple Mail (2.3096.5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Nov 2015 22:49:08 -0000 > On Nov 29, 2015, at 11:22 AM, Warner Losh wrote: >=20 > I would also be interested in a description of the processing model. = It sounds like you're assuming the same model used by the current mtree = program -- mtree files are processed sequentially line-by-line as they = are read. >=20 > The processing model is that the resulting mtree file is read = sequentially. Each > new entry either creates a new node in an internal representation, or = modifies > a previous node. Once everything has been processed, the internal = representation > would be used to do something. In my case, I'd output an mtree file = free of these > extensions. Good. I like that model. > > 1. "unlink" which throws away the previous entry. >=20 > # When set on an entry, a matching file on disk will be removed. > # This would also be useful for things like ObsoleteFiles > unlink=3Dtrue >=20 > OK. That's a little different than what I had in mind. My notion was = that > the tree would be modified in place to remove the file, and this entry > would announce that action so the mtree internal representation could > be modified to reflect that. Though I do see value in your approach. I was thinking that the 'mtree' command-line tool could be useful for = bulk-remove operations (or more generally for updating an existing tree = including removal of obsolete files). But bulk-remove is probably = easier to do with 'xargs rm', so that might be overkill. Simon J. Gerry suggested: > which is good for manually maintained manifests, > and for autogenerated (eg via find) an full path format: >=20 > usr/tests/bin/cat/d_align.in mode=3D0644 = contents=3D"/b/sjg/work/stable10/obj/stage/i386/usr/tests/bin/cat/d_align.= in" > usr/tests/bin/cat/d_align.out mode=3D0644 = contents=3D"/b/sjg/work/stable10/obj/stage/i386/usr/tests/bin/cat/d_align.= out" >=20 > the two can be combined - an mtree style header with autogenerated > info appended. libarchive also supports this mixture. It's a little tricky to parse = accurately, though. I think libarchive considers any line a "full path" = line if the name has a '/' in it. So you occasionally need to use = things like './foo' to force the right interpretation. And of course, = there are tricky details like merging properties accurately when some = are specified in the old format and some in the new. Simon also asked: > Indeed I'd really like the ability to provide default uid/gid > for the case that a uname/gname cannot be looked up. I think 'tar' got this right: If uname and uid are both specified, then = look up uname and if that fails, use the specified uid. Ditto for = gname/gid. In particular, this lets a single specification be used to = rebuild a tree on another system with different UIDs or on a system that = does not (yet) have a full password file. An option could be provided = for the (rare) case that someone really wants to prefer UIDs to unames. Tim From owner-freebsd-arch@freebsd.org Mon Nov 30 04:28:47 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 55502A3C278 for ; Mon, 30 Nov 2015 04:28:47 +0000 (UTC) (envelope-from tim@kientzle.com) Received: from monday.kientzle.com (kientzle.com [142.254.26.11]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2C94A14A4; Mon, 30 Nov 2015 04:28:46 +0000 (UTC) (envelope-from tim@kientzle.com) Received: (from root@localhost) by monday.kientzle.com (8.14.4/8.14.4) id tAU4TklX080901; Mon, 30 Nov 2015 04:29:46 GMT (envelope-from tim@kientzle.com) Received: from [192.168.2.108] (192.168.1.101 [192.168.1.101]) by kientzle.com with SMTP id kwd69xp26pctbddtif9wb6tnte; Mon, 30 Nov 2015 04:29:46 +0000 (UTC) (envelope-from tim@kientzle.com) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.1 \(3096.5\)) Subject: Re: mtree "language" enhancements From: Tim Kientzle In-Reply-To: Date: Sun, 29 Nov 2015 20:28:42 -0800 Cc: Michal Ratajsky , Brooks Davis , "freebsd-arch@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: <0A51B6D4-9EDD-4EFF-876F-C6B515DBB4F3@kientzle.com> References: To: Warner Losh , "Simon J. Gerraty" X-Mailer: Apple Mail (2.3096.5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Nov 2015 04:28:47 -0000 > On Nov 29, 2015, at 2:49 PM, Tim Kientzle wrote: >=20 > Simon also asked: >> Indeed I'd really like the ability to provide default uid/gid >> for the case that a uname/gname cannot be looked up. >=20 > I think 'tar' got this right: If uname and uid are both specified, = then look up uname and if that fails, use the specified uid. Ditto for = gname/gid. In particular, this lets a single specification be used to = rebuild a tree on another system with different UIDs or on a system that = does not (yet) have a full password file. An option could be provided = for the (rare) case that someone really wants to prefer UIDs to unames. On further reflection, preferring UIDs to unames would actually be = pretty common here. In particular, NanoBSD (and Crochet and other similar tools) should = prefer the UID when building images instead of looking up unames against = the build host's password file. Tim From owner-freebsd-arch@freebsd.org Mon Nov 30 05:49:12 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0158FA3123E for ; Mon, 30 Nov 2015 05:49:11 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qk0-x233.google.com (mail-qk0-x233.google.com [IPv6:2607:f8b0:400d:c09::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A312C17B3 for ; Mon, 30 Nov 2015 05:49:11 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by qkfo3 with SMTP id o3so55035405qkf.1 for ; Sun, 29 Nov 2015 21:49:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=gqsigX2gmjuSDeOLlvZlBixwGucPHt7trIKLKPliU38=; b=wDOP3xccnv8qSeSxSgF/hsfSmTEW8hiekDTYZqVeKu74RqKdLz0Ry+JK96H2sNzAbr 6yzI3P/uCtunSjKQS4iBIr7r22q5t6h3dPTAw3Ex+bpY9Jv0QQZf8uSW2PVHAzcuAlrk b6dVGZOm6rQ4YKQXVVAcc6NO7u7i/BKuFl0ET363EiT64yNvyaBRDcNpg3+Sujl6jYek c7ZOrDxdD8aw4oCJ2+2Ag7oarLbMr6lfoLd8hfITLVoLjm0mJ28vypxI8Bpa1A7s6G6x Pd8fXvzDKvpNb7O3GTsfFFRp6UNXh60Ff2isXFOrG4Lj0KImi0Er549v27LLsQU2uFPu y2/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=gqsigX2gmjuSDeOLlvZlBixwGucPHt7trIKLKPliU38=; b=kTHzAdDNBQqr8LvJ1Tkr69NsNRni9rmDmtU+UXbfVuWQy/ekH6z+zZeJvOkWG7Xg4v CdT/cTS1t/qdcfb8Z0g8Ssy1eO35hG0IdfccZ+tEDrmBqwnQG3r+SRiN7gUCRo46k+FH bjGb/XbSieUYz89WlkCBkHjHtJzK3ooNg334gQn7P6LqEI8WOvhtIi+bQQpHhdqvS809 pdPdhH/XlHQ6afvEQbGTyIbJQthwjIaRch+TsHS8bu9uSbyPzSjRxawCvC0Nj8lwWgXC W4tyXkXBhciXOf0XQ0julJBYvJlqfvDtdPMtrC46So2kdU8SXoft4urvV3XW0IhS1FPH gulQ== X-Gm-Message-State: ALoCoQn8zF1xcjNKwSpB96jWiFk6PUDOy6UetKw/KJdfSorODkzARzCU7eBBjMj1tHku6BM+R5xo MIME-Version: 1.0 X-Received: by 10.55.21.65 with SMTP id f62mr71287218qkh.46.1448862550352; Sun, 29 Nov 2015 21:49:10 -0800 (PST) Sender: wlosh@bsdimp.com Received: by 10.140.27.181 with HTTP; Sun, 29 Nov 2015 21:49:10 -0800 (PST) X-Originating-IP: [50.253.99.174] In-Reply-To: <0A51B6D4-9EDD-4EFF-876F-C6B515DBB4F3@kientzle.com> References: <0A51B6D4-9EDD-4EFF-876F-C6B515DBB4F3@kientzle.com> Date: Sun, 29 Nov 2015 22:49:10 -0700 X-Google-Sender-Auth: z1V4o4IzxRtXMOMRukNS-TlZ6UU Message-ID: Subject: Re: mtree "language" enhancements From: Warner Losh To: Tim Kientzle Cc: "Simon J. Gerraty" , Michal Ratajsky , Brooks Davis , "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Nov 2015 05:49:12 -0000 On Sun, Nov 29, 2015 at 9:28 PM, Tim Kientzle wrote: > > > On Nov 29, 2015, at 2:49 PM, Tim Kientzle wrote: > > > > Simon also asked: > >> Indeed I'd really like the ability to provide default uid/gid > >> for the case that a uname/gname cannot be looked up. > > > > I think 'tar' got this right: If uname and uid are both specified, then > look up uname and if that fails, use the specified uid. Ditto for > gname/gid. In particular, this lets a single specification be used to > rebuild a tree on another system with different UIDs or on a system that > does not (yet) have a full password file. An option could be provided for > the (rare) case that someone really wants to prefer UIDs to unames. > > On further reflection, preferring UIDs to unames would actually be pretty > common here. > > In particular, NanoBSD (and Crochet and other similar tools) should prefer > the UID when building images instead of looking up unames against the build > host's password file. I've implemented what we've talked about, except this. When doing the makefs, we should use the /etc/master_password that's inside the image in preference to either of these alternatives. That's the most correct thing to do: use as much of the data as you can, as late as you can. The thing I'm struggling with now is why would both be present? Would that indicate an error? Or someone changing the defaults? And if they are changing the defaults, why use a uid in preference to a uname? Is this to avoid contamination? To set something not in the password file, or just comfort level of the user? FreeBSD will write unames for install*. So I'm left thinking that maybe the rule should be 'last one wins' at least for the use case where we use the target's /etc/master_password. That's what I've actually implemented. Preliminary testing of http://people.freebsd.org/~imp/mtree-dedup.awk appears to be working. I haven't tried all the cases yet, but it is looking promising. I don't need append_from, so that's just a stub in this file. Since this is in awk, I don't use the host's /etc/password at all. That's one of the failures of mtree that I've seen when I tried to use it, and perhaps the source of your concern. I'd love to see any libmtree be able to manipulate mtree files absent the tree it describes and even any process of uname -> uid at all to avoid these issues. The silly awk thing I wrote is purely a path to set of key-value pair manipulation tool. Once I'm more confident about this after some testing and integration into NanoBSD, I'll post something to phabricator. But I'd welcome any comments on what I've implemented in the mean time. Warner From owner-freebsd-arch@freebsd.org Tue Dec 1 02:30:07 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B5876A3DE9A for ; Tue, 1 Dec 2015 02:30:07 +0000 (UTC) (envelope-from tim@kientzle.com) Received: from mail-pa0-x233.google.com (mail-pa0-x233.google.com [IPv6:2607:f8b0:400e:c03::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7C5861FEE for ; Tue, 1 Dec 2015 02:30:07 +0000 (UTC) (envelope-from tim@kientzle.com) Received: by pacej9 with SMTP id ej9so204669889pac.2 for ; Mon, 30 Nov 2015 18:30:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kientzle-com.20150623.gappssmtp.com; s=20150623; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=IzS1S5dFvPT1tBl+p31CcXO+Kxa2hlBdlZZowkhj77w=; b=w81m4xUuE5+5WyIQKI1g4OylOtqXE2mSwWmIQHjx0PqGJG16LRmHZmGj/a7kpMWrWH fVyRt9Rmr6ForrRGANyfPrxbWU2sQkUMYGn8+mm2w/FqhGMYmX3QYgRD5C0Q44WtDtrV ikM5M/DsS3rJVOkjRpKJEyDoDJmXa7AWtnsvKo44jSiC1q54CDr1ginUXL9S0Cv2AsZR wttL+gYzv7G4d68aONwdMTEdSyO8+IKef/asS6JhiVLGlGF1Cc3VV7aNuEd7Pl/WeGiU DcnSk1rHJLYnDffuNXA8FpnD/sHg4UUEqhbV/SerxX5813ha3xnHDUJIz1bwz0P0VlKn kN+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to; bh=IzS1S5dFvPT1tBl+p31CcXO+Kxa2hlBdlZZowkhj77w=; b=YGCJ3ExAITh5zZVe1YUG/5kkVhQU/edoREqvWKfhx8XhUuV7UELYyst7r09lqmjY5H i7YdPxK3CJjjtPM+286kqBzR0yf2E0xsyNGFA9l/NAOqjpBGnc9e5e6HYB2fGbjNcr/A o6Q2iWTkF5m+rM9LNQHFu3KEF1PZ5NU/j0HTovsvRdbHWc9bsSQFMgBxXitWYKrhnEeg G5v8lKYGYavFINukjdHz7Eg9V68HtMrhaUfk8RQgT0prhv2O2yuqBa2hymCrgwxA8eEi agS82NaiH+EgJOMMsQlTe89lJ9msFteBhCT4K+xZpOD74lWhYHvTPH+bxMNYweGhLf4b FGeQ== X-Gm-Message-State: ALoCoQmKGJCm34mE9DDwGiO1nNhbUtpj2/LFpNwM/+IvVZUf/vP9RN74mShzuPzRkgRBA70699rN X-Received: by 10.98.75.83 with SMTP id y80mr75216399pfa.77.1448937006372; Mon, 30 Nov 2015 18:30:06 -0800 (PST) Received: from [192.168.1.102] (c-24-6-102-176.hsd1.ca.comcast.net. [24.6.102.176]) by smtp.gmail.com with ESMTPSA id qn5sm54640256pac.41.2015.11.30.18.30.04 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 30 Nov 2015 18:30:05 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.1 \(3096.5\)) Subject: Re: mtree "language" enhancements From: Tim Kientzle In-Reply-To: Date: Mon, 30 Nov 2015 18:31:07 -0800 Cc: "freebsd-arch@freebsd.org" , Michal Ratajsky , Brooks Davis , "Simon J. Gerraty" Content-Transfer-Encoding: quoted-printable Message-Id: <71D3DCA2-B336-4849-88E3-8412F8A93324@kientzle.com> References: <0A51B6D4-9EDD-4EFF-876F-C6B515DBB4F3@kientzle.com> To: Warner Losh X-Mailer: Apple Mail (2.3096.5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Dec 2015 02:30:07 -0000 > On Nov 29, 2015, at 9:49 PM, Warner Losh wrote: >=20 > On Sun, Nov 29, 2015 at 9:28 PM, Tim Kientzle = wrote: >=20 >>=20 >>> On Nov 29, 2015, at 2:49 PM, Tim Kientzle wrote: >>>=20 >>> Simon also asked: >>>> Indeed I'd really like the ability to provide default uid/gid >>>> for the case that a uname/gname cannot be looked up. >>>=20 >>> I think 'tar' got this right: If uname and uid are both specified, = then >> look up uname and if that fails, use the specified uid. Ditto for >> gname/gid. In particular, this lets a single specification be used = to >> rebuild a tree on another system with different UIDs or on a system = that >> does not (yet) have a full password file. An option could be = provided for >> the (rare) case that someone really wants to prefer UIDs to unames. >>=20 >> On further reflection, preferring UIDs to unames would actually be = pretty >> common here. >>=20 >> In particular, NanoBSD (and Crochet and other similar tools) should = prefer >> the UID when building images instead of looking up unames against the = build >> host's password file. >=20 >=20 > I've implemented what we've talked about, except this. When doing the > makefs, we should use the /etc/master_password that's inside the image = in > preference to either of these alternatives. That's the most correct = thing > to do: use as much of the data as you can, as late as you can. >=20 > The thing I'm struggling with now is why would both be present? Would = that > indicate an error? Or someone changing the defaults? And if they are > changing the defaults, why use a uid in preference to a uname? Is this = to > avoid contamination? To set something not in the password file, or = just > comfort level of the user? FreeBSD will write unames for install*. >=20 > So I'm left thinking that maybe the rule should be 'last one wins' at = least > for the use case where we use the target's /etc/master_password. = That's > what I've actually implemented. There are two key cases that drove this design for tar: 1. Handling user info that is not (yet) in the target password file. = In practice, images get built up in different orders: I might add a = bunch of new files owned by a new user before some other process gets a = chance to add the user. 2. Restoring info when the target has different user numbering than the = host. (Or when the user isn=E2=80=99t in the host password file at = all.) For #1, you need the UID since the uname can=E2=80=99t be looked up = anywhere. For #2, you must have the uname since the UID would be wrong. = An image that can work in either scenario needs to have both. For NanoBSD, you may be able to enforce that users are always present in = the target password file before any data owned by those users is added = to the image. So it may be reasonable to just rely on uname everywhere = for now. Tim From owner-freebsd-arch@freebsd.org Tue Dec 1 18:25:52 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1FC19A3E3CD for ; Tue, 1 Dec 2015 18:25:52 +0000 (UTC) (envelope-from sjg@juniper.net) Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2on0119.outbound.protection.outlook.com [207.46.100.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9F8D51023; Tue, 1 Dec 2015 18:25:51 +0000 (UTC) (envelope-from sjg@juniper.net) Received: from BY2PR05CA052.namprd05.prod.outlook.com (10.141.250.42) by BL2PR05MB049.namprd05.prod.outlook.com (10.255.228.144) with Microsoft SMTP Server (TLS) id 15.1.331.20; Tue, 1 Dec 2015 18:25:49 +0000 Received: from BL2FFO11OLC009.protection.gbl (2a01:111:f400:7c09::153) by BY2PR05CA052.outlook.office365.com (2a01:111:e400:2c5f::42) with Microsoft SMTP Server (TLS) id 15.1.331.20 via Frontend Transport; Tue, 1 Dec 2015 18:25:49 +0000 Authentication-Results: spf=softfail (sender IP is 66.129.239.18) smtp.mailfrom=juniper.net; freebsd.org; dkim=none (message not signed) header.d=none;freebsd.org; dmarc=none action=none header.from=juniper.net; Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.18 as permitted sender) Received: from p-emfe01b-sac.jnpr.net (66.129.239.18) by BL2FFO11OLC009.mail.protection.outlook.com (10.173.160.145) with Microsoft SMTP Server (TLS) id 15.1.331.11 via Frontend Transport; Tue, 1 Dec 2015 18:25:48 +0000 Received: from magenta.juniper.net (172.17.27.123) by p-emfe01b-sac.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.123.3; Tue, 1 Dec 2015 10:25:28 -0800 Received: from chaos.jnpr.net (chaos.jnpr.net [172.21.16.28]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id tB1IPND88504; Tue, 1 Dec 2015 10:25:24 -0800 (PST) (envelope-from sjg@juniper.net) Received: from chaos (localhost [IPv6:::1]) by chaos.jnpr.net (Postfix) with ESMTP id 4E60F580A9; Tue, 1 Dec 2015 10:25:23 -0800 (PST) To: Tim Kientzle CC: Warner Losh , Michal Ratajsky , Brooks Davis , "freebsd-arch@freebsd.org" , Subject: Re: mtree "language" enhancements In-Reply-To: <71D3DCA2-B336-4849-88E3-8412F8A93324@kientzle.com> References: <0A51B6D4-9EDD-4EFF-876F-C6B515DBB4F3@kientzle.com> <71D3DCA2-B336-4849-88E3-8412F8A93324@kientzle.com> Comments: In-reply-to: Tim Kientzle message dated "Mon, 30 Nov 2015 18:31:07 -0800." From: "Simon J. Gerraty" X-Mailer: MH-E 8.6; nmh 1.6; GNU Emacs 24.5.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <27384.1448994323.1@chaos> Content-Transfer-Encoding: quoted-printable Date: Tue, 1 Dec 2015 10:25:23 -0800 Message-ID: <535.1448994323@chaos> X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1; BL2FFO11OLC009; 1:/pQbuSQN8qNzrCmqknRAtcYDQ0jJlWmqh3ndigwxCYLbp1IIRfZL6uhRIICqNfPIW3toEJRIv+oXnDHq4X3nBHMYM6MIBSAIRn4vz2wZCCn3xL/pgO4PMVAHcf7dvgbjWcR+oDWPq6RKtOesI2p3pGGw+T/Byk/CoSbn2O4ZVuis7OQbZNdAzTpnIQWwz30Q3yXESWJ1YxoxNUXLYUDE22Ss43i4ZkYPmBlks0lvR6D9D0QUVoJJZlA1P9zU0tirTipA1igVlAYlrLG4M8zffl9gSaVMijRubFNjDsMc3uJhMxvSoqzUUW788OGi1opPRVjBEHx2BcO7gHJ/JTOk564s+OvEhEqjlwljPl4xdeEhSaXIXvmfRom6qRKlHFor+Jk2B13wLzwqElr/8laXvg== X-Forefront-Antispam-Report: CIP:66.129.239.18; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(2980300002)(24454002)(189002)(199003)(586003)(5001960100002)(1096002)(97756001)(33716001)(81156007)(77096005)(105596002)(50226001)(19580405001)(86362001)(69596002)(93886004)(76176999)(23726003)(11100500001)(47776003)(50466002)(19580395003)(110136002)(46406003)(4001430100002)(189998001)(87936001)(97736004)(76506005)(551544002)(50986999)(1220700001)(106466001)(57986006)(6806005)(117636001)(107886002)(5008740100001)(92566002)(2950100001)(42262002); DIR:OUT; SFP:1102; SCL:1; SRVR:BL2PR05MB049; H:p-emfe01b-sac.jnpr.net; FPR:; SPF:SoftFail; PTR:InfoDomainNonexistent; MX:1; A:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; BL2PR05MB049; 2:urYK/zZVmCLoemdsP8zVeUmQD+1wY97Cy9VNF7sMYNNtyGYQb+tb9z7xyU0TBdzJrKPxhZGoCQuGlyjP2Ee2fKwkir7KeHYp0W+89qBqahgLWBImXAtuz4LZpg/g7Vsad/Eo+sktLivKp10FfdwzPQ==; 3:Qpyn670B5OLgvhSdwahDmND/A4TG4GU8QANZaUeOTxCypphFAFhxrzAAjFuaMXncuun6f7TJpHJukbilHEslleRH7aW9hwU70mPGMDWycNE4y/ynv62vHAuevKmke4NUbqdVuetjti8/82yg9ZyA4Ldkfp5Sfo7SW+kyCAm6vEGyQHzc2JKl3ZClkoKqc6o2gM6j1Pg5PWcbiWb3B1uk6PcVBKQ4JdqPnEdJwuEm6XY=; 25:Z8KodfR+9z26Cia/UIUMTHdTK0ueRC6SL7fI1Z6GAmLUV9VArn1jUqmeGO1lxMX3S4AN8BXB6Nnm4QUdCF0LVwk0SVbOOZsgSxSwgPptIzmYSxQ+jHnQH6rTVEFS3Jn9aEIdfqr88vfnYWPabZkqT42LJRPMqXwzFsk5/A4gZObwld72wzYwEUXJj/aBCb+LrrIkcLRYwJGa3HP2Zjw7g7YBh+DFkoT8EcjiNt37goERxqVeCmWh6jMtfvsbhzv0YpDOQHHm0NTYuReJFjNoQg== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BL2PR05MB049; X-Microsoft-Exchange-Diagnostics: 1; BL2PR05MB049; 20:458D9Dpigxs7TGFTOPDRNtdK5WnYKW3GTJ1NXPnH8RtscVFyHWvogFElZ0nihl0KFigMFpEzbDPgeWomwWOkOAuwzpVycMEM8UqEhkjLsw1ntLyf9xM3zvbs4p6I4U0nC62k8wjLMM6VaPfkREjW7EfaftxTg4ZXY4a9arJVvTMGHYaSZgEk/jkH0hknX/DAW2hsaJ2YdNbh65+Yxt0h9cwzeGbPNL+EHxknx7E/l9+ZceIgw3rUon5yig0gvX8Dj1aa9tCY4GhAM+jsWpiBB70h/L0yOooZDhN4i+wZWuk/hreEXkjUgtThH511g9OjqWCAVg1gEwh1SPYa+nHRL9oreItYmn5d2as4uxa1w8L/1p7ngsJHQW3j5EXReREXMH6Vs7K9OoCM9siOZKxTqO3Qg2lu/4F9HGfKkgcBOPkTTRaZwG2M+7/L8QnggUWIgwqdf68VhEwgClBppbSmHD26lMHgXw0A/+hLfqJOM7xGWPLC5ZfxBfDU97IXlp8x; 4:vndVolHq0HjU2Mm8bXY8WZ8M9nIFCxz2xCPzjsNhJ0l4DAFhsvRVflF3pzo+1bHuVdDAJDtP8FNfmkE1hgoyafaS+hGz1BXUe3ByYZK4ADFwpmKuIgN79KekbYPvs8GQC9F8cGk1yPClu7QXuJ5Os23ocnrFOFV17/XSUG2hYgOaJ/lLreg6ACgCOxqC7c0GyzaazQuewEWj5oA/RAKTN/WfadGz2ol6DguLX+y24aw5ynEQPH7D1FFxIg9E7aKaVM1ektuv6T6pApmk2pufo8U++OMErCHkepEXOXlGFM2k+cV6iXDIAI+8+2PdxXPXL1htGgIEvKGK9ThuZUH+ZpC2zzEC3tM5Guh5K3g75eaVQUZr0qxC2ojyYt2rYRTn X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(601004)(2401047)(5005006)(8121501046)(520078)(10201501046)(3002001); SRVR:BL2PR05MB049; BCL:0; PCL:0; RULEID:; SRVR:BL2PR05MB049; X-Forefront-PRVS: 07778E4001 X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BL2PR05MB049; 23:vNYmZo/cur7fZlFg0h9SRAsuZkJuBeZXEVytWdAz4d?= =?us-ascii?Q?JXcoF6AtT2z73Tjl0IRWmjlJ+5xMvWwZNPq/uIk0UJkx5PZkytGnK69OAury?= =?us-ascii?Q?vxfUr3Kp8f80quI6OWiNUt3yq+mFaw1knTbJC7N0GdguFNJ4VXeYmx6vqKmE?= =?us-ascii?Q?lN5LSMbL3UpbjGn5s6My40Z2oMLiGXteYQ7rJtKAJe4zMGFzPfOkkMWyHgz8?= =?us-ascii?Q?wE6KCPdfgwQOPRadqJwICDhvLCa0UWfKDzeY3Ud4vMXvUd8ZaD0MPNd68DK2?= =?us-ascii?Q?Bk4R5McbtN2qK94rL8e31gBEye4p7yE5rpi43xhMhrQSyctPJKl2G4DuT0vq?= =?us-ascii?Q?vb6qT/TRhE9nHfBWHjbEMvB9gpJsZYuDbyuXgNht568pMCB6koSUTCxnvgAH?= =?us-ascii?Q?2BV2YWBgGimE5/F/3u056ANr/WPpsjOOHzwXub/0l7ORuN0Gw0aOJEK3mo8n?= =?us-ascii?Q?Aq1cWNSgnGWhQaO2oMmYOPEial7bmQ/+vMHZg8oJhTj9GelVicPTsngd694d?= =?us-ascii?Q?+UfBhHdEWPhO6P6HFS8aMtJmsNUmygZClEMGMdXR30ImUpO99cPl7iigokkY?= =?us-ascii?Q?Ydm5Boy6QElTLK3VWhp5+dSiKrQkpIRVgWpF+3/rKPES2ATBkVZDGXeoE3VA?= =?us-ascii?Q?7vb1NqHCFUfArhdUtUVvD2cf/fZcdYdfOGkok7WklTiXs1CXgpStBaeRkzNL?= =?us-ascii?Q?fWu8h6cDnX/vwwkPj1UtyfvYfuARb8j0hFgQvNNerB3W8eOPjcT3cJsl0O7L?= =?us-ascii?Q?YyrxE4grKF22wd32HzFXUGTWDJjK1k1JOOw4sGd1WCnAFNbSR2XUsgbwLdyL?= =?us-ascii?Q?DDQa2OZFpegYLvg2FAj2KOf1sysePqbDbND1y1I8wtkAhqNuoEmmSrPIHpM+?= =?us-ascii?Q?yFCVpGBMR6Nt//ktrEyMduY1UHF1nm/88iFmrp6bmJaCFnNySG7+ez7BudHw?= =?us-ascii?Q?NY2fznUR97wkjfjwtxFYWOUyB3Qli5vx0x9mK4LdU/GVDD1vuKLeRMhcur/2?= =?us-ascii?Q?ByWuYY6KtmV1VRBs/iMMSknl7xnB1t4VOYhODI+25xcRhqEsiJCBUdlRrscC?= =?us-ascii?Q?Bs1M8Llhyy3tXHJKxAlIb62r9z//R5TqoDgBJavCp5kqawHnFwlj8/YfZKkT?= =?us-ascii?Q?jdzHx9kztBnjXL/uSEwJZofJy/VGNm?= X-Microsoft-Exchange-Diagnostics: 1; BL2PR05MB049; 5:X+QbOLZLnMsNFLHqpxrHDuSml8MeNPgsF+sCVoGIdGmQ151OeTLUvxJsRUaslW30otBtQHhzDavQVWovDmUWHT9Hl19JMb9SNMO1ZzVtvgf7wcVQWJdvrV9blYZnPykMg71UQLsk0rPKoaoct6udNA==; 24:7Mk50RMlO8kJXSl4piVv8h3zXbUfVXEGVz/VKM/vOkXl2Bwwpj1ZHS3HlKVWu//SiHiRfbCkBugRNpbwwyyy2Qgi2xeBSBuCMQbZ83Saouw= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: juniper.net X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Dec 2015 18:25:48.5631 (UTC) X-MS-Exchange-CrossTenant-Id: bea78b3c-4cdb-4130-854a-1d193232e5f4 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=bea78b3c-4cdb-4130-854a-1d193232e5f4; Ip=[66.129.239.18]; Helo=[p-emfe01b-sac.jnpr.net] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL2PR05MB049 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Dec 2015 18:25:52 -0000 Tim Kientzle wrote: > > So I'm left thinking that maybe the rule should be 'last one wins' at = least > > for the use case where we use the target's /etc/master_password. That'= s > > what I've actually implemented. > = > There are two key cases that drove this design for tar: > = > 1. Handling user info that is not (yet) in the target password file. > In practice, images get built up in different orders: I might add a > bunch of new files owned by a new user before some other process gets > a chance to add the user. This is the issue we face. We don't like magic numbers so prefer to use names (uid=3D0 gid=3D0 is fine). We use mtree with BSD.var.dist at various times, and in at least some of those cases we cannot assume that the passwd or group databases will be complete (or even valid - eg during recovery from corrupted storage). In such cases we could easily tollerate mtree simply using 0:0 (or current uid:gid) for any uname:gname it could not resolve, since we aren't likely to care about those dirs until we are up and running properly - by which time the ownership would have been fixed. What we don't want is for mtree to toss its cookies or flood the console with pointless noise (which it is wont to do). What we currently have to do to avoid problems, is run BSD.var.dist through sed to replace all \([gu]\)name=3D[^ ]* with \1id=3D0 and and it would be nice to be able to skip that. From owner-freebsd-arch@freebsd.org Wed Dec 2 17:37:12 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3334EA3F09C for ; Wed, 2 Dec 2015 17:37:12 +0000 (UTC) (envelope-from carpeddiem@gmail.com) Received: from mail-ig0-x233.google.com (mail-ig0-x233.google.com [IPv6:2607:f8b0:4001:c05::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0284C1136 for ; Wed, 2 Dec 2015 17:37:12 +0000 (UTC) (envelope-from carpeddiem@gmail.com) Received: by igcto18 with SMTP id to18so36866543igc.0 for ; Wed, 02 Dec 2015 09:37:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:from:date:message-id:subject:to:content-type; bh=qzgjzrhbQk/fDtNvxZ9kP78cT8MfaMhK6Vrxy4AZGOE=; b=xK3sRUkHjRQJxuvwmBzww08GxT+QRI7H16e6YfB/5dwgA3hIHdQ9jrVqLXeN1UfpT4 AJMYungQ/m4G45sM16pknG9pqW88I18QPVuZCJPSKl58iNzNRvyoSzf/U7Yto/Ll29SU v7WBFUFP8/T9N4pLiQ0AtaugwEck6qkiuNDDiBYUnbsr8IPfoMVxZiwVO9JQaD3n+17d 1oJ5ASBIX4EVUGgm+W0O8IIehUwd87ZJZvh9RB/rjvpmVocAA84vRcXhtqz4UqeB0GJc SIibA01QJDikO8CMnpITMsdpZOa5AIec/dsJe+YOnd+J58qvCbS7DAb6XQjSRJ9MAsD1 bZnA== X-Received: by 10.50.43.234 with SMTP id z10mr5095696igl.58.1449077831438; Wed, 02 Dec 2015 09:37:11 -0800 (PST) MIME-Version: 1.0 Sender: carpeddiem@gmail.com Received: by 10.107.169.85 with HTTP; Wed, 2 Dec 2015 09:36:52 -0800 (PST) From: Ed Maste Date: Wed, 2 Dec 2015 17:36:52 +0000 X-Google-Sender-Auth: ViLv3FjhbfYQMcsqrRs47peIeMI Message-ID: Subject: Removing build metadata, for reproducible kernel builds To: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Dec 2015 17:37:12 -0000 The main issue currently preventing kernel builds from being reproducible[1] is the build metadata itself that's included (time, user, host, build path). In order to make the kernel build reproducible I plan to remove these by default, and add a src.conf knob to enable them for developers who want them in their own builds. The user-facing effect of this is that the kern.version sysctl no longer conveys this information, and uname -a changes from something like: FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r288681: Mon Oct 5 01:40:11 UTC 2015 peter@build-11.freebsd.org:/usr/obj/usr/src/sys/CLUSTER11 amd64 to something like: FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44 r288174+7644546(stable-10) amd64 The current version of the change is available for review at https://reviews.freebsd.org/D4347. [1] See https://reproducible-builds.org/ for more information on the reproducible builds project. From owner-freebsd-arch@freebsd.org Wed Dec 2 17:44:05 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1484DA3F25C for ; Wed, 2 Dec 2015 17:44:05 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 06D2B151F for ; Wed, 2 Dec 2015 17:44:04 +0000 (UTC) (envelope-from bright@mu.org) Received: from AlfredMacbookAir.local (unknown [IPv6:2601:645:8004:7515:1be:bcf0:a62d:9358]) by elvis.mu.org (Postfix) with ESMTPSA id 7CFB1345A920 for ; Wed, 2 Dec 2015 09:44:03 -0800 (PST) Subject: Re: Removing build metadata, for reproducible kernel builds To: freebsd-arch@freebsd.org References: From: Alfred Perlstein Message-ID: <565F2DEE.9070204@mu.org> Date: Wed, 2 Dec 2015 09:44:14 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Dec 2015 17:44:05 -0000 On 12/2/15 9:36 AM, Ed Maste wrote: > The main issue currently preventing kernel builds from being > reproducible[1] is the build metadata itself that's included (time, > user, host, build path). In order to make the kernel build > reproducible I plan to remove these by default, and add a src.conf > knob to enable them for developers who want them in their own builds. > > The user-facing effect of this is that the kern.version sysctl no > longer conveys this information, and uname -a changes from something > like: > > FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 > r288681: Mon Oct 5 01:40:11 UTC 2015 > peter@build-11.freebsd.org:/usr/obj/usr/src/sys/CLUSTER11 amd64 > > to something like: > > FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44 > r288174+7644546(stable-10) amd64 > > The current version of the change is available for review at > https://reviews.freebsd.org/D4347. > > [1] See https://reproducible-builds.org/ for more information on the > reproducible builds project. Can it not be done as a kernel module (containing the strings/numbers) or injected after the fact by editing the binaries? This info is very useful. -Alfred From owner-freebsd-arch@freebsd.org Wed Dec 2 20:04:00 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 506D4A3F0B8 for ; Wed, 2 Dec 2015 20:04:00 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2E882187E; Wed, 2 Dec 2015 20:04:00 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id C4197B979; Wed, 2 Dec 2015 15:03:58 -0500 (EST) From: John Baldwin To: freebsd-arch@freebsd.org Cc: Ed Maste Subject: Re: Removing build metadata, for reproducible kernel builds Date: Wed, 02 Dec 2015 12:03:07 -0800 Message-ID: <1920964.NJpSim6qZF@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 02 Dec 2015 15:03:58 -0500 (EST) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Dec 2015 20:04:00 -0000 On Wednesday, December 02, 2015 05:36:52 PM Ed Maste wrote: > The main issue currently preventing kernel builds from being > reproducible[1] is the build metadata itself that's included (time, > user, host, build path). In order to make the kernel build > reproducible I plan to remove these by default, and add a src.conf > knob to enable them for developers who want them in their own builds. > > The user-facing effect of this is that the kern.version sysctl no > longer conveys this information, and uname -a changes from something > like: > > FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 > r288681: Mon Oct 5 01:40:11 UTC 2015 > peter@build-11.freebsd.org:/usr/obj/usr/src/sys/CLUSTER11 amd64 > > to something like: > > FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44 > r288174+7644546(stable-10) amd64 > > The current version of the change is available for review at > https://reviews.freebsd.org/D4347. > > [1] See https://reproducible-builds.org/ for more information on the > reproducible builds project. As I noted in the review, this will break kgdb -n (and possibly crashinfo, less certain about that). Keeping the path (which should not vary if you build out of the same tree) will be sufficient to let kgdb -n still work (though it may need some changes to recognize both formats). Keeping the path also means that 'uname -a' still tells you which kernel config you are running (I assume you aren't changing 'uname -i', but 'uname -a' doesn't include 'uname -i'). -- John Baldwin From owner-freebsd-arch@freebsd.org Wed Dec 2 20:16:07 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BBFA2A3F2F0 for ; Wed, 2 Dec 2015 20:16:07 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from erouter6.ore.mailhop.org (erouter6.ore.mailhop.org [54.187.213.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9364F1D24 for ; Wed, 2 Dec 2015 20:16:07 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from ilsoft.org (unknown [73.34.117.227]) by outbound3.ore.mailhop.org (Halon Mail Gateway) with ESMTPSA; Wed, 2 Dec 2015 20:13:55 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.14.9/8.14.9) with ESMTP id tB2KEvYd017492; Wed, 2 Dec 2015 13:14:57 -0700 (MST) (envelope-from ian@freebsd.org) Message-ID: <1449087297.1262.82.camel@freebsd.org> Subject: Re: Removing build metadata, for reproducible kernel builds From: Ian Lepore To: John Baldwin , freebsd-arch@freebsd.org Cc: Ed Maste Date: Wed, 02 Dec 2015 13:14:57 -0700 In-Reply-To: <1920964.NJpSim6qZF@ralph.baldwin.cx> References: <1920964.NJpSim6qZF@ralph.baldwin.cx> Content-Type: text/plain; charset="us-ascii" X-Mailer: Evolution 3.16.5 FreeBSD GNOME Team Port Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Dec 2015 20:16:07 -0000 On Wed, 2015-12-02 at 12:03 -0800, John Baldwin wrote: > On Wednesday, December 02, 2015 05:36:52 PM Ed Maste wrote: > > The main issue currently preventing kernel builds from being > > reproducible[1] is the build metadata itself that's included (time, > > user, host, build path). In order to make the kernel build > > reproducible I plan to remove these by default, and add a src.conf > > knob to enable them for developers who want them in their own > > builds. > > > > The user-facing effect of this is that the kern.version sysctl no > > longer conveys this information, and uname -a changes from > > something > > like: > > > > FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT > > #0 > > r288681: Mon Oct 5 01:40:11 UTC 2015 > > peter@build-11.freebsd.org:/usr/obj/usr/src/sys/CLUSTER11 amd64 > > > > to something like: > > > > FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44 > > r288174+7644546(stable-10) amd64 > > > > The current version of the change is available for review at > > https://reviews.freebsd.org/D4347. > > > > [1] See https://reproducible-builds.org/ for more information on > > the > > reproducible builds project. > > As I noted in the review, this will break kgdb -n (and possibly > crashinfo, > less certain about that). Keeping the path (which should not vary if > you > build out of the same tree) will be sufficient to let kgdb -n still > work > (though it may need some changes to recognize both formats). > > Keeping the path also means that 'uname -a' still tells you which > kernel > config you are running (I assume you aren't changing 'uname -i', but > 'uname -a' doesn't include 'uname -i'). > But in the kinds of venues where reproducible builds are most important, such as creating images that are part of commercial products, the build path is one of the things most likely to change between builds and least likely to be significant in terms of any differences to the conents of the build. Likewise the hostname of the build machine, which it appears is still in the uname output. -- Ian From owner-freebsd-arch@freebsd.org Wed Dec 2 21:53:46 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4158EA3F8F9 for ; Wed, 2 Dec 2015 21:53:46 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 53F4F195C; Wed, 2 Dec 2015 21:53:44 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA01812; Wed, 02 Dec 2015 23:53:35 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1a4FLD-0001Hy-Ic; Wed, 02 Dec 2015 23:53:35 +0200 Subject: Re: Removing build metadata, for reproducible kernel builds To: Ed Maste , "freebsd-arch@freebsd.org" References: From: Andriy Gapon X-Enigmail-Draft-Status: N1110 Message-ID: <565F6827.6000203@FreeBSD.org> Date: Wed, 2 Dec 2015 23:52:39 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Dec 2015 21:53:46 -0000 On 02/12/2015 19:36, Ed Maste wrote: > The main issue currently preventing kernel builds from being > reproducible[1] is the build metadata itself that's included (time, > user, host, build path). In order to make the kernel build > reproducible I plan to remove these by default, and add a src.conf > knob to enable them for developers who want them in their own builds. > > The user-facing effect of this is that the kern.version sysctl no > longer conveys this information, and uname -a changes from something > like: > > FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 > r288681: Mon Oct 5 01:40:11 UTC 2015 > peter@build-11.freebsd.org:/usr/obj/usr/src/sys/CLUSTER11 amd64 > > to something like: > > FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44 > r288174+7644546(stable-10) amd64 > > The current version of the change is available for review at > https://reviews.freebsd.org/D4347. > > [1] See https://reproducible-builds.org/ for more information on the > reproducible builds project. Personally, I would prefer that, at least initially, KERNEL_METADATA is "yes" by default. My thinking is that people who really need reproducible builds would have no trouble toggling the knob and the rest would have the traditional behavior. -- Andriy Gapon From owner-freebsd-arch@freebsd.org Thu Dec 3 05:29:14 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C7D19A3F48F for ; Thu, 3 Dec 2015 05:29:14 +0000 (UTC) (envelope-from tim@kientzle.com) Received: from mail-pa0-x231.google.com (mail-pa0-x231.google.com [IPv6:2607:f8b0:400e:c03::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 993861E03 for ; Thu, 3 Dec 2015 05:29:14 +0000 (UTC) (envelope-from tim@kientzle.com) Received: by pacdm15 with SMTP id dm15so60947838pac.3 for ; Wed, 02 Dec 2015 21:29:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kientzle-com.20150623.gappssmtp.com; s=20150623; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=6PO+lvsmRRE43j3v4+g8ZvIY/RXAWQ99HixzBqduB1E=; b=YLfVMutiXNz0DwWTll3KXP0jpErJZL8qXuGZGVJeaOpks+Y0oSYXkiX0exw/gOapG1 BGOVnXjtZMytH+0Vfb6yAbFTRJX/V6ULc/2LGdXUFKryxu+t8xBpW9iDl40LbmLW+gOl nCgdDhLH1wxof7MXHhvLLc2RcJFXod+9hxL56S8vfWn50BHbvQ2aCsmsEGwPn8j9/bzx IlSMtAniwrrG3Nr0vBeq/mSyC7qvYViK3LAXGj7v4iwXvC2ISAS11xo2NWkDlvVMuep8 3IvfWqiTvJFHqiz8bG9dLw3nQRD8/C5Ek9ZW3nW10UHRrvA6x2g0fyUFFnT85834zcc/ tkhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to; bh=6PO+lvsmRRE43j3v4+g8ZvIY/RXAWQ99HixzBqduB1E=; b=b7itin4Nu14hG7y4o4ZkoVb+r3LE6xo3jQnYHP59w/MK0xISkL8M5/n7l/bUAyrVVK odlNb7Moj1CjdHpGndosl1bPTxitNrboUiTTvwuKJEcPB4RxgLVLNQkgGl4sYOAruYK8 IILLSaUVJDoiTQ7ZfwDE1i6fwwX/bbcY7klMmax5bJDfZQFrn06XUjj6AZaofK3IiFaQ 3hYoZ2SP9Aaal/di1KqoHpYwjq2rIlEudv8dSw5whZFJaxGO5BfH5O4DBH8HbRbgHCbJ qHKTmLYp8vLfi8O/XOCmz2XOpEqHWVIrT+GRQEQFGhprj3YMcw2JQcKIWX7lcasoH/U4 dVew== X-Gm-Message-State: ALoCoQn0VUbg4aqipufRtopLo0mCinrbJyqaYZ22/F6iPKy1vjeL7nF07Zl+IcOjWewGcsWNHx6b X-Received: by 10.66.219.228 with SMTP id pr4mr10142815pac.99.1449120554017; Wed, 02 Dec 2015 21:29:14 -0800 (PST) Received: from [192.168.1.102] (c-24-6-102-176.hsd1.ca.comcast.net. [24.6.102.176]) by smtp.gmail.com with ESMTPSA id 79sm7673475pfb.67.2015.12.02.21.29.12 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 02 Dec 2015 21:29:13 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.1 \(3096.5\)) Subject: Re: Removing build metadata, for reproducible kernel builds From: Tim Kientzle In-Reply-To: Date: Wed, 2 Dec 2015 21:29:12 -0800 Cc: "freebsd-arch@freebsd.org" Content-Transfer-Encoding: 7bit Message-Id: References: To: Ed Maste X-Mailer: Apple Mail (2.3096.5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 05:29:15 -0000 > On Dec 2, 2015, at 9:36 AM, Ed Maste wrote: > > The main issue currently preventing kernel builds from being > reproducible[1] is the build metadata itself that's included (time, > user, host, build path). In order to make the kernel build > reproducible I plan to remove these by default, and add a src.conf > knob to enable them for developers who want them in their own builds. > > The user-facing effect of this is that the kern.version sysctl no > longer conveys this information, and uname -a changes from something > like: > > FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 > r288681: Mon Oct 5 01:40:11 UTC 2015 > peter@build-11.freebsd.org:/usr/obj/usr/src/sys/CLUSTER11 amd64 > > to something like: > > FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44 > r288174+7644546(stable-10) amd64 > > The current version of the change is available for review at > https://reviews.freebsd.org/D4347. > > [1] See https://reproducible-builds.org/ for more information on the > reproducible builds project. How feasible would it be for the various metadata here to be overridable by src.conf? That is, by default, the time, user, host, etc, are taken from the local environment, but src.conf variables can override them to produce more predictable results. Tim From owner-freebsd-arch@freebsd.org Thu Dec 3 05:51:38 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6583EA3F954 for ; Thu, 3 Dec 2015 05:51:38 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-pa0-x229.google.com (mail-pa0-x229.google.com [IPv6:2607:f8b0:400e:c03::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 27471161D for ; Thu, 3 Dec 2015 05:51:38 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: by pabfh17 with SMTP id fh17so63311683pab.0 for ; Wed, 02 Dec 2015 21:51:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :message-id:references:to; bh=PkVXwFQmS7cPgVJ1oZrQcy1gNZmi6VZnd2x4gS9WfBc=; b=LItry3g68NmTY3/25zqM7RAhb6RJ7kOf18sjHSE/r9429d334VIe2Y2PF3YlQJfVY1 DCESTeopmB++JgtseaOPbcmldbtLYagSwCaec/rq41k9jyoDAcM2MD7O1Z1DZAOj1F0C cphIkD0ebqUuk361JPEYTXKafuvzI7iirwjYImtdFENy60DOv1U2pOY2Q0ZEpORX9MsQ sXNu1zDvFDtjDM5F0gr6MbllozH1vt4gGWGq388VX/pRf8I+sv9CBdoWef3JvMrQECJy VIRqMJrisz19hoUAHWKQjifBjWaP1qZj2QdUYwgI1zkLOni2ptp/J26HXwo02VBLRpBV +rbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:mime-version:content-type:from :in-reply-to:date:cc:message-id:references:to; bh=PkVXwFQmS7cPgVJ1oZrQcy1gNZmi6VZnd2x4gS9WfBc=; b=AX+QwGP+Flv1QVCkJDFrxVMpty6UwXrBesEc2WjYJXNmRAKJPaMgpR8hMPqDEuTbnt 8FhCIrruuSyqYY/o6STwajVxzU8pkXsLNRjvS3YHf2r2S7k/vH8l9PC9Tkm0wVgAjqMS CO34tZl6/qGIdAOZV/phFr31ubigQavPVZ+X8T9/+z2fwt9dgntFtPLMv6rGLfw8gJ9q veOpsw46D2978oeqyu2aUG0aUGIlr+4i+2Gy9N865mI/8Bza+z68sYCeJwmvsB7xYbnC LfQTg2YML/jgFAGlJRELbYLsLd6BkPENfmFQ59AhsiIjqfOMpCZWrbA3GDzZ/GDfO6Q/ eStA== X-Gm-Message-State: ALoCoQlQQ/vdAMEkpxRnsomJDXirio9pSjWs3BBSP8E3C+0Hegw4/51kfhwcTHX6f1BH8csofxN6 X-Received: by 10.67.4.202 with SMTP id cg10mr10573517pad.81.1449121897616; Wed, 02 Dec 2015 21:51:37 -0800 (PST) Received: from [172.20.8.36] ([12.236.253.2]) by smtp.gmail.com with ESMTPSA id r5sm7828235pfi.73.2015.12.02.21.51.34 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 02 Dec 2015 21:51:36 -0800 (PST) Sender: Warner Losh Subject: Re: Removing build metadata, for reproducible kernel builds Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Content-Type: multipart/signed; boundary="Apple-Mail=_CBC91B2A-21A3-4072-B2FF-E00305341564"; protocol="application/pgp-signature"; micalg=pgp-sha512 X-Pgp-Agent: GPGMail 2.5.2 From: Warner Losh In-Reply-To: Date: Wed, 2 Dec 2015 22:51:29 -0700 Cc: "freebsd-arch@freebsd.org" Message-Id: References: To: Ed Maste X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 05:51:38 -0000 --Apple-Mail=_CBC91B2A-21A3-4072-B2FF-E00305341564 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Dec 2, 2015, at 10:36 AM, Ed Maste wrote: >=20 > The main issue currently preventing kernel builds from being > reproducible[1] is the build metadata itself that's included (time, > user, host, build path). In order to make the kernel build > reproducible I plan to remove these by default, and add a src.conf > knob to enable them for developers who want them in their own builds. >=20 > The user-facing effect of this is that the kern.version sysctl no > longer conveys this information, and uname -a changes from something > like: >=20 > FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 > r288681: Mon Oct 5 01:40:11 UTC 2015 > peter@build-11.freebsd.org:/usr/obj/usr/src/sys/CLUSTER11 amd64 >=20 > to something like: >=20 > FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44 > r288174+7644546(stable-10) amd64 >=20 > The current version of the change is available for review at > https://reviews.freebsd.org/D4347. >=20 > [1] See https://reproducible-builds.org/ for more information on the > reproducible builds project. I noted in the review that I don=E2=80=99t like the default being no. I also don=E2=80=99t like that we=E2=80=99re growing lots of different = knobs that need to be set to get a repeatable build. Let=E2=80=99s have one, or barring = that, let=E2=80=99s have one that sets all the sub-knobs. I think that host and path are more worthless than date and time in many environments. Who builds it likewise. Those are all things that are likely to change between builds, yet change the kernel image. I=E2=80=99d rather see it all gone when this option is in effect. And I=E2=80=99d rather see the default be to the historical behavior. The build number too is kinda lame here, since that=E2=80=99s just a = history of the number of tries. If you are building from svn, it should be zero. But if you=E2=80=99re rebuilding, you can easily get that number = over 100 as you update from rev to rev and reboot. It=E2=80=99s better to = have the date / time of the build so if you are seeing a problem on a test machine, you=E2=80=99ll know more firmly if the build has that = thing you fixed yesterday afternoon or not by the date / time it was built, and by whom (since my kernels after 9:15am have the fix, but nobody else does before 2:00pm since that=E2=80=99s when I checked it in). So I see the need for the feature, in general. But this doesn=E2=80=99t implement a reproducible build due to the build number, the user, the host and the path still being encoded into it. That makes the change to remove date / time completely arbitrary which is annoying because they are useful in many environments where it would be difficult to force everybody to =E2=80=98opt in=E2=80=99= to having them included. It=E2=80=99s easier to opt-out the release process. Warner --Apple-Mail=_CBC91B2A-21A3-4072-B2FF-E00305341564 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJWX9hhAAoJEGwc0Sh9sBEAhHQP/28t8+/3E449+eFJJbHN/i11 TfnfoGz2if+e7U8hAYgf/BOgXI6VSeksqhUnAU/Udp6MF0MvEPchCq4o2bMqVE8y czrVUpCvX3rF69s9r3cz3pVOAF8TUpaNTah7hdlSi6RPSCvyB2jt9wC7exFPW0tU wLmxJ9R4mvGYbcH+8GuwqRHLwJ6SWEJkgkeSOiiqLsEBfBrBeJqZmJ5azx9luAom Uafq2OOP4R2A9BzsyX4IlvoEoEcjsZUne1Wo/dG7HqzAuRV6HsOATCtKvs5nRn+r GqHZy7+O8Fg2UEGUElhAU+Y9tVQkPoPXAoM0zD0VvQ0kQ57MFOpYKwumsgFHrhOJ RuluS8uq0i2Dfghxh9a29zy5QqfKrxi+GAiHnKb1rXwsVWvzYSxu+sK4p2UgohZW +cXSNlwjI5D0ieWeY/NAv3cekLJfsHM/9Gt8x+skvOsnuDuOKfkA5Dw64XXhSJ+m yYlmbvNqj7Z+3QaSaQ0j+3LJgOhEKhTtnudJaxIQ0HvVpzAWNiQH5ykQ90uXMsaj FrjXSIboSgf/brJ68eB5BhPmFa9fBpfocbCT93M9rKA+E5InVCdc/co1ymhHF55o CImCCQ3jIlsM6yh+WXLz25LC6GWh57lYbL7fGPIT7zlzO5ebi9t6can40BO/E16M J64HyAK/hxQx7ayAiWdG =n6OE -----END PGP SIGNATURE----- --Apple-Mail=_CBC91B2A-21A3-4072-B2FF-E00305341564-- From owner-freebsd-arch@freebsd.org Thu Dec 3 07:55:25 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 26193A3F2D0 for ; Thu, 3 Dec 2015 07:55:25 +0000 (UTC) (envelope-from carpeddiem@gmail.com) Received: from mail-io0-x229.google.com (mail-io0-x229.google.com [IPv6:2607:f8b0:4001:c06::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E66D51C21 for ; Thu, 3 Dec 2015 07:55:24 +0000 (UTC) (envelope-from carpeddiem@gmail.com) Received: by ioir85 with SMTP id r85so72884391ioi.1 for ; Wed, 02 Dec 2015 23:55:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=85/ACHSYAOkcYzt/Kh4kprumTncuc8Pwon83qKPVa9c=; b=ZkgtyTXjDm/rwRK6Ks0EYwrYiKY43t8wlghKI3JPNR7bcnlsqQ0u6X1gkD6+Pw4yQh ZzTngcCcZm2LVHLzMAdK3L3r+wbnE0aSks0iD1x0+fd0yoKRxtevHj5QBia41QWad3Ya 8qHpn/yus3tt0Mcw2Q1CFhhMyW7prjAuV47kLNLdaxmSXAec04yRmBxNUdrQWv0u+9Jx ihB/Tuvskme41JJiXgtD1dcHRS7C0dLHUxdI1FjZEwJhTsnS/Ve71TvfAV7uV1W/Y8yN KlqoudbG11jBTM/ml9TMPVdgfUmlA214QHPfVinT0ROCyGVRbY44wrC6FMo+XG3dL83f JCLw== X-Received: by 10.107.30.80 with SMTP id e77mr7676765ioe.180.1449129324170; Wed, 02 Dec 2015 23:55:24 -0800 (PST) MIME-Version: 1.0 Sender: carpeddiem@gmail.com Received: by 10.107.169.85 with HTTP; Wed, 2 Dec 2015 23:55:04 -0800 (PST) In-Reply-To: References: From: Ed Maste Date: Thu, 3 Dec 2015 07:55:04 +0000 X-Google-Sender-Auth: qEfMSEAxMWPlUWNbB8LtOIzTY54 Message-ID: Subject: Re: Removing build metadata, for reproducible kernel builds To: Warner Losh Cc: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 07:55:25 -0000 On 3 December 2015 at 05:51, Warner Losh wrote: > > I noted in the review that I don=E2=80=99t like the default being no. > > I also don=E2=80=99t like that we=E2=80=99re growing lots of different kn= obs that need > to be set to get a repeatable build. Let=E2=80=99s have one, or barring t= hat, > let=E2=80=99s have one that sets all the sub-knobs. My hope is that we'll have a reproducible build by default, and that *no* knobs need to be set. That's what I intend with my patch. I can rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's generally desired. If there's a consensus to default to including the metadata I'm fine with setting it in make release. > I think that host and path are more worthless than date and time > in many environments. Who builds it likewise. Those are all things > that are likely to change between builds, yet change the kernel > image. I=E2=80=99d rather see it all gone when this option is in effect. I don't follow -- other than the build iteration number (which I indeed missed), it is all gone. From owner-freebsd-arch@freebsd.org Thu Dec 3 08:07:17 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 685C2A3F77F for ; Thu, 3 Dec 2015 08:07:17 +0000 (UTC) (envelope-from yaneurabeya@gmail.com) Received: from mail-pa0-x235.google.com (mail-pa0-x235.google.com [IPv6:2607:f8b0:400e:c03::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3A1331360; Thu, 3 Dec 2015 08:07:17 +0000 (UTC) (envelope-from yaneurabeya@gmail.com) Received: by pacdm15 with SMTP id dm15so64142209pac.3; Thu, 03 Dec 2015 00:07:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=SecHbEKAc3UE9XikYPiYzaP91dDG+QF2MvRjnimUDwQ=; b=WTP62j25nGQAD9DvsAIN3WYz+Pzw8Y7vzMNyW5mTozJueVia9ZFMIvLbELjIxuYJYV hkhMezZV9FY0zde3jmCCBrf7yeDudKEDrEHomXHr69959s3VHPb7FcfSzNB/3lH4ySLG AFujO68XO+NxUB/HyKTfmuIKu/w0rN9cFla4MKYVqJ7Naj0YvwmeXx7CQPrTSdhhbHcO 6Os4itBi6Lwe+CjhLleETx1HvwrPKppUlDw9eQsXMzDdCJLYpSWzRIuKCKevIhDgY+NW XXPq3JwfoSMgJoZJn5AYgVZ10WGOSextRsE2+pmPIwRFaRWs807zzvad+E8DOdDqglPu syGA== X-Received: by 10.98.64.136 with SMTP id f8mr11227671pfd.95.1449130036852; Thu, 03 Dec 2015 00:07:16 -0800 (PST) Received: from [192.168.20.7] (c-24-16-212-205.hsd1.wa.comcast.net. [24.16.212.205]) by smtp.gmail.com with ESMTPSA id r79sm8801399pfa.61.2015.12.03.00.07.15 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 03 Dec 2015 00:07:15 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Removing build metadata, for reproducible kernel builds From: NGie Cooper In-Reply-To: Date: Thu, 3 Dec 2015 00:07:14 -0800 Cc: Warner Losh , "freebsd-arch@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: <4D787F21-4607-44F0-9CA8-CB2323DD72AA@gmail.com> References: To: Ed Maste X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 08:07:17 -0000 > On Dec 2, 2015, at 23:55, Ed Maste wrote: >=20 > On 3 December 2015 at 05:51, Warner Losh wrote: >>=20 >> I noted in the review that I don=E2=80=99t like the default being no. >>=20 >> I also don=E2=80=99t like that we=E2=80=99re growing lots of = different knobs that need >> to be set to get a repeatable build. Let=E2=80=99s have one, or = barring that, >> let=E2=80=99s have one that sets all the sub-knobs. >=20 > My hope is that we'll have a reproducible build by default, and that > *no* knobs need to be set. That's what I intend with my patch. I can > rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's > generally desired. If there's a consensus to default to including the > metadata I'm fine with setting it in make release. >=20 >> I think that host and path are more worthless than date and time >> in many environments. Who builds it likewise. Those are all things >> that are likely to change between builds, yet change the kernel >> image. I=E2=80=99d rather see it all gone when this option is in = effect. >=20 > I don't follow -- other than the build iteration number (which I > indeed missed), it is all gone. I personally like being able to debug when user A builds on machine X vs = user B on machine Y =E2=80=94 because it's helped me find issues with = peoples=E2=80=99 build environments in the past where I could have ended = up pulling teeth. I think the single-knob src.conf knob approach is wrong though. Why not = document how to do it with build(7) and tweak newvers.sh to do this = (which drives this to begin with)? That would generalize the solution, = accomplish this goal, and help $work accomplish this goal, because right = now we ($work) hack newvers.sh in order to change the version = information to brand the product appropriately, instead of build upon = existing infrastructure, as the existing infrastructure is not flexible = and documented and is very static. Thanks, -NGie= From owner-freebsd-arch@freebsd.org Thu Dec 3 09:29:22 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 39C84A3CCDD for ; Thu, 3 Dec 2015 09:29:22 +0000 (UTC) (envelope-from erik+lists@cederstrand.dk) Received: from mailrelay5.public.one.com (mailrelay5.public.one.com [91.198.169.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 718A814F3 for ; Thu, 3 Dec 2015 09:29:21 +0000 (UTC) (envelope-from erik+lists@cederstrand.dk) X-HalOne-Cookie: 6c000cc0dac24553dbd60456d335038214bcaeb8 X-HalOne-ID: 2a548f93-99a0-11e5-be90-b82a72d03b9b DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=cederstrand.dk; s=20140924; h=from:subject:date:message-id:to:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references; bh=NPgf0CYPfvuXpBFThBClrTAqx/+BMzwZDTLQt7sNSQI=; b=dPcnfimEwJi/2kISzzIfvRCNtnSx7BQeYs4/bOqZPATTGtX64UGVGKrQJb+1uBqH28Gow84iw4SjJ o+fRs+hHJ00odEREcp2UoB1CrvmxX3OugJVCJY/s3ei1dH2ZvgABLC/7UksC7Ufiaynlqq2Hz76G1C 1mfoi+eTssbI1ye8= Received: from [192.168.1.76] (unknown [217.157.7.221]) by smtpfilter2.public.one.com (Halon Mail Gateway) with ESMTPSA; Thu, 3 Dec 2015 09:28:08 +0000 (GMT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.1 \(3096.5\)) Subject: Re: Removing build metadata, for reproducible kernel builds From: Erik Cederstrand In-Reply-To: <1920964.NJpSim6qZF@ralph.baldwin.cx> Date: Thu, 3 Dec 2015 10:28:10 +0100 Cc: freebsd-arch@freebsd.org, Ed Maste Content-Transfer-Encoding: quoted-printable Message-Id: References: <1920964.NJpSim6qZF@ralph.baldwin.cx> To: John Baldwin X-Mailer: Apple Mail (2.3096.5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 09:29:22 -0000 > Den 2. dec. 2015 kl. 21.03 skrev John Baldwin : >=20 > As I noted in the review, this will break kgdb -n (and possibly = crashinfo, > less certain about that). Keeping the path (which should not vary if = you > build out of the same tree) will be sufficient to let kgdb -n still = work > (though it may need some changes to recognize both formats). Would it be feasible to include the relative build path instead of the = absolute path? I seem to remember patches floating around for the = __FILE__ macro, but I don't know if (k)gdb can work with relative paths. Erik= From owner-freebsd-arch@freebsd.org Thu Dec 3 09:58:34 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7775DA3E2D1 for ; Thu, 3 Dec 2015 09:58:34 +0000 (UTC) (envelope-from uebayasi@gmail.com) Received: from mail-io0-x22d.google.com (mail-io0-x22d.google.com [IPv6:2607:f8b0:4001:c06::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 468321318; Thu, 3 Dec 2015 09:58:34 +0000 (UTC) (envelope-from uebayasi@gmail.com) Received: by ioir85 with SMTP id r85so75546245ioi.1; Thu, 03 Dec 2015 01:58:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=APiD3O6uTQKr5fN7F5fK4BKYx84f5e5Cip3+HcVNy8A=; b=qNxvsfy3+ip8zKGVEJY+ZgE/ZlvKS1DAeLLXRnVwtyKPMNBPhQis57SV1NLVlwOaRq pc1ko+bODSPYGOcoEa/idrooTeDqPW5Kxpviw9TquSaAtyAjPeDsEs1IDuZKLGSyj701 1OP/vJLfreZ5mYpVC+eZCriohH7OYRBTx7fpbQDFGSAYPHNjt+5mO1EDYnL+xrhK6SVI o1P5EXRBTHPixtk+3mH/asHrFOkC2hryn11TN4UYK3VJ/QPebg1TbWiPxYzI42OyrKZ4 5n/u7u5a3L1aa0vRtGiyGlgtC+FZzroJIsBMBHadlkBXs1Zxe09pRuTiTIWemoq+wN7i a5xw== MIME-Version: 1.0 X-Received: by 10.107.132.11 with SMTP id g11mr9502752iod.56.1449136713017; Thu, 03 Dec 2015 01:58:33 -0800 (PST) Received: by 10.64.18.80 with HTTP; Thu, 3 Dec 2015 01:58:32 -0800 (PST) In-Reply-To: <71D3DCA2-B336-4849-88E3-8412F8A93324@kientzle.com> References: <0A51B6D4-9EDD-4EFF-876F-C6B515DBB4F3@kientzle.com> <71D3DCA2-B336-4849-88E3-8412F8A93324@kientzle.com> Date: Thu, 3 Dec 2015 18:58:32 +0900 Message-ID: Subject: Re: mtree "language" enhancements From: Masao Uebayashi To: Tim Kientzle Cc: Warner Losh , "Simon J. Gerraty" , Michal Ratajsky , Brooks Davis , "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 09:58:34 -0000 On Tue, Dec 1, 2015 at 11:31 AM, Tim Kientzle wrote: > >> On Nov 29, 2015, at 9:49 PM, Warner Losh wrote: >> >> On Sun, Nov 29, 2015 at 9:28 PM, Tim Kientzle wrote: >> >>> >>>> On Nov 29, 2015, at 2:49 PM, Tim Kientzle wrote: >>>> >>>> Simon also asked: >>>>> Indeed I'd really like the ability to provide default uid/gid >>>>> for the case that a uname/gname cannot be looked up. >>>> >>>> I think 'tar' got this right: If uname and uid are both specified, th= en >>> look up uname and if that fails, use the specified uid. Ditto for >>> gname/gid. In particular, this lets a single specification be used to >>> rebuild a tree on another system with different UIDs or on a system tha= t >>> does not (yet) have a full password file. An option could be provided = for >>> the (rare) case that someone really wants to prefer UIDs to unames. >>> >>> On further reflection, preferring UIDs to unames would actually be pret= ty >>> common here. >>> >>> In particular, NanoBSD (and Crochet and other similar tools) should pre= fer >>> the UID when building images instead of looking up unames against the b= uild >>> host's password file. >> >> >> I've implemented what we've talked about, except this. When doing the >> makefs, we should use the /etc/master_password that's inside the image i= n >> preference to either of these alternatives. That's the most correct thin= g >> to do: use as much of the data as you can, as late as you can. >> >> The thing I'm struggling with now is why would both be present? Would th= at >> indicate an error? Or someone changing the defaults? And if they are >> changing the defaults, why use a uid in preference to a uname? Is this t= o >> avoid contamination? To set something not in the password file, or just >> comfort level of the user? FreeBSD will write unames for install*. >> >> So I'm left thinking that maybe the rule should be 'last one wins' at le= ast >> for the use case where we use the target's /etc/master_password. That's >> what I've actually implemented. > > There are two key cases that drove this design for tar: > > 1. Handling user info that is not (yet) in the target password file. In= practice, images get built up in different orders: I might add a bunch of= new files owned by a new user before some other process gets a chance to a= dd the user. When you say "image", you surely mean "file-system image". File-system image contains on-disk data (inode), which contains UID/GID instead of symbolic ones (uname/gname). When you decide to create an image, you have a whole tree (directories/files) that ends up in a generated file-system image. Which means that when you create an image, you must know all the files and UIDs/GIDs put there. If not, what you are creating should not be an image. If you don't know UIDs/GIDs, can't you just create a tar archive, and extract it when you really create an image later? I don't really want mtree(1) unnecessarily smart so it makes unnecessary decisions. I want it to be simple and deterministic. > 2. Restoring info when the target has different user numbering than the = host. (Or when the user isn=E2=80=99t in the host password file at all.) > > For #1, you need the UID since the uname can=E2=80=99t be looked up anywh= ere. For #2, you must have the uname since the UID would be wrong. An ima= ge that can work in either scenario needs to have both. > > For NanoBSD, you may be able to enforce that users are always present in = the target password file before any data owned by those users is added to t= he image. So it may be reasonable to just rely on uname everywhere for now= . > > Tim > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@freebsd.org Thu Dec 3 16:57:48 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 83197A3F5E0 for ; Thu, 3 Dec 2015 16:57:48 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5F0DF12C6; Thu, 3 Dec 2015 16:57:48 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 790C5B986; Thu, 3 Dec 2015 11:57:47 -0500 (EST) From: John Baldwin To: Erik Cederstrand Cc: freebsd-arch@freebsd.org, Ed Maste Subject: Re: Removing build metadata, for reproducible kernel builds Date: Thu, 03 Dec 2015 08:51:29 -0800 Message-ID: <1758086.qmdp4H277Z@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: References: <1920964.NJpSim6qZF@ralph.baldwin.cx> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 03 Dec 2015 11:57:47 -0500 (EST) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 16:57:48 -0000 On Thursday, December 03, 2015 10:28:10 AM Erik Cederstrand wrote: > > > Den 2. dec. 2015 kl. 21.03 skrev John Baldwin : > > > > As I noted in the review, this will break kgdb -n (and possibly crashinfo, > > less certain about that). Keeping the path (which should not vary if you > > build out of the same tree) will be sufficient to let kgdb -n still work > > (though it may need some changes to recognize both formats). > > Would it be feasible to include the relative build path instead of the absolute path? I seem to remember patches floating around for the __FILE__ macro, but I don't know if (k)gdb can work with relative paths. This is what kgdb -n does: /* * No kernel image here. Parse the dump header. The kernel object * directory can be found there and we probably have the kernel * image still in it. The object directory may also have a kernel * with debugging info (called kernel.debug). If we have a debug * kernel, use it. */ snprintf(path, sizeof(path), "%s/info.%d", crashdir, nr); info = fopen(path, "r"); if (info == NULL) { warn("%s", path); return; } while (fgets(path, sizeof(path), info) != NULL) { l = strlen(path); if (l > 0 && path[l - 1] == '\n') path[--l] = '\0'; if (strncmp(path, " ", 4) == 0) { s = strchr(path, ':'); s = (s == NULL) ? path + 4 : s + 1; l = snprintf(path, sizeof(path), "%s/kernel.debug", s); if (stat(path, &st) == -1 || !S_ISREG(st.st_mode)) { path[l - 6] = '\0'; if (stat(path, &st) == -1 || !S_ISREG(st.st_mode)) break; } kernel = strdup(path); break; } } fclose(info); It basically pulls the path from the 'version' string in the /var/crash/info.X line, appends 'kernel.debug' to it and sees if there is a file with that pathname. If so, it uses it. This means it doesn't find a kernel in some /boot/foo, it looks in the build directory. crashinfo instead finds all the 'kernel' files under /boot, extracts the version string using gdb from each kernel, and does a string compare with the version string in info.X. For this reason, crashinfo will still work if each string is unique. However, with the proposal, kernels built with different kernel configs from the same tree would have the same version string, thus being indistinguishable. A more robust solution than the string compares would be build-id, but that requires a newer linker which we don't have. -- John Baldwin From owner-freebsd-arch@freebsd.org Thu Dec 3 19:53:14 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 37E89A4076E for ; Thu, 3 Dec 2015 19:53:14 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qg0-x231.google.com (mail-qg0-x231.google.com [IPv6:2607:f8b0:400d:c04::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DCAC91302 for ; Thu, 3 Dec 2015 19:53:13 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by qgeb1 with SMTP id b1so70741315qge.1 for ; Thu, 03 Dec 2015 11:53:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=2yGgsK0xacoBajJU0rwl6zKhrhZfywqOM9uEa4as70c=; b=VhY157tpUTs3vlSqawyBD9S5x0IZ5iwNYlVwaKAbe6P+MNCSjCpcjoEBxRLiJIldyw 6t/zgIybyjwSe1wZ0U7o1IcoJZ+KSg9Sfmp6tz4+MvqphBFecID+QPF8irOznJgaHJPh vHSPN6Ttclll/q4hJzpo/nJRpu4URTy/RbzgUgi+ix6FGChY7r5J6e2xGBIqDAVCfqqJ LLs1mirP270Z1flAyCk5RooD6gaIa/rj1HoRIp6elO2xfX40fonzpf5xTRQVOrgOkLMT d+sUMl2xJZsZFNEb7SRY+u9ksUC4nFar33rpC5ECOLPS1sovzWpwmdD3qBQngSBE21Xl yIWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=2yGgsK0xacoBajJU0rwl6zKhrhZfywqOM9uEa4as70c=; b=MPue92uE2hftGb8T7vuHQ1aP5vzOX+IZu041HJYR1ToLxLlCf81GEuqpNeFat7mEWf 1jVDeLEIVeK9OtjdErmCoKMAN6FclKI2O5TyUy0c7+3Pbbkw7R/b84OlA+Uc6HijGKcA VlNQtY1tqlmKiulSrqAgOcMGUD62brQefEl0XrVxN0V4ZWUXPSi1zj/d0IXkMWOyiYFs V44Lw9v43CBdXsUZcSz7dZcgX8uwIehKllcCMG+g9r0K4UIpj0J0sKIK/kyHM3i+9KQb xYnlHLLufQqD6oXhitkE8hce4j5Azbspq9ytDRTxHziRKKCKvYvVn66DdGiW1FVRw2A1 BDFw== X-Gm-Message-State: ALoCoQlpPEDdIsp7u9HW0Ucmz13zg+f1U+nL7OtZtRmlo+j41D/xWmfRCpeuy9GOxcytd6rNLho7 MIME-Version: 1.0 X-Received: by 10.140.99.86 with SMTP id p80mr13259615qge.97.1449172393008; Thu, 03 Dec 2015 11:53:13 -0800 (PST) Sender: wlosh@bsdimp.com Received: by 10.140.27.181 with HTTP; Thu, 3 Dec 2015 11:53:12 -0800 (PST) X-Originating-IP: [192.55.54.58] In-Reply-To: References: Date: Thu, 3 Dec 2015 12:53:12 -0700 X-Google-Sender-Auth: aNhThc4og4RZsS3FxZNxED3AQtc Message-ID: Subject: Re: Removing build metadata, for reproducible kernel builds From: Warner Losh To: Ed Maste Cc: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 19:53:14 -0000 On Thu, Dec 3, 2015 at 12:55 AM, Ed Maste wrote: > On 3 December 2015 at 05:51, Warner Losh wrote: > > > > I noted in the review that I don=E2=80=99t like the default being no. > > > > I also don=E2=80=99t like that we=E2=80=99re growing lots of different = knobs that need > > to be set to get a repeatable build. Let=E2=80=99s have one, or barring= that, > > let=E2=80=99s have one that sets all the sub-knobs. > > My hope is that we'll have a reproducible build by default, and that > *no* knobs need to be set. That's what I intend with my patch. I can > rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's > generally desired. If there's a consensus to default to including the > metadata I'm fine with setting it in make release. I think this an unwise decision in the current form suggested. The kernel metadata has saved my butt enough times I really don't want to see it go by default. But see below for a reasonable (imho) middle ground that would be a good default. > > I think that host and path are more worthless than date and time > > in many environments. Who builds it likewise. Those are all things > > that are likely to change between builds, yet change the kernel > > image. I=E2=80=99d rather see it all gone when this option is in effect= . > > I don't follow -- other than the build iteration number (which I > indeed missed), it is all gone. > Yea I was reading things backwards. In the review, I suggested that if you've modified the tree (which the SCM will tell you), then do the old format to preserve useful metadata that's really really needed and if not to use the shorter version. When you've modified the tree, reproducible builds aren't a concern at all. Warner From owner-freebsd-arch@freebsd.org Thu Dec 3 21:15:28 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CFFE8A4095D for ; Thu, 3 Dec 2015 21:15:28 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from pmta2.delivery6.ore.mailhop.org (pmta2.delivery6.ore.mailhop.org [54.200.129.228]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AC7B51241 for ; Thu, 3 Dec 2015 21:15:28 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from ilsoft.org (unknown [73.34.117.227]) by outbound2.ore.mailhop.org (Halon Mail Gateway) with ESMTPSA; Thu, 3 Dec 2015 21:16:02 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.14.9/8.14.9) with ESMTP id tB3LFPQW019897; Thu, 3 Dec 2015 14:15:25 -0700 (MST) (envelope-from ian@freebsd.org) Message-ID: <1449177325.6214.14.camel@freebsd.org> Subject: Re: Removing build metadata, for reproducible kernel builds From: Ian Lepore To: Warner Losh , Ed Maste Cc: "freebsd-arch@freebsd.org" Date: Thu, 03 Dec 2015 14:15:25 -0700 In-Reply-To: References: Content-Type: text/plain; charset="iso-8859-7" X-Mailer: Evolution 3.16.5 FreeBSD GNOME Team Port Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 21:15:28 -0000 On Thu, 2015-12-03 at 12:53 -0700, Warner Losh wrote: > On Thu, Dec 3, 2015 at 12:55 AM, Ed Maste wrote: > > > On 3 December 2015 at 05:51, Warner Losh wrote: > > > > > > I noted in the review that I don¢t like the default being no. > > > > > > I also don¢t like that we¢re growing lots of different knobs that need > > > to be set to get a repeatable build. Let¢s have one, or barring that, > > > let¢s have one that sets all the sub-knobs. > > > > My hope is that we'll have a reproducible build by default, and that > > *no* knobs need to be set. That's what I intend with my patch. I can > > rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's > > generally desired. If there's a consensus to default to including the > > metadata I'm fine with setting it in make release. > > > I think this an unwise decision in the current form suggested. The kernel > metadata has saved my butt enough times I really don't want to see it > go by default. But see below for a reasonable (imho) middle ground that > would be a good default. > I'm curious why anyone wants this enabled by default, like... are we missing something? Does it improve freebsd-update behavior maybe? If it's just for some general "reproducibility is good" philosophy then I would counter with "information is even better, so don't throw it away without a good reason." Reproducibility is good for some people, and completely useless for others, and the people who need it aren't going to mind turning on a knob or two to get what they want. > > > > I think that host and path are more worthless than date and time > > > in many environments. Who builds it likewise. Those are all things > > > that are likely to change between builds, yet change the kernel > > > image. I¢d rather see it all gone when this option is in effect. > > > > I don't follow -- other than the build iteration number (which I > > indeed missed), it is all gone. > > > > Yea I was reading things backwards. > > In the review, I suggested that if you've modified the tree (which the SCM > will tell you), then do the old format to preserve useful metadata that's > really really needed and if not to use the shorter version. When you've > modified the tree, reproducible builds aren't a concern at all. > How are you going to determine what consitutes a modified tree? What you think of as modifications may be what I call my baseline version. -- Ian From owner-freebsd-arch@freebsd.org Thu Dec 3 21:35:13 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 435B8A40D8E for ; Thu, 3 Dec 2015 21:35:13 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: from mail-oi0-x236.google.com (mail-oi0-x236.google.com [IPv6:2607:f8b0:4003:c06::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EB68D14D6; Thu, 3 Dec 2015 21:35:12 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: by oige206 with SMTP id e206so57918097oig.2; Thu, 03 Dec 2015 13:35:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=nj8CNBTElQ28vL0hdPgrWzSJMhtF413RFTLGl5YyIxs=; b=G+cVPnn8xHbnkJsw6dBAmQj7C8HJM/eBBM9s21zGtyeyPpGVUgK9svloLQ9xM2+ys/ 3AYVAB9Vt3Hqi0R8ReW/9Xw+8kVhBS5eQ1S2qXSmUUReS9HuvVztl420GqCTBu8ygZIP XdxbLJpJgko7zeVirlW0arn0NgYJfqLNCWPW/C8yvcgKlwxSL7uwOMNplzmL4VpAYkfj VPwXCBSCN9OVronxcwkeScSUiqeCB3DS33Sszr/RtWGpuEBuiLzB+6yoVDXh9/H7vuUc JnERcwK1Au6bdiNIaaxR63E7wN8Xb8XjfhYuQKOrHDn8518opIt4ymFV3xxVz850ILec B2yQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alumni-cwru-edu.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=nj8CNBTElQ28vL0hdPgrWzSJMhtF413RFTLGl5YyIxs=; b=YywU7wYhfP7N2E14HzJHJrE1X9JZTlAqYKwEFj9E1efDlhdTfAX3GTp265Ti647pyF vvsHxMJ0IUsnWHeEeVKr3dvKCanj6nEA581sJMvvb/T2PHfPHjKuhe/JpoK0O7E2+NTp kZ57QCu+ffJDyAVxuNmx09U/4wpCKrSjU4Nap8p7LblLhdveLe/UEJycl7zBXUKJ1cw+ e9H7U1XTS3XUXBMybz68veJ59/jYfKeO9p51EBO6GruQ+1wwRgV/wuXLddKeewpiKyDH qoTkIxsGV5L6fdQBjz+MvFPxJJgMUsdFTlpELFcEkxljKesVHn19HHSjEHmuggiaF/bF o4Hw== MIME-Version: 1.0 X-Received: by 10.202.201.67 with SMTP id z64mr9709071oif.24.1449178512222; Thu, 03 Dec 2015 13:35:12 -0800 (PST) Sender: chmeeedalf@gmail.com Received: by 10.182.210.195 with HTTP; Thu, 3 Dec 2015 13:35:12 -0800 (PST) In-Reply-To: <1449177325.6214.14.camel@freebsd.org> References: <1449177325.6214.14.camel@freebsd.org> Date: Thu, 3 Dec 2015 15:35:12 -0600 X-Google-Sender-Auth: 79E3E62BokBUMmA0Poxq2m0_LU8 Message-ID: Subject: Re: Removing build metadata, for reproducible kernel builds From: Justin Hibbits To: Ian Lepore Cc: Warner Losh , Ed Maste , "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 21:35:13 -0000 On Thu, Dec 3, 2015 at 3:15 PM, Ian Lepore wrote: > On Thu, 2015-12-03 at 12:53 -0700, Warner Losh wrote: >> On Thu, Dec 3, 2015 at 12:55 AM, Ed Maste wrote: >> >> > On 3 December 2015 at 05:51, Warner Losh wrote: >> > > >> > > I noted in the review that I don=E2=80=99t like the default being no= . >> > > >> > > I also don=E2=80=99t like that we=E2=80=99re growing lots of differe= nt knobs that need >> > > to be set to get a repeatable build. Let=E2=80=99s have one, or barr= ing that, >> > > let=E2=80=99s have one that sets all the sub-knobs. >> > >> > My hope is that we'll have a reproducible build by default, and that >> > *no* knobs need to be set. That's what I intend with my patch. I can >> > rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's >> > generally desired. If there's a consensus to default to including the >> > metadata I'm fine with setting it in make release. >> >> >> I think this an unwise decision in the current form suggested. The kerne= l >> metadata has saved my butt enough times I really don't want to see it >> go by default. But see below for a reasonable (imho) middle ground that >> would be a good default. >> > > I'm curious why anyone wants this enabled by default, like... are we > missing something? Does it improve freebsd-update behavior maybe? > > If it's just for some general "reproducibility is good" philosophy then > I would counter with "information is even better, so don't throw it > away without a good reason." > > Reproducibility is good for some people, and completely useless for > others, and the people who need it aren't going to mind turning on a > knob or two to get what they want. > >> >> > > I think that host and path are more worthless than date and time >> > > in many environments. Who builds it likewise. Those are all things >> > > that are likely to change between builds, yet change the kernel >> > > image. I=E2=80=99d rather see it all gone when this option is in eff= ect. >> > >> > I don't follow -- other than the build iteration number (which I >> > indeed missed), it is all gone. >> > >> >> Yea I was reading things backwards. >> >> In the review, I suggested that if you've modified the tree (which the S= CM >> will tell you), then do the old format to preserve useful metadata that'= s >> really really needed and if not to use the shorter version. When you've >> modified the tree, reproducible builds aren't a concern at all. >> > > How are you going to determine what consitutes a modified tree? What > you think of as modifications may be what I call my baseline version. > > -- Ian svnversion resulting in a 'nnnnnnM'? - Justin From owner-freebsd-arch@freebsd.org Thu Dec 3 21:42:13 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D65F3A40EAA for ; Thu, 3 Dec 2015 21:42:13 +0000 (UTC) (envelope-from jonathan@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id C1DC2196D; Thu, 3 Dec 2015 21:42:13 +0000 (UTC) (envelope-from jonathan@FreeBSD.org) Received: from [192.168.42.104] (localhost [IPv6:::1]) by freefall.freebsd.org (Postfix) with ESMTP id 24E9D11EA; Thu, 3 Dec 2015 21:42:13 +0000 (UTC) (envelope-from jonathan@FreeBSD.org) From: "Jonathan Anderson" To: "Ian Lepore" Cc: "Warner Losh" , "Ed Maste" , "freebsd-arch@freebsd.org" Subject: Re: Removing build metadata, for reproducible kernel builds Date: Thu, 03 Dec 2015 18:11:27 -0330 Message-ID: In-Reply-To: <1449177325.6214.14.camel@freebsd.org> References: <1449177325.6214.14.camel@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Mailer: MailMate (1.9.3r5187) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 21:42:14 -0000 On 3 Dec 2015, at 17:45, Ian Lepore wrote: > I'm curious why anyone wants this enabled by default, like... are we > missing something? Does it improve freebsd-update behavior maybe? There is value in being able to reproduce the things you run, especially if you download them from somebody else (like releases or binary packages). It's not a panacea (see "Reflections on Trusting Trust"), but it’s helpful, even if you don't always do the reproduction work. The very fact that someone *can* check a binary release for naughtiness is a strong incentive for many adversaries not to try their hand. > If it's just for some general "reproducibility is good" philosophy > then > I would counter with "information is even better, so don't throw it > away without a good reason." When you're building your own stuff, sure, it might help to know that this is the kernel you built on "this machine" at "that time". When running 10.2-RELEASE-p7, however, it’s not very useful to know that it was built on amd64-builder.daemonology.net, or that the source tree was located at /usr/src. It *might* be useful to know that {set of people} all got kernels that hash to {some bit pattern} when they reproduced the build (like Certificate Transparency). Or, more interestingly, that {people using some configuration} got a different result. Again, like Certificate Transparency. :) > Reproducibility is good for some people, and completely useless for > others, and the people who need it aren't going to mind turning on a > knob or two to get what they want. Possibly. I don't have any strong opinions on whether the default is "reproducible" or "full of information that helps me identify busted kernelsâ€, just so long as "reproducible" is available and easy to turn on. And my personal opinion is that it should be turned on for public releases: I think that being able to validate the kernel is more important than knowing what machine it was built on. >> Yea I was reading things backwards. >> >> In the review, I suggested that if you've modified the tree (which >> the SCM >> will tell you), then do the old format to preserve useful metadata >> that's >> really really needed and if not to use the shorter version. When >> you've >> modified the tree, reproducible builds aren't a concern at all. >> > > How are you going to determine what consitutes a modified tree? What > you think of as modifications may be what I call my baseline version. Since we host our code in Subversion and have an official Git mirror, how about svn status || git status? If you're basing your code off of anything other than an official mirror, you get to deal with the reproducibility problem yourself, but it sounds like many people in this camp would prefer the more verbose version string anyway. Jon -- Jonathan Anderson jonathan@FreeBSD.org From owner-freebsd-arch@freebsd.org Thu Dec 3 21:45:18 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 68D8AA40FFA for ; Thu, 3 Dec 2015 21:45:18 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from outbound1b.ore.mailhop.org (outbound1b.ore.mailhop.org [54.200.247.200]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 387B01BAD for ; Thu, 3 Dec 2015 21:45:17 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from ilsoft.org (unknown [73.34.117.227]) by outbound1.ore.mailhop.org (Halon Mail Gateway) with ESMTPSA; Thu, 3 Dec 2015 21:45:41 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.14.9/8.14.9) with ESMTP id tB3Lj9sS019950; Thu, 3 Dec 2015 14:45:09 -0700 (MST) (envelope-from ian@freebsd.org) Message-ID: <1449179109.6214.19.camel@freebsd.org> Subject: Re: Removing build metadata, for reproducible kernel builds From: Ian Lepore To: Justin Hibbits Cc: Warner Losh , Ed Maste , "freebsd-arch@freebsd.org" Date: Thu, 03 Dec 2015 14:45:09 -0700 In-Reply-To: References: <1449177325.6214.14.camel@freebsd.org> Content-Type: text/plain; charset="iso-8859-7" X-Mailer: Evolution 3.16.5 FreeBSD GNOME Team Port Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 21:45:18 -0000 On Thu, 2015-12-03 at 15:35 -0600, Justin Hibbits wrote: > On Thu, Dec 3, 2015 at 3:15 PM, Ian Lepore wrote: > > On Thu, 2015-12-03 at 12:53 -0700, Warner Losh wrote: > > > On Thu, Dec 3, 2015 at 12:55 AM, Ed Maste > > > wrote: > > > > > > > On 3 December 2015 at 05:51, Warner Losh > > > > wrote: > > > > > > > > > > I noted in the review that I don¢t like the default being no. > > > > > > > > > > I also don¢t like that we¢re growing lots of different knobs > > > > > that need > > > > > to be set to get a repeatable build. Let¢s have one, or > > > > > barring that, > > > > > let¢s have one that sets all the sub-knobs. > > > > > > > > My hope is that we'll have a reproducible build by default, and > > > > that > > > > *no* knobs need to be set. That's what I intend with my patch. > > > > I can > > > > rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if > > > > that's > > > > generally desired. If there's a consensus to default to > > > > including the > > > > metadata I'm fine with setting it in make release. > > > > > > > > > I think this an unwise decision in the current form suggested. > > > The kernel > > > metadata has saved my butt enough times I really don't want to > > > see it > > > go by default. But see below for a reasonable (imho) middle > > > ground that > > > would be a good default. > > > > > > > I'm curious why anyone wants this enabled by default, like... are > > we > > missing something? Does it improve freebsd-update behavior maybe? > > > > If it's just for some general "reproducibility is good" philosophy > > then > > I would counter with "information is even better, so don't throw it > > away without a good reason." > > > > Reproducibility is good for some people, and completely useless for > > others, and the people who need it aren't going to mind turning on > > a > > knob or two to get what they want. > > > > > > > > > > I think that host and path are more worthless than date and > > > > > time > > > > > in many environments. Who builds it likewise. Those are all > > > > > things > > > > > that are likely to change between builds, yet change the > > > > > kernel > > > > > image. I¢d rather see it all gone when this option is in > > > > > effect. > > > > > > > > I don't follow -- other than the build iteration number (which > > > > I > > > > indeed missed), it is all gone. > > > > > > > > > > Yea I was reading things backwards. > > > > > > In the review, I suggested that if you've modified the tree > > > (which the SCM > > > will tell you), then do the old format to preserve useful > > > metadata that's > > > really really needed and if not to use the shorter version. When > > > you've > > > modified the tree, reproducible builds aren't a concern at all. > > > > > > > How are you going to determine what consitutes a modified tree? > > What > > you think of as modifications may be what I call my baseline > > version. > > > > -- Ian > > svnversion resulting in a 'nnnnnnM'? > > - Justin > svnversion isn't going to be able to return anything useful inside one of my build sandboxes in which there is no hint of svn anything. -- Ian From owner-freebsd-arch@freebsd.org Thu Dec 3 21:49:54 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5C25A4006B for ; Thu, 3 Dec 2015 21:49:54 +0000 (UTC) (envelope-from carpeddiem@gmail.com) Received: from mail-ig0-x229.google.com (mail-ig0-x229.google.com [IPv6:2607:f8b0:4001:c05::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 864E41CE6; Thu, 3 Dec 2015 21:49:54 +0000 (UTC) (envelope-from carpeddiem@gmail.com) Received: by igvg19 with SMTP id g19so23041046igv.1; Thu, 03 Dec 2015 13:49:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=yWX4mgvJp7E9d7lUMkHrKOSLZ6wLcxU2oTQ2E7LCxuE=; b=VWlHpSRsXuz3ho9NBvGoScZzDrLFoeZvCstSycincsUNpRijf8alhCQaNpSypBNTv0 5Kf++iGJ9zWhlc8jrYPAX1LYa5P+aDJC6lCMYs/MYba5EnAMKcG1wc8h7lAgYlkyEDcI JSWzVy6WBZwvY1FoDBtRtW2bMMbjmqhbv47jK0gKwmwRb2fN3KD3FF5sxvp1WNeVgEg3 lwGoISawOVwUVkfLthwkFZ2hwy25DfFIG7iMFUb1uIAIwIabwvUtEHcoYrIvLYMjDJ3n xkMWAQp4lkz3wRU8KIfJijS4wEZaaRRVt5nq62YURoekkWsBhFMLA6BZ1QQKXjFWRGIN leRA== X-Received: by 10.50.43.234 with SMTP id z10mr955553igl.58.1449179393866; Thu, 03 Dec 2015 13:49:53 -0800 (PST) MIME-Version: 1.0 Sender: carpeddiem@gmail.com Received: by 10.107.169.85 with HTTP; Thu, 3 Dec 2015 13:49:34 -0800 (PST) In-Reply-To: References: <1449177325.6214.14.camel@freebsd.org> From: Ed Maste Date: Thu, 3 Dec 2015 21:49:34 +0000 X-Google-Sender-Auth: PFKC6SPood_iew-pf4vhPudvSRA Message-ID: Subject: Re: Removing build metadata, for reproducible kernel builds To: Justin Hibbits Cc: Ian Lepore , Warner Losh , "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 21:49:54 -0000 On 3 December 2015 at 21:35, Justin Hibbits wrote: > > svnversion resulting in a 'nnnnnnM'? Warner suggested this in the review also, and it might be a good way to choose a default. In any case it's clear that there's strong (and reasonable) objection to enabling this by default for all builds, so I'll not commit the change as-is. I believe there are three separate issues here: 1) It should be possible to build the kernel reproducibly. I hope this isn't contentious. 2) Control over enabling reproducible builds -- build knob or no, default to on/off, based on svnversion including 'M', forced on for release builds, etc. 3) Some tools rely on the current format / data, and will need to be fixed. I expect to make a change so that a reproducible build is possible, but not introduce a new knob or change anything by default. After that I'll work on the issues in #3 and once that's done we can start the bikeshed about whether there should be a knob, what the default should be etc. Thanks all for the feedback. From owner-freebsd-arch@freebsd.org Thu Dec 3 21:59:29 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A5867A401C4 for ; Thu, 3 Dec 2015 21:59:29 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from outbound1b.ore.mailhop.org (outbound1b.ore.mailhop.org [54.200.247.200]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7BC4011CB for ; Thu, 3 Dec 2015 21:59:29 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from ilsoft.org (unknown [73.34.117.227]) by outbound1.ore.mailhop.org (Halon Mail Gateway) with ESMTPSA; Thu, 3 Dec 2015 21:59:58 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.14.9/8.14.9) with ESMTP id tB3LxQRc019983; Thu, 3 Dec 2015 14:59:26 -0700 (MST) (envelope-from ian@freebsd.org) Message-ID: <1449179966.6214.29.camel@freebsd.org> Subject: Re: Removing build metadata, for reproducible kernel builds From: Ian Lepore To: Jonathan Anderson Cc: Warner Losh , Ed Maste , "freebsd-arch@freebsd.org" Date: Thu, 03 Dec 2015 14:59:26 -0700 In-Reply-To: References: <1449177325.6214.14.camel@freebsd.org> Content-Type: text/plain; charset="iso-8859-13" X-Mailer: Evolution 3.16.5 FreeBSD GNOME Team Port Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 21:59:29 -0000 On Thu, 2015-12-03 at 18:11 -0330, Jonathan Anderson wrote: > On 3 Dec 2015, at 17:45, Ian Lepore wrote: > > I'm curious why anyone wants this enabled by default, like... are > > we > > missing something? Does it improve freebsd-update behavior maybe? > > There is value in being able to reproduce the things you run, > especially > if you download them from somebody else (like releases or binary > packages). It's not a panacea (see "Reflections on Trusting Trust"), > but > itÿs helpful, even if you don't always do the reproduction work. The > very fact that someone *can* check a binary release for naughtiness > is a > strong incentive for many adversaries not to try their hand. > > > > If it's just for some general "reproducibility is good" philosophy > > then > > I would counter with "information is even better, so don't throw it > > away without a good reason." > > When you're building your own stuff, sure, it might help to know that > this is the kernel you built on "this machine" at "that time". When > running 10.2-RELEASE-p7, however, itÿs not very useful to know that > it > was built on amd64-builder.daemonology.net, or that the source tree > was > located at /usr/src. It *might* be useful to know that {set of > people} > all got kernels that hash to {some bit pattern} when they reproduced > the > build (like Certificate Transparency). Or, more interestingly, that > {people using some configuration} got a different result. Again, like > Certificate Transparency. :) > > > > Reproducibility is good for some people, and completely useless for > > others, and the people who need it aren't going to mind turning on > > a > > knob or two to get what they want. > > Possibly. I don't have any strong opinions on whether the default is > "reproducible" or "full of information that helps me identify busted > kernels¡, just so long as "reproducible" is available and easy to > turn > on. And my personal opinion is that it should be turned on for public > releases: I think that being able to validate the kernel is more > important than knowing what machine it was built on. > > > > > Yea I was reading things backwards. > > > > > > In the review, I suggested that if you've modified the tree > > > (which > > > the SCM > > > will tell you), then do the old format to preserve useful > > > metadata > > > that's > > > really really needed and if not to use the shorter version. When > > > you've > > > modified the tree, reproducible builds aren't a concern at all. > > > > > > > How are you going to determine what consitutes a modified tree? > > What > > you think of as modifications may be what I call my baseline > > version. > > Since we host our code in Subversion and have an official Git mirror, > how about svn status || git status? If you're basing your code off of > anything other than an official mirror, you get to deal with the > reproducibility problem yourself, but it sounds like many people in > this > camp would prefer the more verbose version string anyway. > By "we" you must mean "The FreeBSD Project" but surely you also realize that the universe of freebsd users is much larger than just the project, and not all of them use subversion or git to check out freebsd and/or manage their local copies of it. For a company building products based on freebsd, reproducibility is important, but they're quite likely to be using something other than subversion or git to manage the source. They're also quite likely to have local modifications that they consider to be part of their baseline even if they appear to be modifications from the project's repo at the same svn revision number. Either way, these folks are going to want to set some control that enforces reproducibility regardless of any build system heuristics about what to default to. For other companies or end users the important factor might be the ability to reproduce an official release, which one presumes would start with checkout out the official sources using one of the official SCMs and then a whole other set of "what constitues a modification" would apply. As someone who works for one of those "not-svn, not-git" companies I just want to make sure there's a "do what I say" knob that overrides any attempts to be smart about detecting modifications. -- Ian From owner-freebsd-arch@freebsd.org Thu Dec 3 22:04:14 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 74A26A402FF for ; Thu, 3 Dec 2015 22:04:14 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qg0-x230.google.com (mail-qg0-x230.google.com [IPv6:2607:f8b0:400d:c04::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 19ADF155A for ; Thu, 3 Dec 2015 22:04:14 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by qgcc31 with SMTP id c31so73029464qgc.3 for ; Thu, 03 Dec 2015 14:04:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=ro7MzqBDqjoZNZ5SqyOw0Yg+vJ5lPck7HdnrVbyKgmQ=; b=pEIjUhAlQdvI8HeABTt/1SWGqQQm+1aAWgSP2mWvc5rCzzaDBRyH06nG7EAVUgVRwP YmqGO2vXnsy3+v3l+mciyJ0LTX+fV02tGaoeTPYwaWKgKGT0ODUFfYssh1NYcEEl/+U/ Nk9F7NLKA9MWr0NMmeKnYy6rlt/jiWGQRUXiDHEV/1+9dSk+5mJQP5WpYUbJqNSAwv9k RPD4+SlRBdd1Q5ARTPFlp64x5TLbxo2GpIDEpR0VT4tw01PtqBDzBfublq59H+kaNW1u Qmy8hmrwjUtSo4q/FWFUWgS8BofvKPdo4XMJ7DHOARRwHFob9xggw7Yzj9axszREj4qI mNgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=ro7MzqBDqjoZNZ5SqyOw0Yg+vJ5lPck7HdnrVbyKgmQ=; b=BCHWC4oozvuQLmHNkXMNihNVAEigrwm6hI176hNFlT4BTZZRUuDKL6Q65d/bag/GDd 7UM0OfqkMJKL2ZeSOX9Q3zGKRGm23n3OdGiPgseZBk54QgcexknfdCbt7jicLZytWXT9 JTda3QwYHI6ggaOVaMR7bAsQZ/OH2DNmkXQWet0BfVU/7wn0CzrlBAX+7F4vWBUCMxxn CzDkH+8brvIkcWmWG1AuWk2k17QU4gEMybcRU1imvkMWLd3N1clVSMgUJDTANMxe1ywQ sq+fhURvAsebQGqmckTKMYMxL87QPlu2UeZZh1JYFt1DzbSeT6kMYIysPrWP3AAfnL1V b8Bg== X-Gm-Message-State: ALoCoQnnYtw5LvOZxqVGRv97ngWnV9Zkxy8u6yx1gQda30t0vVj74cN1gaK2eNC3rzG+DB0pvq4C MIME-Version: 1.0 X-Received: by 10.140.141.138 with SMTP id 132mr15018273qhn.74.1449180253172; Thu, 03 Dec 2015 14:04:13 -0800 (PST) Sender: wlosh@bsdimp.com Received: by 10.140.27.181 with HTTP; Thu, 3 Dec 2015 14:04:13 -0800 (PST) X-Originating-IP: [192.55.54.58] In-Reply-To: <1449179109.6214.19.camel@freebsd.org> References: <1449177325.6214.14.camel@freebsd.org> <1449179109.6214.19.camel@freebsd.org> Date: Thu, 3 Dec 2015 15:04:13 -0700 X-Google-Sender-Auth: owxJ0heKNfijTW2hTmgp1ez7WGU Message-ID: Subject: Re: Removing build metadata, for reproducible kernel builds From: Warner Losh To: Ian Lepore Cc: Justin Hibbits , Ed Maste , "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 22:04:14 -0000 On Thu, Dec 3, 2015 at 2:45 PM, Ian Lepore wrote: > On Thu, 2015-12-03 at 15:35 -0600, Justin Hibbits wrote: > > On Thu, Dec 3, 2015 at 3:15 PM, Ian Lepore wrote: > > > On Thu, 2015-12-03 at 12:53 -0700, Warner Losh wrote: > > > > On Thu, Dec 3, 2015 at 12:55 AM, Ed Maste > > > > wrote: > > > > > > > > > On 3 December 2015 at 05:51, Warner Losh > > > > > wrote: > > > > > > > > > > > > I noted in the review that I don=E2=80=99t like the default bei= ng no. > > > > > > > > > > > > I also don=E2=80=99t like that we=E2=80=99re growing lots of di= fferent knobs > > > > > > that need > > > > > > to be set to get a repeatable build. Let=E2=80=99s have one, or > > > > > > barring that, > > > > > > let=E2=80=99s have one that sets all the sub-knobs. > > > > > > > > > > My hope is that we'll have a reproducible build by default, and > > > > > that > > > > > *no* knobs need to be set. That's what I intend with my patch. > > > > > I can > > > > > rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if > > > > > that's > > > > > generally desired. If there's a consensus to default to > > > > > including the > > > > > metadata I'm fine with setting it in make release. > > > > > > > > > > > > I think this an unwise decision in the current form suggested. > > > > The kernel > > > > metadata has saved my butt enough times I really don't want to > > > > see it > > > > go by default. But see below for a reasonable (imho) middle > > > > ground that > > > > would be a good default. > > > > > > > > > > I'm curious why anyone wants this enabled by default, like... are > > > we > > > missing something? Does it improve freebsd-update behavior maybe? > > > > > > If it's just for some general "reproducibility is good" philosophy > > > then > > > I would counter with "information is even better, so don't throw it > > > away without a good reason." > > > > > > Reproducibility is good for some people, and completely useless for > > > others, and the people who need it aren't going to mind turning on > > > a > > > knob or two to get what they want. > > > > > > > > > > > > > I think that host and path are more worthless than date and > > > > > > time > > > > > > in many environments. Who builds it likewise. Those are all > > > > > > things > > > > > > that are likely to change between builds, yet change the > > > > > > kernel > > > > > > image. I=E2=80=99d rather see it all gone when this option is i= n > > > > > > effect. > > > > > > > > > > I don't follow -- other than the build iteration number (which > > > > > I > > > > > indeed missed), it is all gone. > > > > > > > > > > > > > Yea I was reading things backwards. > > > > > > > > In the review, I suggested that if you've modified the tree > > > > (which the SCM > > > > will tell you), then do the old format to preserve useful > > > > metadata that's > > > > really really needed and if not to use the shorter version. When > > > > you've > > > > modified the tree, reproducible builds aren't a concern at all. > > > > > > > > > > How are you going to determine what consitutes a modified tree? > > > What > > > you think of as modifications may be what I call my baseline > > > version. > > > > > > -- Ian > > > > svnversion resulting in a 'nnnnnnM'? > > > > - Justin > > > > svnversion isn't going to be able to return anything useful inside one > of my build sandboxes in which there is no hint of svn anything. > Then, in my proposal, you'd get the 'reproducible' format. We already don't include the SVN info in this case. Perhaps this isn't desirable for you, but it's my proposal and my suggestion and I'd welcome comments on it. Warner From owner-freebsd-arch@freebsd.org Fri Dec 4 01:15:23 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 67787A3FEC1 for ; Fri, 4 Dec 2015 01:15:23 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 449501D39; Fri, 4 Dec 2015 01:15:23 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 5602AB93C; Thu, 3 Dec 2015 20:15:21 -0500 (EST) From: John Baldwin To: freebsd-arch@freebsd.org Cc: Jonathan Anderson , Ian Lepore , Ed Maste Subject: Re: Removing build metadata, for reproducible kernel builds Date: Thu, 03 Dec 2015 17:14:54 -0800 Message-ID: <5836833.XOCYrAR3QT@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: References: <1449177325.6214.14.camel@freebsd.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 03 Dec 2015 20:15:21 -0500 (EST) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Dec 2015 01:15:23 -0000 On Thursday, December 03, 2015 06:11:27 PM Jonathan Anderson wrote: > > Reproducibility is good for some people, and completely useless for= > > others, and the people who need it aren't going to mind turning on = a > > knob or two to get what they want. >=20 > Possibly. I don't have any strong opinions on whether the default is=20= > "reproducible" or "full of information that helps me identify busted=20= > kernels=E2=80=9D, just so long as "reproducible" is available and eas= y to turn=20 > on. And my personal opinion is that it should be turned on for public= =20 > releases: I think that being able to validate the kernel is more=20 > important than knowing what machine it was built on. FYI, I think most folks agree that releases should be reproducible (and= in particular the release bits that are shipped). I think the primary question people have raised is what the default behavior is if someone is building a kernel themselves vs a kernel from an ISO or freebsd-upda= te. Secondly, the whole kgdb/crashinfo thing does sort of matter if we want= users to have usable crash summaries when reporting bugs on release installs. (crashinfo matters more here than kgdb -n's hackish thing, and crashinfo just needs 'version' to be unique) --=20 John Baldwin From owner-freebsd-arch@freebsd.org Fri Dec 4 02:00:19 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 17C87A369AE for ; Fri, 4 Dec 2015 02:00:19 +0000 (UTC) (envelope-from yaneurabeya@gmail.com) Received: from mail-lf0-x22d.google.com (mail-lf0-x22d.google.com [IPv6:2a00:1450:4010:c07::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B00E312D8; Fri, 4 Dec 2015 02:00:18 +0000 (UTC) (envelope-from yaneurabeya@gmail.com) Received: by lfs39 with SMTP id 39so99383287lfs.3; Thu, 03 Dec 2015 18:00:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=zh31cq5kR+gZldZU9VHURNsP9Mmd3OSyYaJj2nJwX/g=; b=Ajb1vqiQqO8CfoK/SJBPtXvFHBLXNminMRPA10+DECO23MwK2L9DSqGtk3w8x2xxA7 PqHUNJ6AGLvwS51AgkuCQCJhJEtMwHnJc4oy/h3pUKtmVE9mKWM/ipZOgXx8uYDwKbu/ 4Werp18U0fbIojFQa2hwtjgxFUC/aBA4hsdDxdTYdQa54V7f9sEnRh2sv2rHw7r04cCi AlPdYbIG7x9LGBF6QbxHyTnMofp3K59UuYpxG8AmYNCOJf+gtWzSXF/41txocVSHV/5O iGwyO1FWsTyNueWCnaicRrGmsHjDQRyqoRW7Ncw2gm1eJ/lY4TCscZgHLIPRooqL3DJI 3rsA== MIME-Version: 1.0 X-Received: by 10.25.126.5 with SMTP id z5mr7157957lfc.112.1449194416608; Thu, 03 Dec 2015 18:00:16 -0800 (PST) Received: by 10.112.219.9 with HTTP; Thu, 3 Dec 2015 18:00:16 -0800 (PST) In-Reply-To: References: Date: Thu, 3 Dec 2015 18:00:16 -0800 Message-ID: Subject: Re: Removing build metadata, for reproducible kernel builds From: NGie Cooper To: Ed Maste Cc: Warner Losh , "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Dec 2015 02:00:19 -0000 On Dec 2, 2015, at 23:55, Ed Maste wrote: On 3 December 2015 at 05:51, Warner Losh wrote: I noted in the review that I don=E2=80=99t like the default being no. I also don=E2=80=99t like that we=E2=80=99re growing lots of different knob= s that need to be set to get a repeatable build. Let=E2=80=99s have one, or barring tha= t, let=E2=80=99s have one that sets all the sub-knobs. My hope is that we'll have a reproducible build by default, and that *no* knobs need to be set. That's what I intend with my patch. I can rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's generally desired. If there's a consensus to default to including the metadata I'm fine with setting it in make release. I think that host and path are more worthless than date and time in many environments. Who builds it likewise. Those are all things that are likely to change between builds, yet change the kernel image. I=E2=80=99d rather see it all gone when this option is in effect. I don't follow -- other than the build iteration number (which I indeed missed), it is all gone. I personally like being able to debug when user A builds on machine X vs user B on machine Y =E2=80=94 because it's helped me find issues with peopl= es=E2=80=99 build environments in the past where I could have ended up pulling teeth. I think the single-knob src.conf knob approach is wrong though. Why not document how to do it with build(7) and tweak newvers.sh to do this (which drives this to begin with)? That would generalize the solution, accomplish this goal, and help $work accomplish this goal, because right now we ($work) hack newvers.sh in order to change the version information to brand the product appropriately, instead of build upon existing infrastructure, as the existing infrastructure is not flexible/documented/static. Thanks, -NGie From owner-freebsd-arch@freebsd.org Fri Dec 4 02:01:42 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1B4B3A36ACB for ; Fri, 4 Dec 2015 02:01:42 +0000 (UTC) (envelope-from yaneurabeya@gmail.com) Received: from mail-lf0-x22f.google.com (mail-lf0-x22f.google.com [IPv6:2a00:1450:4010:c07::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 99E5F1472 for ; Fri, 4 Dec 2015 02:01:41 +0000 (UTC) (envelope-from yaneurabeya@gmail.com) Received: by lfdl133 with SMTP id l133so102324373lfd.2 for ; Thu, 03 Dec 2015 18:01:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=USmldg8LqVwjQfdvuSs625eDg6mD87Y7EtzbJzsHpY4=; b=a1T0HkoFC+rRq0c+3Q7Ho750s5ivZblqzGsM52zfZ0BS77yRncRE8NrNYlPVHLTcb+ Jvm2IyBZDXOR375s9EAJVeajFsQPzEP1XWSbzZRkw/ShkUil4YVPxp8fwIu/dWo+FFWh OjViA3vVVoFCtHYor+EnS43/FT37T2DvfVhAmoO5kbhZoRiWJx2Ka4ZWzRkeNwIJNvEh QnuUbK9aZRJ+DdLiIVP0b7Zz1RdRn2WXy0EmoRl6rz/I1XDhkJRoFcMJ0JvSqTbwHUMX jXUDauxiiLCfn/AFd8hLDeo3gp2QQe8e/dsrmZ6T++KmPBkpeGS8UyoFRMsfh93FPElM jAsQ== MIME-Version: 1.0 X-Received: by 10.25.218.9 with SMTP id r9mr7109533lfg.138.1449194499418; Thu, 03 Dec 2015 18:01:39 -0800 (PST) Received: by 10.112.219.9 with HTTP; Thu, 3 Dec 2015 18:01:39 -0800 (PST) In-Reply-To: References: Date: Thu, 3 Dec 2015 18:01:39 -0800 Message-ID: Subject: Re: Removing build metadata, for reproducible kernel builds From: NGie Cooper To: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Dec 2015 02:01:42 -0000 On Thu, Dec 3, 2015 at 6:00 PM, NGie Cooper wrote: ... Sorry. Send the same email twice by accident >_>. From owner-freebsd-arch@freebsd.org Fri Dec 4 14:59:02 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8B626A40822 for ; Fri, 4 Dec 2015 14:59:02 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay04.ispgateway.de (smtprelay04.ispgateway.de [80.67.31.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4D6A91271 for ; Fri, 4 Dec 2015 14:59:02 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from [78.35.147.1] (helo=fabiankeil.de) by smtprelay04.ispgateway.de with esmtpsa (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.84) (envelope-from ) id 1a4raT-0005qy-4d for freebsd-arch@freebsd.org; Fri, 04 Dec 2015 15:43:53 +0100 Date: Fri, 4 Dec 2015 15:43:08 +0100 From: Fabian Keil To: Subject: Re: Removing build metadata, for reproducible kernel builds Message-ID: <20151204154308.296841c8@fabiankeil.de> In-Reply-To: References: Reply-To: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/u.WM4vM9v0wpCohIlO2Zh9C"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Dec 2015 14:59:02 -0000 --Sig_/u.WM4vM9v0wpCohIlO2Zh9C Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Ed Maste wrote: > The main issue currently preventing kernel builds from being > reproducible[1] is the build metadata itself that's included (time, > user, host, build path). In order to make the kernel build > reproducible I plan to remove these by default, and add a src.conf > knob to enable them for developers who want them in their own builds. To make the ElectroBSD build (kernel, world and release) reproducible the time, user and host can be overwritten. To make this more convenient the user can do this through a shell script (/usr/src/reproduce.sh) which reads the values from a small config file (/usr/src/reproduce.conf) which is included in the src.txz. Example content: | BUILD=3DElectroBSD-r291706-29246dc | EPOCH=3D1449163375 Currently the build path can't be changed between builds, mainly because I expect most users to reproduce the build using a jail in which case this limitation doesn't seem to matter. The relevant patches (minus the ones I overlooked) are now available at: https://www.fabiankeil.de/sourcecode/electrobsd/reproducible-build-goo-r291= 706-29246dc.diff Due to the auto-untainting (also done by reproduce.sh) this is not expected to build with vanilla FreeBSD, but if that code is disabled it might work. If anyone with a freebsd.org address and an OpenPGP key is interested in the whole ElectroBSD patchset (which contains security fixes that were (mostly) sent to freebsd-so@ months ago but have not been addressed yet) I'll provide it upon request. > The user-facing effect of this is that the kern.version sysctl no > longer conveys this information, and uname -a changes from something > like: Allowing to overwrite the values avoids this problem. Fabian --Sig_/u.WM4vM9v0wpCohIlO2Zh9C Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlZhpnwACgkQBYqIVf93VJ2i0QCgkIEHsXVgFpYINxMm4rCVheAc zNcAoMjtd1GB8U2o5RozG6ojdSIJwirQ =YqNY -----END PGP SIGNATURE----- --Sig_/u.WM4vM9v0wpCohIlO2Zh9C-- From owner-freebsd-arch@freebsd.org Sat Dec 5 05:29:46 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9C8F1A41423 for ; Sat, 5 Dec 2015 05:29:46 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 812841249 for ; Sat, 5 Dec 2015 05:29:46 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: by mailman.ysv.freebsd.org (Postfix) id 7FC8AA41421; Sat, 5 Dec 2015 05:29:46 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 654B8A41420 for ; Sat, 5 Dec 2015 05:29:46 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "cell.glebius.int.ru", Issuer "cell.glebius.int.ru" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 9823E1246; Sat, 5 Dec 2015 05:29:44 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.15.2/8.15.2) with ESMTPS id tB55TeY3052801 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 5 Dec 2015 08:29:41 +0300 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.15.2/8.15.2/Submit) id tB55Teh4052800; Sat, 5 Dec 2015 08:29:40 +0300 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Sat, 5 Dec 2015 08:29:40 +0300 From: Gleb Smirnoff To: jeff@FreeBSD.org, alc@FreeBSD.org, kib@FreeBSD.org Cc: scottl@FreeBSD.org, pho@FreeBSD.org, arch@FreeBSD.org Subject: new vm_pager_get_pages() KPI, round 3 Message-ID: <20151205052940.GJ42565@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="OwLcNYc0lM97+oe1" Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 Dec 2015 05:29:46 -0000 --OwLcNYc0lM97+oe1 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, [first paragraph for arch subscribers, To: recepients may skip] This patch is kinda a prerequisite for the non-blocking sendfile(2), that was jointly developed by NGINX and Netflix in 2014 and has been running in Netflix production for a year, serving 35% of the whole North America (US, Canada, Mexico) Internet traffic. Technically, the new sendfile(2) doesn't require the new vm_pager_get_pages() KPI. We currently run it on the old KPI. However, kib@ suggested that we are abusing the KPI, carefully using its edge cases. To address this critic, back in spring, I suggested a KPI, where vm_pager_get_pages() offers all-or-none approach to the array of pages. Again, kib@ wasn't satisfied, as for "the main user" of vm_pager_get_pages, the vm_fault(), all-or-none approach isn't optimal. The problem was slowly debated through the summer. And then in October jeff@ suggested yet another extension of the KPI, which I have implemented and it is described below. [for those interested in new sendfile(2), skip to the last paragraph, for those willing to review new pager KPI, read on] The new KPI offers this prototype for vm_pager_get_pages(): int vm_pager_get_pages(vm_object_t object, vm_page_t pages[], int count, int *rbehind, in *rahead); Where "count" stands for number of pages in the array. The rbehind and rahead if not NULL specify how many pages the caller is willing to allow the pager to pre-cache, if the pager can. Pager doesn't promise to do any read behind or read ahead. If it does, then only the pager is responsive for grabbing, busying, unbusying and queueing these pages. It also writes the actual values of completed read ahead and read behind back to the pointers. Pager promises to page in "count" pages or fail. Pager expects the pages to be busied, and returns them busied. For a multi page requests, the pager demands that the region is a valid region, that exists in the pager, which can be checked by preceding call to vm_pager_haspage(). For single page requests, there is no such demand. The net result is a win for both vm_fault() and for new sendfile(). The vm_fault() no longer needs to do prepatory vm_pager_haspage(), which removes one I/O operation. The logic for read ahead/behind, which is strongly UFS/EXT-centric, moves into vnode_pager.c. So we no longer do useless operations when having a fault on ZFS. The vm_fault() now knows precisely the read ahead that happened, when updates fs.entry->next_read index. This reduces number of hardfaults by a tiny fraction (measured building world tree). The new sendfile() has a stronger KPI, that doesn't unbusy pages, that sendfile() needs to be kept busied. Also, the new KPI removes some ugly edges. E.g., since the old KPI used to unbusy and free pages in the array in case of an error, the pages could not be wired. However, there are places in kernel where we want to page in into a wired page. These places simply violated the assumption, relying on lack of errors in the pager. Moreover, the swap pager has a special function to skip wired pages, while doing the freeing sweep, to avoid hitting assertion. That means passing wired pages to swapper is kinda OK, while to any other pager it is not. So, we end up with vm_pager_get_pages() being not pager agnostic, while it is designed to be so. Now this is fixed. Peter, if you can, please try the patch in your tests. I already did that, but you are always better at this :) [the new sendfile] As already mentioned, Netflix runs new sendfile(2) in production, and it is one of key components, that allows us to serve over 80 Gbit/s from a single box. We strongly want to contribute this code and see it in FreeBSD 11.0-RELEASE. I believe, many FreeBSD users, who run it as a content server, also want that. Although the code was production ready back in 2014, it is still not in head. The reason is the drama with vm_pager_get_pages() KPI. I was very patient during the whole 2015. Sometimes I was waiting for a feedback from guys in "To:" for several weeks. I was very gentle to not commit anything to sys/vm without a review. Now we've got only 2 months left before the 11.0-RELEASE cycle. And since I want the new sendfile be there in 11.0, I'm going to push that strongly, putting off all my patience and gentleness. I won't buy any dislikes on the KPI again, since this is a third round of compromises from my side. I will wait only one week for pre-commit reviews, and then all reviews and asjustments are post-commit. -- Totus tuus, Glebius. --OwLcNYc0lM97+oe1 Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="vm_pager_get_pages.diff" Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c =================================================================== --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c (revision 291639) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c (working copy) @@ -5762,12 +5762,13 @@ ioflags(int ioflags) } static int -zfs_getpages(struct vnode *vp, vm_page_t *m, int count, int reqpage) +zfs_getpages(struct vnode *vp, vm_page_t *m, int count, int *rbehind, + int *rahead) { znode_t *zp = VTOZ(vp); zfsvfs_t *zfsvfs = zp->z_zfsvfs; objset_t *os = zp->z_zfsvfs->z_os; - vm_page_t mfirst, mlast, mreq; + vm_page_t mlast; vm_object_t object; caddr_t va; struct sf_buf *sf; @@ -5776,82 +5777,46 @@ static int vm_pindex_t reqstart, reqend; int pcount, lsize, reqsize, size; + if (rbehind) + *rbehind = 0; + if (rahead) + *rahead = 0; + ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zp); pcount = OFF_TO_IDX(round_page(count)); - mreq = m[reqpage]; - object = mreq->object; - error = 0; - if (pcount > 1 && zp->z_blksz > PAGESIZE) { - startoff = rounddown(IDX_TO_OFF(mreq->pindex), zp->z_blksz); - reqstart = OFF_TO_IDX(round_page(startoff)); - if (reqstart < m[0]->pindex) - reqstart = 0; - else - reqstart = reqstart - m[0]->pindex; - endoff = roundup(IDX_TO_OFF(mreq->pindex) + PAGE_SIZE, - zp->z_blksz); - reqend = OFF_TO_IDX(trunc_page(endoff)) - 1; - if (reqend > m[pcount - 1]->pindex) - reqend = m[pcount - 1]->pindex; - reqsize = reqend - m[reqstart]->pindex + 1; - KASSERT(reqstart <= reqpage && reqpage < reqstart + reqsize, - ("reqpage beyond [reqstart, reqstart + reqsize[ bounds")); - } else { - reqstart = reqpage; - reqsize = 1; - } - mfirst = m[reqstart]; - mlast = m[reqstart + reqsize - 1]; - zfs_vmobject_wlock(object); - - for (i = 0; i < reqstart; i++) { - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); - } - for (i = reqstart + reqsize; i < pcount; i++) { - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); - } - - if (mreq->valid && reqsize == 1) { - if (mreq->valid != VM_PAGE_BITS_ALL) - vm_page_zero_invalid(mreq, TRUE); + if (m[pcount - 1]->valid != 0 && --pcount == 0) { zfs_vmobject_wunlock(object); ZFS_EXIT(zfsvfs); return (zfs_vm_pagerret_ok); } - PCPU_INC(cnt.v_vnodein); - PCPU_ADD(cnt.v_vnodepgsin, reqsize); + object = m[0]->object; + mlast = m[pcount - 1]; - if (IDX_TO_OFF(mreq->pindex) >= object->un_pager.vnp.vnp_size) { - for (i = reqstart; i < reqstart + reqsize; i++) { - if (i != reqpage) { - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); - } - } + if (IDX_TO_OFF(mlast->pindex) >= + object->un_pager.vnp.vnp_size) { zfs_vmobject_wunlock(object); ZFS_EXIT(zfsvfs); return (zfs_vm_pagerret_bad); } + PCPU_INC(cnt.v_vnodein); + PCPU_ADD(cnt.v_vnodepgsin, reqsize); + lsize = PAGE_SIZE; if (IDX_TO_OFF(mlast->pindex) + lsize > object->un_pager.vnp.vnp_size) - lsize = object->un_pager.vnp.vnp_size - IDX_TO_OFF(mlast->pindex); - + lsize = object->un_pager.vnp.vnp_size - + IDX_TO_OFF(mlast->pindex); zfs_vmobject_wunlock(object); - for (i = reqstart; i < reqstart + reqsize; i++) { + error = 0; + for (i = 0; i < pcount; i++) { size = PAGE_SIZE; - if (i == (reqstart + reqsize - 1)) + if (i == pcount - 1) size = lsize; va = zfs_map_page(m[i], &sf); error = dmu_read(os, zp->z_id, IDX_TO_OFF(m[i]->pindex), @@ -5860,21 +5825,15 @@ static int bzero(va + size, PAGE_SIZE - size); zfs_unmap_page(sf); if (error != 0) - break; + goto out; } zfs_vmobject_wlock(object); - - for (i = reqstart; i < reqstart + reqsize; i++) { - if (!error) - m[i]->valid = VM_PAGE_BITS_ALL; - KASSERT(m[i]->dirty == 0, ("zfs_getpages: page %p is dirty", m[i])); - if (i != reqpage) - vm_page_readahead_finish(m[i]); - } - + for (i = 0; i < pcount; i++) + m[i]->valid = VM_PAGE_BITS_ALL; zfs_vmobject_wunlock(object); +out: ZFS_ACCESSTIME_STAMP(zfsvfs, zp); ZFS_EXIT(zfsvfs); return (error ? zfs_vm_pagerret_error : zfs_vm_pagerret_ok); @@ -5886,11 +5845,13 @@ zfs_freebsd_getpages(ap) struct vnode *a_vp; vm_page_t *a_m; int a_count; - int a_reqpage; + int *a_rbehind; + int *a_rahead; } */ *ap; { - return (zfs_getpages(ap->a_vp, ap->a_m, ap->a_count, ap->a_reqpage)); + return (zfs_getpages(ap->a_vp, ap->a_m, ap->a_count, ap->a_rbehind, + ap->a_rahead)); } static int Index: sys/dev/drm2/i915/i915_gem.c =================================================================== --- sys/dev/drm2/i915/i915_gem.c (revision 291639) +++ sys/dev/drm2/i915/i915_gem.c (working copy) @@ -4338,7 +4338,7 @@ i915_gem_wire_page(vm_object_t object, vm_pindex_t page = vm_page_grab(object, pindex, VM_ALLOC_NORMAL); if (page->valid != VM_PAGE_BITS_ALL) { if (vm_pager_has_page(object, pindex, NULL, NULL)) { - rv = vm_pager_get_pages(object, &page, 1, 0); + rv = vm_pager_get_pages(object, &page, 1, NULL, NULL); if (rv != VM_PAGER_OK) { vm_page_lock(page); vm_page_free(page); Index: sys/dev/drm2/ttm/ttm_tt.c =================================================================== --- sys/dev/drm2/ttm/ttm_tt.c (revision 291639) +++ sys/dev/drm2/ttm/ttm_tt.c (working copy) @@ -291,7 +291,8 @@ int ttm_tt_swapin(struct ttm_tt *ttm) from_page = vm_page_grab(obj, i, VM_ALLOC_NORMAL); if (from_page->valid != VM_PAGE_BITS_ALL) { if (vm_pager_has_page(obj, i, NULL, NULL)) { - rv = vm_pager_get_pages(obj, &from_page, 1, 0); + rv = vm_pager_get_pages(obj, &from_page, 1, + NULL, NULL); if (rv != VM_PAGER_OK) { vm_page_lock(from_page); vm_page_free(from_page); Index: sys/dev/md/md.c =================================================================== --- sys/dev/md/md.c (revision 291639) +++ sys/dev/md/md.c (working copy) @@ -847,7 +847,8 @@ mdstart_swap(struct md_s *sc, struct bio *bp) if (m->valid == VM_PAGE_BITS_ALL) rv = VM_PAGER_OK; else - rv = vm_pager_get_pages(sc->object, &m, 1, 0); + rv = vm_pager_get_pages(sc->object, &m, 1, + NULL, NULL); if (rv == VM_PAGER_ERROR) { vm_page_xunbusy(m); break; @@ -870,7 +871,8 @@ mdstart_swap(struct md_s *sc, struct bio *bp) } } else if (bp->bio_cmd == BIO_WRITE) { if (len != PAGE_SIZE && m->valid != VM_PAGE_BITS_ALL) - rv = vm_pager_get_pages(sc->object, &m, 1, 0); + rv = vm_pager_get_pages(sc->object, &m, 1, + NULL, NULL); else rv = VM_PAGER_OK; if (rv == VM_PAGER_ERROR) { @@ -886,7 +888,8 @@ mdstart_swap(struct md_s *sc, struct bio *bp) m->valid = VM_PAGE_BITS_ALL; } else if (bp->bio_cmd == BIO_DELETE) { if (len != PAGE_SIZE && m->valid != VM_PAGE_BITS_ALL) - rv = vm_pager_get_pages(sc->object, &m, 1, 0); + rv = vm_pager_get_pages(sc->object, &m, 1, + NULL, NULL); else rv = VM_PAGER_OK; if (rv == VM_PAGER_ERROR) { Index: sys/fs/fuse/fuse_vnops.c =================================================================== --- sys/fs/fuse/fuse_vnops.c (revision 291639) +++ sys/fs/fuse/fuse_vnops.c (working copy) @@ -1753,6 +1753,10 @@ fuse_vnop_getpages(struct vop_getpages_args *ap) cred = curthread->td_ucred; /* XXX */ pages = ap->a_m; count = ap->a_count; + if (ap->a_rbehind) + *ap->a_rbehind = 0; + if (ap->a_rahead) + *ap->a_rahead = 0; if (!fsess_opt_mmap(vnode_mount(vp))) { FS_DEBUG("called on non-cacheable vnode??\n"); @@ -1761,26 +1765,21 @@ fuse_vnop_getpages(struct vop_getpages_args *ap) npages = btoc(count); /* - * If the requested page is partially valid, just return it and - * allow the pager to zero-out the blanks. Partially valid pages - * can only occur at the file EOF. + * If the last page is partially valid, just return it and allow + * the pager to zero-out the blanks. Partially valid pages can + * only occur at the file EOF. + * + * XXXGL: is that true for FUSE, which is a local filesystem, + * but still somewhat disconnected from the kernel? */ - VM_OBJECT_WLOCK(vp->v_object); - fuse_vm_page_lock_queues(); - if (pages[ap->a_reqpage]->valid != 0) { - for (i = 0; i < npages; ++i) { - if (i != ap->a_reqpage) { - fuse_vm_page_lock(pages[i]); - vm_page_free(pages[i]); - fuse_vm_page_unlock(pages[i]); - } + if (pages[npages - 1]->valid != 0) { + if (--npages == 0) { + VM_OBJECT_WUNLOCK(vp->v_object); + return (VM_PAGER_OK); } - fuse_vm_page_unlock_queues(); - VM_OBJECT_WUNLOCK(vp->v_object); - return 0; - } - fuse_vm_page_unlock_queues(); + count = npages << PAGE_SHIFT; + } VM_OBJECT_WUNLOCK(vp->v_object); /* @@ -1811,17 +1810,6 @@ fuse_vnop_getpages(struct vop_getpages_args *ap) if (error && (uio.uio_resid == count)) { FS_DEBUG("error %d\n", error); - VM_OBJECT_WLOCK(vp->v_object); - fuse_vm_page_lock_queues(); - for (i = 0; i < npages; ++i) { - if (i != ap->a_reqpage) { - fuse_vm_page_lock(pages[i]); - vm_page_free(pages[i]); - fuse_vm_page_unlock(pages[i]); - } - } - fuse_vm_page_unlock_queues(); - VM_OBJECT_WUNLOCK(vp->v_object); return VM_PAGER_ERROR; } /* @@ -1862,8 +1850,6 @@ fuse_vnop_getpages(struct vop_getpages_args *ap) */ ; } - if (i != ap->a_reqpage) - vm_page_readahead_finish(m); } fuse_vm_page_unlock_queues(); VM_OBJECT_WUNLOCK(vp->v_object); Index: sys/fs/nfsclient/nfs_clbio.c =================================================================== --- sys/fs/nfsclient/nfs_clbio.c (revision 291639) +++ sys/fs/nfsclient/nfs_clbio.c (working copy) @@ -101,6 +101,10 @@ ncl_getpages(struct vop_getpages_args *ap) nmp = VFSTONFS(vp->v_mount); pages = ap->a_m; count = ap->a_count; + if (ap->a_rbehind) + *ap->a_rbehind = 0; + if (ap->a_rahead) + *ap->a_rahead = 0; if ((object = vp->v_object) == NULL) { ncl_printf("nfs_getpages: called with non-merged cache vnode??\n"); @@ -132,12 +136,18 @@ ncl_getpages(struct vop_getpages_args *ap) * If the requested page is partially valid, just return it and * allow the pager to zero-out the blanks. Partially valid pages * can only occur at the file EOF. + * + * XXXGL: is that true for NFS, where short read can occur??? */ - if (pages[ap->a_reqpage]->valid != 0) { - vm_pager_free_nonreq(object, pages, ap->a_reqpage, npages, - FALSE); - return (VM_PAGER_OK); + VM_OBJECT_WLOCK(object); + if (pages[npages - 1]->valid != 0) { + if (--npages == 0) { + VM_OBJECT_WUNLOCK(object); + return (VM_PAGER_OK); + } + count = npages << PAGE_SHIFT; } + VM_OBJECT_WUNLOCK(object); /* * We use only the kva address for the buffer, but this is extremely @@ -167,8 +177,6 @@ ncl_getpages(struct vop_getpages_args *ap) if (error && (uio.uio_resid == count)) { ncl_printf("nfs_getpages: error %d\n", error); - vm_pager_free_nonreq(object, pages, ap->a_reqpage, npages, - FALSE); return (VM_PAGER_ERROR); } @@ -212,8 +220,6 @@ ncl_getpages(struct vop_getpages_args *ap) */ ; } - if (i != ap->a_reqpage) - vm_page_readahead_finish(m); } VM_OBJECT_WUNLOCK(object); return (0); Index: sys/fs/smbfs/smbfs_io.c =================================================================== --- sys/fs/smbfs/smbfs_io.c (revision 291639) +++ sys/fs/smbfs/smbfs_io.c (working copy) @@ -424,7 +424,7 @@ smbfs_getpages(ap) #ifdef SMBFS_RWGENERIC return vop_stdgetpages(ap); #else - int i, error, nextoff, size, toff, npages, count, reqpage; + int i, error, nextoff, size, toff, npages, count; struct uio uio; struct iovec iov; vm_offset_t kva; @@ -436,7 +436,7 @@ smbfs_getpages(ap) struct smbnode *np; struct smb_cred *scred; vm_object_t object; - vm_page_t *pages, m; + vm_page_t *pages; vp = ap->a_vp; if ((object = vp->v_object) == NULL) { @@ -451,26 +451,25 @@ smbfs_getpages(ap) pages = ap->a_m; count = ap->a_count; npages = btoc(count); - reqpage = ap->a_reqpage; + if (ap->a_rbehind) + *ap->a_rbehind = 0; + if (ap->a_rahead) + *ap->a_rahead = 0; /* * If the requested page is partially valid, just return it and * allow the pager to zero-out the blanks. Partially valid pages * can only occur at the file EOF. + * + * XXXGL: is that true for SMB filesystem? */ - m = pages[reqpage]; - VM_OBJECT_WLOCK(object); - if (m->valid != 0) { - for (i = 0; i < npages; ++i) { - if (i != reqpage) { - vm_page_lock(pages[i]); - vm_page_free(pages[i]); - vm_page_unlock(pages[i]); - } + if (pages[npages - 1]->valid != 0) { + if (--npages == 0) { + VM_OBJECT_WUNLOCK(object); + return (VM_PAGER_OK); } - VM_OBJECT_WUNLOCK(object); - return 0; + count = npages << PAGE_SHIFT; } VM_OBJECT_WUNLOCK(object); @@ -500,22 +499,14 @@ smbfs_getpages(ap) relpbuf(bp, &smbfs_pbuf_freecnt); - VM_OBJECT_WLOCK(object); if (error && (uio.uio_resid == count)) { printf("smbfs_getpages: error %d\n",error); - for (i = 0; i < npages; i++) { - if (reqpage != i) { - vm_page_lock(pages[i]); - vm_page_free(pages[i]); - vm_page_unlock(pages[i]); - } - } - VM_OBJECT_WUNLOCK(object); return VM_PAGER_ERROR; } size = count - uio.uio_resid; + VM_OBJECT_WLOCK(object); for (i = 0, toff = 0; i < npages; i++, toff = nextoff) { vm_page_t m; nextoff = toff + PAGE_SIZE; @@ -544,9 +535,6 @@ smbfs_getpages(ap) */ ; } - - if (i != reqpage) - vm_page_readahead_finish(m); } VM_OBJECT_WUNLOCK(object); return 0; Index: sys/fs/tmpfs/tmpfs_subr.c =================================================================== --- sys/fs/tmpfs/tmpfs_subr.c (revision 291639) +++ sys/fs/tmpfs/tmpfs_subr.c (working copy) @@ -1370,7 +1370,8 @@ retry: VM_OBJECT_WLOCK(uobj); goto retry; } else if (m->valid != VM_PAGE_BITS_ALL) - rv = vm_pager_get_pages(uobj, &m, 1, 0); + rv = vm_pager_get_pages(uobj, &m, 1, + NULL, NULL); else /* A cached page was reactivated. */ rv = VM_PAGER_OK; Index: sys/kern/kern_exec.c =================================================================== --- sys/kern/kern_exec.c (revision 291639) +++ sys/kern/kern_exec.c (working copy) @@ -950,8 +950,7 @@ int exec_map_first_page(imgp) struct image_params *imgp; { - int rv, i; - int initial_pagein; + int rv, i, after, initial_pagein; vm_page_t ma[VM_INITIAL_PAGEIN]; vm_object_t object; @@ -967,9 +966,18 @@ exec_map_first_page(imgp) #endif ma[0] = vm_page_grab(object, 0, VM_ALLOC_NORMAL); if (ma[0]->valid != VM_PAGE_BITS_ALL) { - initial_pagein = VM_INITIAL_PAGEIN; - if (initial_pagein > object->size) - initial_pagein = object->size; + if (!vm_pager_has_page(object, 0, NULL, &after)) { + vm_page_lock(ma[0]); + vm_page_free(ma[0]); + vm_page_unlock(ma[0]); + vm_page_xunbusy(ma[0]); + VM_OBJECT_WUNLOCK(object); + return (EIO); + } + initial_pagein = min(after, VM_INITIAL_PAGEIN); + KASSERT(initial_pagein <= object->size, + ("%s: initial_pagein %d object->size %ju", + __func__, initial_pagein, (uintmax_t )object->size)); for (i = 1; i < initial_pagein; i++) { if ((ma[i] = vm_page_next(ma[i - 1])) != NULL) { if (ma[i]->valid) @@ -984,14 +992,19 @@ exec_map_first_page(imgp) } } initial_pagein = i; - rv = vm_pager_get_pages(object, ma, initial_pagein, 0); + rv = vm_pager_get_pages(object, ma, initial_pagein, NULL, NULL); if (rv != VM_PAGER_OK) { - vm_page_lock(ma[0]); - vm_page_free(ma[0]); - vm_page_unlock(ma[0]); + for (i = 0; i < initial_pagein; i++) { + vm_page_lock(ma[i]); + vm_page_free(ma[i]); + vm_page_unlock(ma[i]); + vm_page_xunbusy(ma[i]); + } VM_OBJECT_WUNLOCK(object); return (EIO); } + for (i = 1; i < initial_pagein; i++) + vm_page_readahead_finish(ma[i]); } vm_page_xunbusy(ma[0]); vm_page_lock(ma[0]); Index: sys/kern/uipc_shm.c =================================================================== --- sys/kern/uipc_shm.c (revision 291639) +++ sys/kern/uipc_shm.c (working copy) @@ -189,7 +189,7 @@ uiomove_object_page(vm_object_t obj, size_t len, s m = vm_page_grab(obj, idx, VM_ALLOC_NORMAL); if (m->valid != VM_PAGE_BITS_ALL) { if (vm_pager_has_page(obj, idx, NULL, NULL)) { - rv = vm_pager_get_pages(obj, &m, 1, 0); + rv = vm_pager_get_pages(obj, &m, 1, NULL, NULL); if (rv != VM_PAGER_OK) { printf( "uiomove_object: vm_obj %p idx %jd valid %x pager error %d\n", @@ -460,7 +460,7 @@ retry: goto retry; } else if (m->valid != VM_PAGE_BITS_ALL) rv = vm_pager_get_pages(object, &m, 1, - 0); + NULL, NULL); else /* A cached page was reactivated. */ rv = VM_PAGER_OK; Index: sys/kern/uipc_syscalls.c =================================================================== --- sys/kern/uipc_syscalls.c (revision 291639) +++ sys/kern/uipc_syscalls.c (working copy) @@ -2033,7 +2033,7 @@ sendfile_readpage(vm_object_t obj, struct vnode *v VM_OBJECT_WLOCK(obj); } else { if (vm_pager_has_page(obj, pindex, NULL, NULL)) { - rv = vm_pager_get_pages(obj, &m, 1, 0); + rv = vm_pager_get_pages(obj, &m, 1, NULL, NULL); SFSTAT_INC(sf_iocnt); if (rv != VM_PAGER_OK) { vm_page_lock(m); Index: sys/kern/vfs_default.c =================================================================== --- sys/kern/vfs_default.c (revision 291639) +++ sys/kern/vfs_default.c (working copy) @@ -731,12 +731,13 @@ vop_stdgetpages(ap) struct vnode *a_vp; vm_page_t *a_m; int a_count; - int a_reqpage; + int *a_rbehind; + int *a_rahead; } */ *ap; { return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, - ap->a_count, ap->a_reqpage, NULL, NULL); + ap->a_count, ap->a_rbehind, ap->a_rahead, NULL, NULL); } static int @@ -744,8 +745,9 @@ vop_stdgetpages_async(struct vop_getpages_async_ar { int error; - error = VOP_GETPAGES(ap->a_vp, ap->a_m, ap->a_count, ap->a_reqpage); - ap->a_iodone(ap->a_arg, ap->a_m, ap->a_reqpage, error); + error = VOP_GETPAGES(ap->a_vp, ap->a_m, ap->a_count, ap->a_rbehind, + ap->a_rahead); + ap->a_iodone(ap->a_arg, ap->a_m, ap->a_count, error); return (error); } Index: sys/kern/vnode_if.src =================================================================== --- sys/kern/vnode_if.src (revision 291639) +++ sys/kern/vnode_if.src (working copy) @@ -473,7 +473,8 @@ vop_getpages { IN struct vnode *vp; IN vm_page_t *m; IN int count; - IN int reqpage; + IN int *rbehind; + IN int *rahead; }; @@ -483,7 +484,8 @@ vop_getpages_async { IN struct vnode *vp; IN vm_page_t *m; IN int count; - IN int reqpage; + IN int *rbehind; + IN int *rahead; IN vop_getpages_iodone_t *iodone; IN void *arg; }; Index: sys/sys/buf.h =================================================================== --- sys/sys/buf.h (revision 291639) +++ sys/sys/buf.h (working copy) @@ -122,14 +122,13 @@ struct buf { struct ucred *b_rcred; /* Read credentials reference. */ struct ucred *b_wcred; /* Write credentials reference. */ union { - TAILQ_ENTRY(buf) bu_freelist; /* (Q) */ + TAILQ_ENTRY(buf) b_freelist; /* (Q) */ struct { - void (*pg_iodone)(void *, vm_page_t *, int, int); - int pg_reqpage; - } bu_pager; - } b_union; -#define b_freelist b_union.bu_freelist -#define b_pager b_union.bu_pager + void (*b_pgiodone)(void *, vm_page_t *, int, int); + int b_pgbefore; + int b_pgafter; + }; + }; union cluster_info { TAILQ_HEAD(cluster_list_head, buf) cluster_head; TAILQ_ENTRY(buf) cluster_entry; Index: sys/vm/default_pager.c =================================================================== --- sys/vm/default_pager.c (revision 291639) +++ sys/vm/default_pager.c (working copy) @@ -56,7 +56,7 @@ __FBSDID("$FreeBSD$"); static vm_object_t default_pager_alloc(void *, vm_ooffset_t, vm_prot_t, vm_ooffset_t, struct ucred *); static void default_pager_dealloc(vm_object_t); -static int default_pager_getpages(vm_object_t, vm_page_t *, int, int); +static int default_pager_getpages(vm_object_t, vm_page_t *, int, int *, int *); static void default_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *); static boolean_t default_pager_haspage(vm_object_t, vm_pindex_t, int *, @@ -122,13 +122,16 @@ default_pager_dealloc(object) * see a vm_page with assigned swap here. */ static int -default_pager_getpages(object, m, count, reqpage) - vm_object_t object; - vm_page_t *m; - int count; - int reqpage; +default_pager_getpages(vm_object_t object, vm_page_t *m, int count, + int *rbehind, int *rahead) { - return VM_PAGER_FAIL; + + if (rbehind) + *rbehind = 0; + if (rahead) + *rahead = 0; + + return (VM_PAGER_FAIL); } /* Index: sys/vm/device_pager.c =================================================================== --- sys/vm/device_pager.c (revision 291639) +++ sys/vm/device_pager.c (working copy) @@ -59,7 +59,7 @@ static void dev_pager_init(void); static vm_object_t dev_pager_alloc(void *, vm_ooffset_t, vm_prot_t, vm_ooffset_t, struct ucred *); static void dev_pager_dealloc(vm_object_t); -static int dev_pager_getpages(vm_object_t, vm_page_t *, int, int); +static int dev_pager_getpages(vm_object_t, vm_page_t *, int, int *, int *); static void dev_pager_putpages(vm_object_t, vm_page_t *, int, int, int *); static boolean_t dev_pager_haspage(vm_object_t, vm_pindex_t, int *, int *); static void dev_pager_free_page(vm_object_t object, vm_page_t m); @@ -257,28 +257,33 @@ dev_pager_dealloc(vm_object_t object) } static int -dev_pager_getpages(vm_object_t object, vm_page_t *ma, int count, int reqpage) +dev_pager_getpages(vm_object_t object, vm_page_t *ma, int count, int *rbehind, + int *rahead) { int error; + /* Since our haspage reports zero after/before, the count is 1. */ + KASSERT(count == 1, ("%s: count %d", __func__, count)); VM_OBJECT_ASSERT_WLOCKED(object); error = object->un_pager.devp.ops->cdev_pg_fault(object, - IDX_TO_OFF(ma[reqpage]->pindex), PROT_READ, &ma[reqpage]); + IDX_TO_OFF(ma[0]->pindex), PROT_READ, &ma[0]); VM_OBJECT_ASSERT_WLOCKED(object); - vm_pager_free_nonreq(object, ma, reqpage, count, TRUE); - if (error == VM_PAGER_OK) { KASSERT((object->type == OBJT_DEVICE && - (ma[reqpage]->oflags & VPO_UNMANAGED) != 0) || + (ma[0]->oflags & VPO_UNMANAGED) != 0) || (object->type == OBJT_MGTDEVICE && - (ma[reqpage]->oflags & VPO_UNMANAGED) == 0), - ("Wrong page type %p %p", ma[reqpage], object)); + (ma[0]->oflags & VPO_UNMANAGED) == 0), + ("Wrong page type %p %p", ma[0], object)); if (object->type == OBJT_DEVICE) { TAILQ_INSERT_TAIL(&object->un_pager.devp.devp_pglist, - ma[reqpage], plinks.q); + ma[0], plinks.q); } + if (rbehind) + *rbehind = 0; + if (rahead) + *rahead = 0; } return (error); Index: sys/vm/phys_pager.c =================================================================== --- sys/vm/phys_pager.c (revision 291639) +++ sys/vm/phys_pager.c (working copy) @@ -139,7 +139,8 @@ phys_pager_dealloc(vm_object_t object) * Fill as many pages as vm_fault has allocated for us. */ static int -phys_pager_getpages(vm_object_t object, vm_page_t *m, int count, int reqpage) +phys_pager_getpages(vm_object_t object, vm_page_t *m, int count, int *rbehind, + int *rahead) { int i; @@ -154,14 +155,11 @@ static int ("phys_pager_getpages: partially valid page %p", m[i])); KASSERT(m[i]->dirty == 0, ("phys_pager_getpages: dirty page %p", m[i])); - /* The requested page must remain busy, the others not. */ - if (i == reqpage) { - vm_page_lock(m[i]); - vm_page_flash(m[i]); - vm_page_unlock(m[i]); - } else - vm_page_xunbusy(m[i]); } + if (rbehind) + *rbehind = 0; + if (rahead) + *rahead = 0; return (VM_PAGER_OK); } Index: sys/vm/sg_pager.c =================================================================== --- sys/vm/sg_pager.c (revision 291639) +++ sys/vm/sg_pager.c (working copy) @@ -49,7 +49,7 @@ __FBSDID("$FreeBSD$"); static vm_object_t sg_pager_alloc(void *, vm_ooffset_t, vm_prot_t, vm_ooffset_t, struct ucred *); static void sg_pager_dealloc(vm_object_t); -static int sg_pager_getpages(vm_object_t, vm_page_t *, int, int); +static int sg_pager_getpages(vm_object_t, vm_page_t *, int, int *, int *); static void sg_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *); static boolean_t sg_pager_haspage(vm_object_t, vm_pindex_t, int *, @@ -135,7 +135,8 @@ sg_pager_dealloc(vm_object_t object) } static int -sg_pager_getpages(vm_object_t object, vm_page_t *m, int count, int reqpage) +sg_pager_getpages(vm_object_t object, vm_page_t *m, int count, int *rbehind, + int *rahead) { struct sglist *sg; vm_page_t m_paddr, page; @@ -145,11 +146,13 @@ static int size_t space; int i; + /* Since our haspage reports zero after/before, the count is 1. */ + KASSERT(count == 1, ("%s: count %d", __func__, count)); VM_OBJECT_ASSERT_WLOCKED(object); sg = object->handle; memattr = object->memattr; VM_OBJECT_WUNLOCK(object); - offset = m[reqpage]->pindex; + offset = m[0]->pindex; /* * Lookup the physical address of the requested page. An initial @@ -178,7 +181,7 @@ static int } /* Return a fake page for the requested page. */ - KASSERT(!(m[reqpage]->flags & PG_FICTITIOUS), + KASSERT(!(m[0]->flags & PG_FICTITIOUS), ("backing page for SG is fake")); /* Construct a new fake page. */ @@ -185,19 +188,16 @@ static int page = vm_page_getfake(paddr, memattr); VM_OBJECT_WLOCK(object); TAILQ_INSERT_TAIL(&object->un_pager.sgp.sgp_pglist, page, plinks.q); - - /* Free the original pages and insert this fake page into the object. */ - for (i = 0; i < count; i++) { - if (i == reqpage && - vm_page_replace(page, object, offset) != m[i]) - panic("sg_pager_getpages: invalid place replacement"); - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); - } - m[reqpage] = page; + if (vm_page_replace(page, object, offset) != m[0]) + panic("sg_pager_getpages: invalid place replacement"); + m[0] = page; page->valid = VM_PAGE_BITS_ALL; + if (rbehind) + *rbehind = 0; + if (rahead) + *rahead = 0; + return (VM_PAGER_OK); } Index: sys/vm/swap_pager.c =================================================================== --- sys/vm/swap_pager.c (revision 291639) +++ sys/vm/swap_pager.c (working copy) @@ -357,9 +357,10 @@ static vm_object_t swap_pager_alloc(void *handle, vm_ooffset_t size, vm_prot_t prot, vm_ooffset_t offset, struct ucred *); static void swap_pager_dealloc(vm_object_t object); -static int swap_pager_getpages(vm_object_t, vm_page_t *, int, int); -static int swap_pager_getpages_async(vm_object_t, vm_page_t *, int, int, - pgo_getpages_iodone_t, void *); +static int swap_pager_getpages(vm_object_t, vm_page_t *, int, int *, + int *); +static int swap_pager_getpages_async(vm_object_t, vm_page_t *, int, int *, + int *, pgo_getpages_iodone_t, void *); static void swap_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *); static boolean_t swap_pager_haspage(vm_object_t object, vm_pindex_t pindex, int *before, int *after); @@ -413,16 +414,6 @@ static void swp_pager_meta_free(vm_object_t, vm_pi static void swp_pager_meta_free_all(vm_object_t); static daddr_t swp_pager_meta_ctl(vm_object_t, vm_pindex_t, int); -static void -swp_pager_free_nrpage(vm_page_t m) -{ - - vm_page_lock(m); - if (m->wire_count == 0) - vm_page_free(m); - vm_page_unlock(m); -} - /* * SWP_SIZECHECK() - update swap_pager_full indication * @@ -1103,16 +1094,12 @@ swap_pager_unswapped(vm_page_t m) * left busy, but the others adjusted. */ static int -swap_pager_getpages(vm_object_t object, vm_page_t *m, int count, int reqpage) +swap_pager_getpages(vm_object_t object, vm_page_t *m, int count, int *rbehind, + int *rahead) { struct buf *bp; - vm_page_t mreq; - int i; - int j; daddr_t blk; - mreq = m[reqpage]; - /* * Calculate range to retrieve. The pages have already been assigned * their swapblks. We require a *contiguous* range but we know it to @@ -1122,45 +1109,18 @@ static int * * The swp_*() calls must be made with the object locked. */ - blk = swp_pager_meta_ctl(mreq->object, mreq->pindex, 0); + blk = swp_pager_meta_ctl(m[0]->object, m[0]->pindex, 0); - for (i = reqpage - 1; i >= 0; --i) { - daddr_t iblk; - - iblk = swp_pager_meta_ctl(m[i]->object, m[i]->pindex, 0); - if (blk != iblk + (reqpage - i)) - break; - } - ++i; - - for (j = reqpage + 1; j < count; ++j) { - daddr_t jblk; - - jblk = swp_pager_meta_ctl(m[j]->object, m[j]->pindex, 0); - if (blk != jblk - (j - reqpage)) - break; - } - - /* - * free pages outside our collection range. Note: we never free - * mreq, it must remain busy throughout. - */ - if (0 < i || j < count) { - int k; - - for (k = 0; k < i; ++k) - swp_pager_free_nrpage(m[k]); - for (k = j; k < count; ++k) - swp_pager_free_nrpage(m[k]); - } - - /* - * Return VM_PAGER_FAIL if we have nothing to do. Return mreq - * still busy, but the others unbusied. - */ if (blk == SWAPBLK_NONE) return (VM_PAGER_FAIL); +#ifdef INVARIANTS + for (int i = 0; i < count; i++) + KASSERT(blk + i == + swp_pager_meta_ctl(m[i]->object, m[i]->pindex, 0), + ("%s: range is not contiguous", __func__)); +#endif + /* * Getpbuf() can sleep. */ @@ -1175,21 +1135,16 @@ static int bp->b_iodone = swp_pager_async_iodone; bp->b_rcred = crhold(thread0.td_ucred); bp->b_wcred = crhold(thread0.td_ucred); - bp->b_blkno = blk - (reqpage - i); - bp->b_bcount = PAGE_SIZE * (j - i); - bp->b_bufsize = PAGE_SIZE * (j - i); - bp->b_pager.pg_reqpage = reqpage - i; + bp->b_blkno = blk; + bp->b_bcount = PAGE_SIZE * count; + bp->b_bufsize = PAGE_SIZE * count; + bp->b_npages = count; VM_OBJECT_WLOCK(object); - { - int k; - - for (k = i; k < j; ++k) { - bp->b_pages[k - i] = m[k]; - m[k]->oflags |= VPO_SWAPINPROG; - } + for (int i = 0; i < count; i++) { + bp->b_pages[i] = m[i]; + m[i]->oflags |= VPO_SWAPINPROG; } - bp->b_npages = j - i; PCPU_INC(cnt.v_swapin); PCPU_ADD(cnt.v_swappgsin, bp->b_npages); @@ -1221,8 +1176,8 @@ static int * is set in the meta-data. */ VM_OBJECT_WLOCK(object); - while ((mreq->oflags & VPO_SWAPINPROG) != 0) { - mreq->oflags |= VPO_SWAPSLEEP; + while ((m[0]->oflags & VPO_SWAPINPROG) != 0) { + m[0]->oflags |= VPO_SWAPSLEEP; PCPU_INC(cnt.v_intrans); if (VM_OBJECT_SLEEP(object, &object->paging_in_progress, PSWP, "swread", hz * 20)) { @@ -1233,16 +1188,19 @@ static int } /* - * mreq is left busied after completion, but all the other pages - * are freed. If we had an unrecoverable read error the page will - * not be valid. + * If we had an unrecoverable read error pages will not be valid. */ - if (mreq->valid != VM_PAGE_BITS_ALL) { - return (VM_PAGER_ERROR); - } else { - return (VM_PAGER_OK); - } + for (int i = 0; i < count; i++) + if (m[i]->valid != VM_PAGE_BITS_ALL) + return (VM_PAGER_ERROR); + if (rbehind) + *rbehind = 0; + if (rahead) + *rahead = 0; + + return (VM_PAGER_OK); + /* * A final note: in a low swap situation, we cannot deallocate swap * and mark a page dirty here because the caller is likely to mark @@ -1259,11 +1217,11 @@ static int */ static int swap_pager_getpages_async(vm_object_t object, vm_page_t *m, int count, - int reqpage, pgo_getpages_iodone_t iodone, void *arg) + int *rbehind, int *rahead, pgo_getpages_iodone_t iodone, void *arg) { int r, error; - r = swap_pager_getpages(object, m, count, reqpage); + r = swap_pager_getpages(object, m, count, rbehind, rahead); VM_OBJECT_WUNLOCK(object); switch (r) { case VM_PAGER_OK: @@ -1527,33 +1485,11 @@ swp_pager_async_iodone(struct buf *bp) */ if (bp->b_iocmd == BIO_READ) { /* - * When reading, reqpage needs to stay - * locked for the parent, but all other - * pages can be freed. We still want to - * wakeup the parent waiting on the page, - * though. ( also: pg_reqpage can be -1 and - * not match anything ). - * - * We have to wake specifically requested pages - * up too because we cleared VPO_SWAPINPROG and - * someone may be waiting for that. - * * NOTE: for reads, m->dirty will probably * be overridden by the original caller of * getpages so don't play cute tricks here. */ m->valid = 0; - if (i != bp->b_pager.pg_reqpage) - swp_pager_free_nrpage(m); - else { - vm_page_lock(m); - vm_page_flash(m); - vm_page_unlock(m); - } - /* - * If i == bp->b_pager.pg_reqpage, do not wake - * the page up. The caller needs to. - */ } else { /* * If a write error occurs, reactivate page @@ -1575,38 +1511,12 @@ swp_pager_async_iodone(struct buf *bp) * want to do that anyway, but it was an optimization * that existed in the old swapper for a time before * it got ripped out due to precisely this problem. - * - * If not the requested page then deactivate it. - * - * Note that the requested page, reqpage, is left - * busied, but we still have to wake it up. The - * other pages are released (unbusied) by - * vm_page_xunbusy(). */ KASSERT(!pmap_page_is_mapped(m), ("swp_pager_async_iodone: page %p is mapped", m)); - m->valid = VM_PAGE_BITS_ALL; KASSERT(m->dirty == 0, ("swp_pager_async_iodone: page %p is dirty", m)); - - /* - * We have to wake specifically requested pages - * up too because we cleared VPO_SWAPINPROG and - * could be waiting for it in getpages. However, - * be sure to not unbusy getpages specifically - * requested page - getpages expects it to be - * left busy. - */ - if (i != bp->b_pager.pg_reqpage) { - vm_page_lock(m); - vm_page_deactivate(m); - vm_page_unlock(m); - vm_page_xunbusy(m); - } else { - vm_page_lock(m); - vm_page_flash(m); - vm_page_unlock(m); - } + m->valid = VM_PAGE_BITS_ALL; } else { /* * For write success, clear the dirty @@ -1727,7 +1637,7 @@ swp_pager_force_pagein(vm_object_t object, vm_pind return; } - if (swap_pager_getpages(object, &m, 1, 0) != VM_PAGER_OK) + if (swap_pager_getpages(object, &m, 1, NULL, NULL) != VM_PAGER_OK) panic("swap_pager_force_pagein: read from swap failed");/*XXX*/ vm_object_pip_wakeup(object); vm_page_dirty(m); Index: sys/vm/vm_fault.c =================================================================== --- sys/vm/vm_fault.c (revision 291639) +++ sys/vm/vm_fault.c (working copy) @@ -107,13 +107,8 @@ __FBSDID("$FreeBSD$"); #define PFBAK 4 #define PFFOR 4 -static int vm_fault_additional_pages(vm_page_t, int, int, vm_page_t *, int *); - -#define VM_FAULT_READ_BEHIND 8 #define VM_FAULT_READ_DEFAULT (1 + VM_FAULT_READ_AHEAD_INIT) #define VM_FAULT_READ_MAX (1 + VM_FAULT_READ_AHEAD_MAX) -#define VM_FAULT_NINCR (VM_FAULT_READ_MAX / VM_FAULT_READ_BEHIND) -#define VM_FAULT_SUM (VM_FAULT_NINCR * (VM_FAULT_NINCR + 1) / 2) #define VM_FAULT_DONTNEED_MIN 1048576 @@ -133,7 +128,7 @@ struct faultstate { static void vm_fault_dontneed(const struct faultstate *fs, vm_offset_t vaddr, int ahead); static void vm_fault_prefault(const struct faultstate *fs, vm_offset_t addra, - int faultcount, int reqpage); + int backward, int forward); static inline void release_page(struct faultstate *fs) @@ -288,11 +283,10 @@ vm_fault_hold(vm_map_t map, vm_offset_t vaddr, vm_ int fault_flags, vm_page_t *m_hold) { vm_prot_t prot; - int alloc_req, era, faultcount, nera, reqpage, result; + int alloc_req, era, faultcount, nera, result; boolean_t growstack, is_first_object_locked, wired; int map_generation; vm_object_t next_object; - vm_page_t marray[VM_FAULT_READ_MAX]; int hardfault; struct faultstate fs; struct vnode *vp; @@ -303,7 +297,7 @@ vm_fault_hold(vm_map_t map, vm_offset_t vaddr, vm_ growstack = TRUE; PCPU_INC(cnt.v_vm_faults); fs.vp = NULL; - faultcount = reqpage = 0; + faultcount = 0; RetryFault:; @@ -389,7 +383,7 @@ RetryFault:; FALSE); VM_OBJECT_RUNLOCK(fs.first_object); if (!wired) - vm_fault_prefault(&fs, vaddr, 0, 0); + vm_fault_prefault(&fs, vaddr, PFBAK, PFFOR); vm_map_lookup_done(fs.map, fs.entry); curthread->td_ru.ru_minflt++; return (KERN_SUCCESS); @@ -652,36 +646,13 @@ vnode_locked: ("vm_fault: vnode-backed object mapped by system map")); /* - * now we find out if any other pages should be paged - * in at this time this routine checks to see if the - * pages surrounding this fault reside in the same - * object as the page for this fault. If they do, - * then they are faulted in also into the object. The - * array "marray" returned contains an array of - * vm_page_t structs where one of them is the - * vm_page_t passed to the routine. The reqpage - * return value is the index into the marray for the - * vm_page_t passed to the routine. - * - * fs.m plus the additional pages are exclusive busied. + * Page in the requested page and hint the pager, + * that it may bring up surrounding pages. */ - faultcount = vm_fault_additional_pages( - fs.m, behind, ahead, marray, &reqpage); - - rv = faultcount ? - vm_pager_get_pages(fs.object, marray, faultcount, - reqpage) : VM_PAGER_FAIL; - + rv = vm_pager_get_pages(fs.object, &fs.m, 1, + &behind, &ahead); if (rv == VM_PAGER_OK) { - /* - * Found the page. Leave it busy while we play - * with it. - * - * Pager could have changed the page. Pager - * is responsible for disposition of old page - * if moved. - */ - fs.m = marray[reqpage]; + faultcount = behind + 1 + ahead; hardfault++; break; /* break to PAGE HAS BEEN FOUND */ } @@ -961,16 +932,13 @@ vnode_locked: } /* * If the page was filled by a pager, update the map entry's - * last read offset. Since the pager does not return the - * actual set of pages that it read, this update is based on - * the requested set. Typically, the requested and actual - * sets are the same. + * last read offset. * * XXX The following assignment modifies the map * without holding a write lock on it. */ if (hardfault) - fs.entry->next_read = fs.pindex + faultcount - reqpage; + fs.entry->next_read = fs.pindex + ahead + 1; vm_fault_dirty(fs.entry, fs.m, prot, fault_type, fault_flags, TRUE); vm_page_assert_xbusied(fs.m); @@ -993,7 +961,9 @@ vnode_locked: fault_type | (wired ? PMAP_ENTER_WIRED : 0), 0); if (faultcount != 1 && (fault_flags & VM_FAULT_WIRE) == 0 && wired == 0) - vm_fault_prefault(&fs, vaddr, faultcount, reqpage); + vm_fault_prefault(&fs, vaddr, + faultcount > 0 ? behind : PFBAK, + faultcount > 0 ? ahead : PFFOR); VM_OBJECT_WLOCK(fs.object); vm_page_lock(fs.m); @@ -1110,7 +1080,7 @@ vm_fault_dontneed(const struct faultstate *fs, vm_ */ static void vm_fault_prefault(const struct faultstate *fs, vm_offset_t addra, - int faultcount, int reqpage) + int backward, int forward) { pmap_t pmap; vm_map_entry_t entry; @@ -1118,19 +1088,12 @@ vm_fault_prefault(const struct faultstate *fs, vm_ vm_offset_t addr, starta; vm_pindex_t pindex; vm_page_t m; - int backward, forward, i; + int i; pmap = fs->map->pmap; if (pmap != vmspace_pmap(curthread->td_proc->p_vmspace)) return; - if (faultcount > 0) { - backward = reqpage; - forward = faultcount - reqpage - 1; - } else { - backward = PFBAK; - forward = PFFOR; - } entry = fs->entry; starta = addra - backward * PAGE_SIZE; @@ -1461,134 +1424,7 @@ again: } } - /* - * This routine checks around the requested page for other pages that - * might be able to be faulted in. This routine brackets the viable - * pages for the pages to be paged in. - * - * Inputs: - * m, rbehind, rahead - * - * Outputs: - * marray (array of vm_page_t), reqpage (index of requested page) - * - * Return value: - * number of pages in marray - */ -static int -vm_fault_additional_pages(m, rbehind, rahead, marray, reqpage) - vm_page_t m; - int rbehind; - int rahead; - vm_page_t *marray; - int *reqpage; -{ - int i,j; - vm_object_t object; - vm_pindex_t pindex, startpindex, endpindex, tpindex; - vm_page_t rtm; - int cbehind, cahead; - - VM_OBJECT_ASSERT_WLOCKED(m->object); - - object = m->object; - pindex = m->pindex; - cbehind = cahead = 0; - - /* - * if the requested page is not available, then give up now - */ - if (!vm_pager_has_page(object, pindex, &cbehind, &cahead)) { - return 0; - } - - if ((cbehind == 0) && (cahead == 0)) { - *reqpage = 0; - marray[0] = m; - return 1; - } - - if (rahead > cahead) { - rahead = cahead; - } - - if (rbehind > cbehind) { - rbehind = cbehind; - } - - /* - * scan backward for the read behind pages -- in memory - */ - if (pindex > 0) { - if (rbehind > pindex) { - rbehind = pindex; - startpindex = 0; - } else { - startpindex = pindex - rbehind; - } - - if ((rtm = TAILQ_PREV(m, pglist, listq)) != NULL && - rtm->pindex >= startpindex) - startpindex = rtm->pindex + 1; - - /* tpindex is unsigned; beware of numeric underflow. */ - for (i = 0, tpindex = pindex - 1; tpindex >= startpindex && - tpindex < pindex; i++, tpindex--) { - - rtm = vm_page_alloc(object, tpindex, VM_ALLOC_NORMAL | - VM_ALLOC_IFNOTCACHED); - if (rtm == NULL) { - /* - * Shift the allocated pages to the - * beginning of the array. - */ - for (j = 0; j < i; j++) { - marray[j] = marray[j + tpindex + 1 - - startpindex]; - } - break; - } - - marray[tpindex - startpindex] = rtm; - } - } else { - startpindex = 0; - i = 0; - } - - marray[i] = m; - /* page offset of the required page */ - *reqpage = i; - - tpindex = pindex + 1; - i++; - - /* - * scan forward for the read ahead pages - */ - endpindex = tpindex + rahead; - if ((rtm = TAILQ_NEXT(m, listq)) != NULL && rtm->pindex < endpindex) - endpindex = rtm->pindex; - if (endpindex > object->size) - endpindex = object->size; - - for (; tpindex < endpindex; i++, tpindex++) { - - rtm = vm_page_alloc(object, tpindex, VM_ALLOC_NORMAL | - VM_ALLOC_IFNOTCACHED); - if (rtm == NULL) { - break; - } - - marray[i] = rtm; - } - - /* return number of pages */ - return i; -} - -/* * Block entry into the machine-independent layer's page fault handler by * the calling thread. Subsequent calls to vm_fault() by that thread will * return KERN_PROTECTION_FAILURE. Enable machine-dependent handling of Index: sys/vm/vm_glue.c =================================================================== --- sys/vm/vm_glue.c (revision 291639) +++ sys/vm/vm_glue.c (working copy) @@ -238,7 +238,7 @@ vm_imgact_hold_page(vm_object_t object, vm_ooffset pindex = OFF_TO_IDX(offset); m = vm_page_grab(object, pindex, VM_ALLOC_NORMAL); if (m->valid != VM_PAGE_BITS_ALL) { - rv = vm_pager_get_pages(object, &m, 1, 0); + rv = vm_pager_get_pages(object, &m, 1, NULL, NULL); if (rv != VM_PAGER_OK) { vm_page_lock(m); vm_page_free(m); @@ -567,37 +567,37 @@ vm_thread_swapin(struct thread *td) { vm_object_t ksobj; vm_page_t ma[KSTACK_MAX_PAGES]; - int i, j, pages, rv; + int pages; pages = td->td_kstack_pages; ksobj = td->td_kstack_obj; VM_OBJECT_WLOCK(ksobj); - for (i = 0; i < pages; i++) + for (int i = 0; i < pages; i++) ma[i] = vm_page_grab(ksobj, i, VM_ALLOC_NORMAL | VM_ALLOC_WIRED); - for (i = 0; i < pages; i++) { - if (ma[i]->valid != VM_PAGE_BITS_ALL) { - vm_page_assert_xbusied(ma[i]); - vm_object_pip_add(ksobj, 1); - for (j = i + 1; j < pages; j++) { - if (ma[j]->valid != VM_PAGE_BITS_ALL) - vm_page_assert_xbusied(ma[j]); - if (ma[j]->valid == VM_PAGE_BITS_ALL) - break; - } - rv = vm_pager_get_pages(ksobj, ma + i, j - i, 0); - if (rv != VM_PAGER_OK) - panic("vm_thread_swapin: cannot get kstack for proc: %d", - td->td_proc->p_pid); - /* - * All pages in the array are in place, due to the - * pager is always the swap pager, which doesn't - * free or remove wired non-req pages from object. - */ - vm_object_pip_wakeup(ksobj); + for (int i = 0; i < pages;) { + int j, a, count, rv; + + vm_page_assert_xbusied(ma[i]); + if (ma[i]->valid == VM_PAGE_BITS_ALL) { vm_page_xunbusy(ma[i]); - } else if (vm_page_xbusied(ma[i])) - vm_page_xunbusy(ma[i]); + i++; + continue; + } + vm_object_pip_add(ksobj, 1); + for (j = i + 1; j < pages; j++) + if (ma[j]->valid == VM_PAGE_BITS_ALL) + break; + rv = vm_pager_has_page(ksobj, ma[i]->pindex, NULL, &a); + KASSERT(rv == 1, ("%s: missing page %p", __func__, ma[i])); + count = min(a + 1, j - i); + rv = vm_pager_get_pages(ksobj, ma + i, count, NULL, NULL); + KASSERT(rv == VM_PAGER_OK, ("%s: cannot get kstack for proc %d", + __func__, td->td_proc->p_pid)); + vm_object_pip_wakeup(ksobj); + for (j = i; j < i + count; j++) + vm_page_xunbusy(ma[j]); + i += count; } VM_OBJECT_WUNLOCK(ksobj); pmap_qenter(td->td_kstack, ma, pages); Index: sys/vm/vm_object.c =================================================================== --- sys/vm/vm_object.c (revision 291639) +++ sys/vm/vm_object.c (working copy) @@ -2023,7 +2023,7 @@ vm_object_populate(vm_object_t object, vm_pindex_t for (pindex = start; pindex < end; pindex++) { m = vm_page_grab(object, pindex, VM_ALLOC_NORMAL); if (m->valid != VM_PAGE_BITS_ALL) { - rv = vm_pager_get_pages(object, &m, 1, 0); + rv = vm_pager_get_pages(object, &m, 1, NULL, NULL); if (rv != VM_PAGER_OK) { vm_page_lock(m); vm_page_free(m); Index: sys/vm/vm_object.h =================================================================== --- sys/vm/vm_object.h (revision 291639) +++ sys/vm/vm_object.h (working copy) @@ -243,6 +243,8 @@ extern struct vm_object kmem_object_store; rw_try_upgrade(&(object)->lock) #define VM_OBJECT_WLOCK(object) \ rw_wlock(&(object)->lock) +#define VM_OBJECT_WOWNED(object) \ + rw_wowned(&(object)->lock) #define VM_OBJECT_WUNLOCK(object) \ rw_wunlock(&(object)->lock) Index: sys/vm/vm_page.c =================================================================== --- sys/vm/vm_page.c (revision 291639) +++ sys/vm/vm_page.c (working copy) @@ -979,38 +979,28 @@ vm_page_free_zero(vm_page_t m) /* * Unbusy and handle the page queueing for a page from the VOP_GETPAGES() - * array which is not the request page. + * array which was read ahead. */ void vm_page_readahead_finish(vm_page_t m) { - if (m->valid != 0) { - /* - * Since the page is not the requested page, whether - * it should be activated or deactivated is not - * obvious. Empirical results have shown that - * deactivating the page is usually the best choice, - * unless the page is wanted by another thread. - */ - vm_page_lock(m); - if ((m->busy_lock & VPB_BIT_WAITERS) != 0) - vm_page_activate(m); - else - vm_page_deactivate(m); - vm_page_unlock(m); - vm_page_xunbusy(m); - } else { - /* - * Free the completely invalid page. Such page state - * occurs due to the short read operation which did - * not covered our page at all, or in case when a read - * error happens. - */ - vm_page_lock(m); - vm_page_free(m); - vm_page_unlock(m); - } + /* We shouldn't put invalid pages on queues. */ + KASSERT(m->valid != 0, ("%s: %p is invalid", __func__, m)); + + /* + * Since the page is not the actually needed one, whether it should + * be activated or deactivated is not obvious. Empirical results + * have shown that deactivating the page is usually the best choice, + * unless the page is wanted by another thread. + */ + vm_page_lock(m); + if ((m->busy_lock & VPB_BIT_WAITERS) != 0) + vm_page_activate(m); + else + vm_page_deactivate(m); + vm_page_unlock(m); + vm_page_xunbusy(m); } /* Index: sys/vm/vm_pager.c =================================================================== --- sys/vm/vm_pager.c (revision 291639) +++ sys/vm/vm_pager.c (working copy) @@ -88,7 +88,7 @@ int cluster_pbuf_freecnt = -1; /* unlimited to beg struct buf *swbuf; -static int dead_pager_getpages(vm_object_t, vm_page_t *, int, int); +static int dead_pager_getpages(vm_object_t, vm_page_t *, int, int *, int *); static vm_object_t dead_pager_alloc(void *, vm_ooffset_t, vm_prot_t, vm_ooffset_t, struct ucred *); static void dead_pager_putpages(vm_object_t, vm_page_t *, int, int, int *); @@ -96,13 +96,11 @@ static boolean_t dead_pager_haspage(vm_object_t, v static void dead_pager_dealloc(vm_object_t); static int -dead_pager_getpages(obj, ma, count, req) - vm_object_t obj; - vm_page_t *ma; - int count; - int req; +dead_pager_getpages(vm_object_t obj, vm_page_t *ma, int count, int *rbehind, + int *rahead) { - return VM_PAGER_FAIL; + + return (VM_PAGER_FAIL); } static vm_object_t @@ -282,45 +280,47 @@ vm_pager_assert_in(vm_object_t object, vm_page_t * * The requested page must be fully valid on successful return. */ int -vm_pager_get_pages(vm_object_t object, vm_page_t *m, int count, int reqpage) +vm_pager_get_pages(vm_object_t object, vm_page_t *m, int count, int *rbehind, + int *rahead) { +#ifdef INVARIANTS + vm_pindex_t pindex = m[0]->pindex; +#endif int r; vm_pager_assert_in(object, m, count); - r = (*pagertab[object->type]->pgo_getpages)(object, m, count, reqpage); + r = (*pagertab[object->type]->pgo_getpages)(object, m, count, rbehind, + rahead); if (r != VM_PAGER_OK) return (r); - /* - * If pager has replaced the page, assert that it had - * updated the array. Also assert that page is still - * busied. - */ - KASSERT(m[reqpage] == vm_page_lookup(object, m[reqpage]->pindex), - ("%s: mismatch page %p pindex %ju", __func__, - m[reqpage], (uintmax_t )m[reqpage]->pindex)); - vm_page_assert_xbusied(m[reqpage]); - - /* - * Pager didn't fill up entire page. Zero out - * partially filled data. - */ - if (m[reqpage]->valid != VM_PAGE_BITS_ALL) - vm_page_zero_invalid(m[reqpage], TRUE); - + for (int i = 0; i < count; i++) { + /* + * If pager has replaced a page, assert that it had + * updated the array. + */ + KASSERT(m[i] == vm_page_lookup(object, pindex++), + ("%s: mismatch page %p pindex %ju", __func__, + m[i], (uintmax_t )pindex - 1)); + /* + * Zero out partially filled data. + */ + if (m[i]->valid != VM_PAGE_BITS_ALL) + vm_page_zero_invalid(m[i], TRUE); + } return (VM_PAGER_OK); } int vm_pager_get_pages_async(vm_object_t object, vm_page_t *m, int count, - int reqpage, pgo_getpages_iodone_t iodone, void *arg) + int *rbehind, int *rahead, pgo_getpages_iodone_t iodone, void *arg) { vm_pager_assert_in(object, m, count); return ((*pagertab[object->type]->pgo_getpages_async)(object, m, - count, reqpage, iodone, arg)); + count, rbehind, rahead, iodone, arg)); } /* @@ -355,39 +355,6 @@ vm_pager_object_lookup(struct pagerlst *pg_list, v } /* - * Free the non-requested pages from the given array. To remove all pages, - * caller should provide out of range reqpage number. - */ -void -vm_pager_free_nonreq(vm_object_t object, vm_page_t ma[], int reqpage, - int npages, boolean_t object_locked) -{ - enum { UNLOCKED, CALLER_LOCKED, INTERNALLY_LOCKED } locked; - int i; - - if (object_locked) { - VM_OBJECT_ASSERT_WLOCKED(object); - locked = CALLER_LOCKED; - } else { - VM_OBJECT_ASSERT_UNLOCKED(object); - locked = UNLOCKED; - } - for (i = 0; i < npages; ++i) { - if (i != reqpage) { - if (locked == UNLOCKED) { - VM_OBJECT_WLOCK(object); - locked = INTERNALLY_LOCKED; - } - vm_page_lock(ma[i]); - vm_page_free(ma[i]); - vm_page_unlock(ma[i]); - } - } - if (locked == INTERNALLY_LOCKED) - VM_OBJECT_WUNLOCK(object); -} - -/* * initialize a physical buffer */ Index: sys/vm/vm_pager.h =================================================================== --- sys/vm/vm_pager.h (revision 291639) +++ sys/vm/vm_pager.h (working copy) @@ -50,9 +50,9 @@ typedef void pgo_init_t(void); typedef vm_object_t pgo_alloc_t(void *, vm_ooffset_t, vm_prot_t, vm_ooffset_t, struct ucred *); typedef void pgo_dealloc_t(vm_object_t); -typedef int pgo_getpages_t(vm_object_t, vm_page_t *, int, int); +typedef int pgo_getpages_t(vm_object_t, vm_page_t *, int, int *, int *); typedef void pgo_getpages_iodone_t(void *, vm_page_t *, int, int); -typedef int pgo_getpages_async_t(vm_object_t, vm_page_t *, int, int, +typedef int pgo_getpages_async_t(vm_object_t, vm_page_t *, int, int *, int *, pgo_getpages_iodone_t, void *); typedef void pgo_putpages_t(vm_object_t, vm_page_t *, int, int, int *); typedef boolean_t pgo_haspage_t(vm_object_t, vm_pindex_t, int *, int *); @@ -106,14 +106,12 @@ vm_object_t vm_pager_allocate(objtype_t, void *, v vm_ooffset_t, struct ucred *); void vm_pager_bufferinit(void); void vm_pager_deallocate(vm_object_t); -int vm_pager_get_pages(vm_object_t, vm_page_t *, int, int); -int vm_pager_get_pages_async(vm_object_t, vm_page_t *, int, int, +int vm_pager_get_pages(vm_object_t, vm_page_t *, int, int *, int *); +int vm_pager_get_pages_async(vm_object_t, vm_page_t *, int, int *, int *, pgo_getpages_iodone_t, void *); static __inline boolean_t vm_pager_has_page(vm_object_t, vm_pindex_t, int *, int *); void vm_pager_init(void); vm_object_t vm_pager_object_lookup(struct pagerlst *, void *); -void vm_pager_free_nonreq(vm_object_t object, vm_page_t ma[], int reqpage, - int npages, boolean_t object_locked); static __inline void vm_pager_put_pages( Index: sys/vm/vnode_pager.c =================================================================== --- sys/vm/vnode_pager.c (revision 291639) +++ sys/vm/vnode_pager.c (working copy) @@ -84,11 +84,9 @@ static int vnode_pager_addr(struct vnode *vp, vm_o static int vnode_pager_input_smlfs(vm_object_t object, vm_page_t m); static int vnode_pager_input_old(vm_object_t object, vm_page_t m); static void vnode_pager_dealloc(vm_object_t); -static int vnode_pager_local_getpages0(struct vnode *, vm_page_t *, int, int, - vop_getpages_iodone_t, void *); -static int vnode_pager_getpages(vm_object_t, vm_page_t *, int, int); -static int vnode_pager_getpages_async(vm_object_t, vm_page_t *, int, int, - vop_getpages_iodone_t, void *); +static int vnode_pager_getpages(vm_object_t, vm_page_t *, int, int *, int *); +static int vnode_pager_getpages_async(vm_object_t, vm_page_t *, int, int *, + int *, vop_getpages_iodone_t, void *); static void vnode_pager_putpages(vm_object_t, vm_page_t *, int, int, int *); static boolean_t vnode_pager_haspage(vm_object_t, vm_pindex_t, int *, int *); static vm_object_t vnode_pager_alloc(void *, vm_ooffset_t, vm_prot_t, @@ -673,15 +671,15 @@ vnode_pager_input_old(vm_object_t object, vm_page_ * backing vp's VOP_GETPAGES. */ static int -vnode_pager_getpages(vm_object_t object, vm_page_t *m, int count, int reqpage) +vnode_pager_getpages(vm_object_t object, vm_page_t *m, int count, int *rbehind, + int *rahead) { + struct vnode *vp; int rtval; - struct vnode *vp; - int bytes = count * PAGE_SIZE; vp = object->handle; VM_OBJECT_WUNLOCK(object); - rtval = VOP_GETPAGES(vp, m, bytes, reqpage); + rtval = VOP_GETPAGES(vp, m, count, rbehind, rahead); KASSERT(rtval != EOPNOTSUPP, ("vnode_pager: FS getpages not implemented\n")); VM_OBJECT_WLOCK(object); @@ -690,7 +688,7 @@ static int static int vnode_pager_getpages_async(vm_object_t object, vm_page_t *m, int count, - int reqpage, vop_getpages_iodone_t iodone, void *arg) + int *rbehind, int *rahead, vop_getpages_iodone_t iodone, void *arg) { struct vnode *vp; int rtval; @@ -697,8 +695,7 @@ vnode_pager_getpages_async(vm_object_t object, vm_ vp = object->handle; VM_OBJECT_WUNLOCK(object); - rtval = VOP_GETPAGES_ASYNC(vp, m, count * PAGE_SIZE, reqpage, - iodone, arg); + rtval = VOP_GETPAGES_ASYNC(vp, m, count, rbehind, rahead, iodone, arg); KASSERT(rtval != EOPNOTSUPP, ("vnode_pager: FS getpages_async not implemented\n")); VM_OBJECT_WLOCK(object); @@ -714,8 +711,8 @@ int vnode_pager_local_getpages(struct vop_getpages_args *ap) { - return (vnode_pager_local_getpages0(ap->a_vp, ap->a_m, ap->a_count, - ap->a_reqpage, NULL, NULL)); + return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count, + ap->a_rbehind, ap->a_rahead, NULL, NULL)); } int @@ -722,68 +719,54 @@ int vnode_pager_local_getpages_async(struct vop_getpages_async_args *ap) { - return (vnode_pager_local_getpages0(ap->a_vp, ap->a_m, ap->a_count, - ap->a_reqpage, ap->a_iodone, ap->a_arg)); + return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count, + ap->a_rbehind, ap->a_rahead, ap->a_iodone, ap->a_arg)); } -static int -vnode_pager_local_getpages0(struct vnode *vp, vm_page_t *m, int bytecount, - int reqpage, vop_getpages_iodone_t iodone, void *arg) -{ - vm_page_t mreq; - - mreq = m[reqpage]; - - /* - * Since the caller has busied the requested page, that page's valid - * field will not be changed by other threads. - */ - vm_page_assert_xbusied(mreq); - - /* - * The requested page has valid blocks. Invalid part can only - * exist at the end of file, and the page is made fully valid - * by zeroing in vm_pager_get_pages(). Free non-requested - * pages, since no i/o is done to read its content. - */ - if (mreq->valid != 0) { - vm_pager_free_nonreq(mreq->object, m, reqpage, - round_page(bytecount) / PAGE_SIZE, FALSE); - if (iodone != NULL) - iodone(arg, m, reqpage, 0); - return (VM_PAGER_OK); - } - - return (vnode_pager_generic_getpages(vp, m, bytecount, reqpage, - iodone, arg)); -} - /* * This is now called from local media FS's to operate against their * own vnodes if they fail to implement VOP_GETPAGES. */ int -vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, int bytecount, - int reqpage, vop_getpages_iodone_t iodone, void *arg) +vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, int count, + int *a_rbehind, int *a_rahead, vop_getpages_iodone_t iodone, void *arg) { vm_object_t object; struct bufobj *bo; struct buf *bp; - daddr_t firstaddr, reqblock; - off_t foff, pib; - int pbefore, pafter, i, size, bsize, first, last, *freecnt; - int count, error, before, after, secmask; + off_t foff; + int bsize, pagesperblock, *freecnt; + int error, before, after, rbehind, rahead, poff, i; + int bytecount, secmask; KASSERT(vp->v_type != VCHR && vp->v_type != VBLK, - ("vnode_pager_generic_getpages does not support devices")); + ("%s does not support devices", __func__)); + if (vp->v_iflag & VI_DOOMED) return (VM_PAGER_BAD); object = vp->v_object; - count = bytecount / PAGE_SIZE; + foff = IDX_TO_OFF(m[0]->pindex); bsize = vp->v_mount->mnt_stat.f_iosize; + pagesperblock = bsize / PAGE_SIZE; + KASSERT(foff < object->un_pager.vnp.vnp_size, + ("%s: page %p offset beyond vp %p size", __func__, m[0], vp)); + KASSERT(count <= sizeof(bp->b_pages), + ("%s: requested %d pages", __func__, count)); + /* + * The last page has valid blocks. Invalid part can only + * exist at the end of file, and the page is made fully valid + * by zeroing in vm_pager_get_pages(). + */ + if (m[count - 1]->valid != 0 && --count == 0) { + if (iodone != NULL) + iodone(arg, m, 1, 0); + return (VM_PAGER_OK); + } + + /* * Synchronous and asynchronous paging operations use different * free pbuf counters. This is done to avoid asynchronous requests * to consume all pbufs. @@ -800,130 +783,182 @@ int * If the file system doesn't support VOP_BMAP, use old way of * getting pages via VOP_READ. */ - error = VOP_BMAP(vp, IDX_TO_OFF(m[reqpage]->pindex) / bsize, &bo, - &reqblock, &after, &before); + error = VOP_BMAP(vp, foff / bsize, &bo, &bp->b_blkno, &after, &before); if (error == EOPNOTSUPP) { relpbuf(bp, freecnt); VM_OBJECT_WLOCK(object); - for (i = 0; i < count; i++) - if (i != reqpage) { - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); - } - PCPU_INC(cnt.v_vnodein); - PCPU_INC(cnt.v_vnodepgsin); - error = vnode_pager_input_old(object, m[reqpage]); + for (i = 0; i < count; i++) { + PCPU_INC(cnt.v_vnodein); + PCPU_INC(cnt.v_vnodepgsin); + error = vnode_pager_input_old(object, m[i]); + if (error) + break; + } VM_OBJECT_WUNLOCK(object); return (error); } else if (error != 0) { relpbuf(bp, freecnt); - vm_pager_free_nonreq(object, m, reqpage, count, FALSE); return (VM_PAGER_ERROR); - - /* - * If the blocksize is smaller than a page size, then use - * special small filesystem code. - */ - } else if ((PAGE_SIZE / bsize) > 1) { - relpbuf(bp, freecnt); - vm_pager_free_nonreq(object, m, reqpage, count, FALSE); - PCPU_INC(cnt.v_vnodein); - PCPU_INC(cnt.v_vnodepgsin); - return (vnode_pager_input_smlfs(object, m[reqpage])); } /* - * Since the caller has busied the requested page, that page's valid - * field will not be changed by other threads. + * If the file system supports BMAP, but blocksize is smaller + * than a page size, then use special small filesystem code. */ - vm_page_assert_xbusied(m[reqpage]); + if (pagesperblock == 0) { + for (i = 0; i < count; i++) { + PCPU_INC(cnt.v_vnodein); + PCPU_INC(cnt.v_vnodepgsin); + error = vnode_pager_input_smlfs(object, m[i]); + if (error) + break; + } + return (error); + } /* - * If we have a completely valid page available to us, we can - * clean up and return. Otherwise we have to re-read the - * media. + * A sparse file can be encountered only for a single page request, + * which may not be preceeded by call to vm_pager_haspage(). */ - if (m[reqpage]->valid == VM_PAGE_BITS_ALL) { + if (bp->b_blkno == -1) { + KASSERT(count == 1, + ("%s: array[%d] request to a sparse file %p", __func__, + count, vp)); relpbuf(bp, freecnt); - vm_pager_free_nonreq(object, m, reqpage, count, FALSE); - return (VM_PAGER_OK); - } else if (reqblock == -1) { - relpbuf(bp, freecnt); - pmap_zero_page(m[reqpage]); - KASSERT(m[reqpage]->dirty == 0, - ("vnode_pager_generic_getpages: page %p is dirty", m)); + pmap_zero_page(m[0]); + KASSERT(m[0]->dirty == 0, ("%s: page %p is dirty", + __func__, m[0])); VM_OBJECT_WLOCK(object); - m[reqpage]->valid = VM_PAGE_BITS_ALL; - vm_pager_free_nonreq(object, m, reqpage, count, TRUE); + m[0]->valid = VM_PAGE_BITS_ALL; VM_OBJECT_WUNLOCK(object); return (VM_PAGER_OK); - } else if (m[reqpage]->valid != 0) { - VM_OBJECT_WLOCK(object); - m[reqpage]->valid = 0; - VM_OBJECT_WUNLOCK(object); } - pib = IDX_TO_OFF(m[reqpage]->pindex) % bsize; - pbefore = ((daddr_t)before * bsize + pib) / PAGE_SIZE; - pafter = ((daddr_t)(after + 1) * bsize - pib) / PAGE_SIZE - 1; - first = reqpage < pbefore ? 0 : reqpage - pbefore; - last = reqpage + pafter >= count ? count - 1 : reqpage + pafter; - if (first > 0 || last + 1 < count) { + bp->b_blkno += (foff % bsize) / DEV_BSIZE; + + /* Recalculate blocks available after/before to pages. */ + poff = (foff % bsize) / PAGE_SIZE; + before *= pagesperblock; + before += poff; + after *= pagesperblock; + after += pagesperblock - (poff + 1); + if (m[0]->pindex + after >= object->size) + after = object->size - 1 - m[0]->pindex; + KASSERT(count <= after + 1, ("%s: %d pages asked, can do only %d", + __func__, count, after + 1)); + after -= count - 1; + + /* Trim requested rbehind/rahead to possible values. */ + rbehind = a_rbehind ? *a_rbehind : 0; + rahead = a_rahead ? *a_rahead : 0; + rbehind = min(rbehind, before); + rbehind = min(rbehind, m[0]->pindex); + rahead = min(rahead, after); + rahead = min(rahead, object->size - m[count - 1]->pindex); + KASSERT(rbehind + rahead + count <= sizeof(bp->b_pages), + ("%s: behind %d ahead %d count %d", __func__, + rbehind, rahead, count)); + + /* + * Fill in the bp->b_pages[] array with requested and optional + * read behind or read ahead pages. Read behind pages are looked + * up in a backward direction, down to a first cached page. Same + * for read ahead pages, but there is no need to shift the array + * in case of encountering a cached page. + */ + i = bp->b_npages = 0; + if (rbehind) { + vm_pindex_t startpindex, tpindex; + vm_page_t p; + VM_OBJECT_WLOCK(object); - for (i = 0; i < first; i++) { - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); + startpindex = m[0]->pindex - rbehind; + if ((p = TAILQ_PREV(m[0], pglist, listq)) != NULL && + p->pindex >= startpindex) + startpindex = p->pindex + 1; + + /* tpindex is unsigned; beware of numeric underflow. */ + for (tpindex = m[0]->pindex - 1; + tpindex >= startpindex && tpindex < m[0]->pindex; + tpindex--, i++) { + p = vm_page_alloc(object, tpindex, VM_ALLOC_NORMAL | + VM_ALLOC_IFNOTCACHED); + if (p == NULL) { + /* Shift the array. */ + for (int j = 0; j < i; j++) + bp->b_pages[j] = bp->b_pages[j + + tpindex + 1 - startpindex]; + break; + } + bp->b_pages[tpindex - startpindex] = p; } - for (i = last + 1; i < count; i++) { - vm_page_lock(m[i]); - vm_page_free(m[i]); - vm_page_unlock(m[i]); + + bp->b_pgbefore = i; + bp->b_npages += i; + bp->b_blkno -= IDX_TO_OFF(i) / DEV_BSIZE; + } else + bp->b_pgbefore = 0; + + /* Requested pages. */ + for (int j = 0; j < count; j++, i++) + bp->b_pages[i] = m[j]; + bp->b_npages += count; + + if (rahead) { + vm_pindex_t endpindex, tpindex; + vm_page_t p; + + if (!VM_OBJECT_WOWNED(object)) + VM_OBJECT_WLOCK(object); + endpindex = m[count - 1]->pindex + rahead + 1; + if ((p = TAILQ_NEXT(m[count - 1], listq)) != NULL && + p->pindex < endpindex) + endpindex = p->pindex; + if (endpindex > object->size) + endpindex = object->size; + + for (tpindex = m[count - 1]->pindex + 1; + tpindex < endpindex; i++, tpindex++) { + p = vm_page_alloc(object, tpindex, VM_ALLOC_NORMAL | + VM_ALLOC_IFNOTCACHED); + if (p == NULL) + break; + bp->b_pages[i] = p; } - VM_OBJECT_WUNLOCK(object); - } - /* - * here on direct device I/O - */ - firstaddr = reqblock; - firstaddr += pib / DEV_BSIZE; - firstaddr -= IDX_TO_OFF(reqpage - first) / DEV_BSIZE; + bp->b_pgafter = i - bp->b_npages; + bp->b_npages = i; + } else + bp->b_pgafter = 0; - /* - * The first and last page have been calculated now, move - * input pages to be zero based, and adjust the count. - */ - m += first; - reqpage -= first; - count = last - first + 1; + if (VM_OBJECT_WOWNED(object)) + VM_OBJECT_WUNLOCK(object); - /* - * calculate the file virtual address for the transfer - */ - foff = IDX_TO_OFF(m[0]->pindex); + /* Report back actual behind/ahead read. */ + if (a_rbehind) + *a_rbehind = bp->b_pgbefore; + if (a_rahead) + *a_rahead = bp->b_pgafter; - /* - * calculate the size of the transfer - */ - size = count * PAGE_SIZE; - KASSERT(count > 0, ("zero count")); - if ((foff + size) > object->un_pager.vnp.vnp_size) - size = object->un_pager.vnp.vnp_size - foff; - KASSERT(size > 0, ("zero size")); + KASSERT(bp->b_npages <= sizeof(bp->b_pages), + ("%s: buf %p overflowed", __func__, bp)); /* - * round up physical size for real devices. + * Recalculate first offset and bytecount with regards to read behind. + * Truncate bytecount to vnode real size and round up physical size + * for real devices. */ + foff = IDX_TO_OFF(bp->b_pages[0]->pindex); + bytecount = bp->b_npages << PAGE_SHIFT; + if ((foff + bytecount) > object->un_pager.vnp.vnp_size) + bytecount = object->un_pager.vnp.vnp_size - foff; secmask = bo->bo_bsize - 1; KASSERT(secmask < PAGE_SIZE && secmask > 0, - ("vnode_pager_generic_getpages: sector size %d too large", - secmask + 1)); - size = (size + secmask) & ~secmask; + ("%s: sector size %d too large", __func__, secmask + 1)); + bytecount = (bytecount + secmask) & ~secmask; /* - * and map the pages to be read into the kva, if the filesystem + * And map the pages to be read into the kva, if the filesystem * requires mapped buffers. */ if ((vp->v_mount->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0 && @@ -932,41 +967,32 @@ int bp->b_offset = 0; } else { bp->b_data = bp->b_kvabase; - pmap_qenter((vm_offset_t)bp->b_data, m, count); + pmap_qenter((vm_offset_t)bp->b_data, bp->b_pages, bp->b_npages); } - /* build a minimal buffer header */ + /* Build a minimal buffer header. */ bp->b_iocmd = BIO_READ; KASSERT(bp->b_rcred == NOCRED, ("leaking read ucred")); KASSERT(bp->b_wcred == NOCRED, ("leaking write ucred")); bp->b_rcred = crhold(curthread->td_ucred); bp->b_wcred = crhold(curthread->td_ucred); - bp->b_blkno = firstaddr; pbgetbo(bo, bp); bp->b_vp = vp; - bp->b_bcount = size; - bp->b_bufsize = size; - bp->b_runningbufspace = bp->b_bufsize; - for (i = 0; i < count; i++) - bp->b_pages[i] = m[i]; - bp->b_npages = count; - bp->b_pager.pg_reqpage = reqpage; + bp->b_bcount = bp->b_bufsize = bp->b_runningbufspace = bytecount; + bp->b_iooffset = dbtob(bp->b_blkno); + atomic_add_long(&runningbufspace, bp->b_runningbufspace); - PCPU_INC(cnt.v_vnodein); - PCPU_ADD(cnt.v_vnodepgsin, count); + PCPU_ADD(cnt.v_vnodepgsin, bp->b_npages); - /* do the input */ - bp->b_iooffset = dbtob(bp->b_blkno); - if (iodone != NULL) { /* async */ - bp->b_pager.pg_iodone = iodone; + bp->b_pgiodone = iodone; bp->b_caller1 = arg; bp->b_iodone = vnode_pager_generic_getpages_done_async; bp->b_flags |= B_ASYNC; BUF_KERNPROC(bp); bstrategy(bp); - /* Good bye! */ + return (VM_PAGER_OK); } else { bp->b_iodone = bdone; bstrategy(bp); @@ -977,9 +1003,8 @@ int bp->b_vp = NULL; pbrelbo(bp); relpbuf(bp, &vnode_pbuf_freecnt); + return (error != 0 ? VM_PAGER_ERROR : VM_PAGER_OK); } - - return (error != 0 ? VM_PAGER_ERROR : VM_PAGER_OK); } static void @@ -988,8 +1013,7 @@ vnode_pager_generic_getpages_done_async(struct buf int error; error = vnode_pager_generic_getpages_done(bp); - bp->b_pager.pg_iodone(bp->b_caller1, bp->b_pages, - bp->b_pager.pg_reqpage, error); + bp->b_pgiodone(bp->b_caller1, bp->b_pages, bp->b_npages, error); for (int i = 0; i < bp->b_npages; i++) bp->b_pages[i] = NULL; bp->b_vp = NULL; @@ -1052,8 +1076,8 @@ vnode_pager_generic_getpages_done(struct buf *bp) object->un_pager.vnp.vnp_size - tfoff)) == 0, ("%s: page %p is dirty", __func__, mt)); } - - if (i != bp->b_pager.pg_reqpage) + + if (i < bp->b_pgbefore || i >= bp->b_npages - bp->b_pgafter) vm_page_readahead_finish(mt); } VM_OBJECT_WUNLOCK(object); Index: sys/vm/vnode_pager.h =================================================================== --- sys/vm/vnode_pager.h (revision 291639) +++ sys/vm/vnode_pager.h (working copy) @@ -41,7 +41,8 @@ #ifdef _KERNEL int vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, - int count, int reqpage, vop_getpages_iodone_t iodone, void *arg); + int count, int *rbehind, int *rahead, vop_getpages_iodone_t iodone, + void *arg); int vnode_pager_generic_putpages(struct vnode *vp, vm_page_t *m, int count, boolean_t sync, int *rtvals); --OwLcNYc0lM97+oe1--