Date: Tue, 09 Jul 2013 07:37:48 +0000 From: "Chad J. Milios" <freebsd-list@nuos.org> To: Devin Teske <dteske@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: login.conf.db, /sbin/init, separate /etc, and configs around "thin provisioning" WAS: Re: nuOS Message-ID: <51DBBDCC.7060800@nuos.org> In-Reply-To: <13CA24D6AB415D428143D44749F57D7201FB7302@ltcfiswmsgmb21> References: <51D9E499.103@nuos.org> <13CA24D6AB415D428143D44749F57D7201FB6177@ltcfiswmsgmb21> <51DB085A.9040701@nuos.org> <13CA24D6AB415D428143D44749F57D7201FB7302@ltcfiswmsgmb21>
next in thread | previous in thread | raw e-mail | index | archive | help
On 07/08/13 22:12, Teske, Devin wrote: > > We also had to put one file into the etc directory on the / "beneath" the /etc mount so that /sbin/init can read it before /etc is mounted. There were two or three ways we could do that and each has a tradeoff. > > I've been bitten by that. > > Getting access to that file that's "beneath" once you've booted the system can be ... less than easy. yeah i prefer resorting to trickery or "magic" as little as possible only as a last resort and i try to clutter up the standard tree of files as little as possible. in this case we only needed the one file, just a symlink actually. the "under" has only the following: lrwxr-xr-x 1 root wheel 25 Jun 25 17:59 /login.conf.db@ -> ../boot/etc/login.conf.db and in the "over" /etc we still place an identical symlink so that the real file is in /boot/etc/. cap_mkdb doesnt clobber the symlink, it writes through to /boot/etc/login.conf.db for you. so in the normal usual case, a user edits login.conf and runs cap_mkdb like they're supposed to and everything is fine. its only if they rollback or restore a backup to /etc that things potentially can end up out of sync. i don't want anyone to get confused by me talking about jails in the same email. The above snag we are working around involves /sbin/init ONLY WHEN booting the host FreeBSD. Our jailed customers don't have to worry about this because /etc is already in the right spot by the time jail runs /etc/rc. /sbin/init isn't even involved in a jail, is it? Not even in some "hooked-in" way? At any rate we dont have to do anything special for a separate /etc dataset for jails. We could just forgo the /etc dataset on the host but i am glad that we can manage our bare metal customers using the same methods and tools. Handling this symlink hack is less differentiation than giving up separate /etc on the host i think. > > I'm interested in your cost/benefit points of having /etc a separate filesystem. > > On the face of it, I want to say that "/etc" is (or at least contains) the "core identity" of the machine (and to a lesser extent -- because this is BSD after-all -- /usr/local/etc). In my mind, /etc and /usr/local/etc *are* the machine (metaphorically speaking), so the merits of having it as a separate filesystem are weighed against your desired topology. i agree. myself i like having such a lightweight "identity" and keeping /, /usr and /usr/local (which are all just on sitting on / in my case) mounted read-only. the "prototype" for a host is handled by a completely different department than the people/customers who sysadmin their deployments and instances. Early in the building/installing, before any ports/packages, /usr/local/etc is made a symlink to /etc/local, so the symlink is in the readonly / and every time you write or cd to /usr/local/etc you end up in /etc/local. An /etc dataset ends up under a MB zfs compressed and /var on a fresh instance is basically also nothing. all-in-all a new jail costs you under a MB of zpool. we jail stop/start and zfs send/receive instances in a blink of an eye and its "almost" as good as having live migration. We could get the same storage efficiency by simply cloning /, and having no sub-datasets. some customers feel like they want to be able to write anywhere and we give them those options but then they are on their own and we don't manage the software updates for those guys and some like it that way. we then bill each for all the storage they reference because a year down the road they may be the only one still holding a reference to the outdated prototype they're on even though they overwrote every file twice with make world or freebsd-update. their memory usage is also way higher than most because when executables are launched on the jails with the read-only nullfs mounted /, those all access the same memory pages but zfs isnt smart enough yet to let the virtual memory system maintain those pointers through the indirection of zfs clones and snapshots. so zfs separate /etc and /var give us great storage efficiency while nullfs gives us great memory performance and efficiency. > > If you want to bunch of machines to look and/or act differently, then a shared /etc is precisely what you want. However, without allowing minor changes (ala ZFS clone/snapshot or by way of UnionFS), you'll quickly find that the only way to cope is with role-based scripting in /etc/rc.conf (it is after-all a shell script) or complicated abstraction layers (for example, using netgraph eiface devices with the jail-name inside them so that rc.conf have have jail-specific ifconfig_* lines). But I digress. > > I think the better solution to your loading of files "beneath" the eventual /etc filesystem is to throw away the ZFS snapshot/clone method and instead move to a UnionFS approach for /etc. > > If you use UnionFS for your /etc, then what you do is for each of the machines that you want *that* /etc to appear, you do something like: > > (as root) mount_unionfs -o below /etc /other/etc > > Now /other/etc (assuming it was empty before) looks exactly like /etc. In theory, i love the concept of unionfs and it gives far more flexibility than zfs especially if the two can be combined effectively. For us, its semantics were just never well established enough and there are too many corner cases and combinations of possibilities that, while exciting, were never conceived of and cant be nailed down in a simple VFS or POSIX filesystem mindset for obvious reasons. When i have the time to really dig in again i'd love to see where unionfs is at today and if i can be using it to do some very cool things again (but now with less headaches legwork and sleepless nights). For the reasons stated though, i have to admit i'm simply just _afraid_ of unionfs. Your suggestion is simple enough though, i'm sure i wouldnt need a month of research and testing. :) It's probably overkill for our needs in this case. > > Pros: With "rm -f <file>; rm -W <file>" (in /other/etc) you can reclaim a file from the underlying /etc. ZFS does not allow you to revert a single file (you can revert the entire volume or filesystem, but not a single file). I really liked the idea of removing whiteout and having a lower file appear but thats just me. :) You're right that ZFS doesn't let you do anything nearly as selective but it does allow you cherry pick files out of .zfs/snapshot. Like you said, that's not rolling a file back you're just copying an old version to a new version atop the top "layer". > > > > if anyone with more intimate knowledge of how and exactly when login.conf.db gets accessed has any thoughts... It could be a disaster for an admin to think their /etc is in a certain state and have that one file be out of sync. If better minds could chip in, I'm wondering if we're better off editing /sbin/init to run init_script _before_ loading the daemon class from login.conf.db (or explain why thats a bad idea) or if i should just add some sort of hook to run cap_mkdb right when needed using a DTrace script or auditd? > > That's an interesting aspect of the boot process I hadn't noticed before (having not used init_script before). I would think that this should be filed as a PR. Seems to me that the init_script should fire first -- but (and this is a guess) it may need to bootstrap the user that the init_script runs as (presumably needing to load the daemon class for said user). While there may be good reason, it certainly violates a principle (that one might be astonished to learn that init_script is not run in a fashion that only the dependencies thereof are required). > > I thought so too initially, init_script is documented as being for [init]ialization BEFORE /etc/rc itself. It's obviously run as root and early enough the machine ought to obey init_script as if it were commandments handed down by God. Why init needs to know anything about the daemon class beforehand is beyond me. Quite literally "beyond me". I don't have a strong enough opinion either way though to be filing a PR yet. I thought it's worth bringing up so brighter minds might take a look if they find it peculiar. I have it back-burnered on one of a full screen-border of post-it notes and i'll learn more about what's going on in /sbin/init soon if no one else steps forward. >> Does anyone think this issue is moot? (Can't we just document this particular specific "gotcha" instance? I don't think so, I abhor any "gotcha" that deviates from behavior people expect from "upstream" fbsd.) Does anyone agree it's important we come as close to perfect a solution as we can? > Thanks for bringing up the issue with init_script. We should look to fix it to make its use capable of handling the use-case you identified (using it to bootstrap a separate /etc). Good, see, this is why FreeBSD is awesome. People care about parameters and configurations and having a stable system even in the face of overwhelming combinatorics. Not to speak ill of Linux or sling mud with vague accusations and no specific instances (but i'm going to haha) but you have no idea how many times i've been using Linux in a project, usually to do something a little cutting edge or off-the-reservation, and i say "Hey i think i should be able to combine X with Y, can someone help me?" and all too often i get the attitude like "man, we're all doing Z now, havent ya heard? Z is here to end all our sorrows" and i'll be like "but Z doesn't do X+Y" and to that i'm shamed and ridiculed like "dude, if Z doesn't do everything you want and you don't worship Z with us, youre stupid" hahaha does anyone else feel similarly about any experience theyve had on the LKML? I can name almost 10 values for X, Y and flavor-of-the-week Z. > > >> Is a separate /etc even worth it to people? > Depends. Everybody? certainly not. Some? Sure. See above example-cases. > > >> Should i scrap that feature because of this issue? > It sounds like you contorted yourself working around a deficiency in it (a POLA violation in that it has unforeseen dependencies). At the very least, I would think that init could have a fall-back if the file can't be loaded. > > Are you putting anything beside the default daemon-class definition in your login.conf "beneath" your true /etc? Init does have a compiled in default class == the initial system default "default" class. login.conf remains the source of truth on the true "upper" /etc but things read login.conf.db to get their answers. At the very outset of a system build, i move the plain old default login.conf.db to /boot/etc and it contains all the classes. 99.9% of our users keep the default login.conf and maybe actually 100% are using it just that way on any given day. I'm just that anal-retentive that I think if i ignore this someone will suffer for their astonishment (or unknowing lack thereof) when their db ends up out of sync because they didnt know we introduced another event where cap_mkdb should get run (post rollback/restore of /etc). I would simply run cap_mkdb every time we mount /etc but i don't think thats good enough because i dont know when and what else accesses it, I'm assuming more than just /sbin/init at boot, right? Am I overthinking this because nothing else reads login.conf.db ever? /usr/bin/login accesses it every user login, no? Do i misunderstand totally? > > >> I think we can tighten this up so theres no twisted ankles and no one falling in this rare case but certainly potential manhole. (the manhole i'm talking about is login.conf and login.conf.db being out of sync because the later is a symlink to /boot/etc and someone might rollback to a more restrictive login.conf and think they're covered without running cap_mkdb again but their login.conf.db is actually out of sync and less restrictive in a way that burns them) >> > Sorry you had to work around that -- you should have filed a PR. > I will file a PR if i look at the problem more in depth if someone doesn't chime in and save me with already-expert knowledge that i don't have to dig for. (one can hope, right?) > >> Devin, thank you IMMENSELY for bsdinstall and especially bsdconfig. I use them both at work and they make life so much better. And thank you for the simplification using kenv. I was unaware of it > On a side-note, I didn't write bsdinstall -- I'm going to maintain it, but I wrote bsdconfig ^_^ (smiles) > > Thank you very much for your appreciation. Certainly a labor of love and I'm happy that others have kicked the wheels at least. Yeah i've more than kicked the tires. It's excellent work keep it up.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51DBBDCC.7060800>