Date: Wed, 17 Apr 2013 17:35:44 +0200 From: Polytropon <freebsd@edvax.de> To: nightrecon@hotmail.com Cc: freebsd-questions@freebsd.org Subject: Re: pwd.db/spwd.db file corupption when having unsafe system poweroff Message-ID: <20130417173544.25266cd6.freebsd@edvax.de> In-Reply-To: <kkkda5$vm9$1@ger.gmane.org> References: <CAHHq%2BVwcazbVXDDsZqH1AXxVOu0mfGjT_5Tcj3OoHJroe8Kgdg@mail.gmail.com> <kkkda5$vm9$1@ger.gmane.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Allow me a few additions: On Tue, 16 Apr 2013 16:45:59 -0400, Michael Powell wrote: > Pressing the power button for 4 seconds as described is invoking the ACPI > layer to stimulate call(s) down to the system BIOS. No. In most (but of course not all) default settings the "long press" will forcedly (and with _no_ message to the OS) turn off the system's power. The "short press" will emit the ACPI signal to the OS to deal with the power-off sequence itself. Still it's possible to have a different programming for the button. For example, it seems to be common to have this button perform a "ACPI sleep", "ACPI hibernate" or "ACPI powersafe" mode on "short press", and (as you mentioned) the "ACPI power down" on long press. But as I said: _What_ the button actually does is defined in the CMOS setup. http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface#Power_states have a look at this page to find out more about the various possible signals (power states). > Whatever is set in the > BIOS wrt to power control and various power-savings modes are passed through > the ACPI layer. The problem with this is the acpi module in FreeBSD may, or > may not, be a perfect implementation for every possible piece of hardware in > existance. This statement especially applies in regards to laptops, where closing the lid can also trigger a specific signal, and opening the device again sends another signal. Vendors don't agree on how to "properly" do this, so there are many different ACPI implementations. % ls /boot/kernel/acpi* /boot/kernel/acpi.ko* /boot/kernel/acpi_ibm.ko* /boot/kernel/acpi_aiboost.ko* /boot/kernel/acpi_panasonic.ko* /boot/kernel/acpi_asus.ko* /boot/kernel/acpi_sony.ko* /boot/kernel/acpi_dock.ko* /boot/kernel/acpi_toshiba.ko* /boot/kernel/acpi_fujitsu.ko* /boot/kernel/acpi_video.ko* /boot/kernel/acpi_hp.ko* /boot/kernel/acpi_wmi.ko* You can see from this example that FreeBSD only supports a subset of what can be considered possible. Of course there are many "fields of compatibility", but it may still result in specific hardware not working properly -- mostly in the area of laptops and their accessories (like docking stations). > The piece of that which really concerns me are individual > manufactuer BIOS quirks can be just enough 'off' so as to misbehave even when > the FreeBSD acpi implentation is basically sound. Even though I did not experience that myself, it can be considered possible. A sloppy ACPI implementation can be the source of many kinds of trouble, even involving such "simple" devices like a power button. > The jist of this is (IMHO > here - YMMV) is I consider it a bad procedure to turn off a server as you've > described. Definitely. :-) > Use the shutdown command properly instead. I would never do what > your coworker did to any of my servers. A mechanicl protection could prevent that. > Caveat being sometimes you have no > other choice but to do a hard power-down. A hard power-down is done by using > the switch on the power supply, and not using the ACPI/BIOS from pressing > the power switch on the front. This is also possible. Both this _and_ the default "forced power off" (the "long press" in many defaults) equal the action of pulling the power cord. > When you do have an 'uh-oh' like this, FreeBSD normally boots back into an > unclean file system with corresponding whinings and complaints about how the > file system(s) were not properly dismounted. This is an intended behaviour. TO prevent further damage and to make data recovery possible (worst case), the system does not try to "boot by all means", just to make the (clueless) user happy. :-) > Normally a background fsck > ensues after 60 seconds of idle. This _can_ be dangerous, because at this time, the system has already been booted into a "somehow working" state. You should ask yourself the question: Can I invest the time to have _no_ background fsck (i. e., a foreground fsck which maybe will ask prior to doing anything "heavy") to make sure my data is consistent, because it is important data which _needs_ to be okay? In this case, put background_fsck="NO" in /etc/rc.conf -- and wait. When using UFS, there _may_ be file system damages so severe that fsck will _not_ correct them manually (which often leads to data loss of important data that could have been saved if the proper _user decision_ would have been taken place). This will only happen in the "interactive mode" at system startup. > In your case whatever files were left open > and not properly closed this background fsck, had it been allowed to run and > complete, would have cleaned this up. Maybe, maybe not. It highly depends on what actually happened, and it's nearly impossible to find that out, especially when there is no control about what the background fsck does (while the system is already happily running and humming). > The problem starts when someone > presses the power off button again, and again, before this process completes. > Using the power button ACPI/BIOS only compounds this situation. Correct. That's why the time to have fsck perform its task in the foreground should be invested, at least after such an abrupt action. > I would recommend you do NOT use the power button as you described above. > Period. In case of _servers_, this button is commonly considered an "emergency button" anyway, and therefor hardly used. :-) > In any event pay particular attention to that very first boot after > an 'uh-oh' power off event. Look at top and watch for the background fsck to > kick off and complete, returning the machine to quiescent state BEFORE you do > ANYTHING else to it. This includes pressing the button on the front. The "doing anything else" can be the problem with a background fsck. Let's say the server starts its services which start accessing the partitions currently checked by fsck. Yes, I know, snapshots and all this stuff. Sometimes it works. Sometimes it doesn't. My additional advice would be: Do not use a background fsck. If you had a power failure (for whatever reason), take the time to make sure your system boots into a verified state (NOT: boots into a questionable state, tries to verify it during normal operations, and pretends "everything is fine"). -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130417173544.25266cd6.freebsd>