Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Dec 2006 21:55:57 -0600
From:      "Rick C. Petty" <rick-freebsd@kiwi-computer.com>
To:        "R. B. Riddick" <arne_woerner@yahoo.com>
Cc:        freebsd-geom@freebsd.org
Subject:   Re: gmirror issues (fdisk?, disklabel?, newfs?)
Message-ID:  <20061228035557.GA97647@keira.kiwi-computer.com>
In-Reply-To: <631303.64130.qm@web30304.mail.mud.yahoo.com>
References:  <20061227232712.GA90336@keira.kiwi-computer.com> <631303.64130.qm@web30304.mail.mud.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Dec 27, 2006 at 03:43:35PM -0800, R. B. Riddick wrote:
> 
> Since I do not want to use my DSL flat-rate too much, I did not test 6.2-RC1...
> But I will try R6.2, as soon as possible...

Understandable.  My testing has been with using csup/cvsup and daily
buildworlds / installworlds.  Much less bandwidth than ISOs  =)

> > # boot0cfg -B ad1
> > boot0cfg: /dev/ad1: Geom not found
> > boot0cfg: write_mbr: /dev/ad1: No such file or directory
> >
> Hmm... The error message is misleading...

Very.

> But it is OK, that /dev/ad1 cannot be opened for writing as soon as gmirror
> uses it as a consumer, because: gmirror has no chance to notice changes that go
> directly to /dev/ad1, so that /dev/ad0 would stay unchanged (and so possibly
> the mirror is not sync'ed and gmirror does not mention that immediately).

Exactly, and I agree with this behavior.

> > # boot0cfg -B /dev/mirror/gm0
> > boot0cfg: /dev/mirror/gm0: Geom not found
> > boot0cfg: /dev/mirror/gm0: ioctl DIOCSMBR: Operation not permitted
> >
> This is strange, because: gmirror should certainly allow write access to its
> devices (providers)... Sounds really strange...

It is, and it's been a bug since 5.0.

> Luckily someone mentioned that before R6.2... :-)

It has been discussed/ignored for years.  Search the lists for my name as
well as others who've noticed this behavior.  It's not a matter of release
dates or deadlines but a matter of willing manpower.  A good argument-- yet
it seems manpower is quite willing to nerf other perfectly good software
(*cough* vinum *cough*).  In this case, I think it's more a matter of
developers not knowing the depths of GEOM well..  it certainly was a
barrier for me, or I would have submitted patches myself.  I spent enough
of my time trying to patch arla (AFS) to work with the numerous (perhaps
unnecessary??) VFS changes, but those APIs keep shifting and I've given up
until the APIs settle, which is never.  Hopefully other people have more
motivation/time than I.

> Hmm... But why does it work here on my box with R6.1?

No idea.  (Bad) Luck?

> I just used boot0cfg on my /dev/ad0, which has geom_bsd (/dev/ad0s1) and

You shouldn't be able to "fdisk -B" or "boot0cfg -B" directly onto ad0 !!!
Not if a GEOM provider has ad0 as a consumer!  Why is the device entry even
visible?  (Granted, I'd rather no device entries were "hidden".)

> gmirror (e.g.: /dev/ad0s1a and /dev/ad1s1a build one gmirror) on top (I could
> successfully change the ticks and the default choice)...

Ah, you've introduced an extra layer between the device and the mirror.
You're not using gmirror on the whole device, as I am.  Probably bsdlabel
isn't GEOM-ready and thus it allows you to modify ad0 directly.  Yet
another bug.

> Yup - some special cases r not handled as gracefully as they could be
> handled...

One of the understatements of the century.

> I had another one:
> gmirror out of ad1s1a and ad0s1a, where ad0s1a was rebuilding... in order to
> stop rebuild I decided to remove ad1s1a which left a unusable gmirror device
> and a panic (certainly after reboot, too). Unfortunately I do not remember the
> kind of panic... If it was a bad memory access or just an ASSERT... But it
> crashed repeatedly until I disconnected the disk that had that bad
> gmirror-meta-info... :-)

I certainly wouldn't expect any good behavior from doing that, but the fbsd
kernel sure likes to panic a lot when there's another perfectly good option.
IMO, panics should only happen whenever there's no possible way the system
will ever become usable again until a reboot.  For example, if you discover
an ECC memory error in kernel space, or the device housing your root
partition disappears (as in your example).  But certainly not if your swap
partition fails on pagein or trying to reload an already-loaded device.

In my case, the system was working perfectly well and all of the sudden
gmirror decided ad0 had the wrong quantum phase to be operating in December
under a new moon, so it clobbered the metadata and immediately panicked.
Successive reboots left many weird messages on the screen.  I've never seen
so many sequential "c"s on the screen at a time.

> > It would be nice if the tools had a "load but don't taste" command, an
> > "untaste" command, and a "taste" command.  Until then, it all just feels
> > so incomplete, like it was hacked together.
> > 
> *sob* :-)
> I would say this "untaste" command wouldnt be necessary, if the geom classes
> would handle every single special case as gracefully as possible...

Unfortunately with shifting APIs and error-prone programmers writing device
drivers, I think all three of these commands are an absolute necessity.
Otherwise, why would we need kernel debuggers at all?

-- Rick C. Petty



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061228035557.GA97647>