Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Jun 2006 08:50:58 +0200
From:      "Frank B. Scholl" <frank.b.scholl@web.de>
To:        freebsd-geom@freebsd.org
Subject:   GEOM_MIRROR after crash not identical
Message-ID:  <200606210850.58525.frank.b.scholl@web.de>

next in thread | raw e-mail | index | archive | help
hello list,

i just wanted to describe what happened to me last night.

i have an ultra 10 running with a highpoint ide controller, on each channel 
there are udma100 drives. booting is done via compact flash with the onboard 
controller. the two udma100 drives form a mirror, which is encrypted with 
geli. after creating the device with gmirror and inserting the other disk, 
everything ran fine.

then i needed to power down the machine to add more drives. after the machine 
came up, the mirror was degraded. it always failed to insert the second disk 
as a valid provider.

wouldnt be that bad, i thought, just remount the degraded mirror readonly and 
backup data, after that lets see what can be done. well, after mounting i saw 
a lot of data missing, to be exact 180gb from 300gb total. so i thought it 
might be a problem of the filesystem and tried to fsck it. didnt work either, 
there are a lot of unreadable blocks on the device, geom_geli through 
geom_mirror didnt stop flooding my logs. problem: dma timeouts and interrupt 
storms en masse - with a controller that beforehand worked flawlessly over 
_weeks_ without any reboot. so i forced hw.ata.ata_dma and hw.ata.atapi_dma 
to zero and circumvented the timeout problem. changing cables didnt work, 
either, btw, what seems to be common practice in such cases.

so far, i ve only worked on the first disk and decided to go to sleep after 
having lost 180gb of data to a mirror device. next morning, i woke up and had 
quite a good idea: lets try the same thing - mounting the degraded mirror - 
again, but with the second provider only. so i unplugged the first disk, 
booted, and see, it worked.

so now.. how can that be that after a single reboot the two providers are not 
exactly the same? the thing is.. last data i have on the first provider is 
from 14 june, on the second provider i have everything until 20 june. the 
machine was powered down yesterday and was running at least since this month, 
if not longer. i went through the logfiles and there was not a single hint, 
where geom_mirror claimed about inconsistency. 

any ideas? my data is back, so i dont cry anymore. is this a problem due to my 
platform choice? namely sparc64?

thanks for an answer, cheers,

frank scholl



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200606210850.58525.frank.b.scholl>