Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Feb 2006 09:03:50 +0100
From:      Wilko Bulte <wb@freebie.xs4all.nl>
To:        =?iso-8859-1?Q?S=F8ren?= Schmidt <sos@deepcore.dk>
Cc:        stable@freebsd.org
Subject:   Re: Showstopper ATA bug in 6.1-PRE?
Message-ID:  <20060210080350.GA5978@freebie.xs4all.nl>
In-Reply-To: <20060209220824.GA1499@freebie.xs4all.nl>
References:  <43EA5C50.5020804@deepcore.dk> <20060208213704.GA703@freebie.xs4all.nl> <43EA6625.2070106@deepcore.dk> <20060208221056.GA1299@freebie.xs4all.nl> <43EB5393.5090502@deepcore.dk> <20060209144250.GB4874@freebie.xs4all.nl> <43EB55A1.9040405@deepcore.dk> <20060209201912.GA680@freebie.xs4all.nl> <43EBA4F7.7040407@deepcore.dk> <20060209220824.GA1499@freebie.xs4all.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Feb 09, 2006 at 11:08:24PM +0100, Wilko Bulte wrote..
> On Thu, Feb 09, 2006 at 09:24:23PM +0100, Sren Schmidt wrote..
> > Wilko Bulte wrote:
> > >On Thu, Feb 09, 2006 at 03:45:53PM +0100, Sren Schmidt wrote..
> > >>Wilko Bulte wrote:
> > >>>On Thu, Feb 09, 2006 at 03:37:07PM +0100, Sren Schmidt wrote..
> > >>>>Wilko Bulte wrote:
> > >>>>>On Wed, Feb 08, 2006 at 10:44:05PM +0100, Sren Schmidt wrote..
> > >>>>>>Wilko Bulte wrote:
> > >>>>>>>On Wed, Feb 08, 2006 at 10:02:08PM +0100, Sren Schmidt wrote..
> > >>>>>>>>Wilko Bulte wrote:
> > >>>>>>>>>Hi Soren,
> > >>>>>>>>>
> > >>>>>>>>>I just went to 6.1-PRE on my main machine, coming from 6.0-STABLE
> > >>>>>>>>>of roughly end of december.
> > >>>>>>>>>
> > >>>>>>>>>And I hit some stuff that really worries me:
> > >>>>>>>>>
> > >>>>>>>>>- the freshly built kernel keels over with (hand transcribed):
> > >>>>>>>>>
> > >>>>>>>>>ata3: reiniting channel SATA connect ... 
> > >>>>>>>>>SATA connected
> > >>>>>>>>>sata_connect_devices 0x1 <ATA_MASTER>
> > >>>>>>>>>
> > >>>>>>>>>ad6: req=0xC35ba0c8 SETFEATURES SETTRANSFERMODE semaphore timeout 
> > >>>>>>>>>!! DANGER Will RObinson !!
> > >>>>>>>>>
> > >>>>>>>>>(... is where I cannot read my own handwriting, it scrolled quite 
> > >>>>>>>>>fast on
> > >>>>>>>>>the screen..)
> > >>>>>>>>>
> > >>>>>>>>>Boot device is a SATA RAID1 on a Promise 2300.
> > >>>>>>>>Hmm, that should not happen. Could you try to backstep just ATA to 
> > >>>>>>>>before the MFC, that is 24/1/06 and let me know if that helps 
> > >>>>>>>>please ?
> > >>>>>>>First impression is that the problem is gone.  None of the 
> > >>>>>>>previously reported errors are seen.  I am running a level 0 dump 
> > >>>>>>>from disk to disk
> > >>>>>>>to see if the box remains stable.  Given that this is my primary 
> > >>>>>>>machine
> > >>>>>>>I sure hope it will be :-)
> > >>>>>>>
> > >>>>>>>>>Another snag is that my ad10 disk on 6.0-STABLE suddenly became 
> > >>>>>>>>>ad12 on
> > >>>>>>>>>6.1-PRE
> > >>>>>>>>Hmm that is because there is only 2 ports on your promise which is 
> > >>>>>>>>now correctly identified, before it was errounsly found as 3 ports.
> > >>>>>>>Ah, OK.  I would suggest a note to the Release Note writers would be 
> > >>>>>>>a good
> > >>>>>>>thing, devices changing location after an upgrade in the -stable 
> > >>>>>>>branch
> > >>>>>>>is unnerving ;-)
> > >>>>>>Well, the good thing is that I can reproduce the error here, the bad 
> > >>>>>>thing is that it slipped through testing on -current...
> > >>>>>>Oh, well, I'll look into it ASAP...
> > >>>>>Thank you Soren!
> > >>>>OK, had a few this afternoon, could you try this patch and let me know 
> > >>>>if it helps, at least it makes the problem go away on my testbed..
> > >>>Is this relative to HEAD or RELENG_6?  I cannot / will not go to HEAD
> > >>>with this machine (my main production box.. :-)
> > >>Doesn't matter, ATA is the same on both...
> > >
> > >OK, I was not sure if they were 100% identical.
> > >
> > >The patch at first impression seems to have eliminated the problem.
> > 
> > Good seems I'm on the right track at least.
> > 
> > >Interestingly enough ad10 remained ad10 with the patch applied?
> > 
> > Yeah, thats intentional, I though we better not break POLA here..
> 
> I agree :-)
> 
> > >I'll put some load on to see what happens.
> > 
> > Let me know how that turns out, I'll clean things up a bit and get it 
> > committed to -current, then get permission to MFC when we are sure it 
> > fixes the problem...
> 
> I ran a 44GB disk-to-disk dump without incidents (source on the RAID1,
> target on the JBOD).  No problems whatsoever.
> 
> Looks like things behave much better now.  Tonight the machine will
> run a daily full dump to DLT tape, I'll know how that turns out tomorrow.

Backup ran without problems.

-- 
Wilko Bulte				wilko@FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060210080350.GA5978>