Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Feb 2012 09:03:16 -0800 (PST)
From:      john fleming <jflemingeds@yahoo.com>
To:        "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
Subject:   Re: 6.2-Release ..ish.. CF + ata == freeze?
Message-ID:  <1329411796.23457.YahooMailNeo@web111719.mail.gq1.yahoo.com>
In-Reply-To: <1056033736-1329271037-cardhu_decombobulator_blackberry.rim.net-225645010-@b15.c31.bise6.blackberry>
References:  <1329194588.14324.YahooMailNeo@web111720.mail.gq1.yahoo.com> <20120214051828.GA89777@icarus.home.lan> <1056033736-1329271037-cardhu_decombobulator_blackberry.rim.net-225645010-@b15.c31.bise6.blackberry>

next in thread | previous in thread | raw e-mail | index | archive | help
The plot is starting to thicken. I've noticed all the systems that have don=
e this (so far) have this flash card on them.=0A=0ASTEC M2+ CF 9.0.2 K1186-=
2=0A=0A=0AFrom talking to checkpoint this is a newer flash they have starte=
d using. I just had a 4th machine do the same thing yesterday. Basic instal=
l, about %70 disk space free, very new install, like 1-2 month and the up t=
ime on the machine in question was only 16 days. After rebooting i did a fe=
w dd if=3D/dev/zero of=3D~/file bs=3D1m count=3D350 and didn't get any erro=
rs.=0A=0AThe latest machine is a 1 gig version of the flash listed above, s=
o this ate almost all the free disk space. Checkpoint is asking that we RAM=
 one of the flash cards so they can play with it.=0A=0A=0A_________________=
_______________=0A From: "jflemingeds@yahoo.com" <jflemingeds@yahoo.com>=0A=
To: Jeremy Chadwick <freebsd@jdc.parodius.com> =0ACc: "freebsd-stable@freeb=
sd.org" <freebsd-stable@freebsd.org> =0ASent: Tuesday, February 14, 2012 7:=
57 PM=0ASubject: Re: 6.2-Release ..ish.. CF + ata =3D=3D freeze?=0A =0A2 of=
 the 3 cf cards are very new, like less then 6 months old. =0A=0AI think ar=
ound 65-70 percent is in use. This number doesn't change unless the user du=
mps data in a home dir, which isn't the case so far. =0A=0AYou are correct =
that only writes are failing. Msgbuf has more then what I pasted but I'm pr=
etty sure its just more of the same errors. Ill redouble my check. =0A=0ATh=
e other slices are very small. One is 35 meg the other is 100 some odd meg.=
 H is 1.2 gig.=A0 =0A=0AI don't know if ill be able to try the dd test for =
a few reasons but ill check it out. Let me ask you this. Say zeroing out th=
e drive works without error. Does that tell me anything?=A0 =0A=0AI also do=
n't have access to smart tools as this is basically a closed system and the=
 vendor would never give us access to a complier. Granted I haven't tried j=
ust throwing on gcc from 6.2. I could play with that or maybe since said ve=
ndor's dev team is keeping track of this thread they could provide said bin=
ary :). =0A=0AI really don't like the idea of replacing hardware as I'm loo=
king at around 200 boxes. I really hope it doesn't come to that. =0A=0AThan=
ks for the reply!=0A=0ASent via BlackBerry from T-Mobile=0A=0A-----Original=
 Message-----=0AFrom: Jeremy Chadwick <freebsd@jdc.parodius.com>=0ADate: Mo=
n, 13 Feb 2012 21:18:28 =0ATo: john fleming<jflemingeds@yahoo.com>=0ACc: fr=
eebsd-stable@freebsd.org<freebsd-stable@freebsd.org>=0ASubject: Re: 6.2-Rel=
ease ..ish.. CF + ata =3D=3D freeze?=0A=0AOn Mon, Feb 13, 2012 at 08:43:08P=
M -0800, john fleming wrote:=0A> Just thought i would post over here as i'm=
 not getting a warm fuzzy from checkpoint about being able to find the root=
 cause of an issue. I have a large install base of IPSO checkpoint firewall=
s, which are based on FreeBSD 6.2. I've had 3 firewalls hang basically the =
same way, with something that looks like a filesystem issue or an?issue wit=
h a CF card. =0A=0AFreeBSD 6.2 was EOL'd in early-to-mid-2008.=A0 The ATA d=
river has changed=0Asignificantly since then (present-day uses CAM).=0A=0A>=
 Does anyone happen to know of any bugs (i've been looking around) that cou=
ld cause something like that? Granted, it could be a batch of bad CF cards,=
 but its odd that i'm seeing the same thing on 3 different boxes and once r=
ebooted they seem ok.=0A> ?=0A> Also is it possible to get useful info form=
 the atacontroller when things go south like this from the ddb prompt?=0A=
=0ANot particularly.=A0 What's shown below indicates that the driver had=0A=
issued some form of ATA write command (there are multiple kinds per ATA=0As=
pecification), and either the underlying media (CF/disk) or controller=0Ast=
alled/locked up/took too long.=A0 I forget what the timeout value is in=0A6=
.2; I can't be bothered to remember such from 6 years ago.=A0 :-)=0A=0A> Th=
is is what shows in show msgbuf=0A> ad0: timeout waiting to issue command=
=0A> ad0: error issuing WRITE command=0A> ad0: timeout waiting to issue com=
mand=0A> ad0: error issuing WRITE command=0A> ad0: timeout waiting to issue=
 command=0A> ad0: error issuing WRITE command=0A> ad0: timeout waiting to i=
ssue command=0A> ad0: error issuing WRITE command=0A> g_vfs_done():ad0s4h[W=
RITE(offset=3D33849344, length=3D131072)]error =3D 5 =0A> g_vfs_done():ad0s=
4h[WRITE(offset=3D33980416, length=3D131072)]error =3D 5 =0A> g_vfs_done():=
ad0s4h[WRITE(offset=3D34111488, length=3D131072)]error =3D 5=0A> ?g_vfs_don=
e():ad0s4h[WRITE(offset=3D34242560, length=3D131072)]error =3D 5 =0A> g_vfs=
_done():ad0s4h[WRITE(offset=3D34373632, length=3D131072)]error =3D 5 =0A=0A=
error 5 =3D EIO =3D Input/output error.=A0 But this isn't too big of a=0Asu=
rprise given the timeouts you see prior.=0A=0AAre these CF cards brand new =
-- meaning, are they completely unused=0A(having never had any writes done =
to them), or have they been in use a=0Awhile?=A0 I'm betting they've been i=
n use a while, and have probably been=0Adoing many writes over the years.=
=0A=0ATwo things to note here:=0A=0A1) The errors you've shown are only hap=
pening on writes, not reads.=A0 Of=0Acourse if you omitted information then=
 this isn't an accurate statement.=0A2) Timeouts are seen when issuing writ=
es to some LBA regions.=0A=0AHow full is the CF card, disk-space-wise?=A0 N=
ot just ad0s4h, I'm talking=0Aabout the entire card.=A0 How much space is r=
oughly available?=A0 They're=0Avery small CF cards (1.8GByte roughly), and =
the less space available,=0Athe less effectiveness of wear levelling (and i=
n some cases the slower=0Athe writes are).=0A=0AReason I ask: given that th=
ese are CF cards, this smells of cards which=0Aare simply "worn down".=A0 C=
F cards have limited numbers of writes, and=0Athe card may be "freaking out=
" internally when attempting to write to=0Asome LBAs which map to CF sector=
s that are, in effect, "bad".=A0 The CF=0Acards' ECC implementation may be =
buggy, or may simply be "spinning hard"=0Afor too long.=A0 You can read abo=
ut this sort of behaviour on Wikipedia's=0ACompactFlash article.=0A=0AYou w=
ouldn't be able to verify this with dd if=3D/dev/ad0, because those=0Aare r=
ead operations.=A0 You could zero the media (dd if=3D/dev/zero=0Aof=3D/dev/=
ad0) as a form of verification if you wanted.=0A=0ADo you happen to know if=
 these CF cards support SMART?=A0 If so,=0Ainstalling smartmontools (versio=
n 5.42 or newer please) and providing=0Aoutput from "smartctl -a /dev/ad0" =
may be helpful to me, but I make no=0Aguarantees anything of use will be sh=
own there.=0A=0AOverall my advice would be to replace the CF cards, especia=
lly if they=0Ahave been in use for a long while.=A0 It really doesn't matte=
r to me that=0Ait's happening on 3 machines (honest), especially if these a=
re 6.2=0Amachines with CF cards that have been in use for years.=A0 We're l=
ucky to=0Aget 2 years out of our CF cards on our Juniper M120/320s before t=
hey=0Astart spitting I/O errors.=A0 Pick larger CF cards as well; more spac=
e =3D=0Amore room for effective wear levelling.=0A=0A> ?=0A> ad0: 1882MB <S=
TEC M2+ CF 9.0.2 K1186-2> at ata0-master PIO4=0A> atapci0: <Intel 6300ESB U=
DMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x5070-0x507f m=
em 0x80301000-0x803013ff at device 31.1 on pci0=0A> ata0: <ATA channel 0> o=
n atapci0=0A> ata1: <ATA channel 1> on atapci0=0A> atapci1: <Intel 6300ESB =
SATA150 controller> port 0x5088-0x508f,0x50a4-0x50a7,0x5080-0x5087,0x50a0-0=
x50a3,0x5060-0x506f irq 15 at device 31.2 on pci0=0A> ata2: <ATA channel 0>=
 on atapci1=0A> ata3: <ATA channel 1> on atapci1ad0s4h is basically a r/w u=
fs partition on the box where almost anything that needs to be written goes=
.=0A> trace=0A> Tracing pid 1101 tid 100043 td 0x656d8460=0A> kdb_enter(608=
cc388,6246,656d8460,64ba1400,6095d580,...) at kdb_enter+0x2b=0A> siointr1(6=
4ba1400) at siointr1+0xf0=0A> siointr(64ba1400) at siointr+0x38=0A> intr_ex=
ecute_handler(6095d580,f0a4ab04,6,6095d580,f0a4aafc,...) at intr_execute_ha=
ndler+0x61=0A> intr_execute_handlers(6095d580,f0a4ab04,6,0,656d8460,...) at=
 intr_execute_handlers+0x40=0A> atpic_handle_intr(4) at atpic_handle_intr+0=
x96=0A> Xatpic_intr4() at Xatpic_intr4+0x20=0A> --- interrupt, eip =3D 0x60=
6044af, esp =3D 0xf0a4ab48, ebp =3D 0xf0a4ab5c ---=0A> lockmgr(e1456a04,6,0=
,656d8460) at lockmgr+0x58f=0A> getdirtybuf(e14569a4,60a405e4,1) at getdirt=
ybuf+0x2e2=0A> flush_deplist(68b30850,1,f0a4abb8) at flush_deplist+0x30=0A>=
 flush_inodedep_deps(656fa28c,1f235) at flush_inodedep_deps+0xcf=0A> softde=
p_sync_metadata(65964618) at softdep_sync_metadata+0x61=0A> ffs_syncvnode(6=
5964618,1) at ffs_syncvnode+0x3a2=0A> ffs_fsync(f0a4ac74) at ffs_fsync+0x12=
=0A> VOP_FSYNC_APV(60949260,f0a4ac74) at VOP_FSYNC_APV+0x38=0A> fsync(656d8=
460,f0a4acb4) at fsync+0x170=0A> syscall(805003b,806003b,5fbf003b,8050000,2=
88be450,...) at syscall+0x2ee=0A> Xint0x80_syscall() at Xint0x80_syscall+0x=
1f=0A=0A-- =0A| Jeremy Chadwick=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 jdc@parodius.com |=0A| Parodius Networking=A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 http://www.parodius.com/ |=0A| UNIX Systems Adm=
inistrator=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0  Mountain View, CA, US |=0A| Maki=
ng life hard for others since 1977.=A0 =A0 =A0 =A0 =A0 =A0  PGP 4BD6C0CB |



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1329411796.23457.YahooMailNeo>