From owner-freebsd-stable@FreeBSD.ORG Thu Feb 16 17:03:18 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 15309106564A for ; Thu, 16 Feb 2012 17:03:18 +0000 (UTC) (envelope-from jflemingeds@yahoo.com) Received: from nm5.bullet.mail.sp2.yahoo.com (nm5.bullet.mail.sp2.yahoo.com [98.139.91.75]) by mx1.freebsd.org (Postfix) with SMTP id D08D88FC13 for ; Thu, 16 Feb 2012 17:03:17 +0000 (UTC) Received: from [98.139.91.64] by nm5.bullet.mail.sp2.yahoo.com with NNFMP; 16 Feb 2012 17:03:17 -0000 Received: from [98.139.91.9] by tm4.bullet.mail.sp2.yahoo.com with NNFMP; 16 Feb 2012 17:03:17 -0000 Received: from [127.0.0.1] by omp1009.mail.sp2.yahoo.com with NNFMP; 16 Feb 2012 17:03:17 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 500172.62074.bm@omp1009.mail.sp2.yahoo.com Received: (qmail 24265 invoked by uid 60001); 16 Feb 2012 17:03:16 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1329411796; bh=P/ZE41+rHhfBoVeEdHMwL7DB3x1IbkmRMpxbFRpLHEA=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=epKquST2JUaKB0bPcujjVpjy41CeA/fSJGKqO3IESI+FYLLW/RMBvE8dSQgV7oWzI3YhpkFStduScdlfN1kCPnLri+9GNAF22c65ynUmAku82tRmgZ8PrbgmCw9Sny3OoA8HPDU7mrp96Ay+qqP5OZcUoK9dpxGflXxpMk61kp4= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=dVupSw9dsAFmKE1ScIewMK2NwbmgiF2bVXUWyCAAobZIF5RK4MssOOSyCzaC4qI2cp2J8n5uziBtvLrxlvIHF3ziwIdU+tADWkzrawkteEeWBvljLRBtk2JCN2jE6x+MC7sNiicjcEjJB+4A7In5QouEA4RxV7T4FFHiH406euQ=; X-YMail-OSG: 15zOE8YVM1lfKEmMrGVOlq3CfIDy7aIcw3jX2UzbLQW85x6 vJb68mzGpvhIROxxqyiOly4BBED.Q9SxRJteK08uT01sjaN9zKjcOqmj03SD USgiLSwjFQdIdrAg33Iwc9FFYannbFNWtlyxEJwv5tdKhRUmbPJfYr.XeqzV 9.bbqGNp78RWbbEhiMuCkWRD4AsD2twDNqSJ.Vh8xfc_B4qjVHskBHuavAnk bPnGLzy0xZQys6h9r5VBY0of1NrEOAi9fxcVAZfjQZy2KJ63vRxmb7DLWj8X 0x4jqq9DAj0rSxPlVO8orP3YwA_vsDoHytYdDu_IzwXZF_nK0WQVT.F.S6kR XgkV8_MHM3GxmuDRWLsUmLwJozZofsUOJxma12aj0KjYI10QdLhCQgBYCOQs CsvZV9sNM61GJii5L1rpeM6QPoSxZftC7aqKvlBk4Dfx6YA1FYdJjqDt4SzR e0lDe Received: from [99.8.58.116] by web111719.mail.gq1.yahoo.com via HTTP; Thu, 16 Feb 2012 09:03:16 PST X-Mailer: YahooMailWebService/0.8.116.338427 References: <1329194588.14324.YahooMailNeo@web111720.mail.gq1.yahoo.com> <20120214051828.GA89777@icarus.home.lan> <1056033736-1329271037-cardhu_decombobulator_blackberry.rim.net-225645010-@b15.c31.bise6.blackberry> Message-ID: <1329411796.23457.YahooMailNeo@web111719.mail.gq1.yahoo.com> Date: Thu, 16 Feb 2012 09:03:16 -0800 (PST) From: john fleming To: "freebsd-stable@freebsd.org" In-Reply-To: <1056033736-1329271037-cardhu_decombobulator_blackberry.rim.net-225645010-@b15.c31.bise6.blackberry> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: 6.2-Release ..ish.. CF + ata == freeze? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: john fleming List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Feb 2012 17:03:18 -0000 The plot is starting to thicken. I've noticed all the systems that have don= e this (so far) have this flash card on them.=0A=0ASTEC M2+ CF 9.0.2 K1186-= 2=0A=0A=0AFrom talking to checkpoint this is a newer flash they have starte= d using. I just had a 4th machine do the same thing yesterday. Basic instal= l, about %70 disk space free, very new install, like 1-2 month and the up t= ime on the machine in question was only 16 days. After rebooting i did a fe= w dd if=3D/dev/zero of=3D~/file bs=3D1m count=3D350 and didn't get any erro= rs.=0A=0AThe latest machine is a 1 gig version of the flash listed above, s= o this ate almost all the free disk space. Checkpoint is asking that we RAM= one of the flash cards so they can play with it.=0A=0A=0A_________________= _______________=0A From: "jflemingeds@yahoo.com" =0A= To: Jeremy Chadwick =0ACc: "freebsd-stable@freeb= sd.org" =0ASent: Tuesday, February 14, 2012 7:= 57 PM=0ASubject: Re: 6.2-Release ..ish.. CF + ata =3D=3D freeze?=0A =0A2 of= the 3 cf cards are very new, like less then 6 months old. =0A=0AI think ar= ound 65-70 percent is in use. This number doesn't change unless the user du= mps data in a home dir, which isn't the case so far. =0A=0AYou are correct = that only writes are failing. Msgbuf has more then what I pasted but I'm pr= etty sure its just more of the same errors. Ill redouble my check. =0A=0ATh= e other slices are very small. One is 35 meg the other is 100 some odd meg.= H is 1.2 gig.=A0 =0A=0AI don't know if ill be able to try the dd test for = a few reasons but ill check it out. Let me ask you this. Say zeroing out th= e drive works without error. Does that tell me anything?=A0 =0A=0AI also do= n't have access to smart tools as this is basically a closed system and the= vendor would never give us access to a complier. Granted I haven't tried j= ust throwing on gcc from 6.2. I could play with that or maybe since said ve= ndor's dev team is keeping track of this thread they could provide said bin= ary :). =0A=0AI really don't like the idea of replacing hardware as I'm loo= king at around 200 boxes. I really hope it doesn't come to that. =0A=0AThan= ks for the reply!=0A=0ASent via BlackBerry from T-Mobile=0A=0A-----Original= Message-----=0AFrom: Jeremy Chadwick =0ADate: Mo= n, 13 Feb 2012 21:18:28 =0ATo: john fleming=0ACc: fr= eebsd-stable@freebsd.org=0ASubject: Re: 6.2-Rel= ease ..ish.. CF + ata =3D=3D freeze?=0A=0AOn Mon, Feb 13, 2012 at 08:43:08P= M -0800, john fleming wrote:=0A> Just thought i would post over here as i'm= not getting a warm fuzzy from checkpoint about being able to find the root= cause of an issue. I have a large install base of IPSO checkpoint firewall= s, which are based on FreeBSD 6.2. I've had 3 firewalls hang basically the = same way, with something that looks like a filesystem issue or an?issue wit= h a CF card. =0A=0AFreeBSD 6.2 was EOL'd in early-to-mid-2008.=A0 The ATA d= river has changed=0Asignificantly since then (present-day uses CAM).=0A=0A>= Does anyone happen to know of any bugs (i've been looking around) that cou= ld cause something like that? Granted, it could be a batch of bad CF cards,= but its odd that i'm seeing the same thing on 3 different boxes and once r= ebooted they seem ok.=0A> ?=0A> Also is it possible to get useful info form= the atacontroller when things go south like this from the ddb prompt?=0A= =0ANot particularly.=A0 What's shown below indicates that the driver had=0A= issued some form of ATA write command (there are multiple kinds per ATA=0As= pecification), and either the underlying media (CF/disk) or controller=0Ast= alled/locked up/took too long.=A0 I forget what the timeout value is in=0A6= .2; I can't be bothered to remember such from 6 years ago.=A0 :-)=0A=0A> Th= is is what shows in show msgbuf=0A> ad0: timeout waiting to issue command= =0A> ad0: error issuing WRITE command=0A> ad0: timeout waiting to issue com= mand=0A> ad0: error issuing WRITE command=0A> ad0: timeout waiting to issue= command=0A> ad0: error issuing WRITE command=0A> ad0: timeout waiting to i= ssue command=0A> ad0: error issuing WRITE command=0A> g_vfs_done():ad0s4h[W= RITE(offset=3D33849344, length=3D131072)]error =3D 5 =0A> g_vfs_done():ad0s= 4h[WRITE(offset=3D33980416, length=3D131072)]error =3D 5 =0A> g_vfs_done():= ad0s4h[WRITE(offset=3D34111488, length=3D131072)]error =3D 5=0A> ?g_vfs_don= e():ad0s4h[WRITE(offset=3D34242560, length=3D131072)]error =3D 5 =0A> g_vfs= _done():ad0s4h[WRITE(offset=3D34373632, length=3D131072)]error =3D 5 =0A=0A= error 5 =3D EIO =3D Input/output error.=A0 But this isn't too big of a=0Asu= rprise given the timeouts you see prior.=0A=0AAre these CF cards brand new = -- meaning, are they completely unused=0A(having never had any writes done = to them), or have they been in use a=0Awhile?=A0 I'm betting they've been i= n use a while, and have probably been=0Adoing many writes over the years.= =0A=0ATwo things to note here:=0A=0A1) The errors you've shown are only hap= pening on writes, not reads.=A0 Of=0Acourse if you omitted information then= this isn't an accurate statement.=0A2) Timeouts are seen when issuing writ= es to some LBA regions.=0A=0AHow full is the CF card, disk-space-wise?=A0 N= ot just ad0s4h, I'm talking=0Aabout the entire card.=A0 How much space is r= oughly available?=A0 They're=0Avery small CF cards (1.8GByte roughly), and = the less space available,=0Athe less effectiveness of wear levelling (and i= n some cases the slower=0Athe writes are).=0A=0AReason I ask: given that th= ese are CF cards, this smells of cards which=0Aare simply "worn down".=A0 C= F cards have limited numbers of writes, and=0Athe card may be "freaking out= " internally when attempting to write to=0Asome LBAs which map to CF sector= s that are, in effect, "bad".=A0 The CF=0Acards' ECC implementation may be = buggy, or may simply be "spinning hard"=0Afor too long.=A0 You can read abo= ut this sort of behaviour on Wikipedia's=0ACompactFlash article.=0A=0AYou w= ouldn't be able to verify this with dd if=3D/dev/ad0, because those=0Aare r= ead operations.=A0 You could zero the media (dd if=3D/dev/zero=0Aof=3D/dev/= ad0) as a form of verification if you wanted.=0A=0ADo you happen to know if= these CF cards support SMART?=A0 If so,=0Ainstalling smartmontools (versio= n 5.42 or newer please) and providing=0Aoutput from "smartctl -a /dev/ad0" = may be helpful to me, but I make no=0Aguarantees anything of use will be sh= own there.=0A=0AOverall my advice would be to replace the CF cards, especia= lly if they=0Ahave been in use for a long while.=A0 It really doesn't matte= r to me that=0Ait's happening on 3 machines (honest), especially if these a= re 6.2=0Amachines with CF cards that have been in use for years.=A0 We're l= ucky to=0Aget 2 years out of our CF cards on our Juniper M120/320s before t= hey=0Astart spitting I/O errors.=A0 Pick larger CF cards as well; more spac= e =3D=0Amore room for effective wear levelling.=0A=0A> ?=0A> ad0: 1882MB at ata0-master PIO4=0A> atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x5070-0x507f m= em 0x80301000-0x803013ff at device 31.1 on pci0=0A> ata0: o= n atapci0=0A> ata1: on atapci0=0A> atapci1: port 0x5088-0x508f,0x50a4-0x50a7,0x5080-0x5087,0x50a0-0= x50a3,0x5060-0x506f irq 15 at device 31.2 on pci0=0A> ata2: = on atapci1=0A> ata3: on atapci1ad0s4h is basically a r/w u= fs partition on the box where almost anything that needs to be written goes= .=0A> trace=0A> Tracing pid 1101 tid 100043 td 0x656d8460=0A> kdb_enter(608= cc388,6246,656d8460,64ba1400,6095d580,...) at kdb_enter+0x2b=0A> siointr1(6= 4ba1400) at siointr1+0xf0=0A> siointr(64ba1400) at siointr+0x38=0A> intr_ex= ecute_handler(6095d580,f0a4ab04,6,6095d580,f0a4aafc,...) at intr_execute_ha= ndler+0x61=0A> intr_execute_handlers(6095d580,f0a4ab04,6,0,656d8460,...) at= intr_execute_handlers+0x40=0A> atpic_handle_intr(4) at atpic_handle_intr+0= x96=0A> Xatpic_intr4() at Xatpic_intr4+0x20=0A> --- interrupt, eip =3D 0x60= 6044af, esp =3D 0xf0a4ab48, ebp =3D 0xf0a4ab5c ---=0A> lockmgr(e1456a04,6,0= ,656d8460) at lockmgr+0x58f=0A> getdirtybuf(e14569a4,60a405e4,1) at getdirt= ybuf+0x2e2=0A> flush_deplist(68b30850,1,f0a4abb8) at flush_deplist+0x30=0A>= flush_inodedep_deps(656fa28c,1f235) at flush_inodedep_deps+0xcf=0A> softde= p_sync_metadata(65964618) at softdep_sync_metadata+0x61=0A> ffs_syncvnode(6= 5964618,1) at ffs_syncvnode+0x3a2=0A> ffs_fsync(f0a4ac74) at ffs_fsync+0x12= =0A> VOP_FSYNC_APV(60949260,f0a4ac74) at VOP_FSYNC_APV+0x38=0A> fsync(656d8= 460,f0a4acb4) at fsync+0x170=0A> syscall(805003b,806003b,5fbf003b,8050000,2= 88be450,...) at syscall+0x2ee=0A> Xint0x80_syscall() at Xint0x80_syscall+0x= 1f=0A=0A-- =0A| Jeremy Chadwick=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 jdc@parodius.com |=0A| Parodius Networking=A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 http://www.parodius.com/ |=0A| UNIX Systems Adm= inistrator=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Mountain View, CA, US |=0A| Maki= ng life hard for others since 1977.=A0 =A0 =A0 =A0 =A0 =A0 PGP 4BD6C0CB |