Date: Sat, 21 Feb 2004 18:58:00 +1100 From: Tony Frank <tfrank@optushome.com.au> To: freebsd-scsi@freebsd.org Cc: FreeBSD-questions@freebsd.org Subject: Re: ahc + vinum raid5 deadlocks? Message-ID: <20040221075800.GC98919@marvin.home.local> In-Reply-To: <20040221034749.GA98919@marvin.home.local> References: <20040221034749.GA98919@marvin.home.local>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, Cross posting to -scsi as it seems maybe related to my scsi setup. At least, the problems currently only appear when the scsi parts are in use. On Sat, Feb 21, 2004 at 02:47:49PM +1100, Tony Frank wrote: > Refer earlier thread on -stable for more background. > > 4.9-STABLE (cvsup 20th Feb) > Kernel was compiled with DDB, INVARIANTS, DIAGNOSTICS. > All options removed from /etc/make.conf except 'NOPROFILE=TRUE' > > I get the following on the console before system freezes: > ahc0: WARNING no command for scb 14 (cmdcmplt) > QOUTPOS = 44 This occured while doing majority of i/o to vinum raid5 volume. Since that time I have removed the vinum raid5 configuration and am using the disks directly - ufs mounted on /dev/da0s1h. When performing the same load benchmark I have received two separate panic's: First panic occurs due to KASSERT in ffs_read, refer: http://fxr.watson.org/fxr/source//ufs/ufs/ufs_readwrite.c?v=RELENG4#L316 %%%% (da0:ahc0:0:0:0): tagged openings now 40 (da1:ahc0:0:1:0): tagged openings now 40 (da2:ahc0:0:2:0): tagged openings now 40 (da3:ahc0:0:3:0): tagged openings now 40 Feb 21 15:48:16 raider su: tony to root on /dev/ttypc panic: bp->b_resid != 0 syncing disks... Stopped at siointr1+0x102: movl $0,brk_state2.757 db> trace siointr1(c0f98000,ccdc6c64,c027d546,c0f98000,10) at siointr1+0x102 siointr(c0f98000,10,0,0,0) at siointr+0xb Xfastintr4(c1090800,1000040,600,20002,ccd54300) at Xfastintr4+0x16 lockmgr(c1090800,1030002,ccd5436c,c032c7a0,ccdc6cac) at lockmgr+0x1fc vop_stdlock(ccdc6cc4,ccdc6cd4,c01bb269,ccdc6cc4,0) at vop_stdlock+0x20 ufs_vnoperatespec(ccdc6cc4) at ufs_vnoperatespec+0x15 vn_lock(ccd54300,20002,c032c7a0,c1129000,0) at vn_lock+0x71 ffs_sync(c1129000,2,c0b2d600,c032c7a0,c1129000) at ffs_sync+0x17f sync(c032c7a0,0,c02b9660,c02d6048,100) at sync+0x63 boot(100,0,0,ccdc6de0,c02482a2) at boot+0x8a panic(c02d6048,ccf5cbc0,5b,ccd8f000,400) at panic+0x79 ffs_read(ccdc6df4,0,ccdc6ea8,c02ec1e0,ccf5cbc0) at ffs_read+0x37a ufs_readlink(ccdc6e38,ccdc6e68,c01b1c8a,ccdc6e38,c7f7bee0) at ufs_readlink+0x6b ufs_vnoperate(ccdc6e38,c7f7bee0,c7f7bee0,ccdc6f80,c7f7bee0) at ufs_vnoperate+0x15 namei(ccdc6e80,c7f7bee0,2,ccdc6f80,8137400) at namei+0x302 stat(c7f7bee0,ccdc6f80,bfbe7810,bfbe8298,bfbea49c) at stat+0x41 syscall2(c027002f,2f,2f,bfbea49c,bfbe8298) at syscall2+0x209 Xint0x80_syscall() at Xint0x80_syscall+0x25 db> %%%% I have a core from this panic saved. 'trace' is about the extent of my ddb skills at the moment though. Checking the archives a similar problem was seen ~6months ago with particular SCSI disk and having too high tags value. Refer: http://docs.freebsd.org/cgi/mid.cgi?FE045D4D9F7AED4CBFF1B3B813C8533702741FEA >From what I can see, the ahc driver is forcing a max of 40 tags. The SCSI hardware is fairly old but was working without problems in the old system (Win based) Second panic occurs in ffs_softdep, refer: http://fxr.watson.org/fxr/source//ufs/ffs/ffs_softdep.c?v=RELENG4#L3590 %%%% Feb 21 17:16:30 raider su: tony to root on /dev/ttyp8 (da1:ahc0:0:1:0): tagged openings now 40 Feb 21 17:21:29 raider su: tony to root on /dev/ttyp3 (da3:ahc0:0:3:0): tagged openings now 40 (da2:ahc0:0:2:0): tagged openings now 40 panic: handle_written_inodeblock: live inodedep syncing disks... Stopped at siointr1+0x102: movl $0,brk_state2.757 db> trace siointr1(c0f98000,ccf6bba0,c027d546,c0f98000,10) at siointr1+0x102 siointr(c0f98000,10,0,8,68c040) at siointr+0xb Xfastintr4(c389c20c,10,c02be0a4,0) at Xfastintr4+0x16 biowait(c389c20c,c106b800,c11d1900,2,c02ec760) at biowait+0x37 bread(ccaaba00,18c040,2000,0,ccf6bc14) at bread+0xb2 ffs_update(ccd51c00,0,68c040,ccd51c00,ccf33580) at ffs_update+0xba ffs_fsync(ccf6bc78) at ffs_fsync+0x358 ffs_sync(c1069c00,2,c0b2d600,c032c7a0,c1069c00) at ffs_sync+0xdb sync(c032c7a0,0,c02b9660,c02d56e0,100) at sync+0x63 boot(100,0,c127bb00,ccf6bd18,c024416a) at boot+0x8a panic(c02d56e0,1,c11a5c80,c38d6178,0) at panic+0x79 handle_written_inodeblock(c11a5c80,c38d6150) at handle_written_inodeblock+0x30e softdep_disk_write_complete(c38d6150) at softdep_disk_write_complete+0x6a biodone(c38d6150,1,68c040,c10efa48,c38d6150) at biodone+0x121 complete_rqe(c10efa20,0,c1028c00,f76,c1028d3c) at complete_rqe+0x651 biodone(c10efa20,c1028cb8,c10efa20,c0146d58,c16caac0) at biodone+0xf5 ad_interrupt(c16caac0,c032b7d4,ccf6be38,c0182482,c0fa7900) at ad_interrupt+0x3e7 ata_intr(c0fa7900,c16ccdc0,ccf6be8c,c027e4c2,c032b7d4) at ata_intr+0xd8 add_interrupt_randomness(c032b7d4,0,10,c3870010,c01a0010) at add_interrupt_rando mness+0xe Xresume15() at Xresume15+0x2b --- interrupt, eip = 0xc01ab6e5, esp = 0xccf6be80, ebp = 0xccf6be8c --- bwillwrite(c16ccdc0,ccf6bf80,cce7b2a0,0,0) at bwillwrite+0x75 dofilewrite(cce7b2a0,c16ccdc0,a,824a00c,f76) at dofilewrite+0xa2 write(cce7b2a0,ccf6bf80,824a00c,bfbff7e0,bfbff800) at write+0x36 syscall2(bfbf002f,bfbf002f,822002f,bfbff800,bfbff7e0) at syscall2+0x209 Xint0x80_syscall() at Xint0x80_syscall+0x25 db> %%%% My searches show a few hits for -current and some old items from -stable. Nothing jumps out at me as a possible solution? This trace seems to suggest the problem may have been on the IDE disk? Again the core has been saved so given suitable directions I can do something with it. > Hardware is currently Adaptec AHA-2940W S71 (F/w 1.19S8) > Same issues occured with Adaptec AHA-2940UW/B (F/w 1.32S8) > I have 4 x 4G IBM SCSI disks on the single internal 68-pin connector. > Connections are good & termination is correct > The 4 disks are combined into a single vinum raid5 plex. > System is well ventilated and cool to the touch. > Pentium 2 200Mhz, 128mb SDRAM, Asus P2V m/b. > > Currently takes ~2 hours of solid activity to trigger the issue. These problems are occuring with load after about 30-50 mins of solid activity. a 'plain' "make -j4 buildworld buildkernel" completes ok if it runs by itself without the extra load of my test bench. > My 'test bench': > Copying files over NFSv3/udp to vinum raid5 volume via cp & tar > + make -j4 buildworld > + Copying large dir trees (/usr/ports, /usr/obj, /usr/src) > + cvsuping second copy of /usr/src > + extracting tar archive of /usr/obj Ideas are welcome, Tony
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040221075800.GC98919>