From owner-freebsd-stable@FreeBSD.ORG Thu Feb 19 05:04:14 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BCEF616A4FC for ; Thu, 19 Feb 2004 05:04:14 -0800 (PST) Received: from mail010.syd.optusnet.com.au (mail010.syd.optusnet.com.au [211.29.132.56]) by mx1.FreeBSD.org (Postfix) with ESMTP id AAABE43D2D for ; Thu, 19 Feb 2004 05:04:13 -0800 (PST) (envelope-from tfrank@optushome.com.au) Received: from marvin.home.local (c211-28-241-189.eburwd5.vic.optusnet.com.au [211.28.241.189])i1JD4Bl02715; Fri, 20 Feb 2004 00:04:11 +1100 Received: by marvin.home.local (Postfix, from userid 1001) id 59685290; Fri, 20 Feb 2004 00:04:10 +1100 (EST) Date: Fri, 20 Feb 2004 00:04:09 +1100 From: Tony Frank To: Tony Frank Message-ID: <20040219130409.GA51615@marvin.home.local> References: <20040219052336.GA28162@marvin.home.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040219052336.GA28162@marvin.home.local> User-Agent: Mutt/1.4.2i cc: freebsd-stable@freebsd.org Subject: Re: Kernel panics in ahc during load with stable built 18th Feb X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Feb 2004 13:04:14 -0000 Hi all again, Some more updates on my problems. On Thu, Feb 19, 2004 at 04:23:36PM +1100, Tony Frank wrote: > As per the subject I seem to be getting kernel panics in ahc > driver since upgrading my kernel & world to -stable. > This occurs specifically when writing high volume of files to > vinum raid5 volume spanning 4 scsi drives connected to Adaptec 2940 PCI > controller. > > The fault appears to be reproducable - every time I try to extract > a tar file containing a copy of /usr/obj from another system. > > vinum init and a lot of benchmarking (rawio and bonnie) work fine on the > volume. > > dmesg lines: > ahc0: port 0xb400-0xb4ff mem 0xe0800000-0xe080 > 0fff irq 10 at device 11.0 on pci0 > > Custom kernel is configured with: > options AHC_ALLOW_MEMIO > > I setup a serial console and rebuilt kernel to include debugging bits. > > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x5c > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc015cab2 > stack pointer = 0x10:0xc02e2b58 > frame pointer = 0x10:0xc02e2b68 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = Idle > interrupt mask = cam > kernel: type 12 trap, code=0 > Stopped at ahc_done+0xc2: pushl 0x5c(%ebx) > > db> trace > ahc_done(c0f9a200,c0fb53c0) at ahc_done+0xc2 > ahc_run_qoutfifo(c0f9a200) at ahc_run_qoutfifo+0xf1 > ahc_platform_intr(c0f9a200,0,c02e2bf8,c027ab82,c0322458) at ahc_platform_intr+0x > 174 > add_interrupt_randomness(c0322458,0,400010,c0300010,c0300010) at add_interrupt_r > andomness+0xe > Xresume10() at Xresume10+0x2b > --- interrupt, eip = 0xc027fa46, esp = 0xc02e2bf0, ebp = 0xc02e2bf8 --- > cpu_idle(e,633,2,80f9ff,0) at cpu_idle+0xe > idle_loop() at idle_loop+0x1d This problem easily occured after ~20 mins of extracting the mentioned tar file. I rebuilt my kernel without options AHC_ALLOW_MEMIO. With the new kernel the tar file extracted without any panics. When I then tried to 'stress' the system a bit more I did get a bunch of messages from ahc0 driver plus a panic. My "stress test" consisted of: FTP download (~500meg file over 100Mbps LAN using fxp0 from server) tar xf objtest.tar (~500meg tar file containing /usr/obj /usr/src copy) second tar xf objtest.tar (second time in different filesystem on ata disks) cvsup (stable from local cvsup mirror server) This was busy making noise & heat for about 1hr 40 mins before it died as mentioned. As such without the AHC_ALLOW_MEMIO option it worked a bit longer but still failed. Details of the new failure are included below: tar: Skipping to next header tar: Skipping to next header tar: Skipping to next header tar: Archive contains obsolescent base-64 headers tar: Skipping to next header tar: Skipping to next header tar: Error exit delayed from previous errors > ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET SAVED_SCSIID == 0x17, SAVED_LUN == 0x0, ARG_1 == 0x27 ACCUM = 0x27 SEQ_FLAGS == 0xc0, SCBPTR == 0xa, BTT == 0xff, SINDEX == 0x31 SCSIID == 0x27, SCB_SCSIID == 0x27, SCB_LUN == 0x0, SCB_TAG == 0x27, SCB_CONTROL == 0x64 SCSIBUSL == 0x27, SCSISIGI == 0xe6 SXFRCTL0 == 0x88 SEQCTL == 0x10 >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< ahc0: Dumping Card State in Message-in phase, at SEQADDR 0x1b8 Card was paused ACCUM = 0x27, SINDEX = 0x31, DINDEX = 0x52, ARG_2 = 0xff HCNT = 0x0 SCBPTR = 0xa SCSISIGI[0xe6]:(REQI|BSYI|MSGI|IOI|CDI) ERROR[0x0] SCSIBUSL[0x27] LASTPHASE[0xe0]:(MSGI|IOI|CDI) SCSISEQ[0x12]:(ENAUTOATNP|ENRSELI) SBLKCTL[0x2]:(SELWIDE) SCSIRATE[0x0] SEQCTL[0x10]:(FASTMODE) SEQ_FLAGS[0xc0]:(NO_CDB_SENT|NOT_IDENTIFIED) SSTAT0[0x7]:(DMADONE|SPIORDY|SDONE) SSTAT1[0x3]:(REQINIT|PHASECHG) SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x0] SIMODE1[0xac]:(ENSCSIPERR|ENBUSFREE|ENSCSIRST|ENSELTIMO) SXFRCTL0[0x88]:(SPIOEN|DFON) DFCNTRL[0x0] DFSTATUS[0x2d]:(FIFOEMP|DFTHRESH|HDONE |FIFOQWDEMP) STACK: 0x12c 0x0 0x151 0x192 SCB count = 50 Kernel NEXTQSCB = 38 Card NEXTQSCB = 38 QINFIFO entries: Waiting Queue entries: Disconnected Queue entries: 10:39 11:46 14:39 QOUTFIFO entries: Sequencer Free SCB List: 12 9 6 2 3 7 8 15 1 4 0 5 13 Sequencer SCB Info: 0 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x37] SCB_LUN[0x0] SCB_TAG[0xff] 1 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 2 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x27] SCB_LUN[0x0] SCB_TAG[0xff] 3 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x37] SCB_LUN[0x0] SCB_TAG[0xff] 4 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x27] SCB_LUN[0x0] SCB_TAG[0xff] 5 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x37] SCB_LUN[0x0] SCB_TAG[0xff] 6 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x37] SCB_LUN[0x0] SCB_TAG[0xff] 7 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 8 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x27] SCB_LUN[0x0] SCB_TAG[0xff] 9 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x27] SCB_LUN[0x0] SCB_TAG[0xff] 10 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) SCB_SCSIID[0x27] SCB_LUN[0x0] SCB_TAG[0x27] 11 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) SCB_SCSIID[0x17] SCB_LUN[0x0] SCB_TAG[0x2e] 12 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x37] SCB_LUN[0x0] SCB_TAG[0xff] 13 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 14 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) SCB_SCSIID[0x17] SCB_LUN[0x0] SCB_TAG[0x27] 15 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x17] SCB_LUN[0x0] SCB_TAG[0xff] Pending list: 39 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) SCB_SCSIID[0x27] SCB_LUN[0x0] 46 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) SCB_SCSIID[0x17] SCB_LUN[0x0] 46 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) SCB_SCSIID[0x17] SCB_LUN[0x0] [ ... this line repeats ~256 times ... ] Kernel Free SCB list: 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 3 9 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 3 9 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 3 9 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> Fatal trap 12: page fault while in kernel mode fault virtual address = 0x2c fault code = supervisor read, page not present instruction pointer = 0x8:0xc015755f stack pointer = 0x10:0xccd72c40 frame pointer = 0x10:0xccd72c50 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 165 (top) interrupt mask = cam kernel: type 12 trap, code=0 Stopped at ahc_match_scb+0xa3: movl 0x2c(%eax),%eax db> trace ahc_match_scb(c0f9a200,c0fb54c0,1,41,ffffffff,ff,1) at ahc_match_scb+0xa3 ahc_abort_scbs(c0f9a200,1,41,ffffffff,ff) at ahc_abort_scbs+0x2d9 ahc_handle_devreset(c0f9a200,ccd72d44,17,c02acb84,0) at ahc_handle_devreset+0x2c ahc_handle_scsiint(c0f9a200,64,c0322458,2,ccd72dc0) at ahc_handle_scsiint+0x95f ahc_platform_intr(c0f9a200,ccd72e48,ccd72e1c,c027ab82,c0322458) at ahc_platform_ intr+0x1f7 add_interrupt_randomness(c0322458,0,1030010,c7f80010,ccd70010) at add_interrupt_ randomness+0xe Xresume10() at Xresume10+0x2b --- interrupt, eip = 0xc0188a2f, esp = 0xccd72e08, ebp = 0xccd72e1c --- sysctl_find_oid(ccd72ef8,2,ccd72e44,ccd72e48,ccd72e70) at sysctl_find_oid+0x1b sysctl_root(0,ccd72ef8,2,ccd72e70,0) at sysctl_root+0x22 userland_sysctl(c7f825a0,ccd72ef8,2,bfbff9cc,bfbff9d4) at userland_sysctl+0x111 __sysctl(c7f825a0,ccd72f80,2815578c,bfbff9d8,2) at __sysctl+0x5c syscall2(2f,c107002f,bfbf002f,2,bfbff9d8) at syscall2+0x1f5 Xint0x80_syscall() at Xint0x80_syscall+0x25 db> ps pid proc addr uid ppid pgrp flag stat wmesg wchan cmd 354 c7f801e0 cd00f000 1001 249 354 004006 2 cvsup 288 c7f80860 ccff5000 1001 240 288 004086 3 ttyin c1115830 ftp 249 c7f80380 cd00c000 1001 248 249 2004086 3 pause cd00c260 tcsh 248 c7f80520 cd007000 1001 246 118 000184 3 select c0324168 sshd 246 c7f806c0 cd002000 0 118 118 000184 3 sbwait cc5bfec8 sshd 240 c7f80a00 ccfed000 1001 239 240 2004086 3 pause ccfed260 tcsh 239 c7f80d40 ccf5e000 1001 237 118 000184 2 sshd 237 c7f80ee0 ccf7a000 0 118 118 000184 3 sbwait cc5be308 sshd 223 c7f80ba0 ccf6e000 1001 222 223 004086 3 ttyin c0f9aa30 tcsh 222 c7f81080 ccf57000 1001 220 118 000184 3 select c0324168 sshd 220 c7f813c0 cce0b000 0 118 118 000184 3 sbwait cc5be548 sshd 171 c7f81560 ccded000 1001 153 171 004186 2 systat 165 c7f825a0 ccd70000 1001 152 165 004106 2 top 159 c7f818a0 ccda1000 1001 158 159 004086 3 ttyin c105b810 tcsh 158 c7f81a40 ccd9c000 0 1 158 004186 3 wait c7f81a40 login 153 c7f82dc0 ccd2f000 0 1 153 004186 3 wait c7f82dc0 login 152 c7f82f60 ccd27000 0 1 152 004186 3 wait c7f82f60 login 118 c7f81be0 ccd91000 0 1 118 000184 3 select c0324168 sshd 116 c7f81d80 ccd84000 0 1 116 000484 2 cron 109 c7f81f20 ccd80000 0 1 104 000084 3 nfsidl c032a6ac nfsiod --More-- 108 c7f820c0 ccd7c000 0 1 104 000084 3 nfsidl c032a6a8 nfsiod 107 c7f82260 ccd78000 0 1 104 000084 3 nfsidl c032a6a4 nfsiod 106 c7f82400 ccd74000 0 1 104 000084 3 nfsidl c032a6a0 nfsiod 101 c7f82c20 ccd37000 0 1 101 000084 2 ntpd 96 c7f82a80 ccd3b000 0 1 96 000004 2 syslogd 69 c7f82740 ccd44000 0 1 69 000084 3 select c0324168 dhclient 29 c7f828e0 ccd3f000 0 1 29 2000084 3 pause ccd3f260 adjkerntz 9 c7f83100 ccabf000 0 0 0 000204 3 vlruwt c7f83100 vnlru 8 c7f832a0 ccabc000 0 0 0 000204 2 syncer 7 c7f83440 ccab9000 0 0 0 000204 3 vrlock c104c000 bufdaemon 6 c7f835e0 ccab6000 0 0 0 000204 3 psleep c031b260 vmdaemon 5 c7f83780 ccab3000 0 0 0 000204 3 psleep c02ffef8 pagedaemon 4 c7f83920 cc5b8000 0 0 0 000204 3 idle c0f9a200 aic_recovery 0 3 c7f83ac0 cc5b5000 0 0 0 000204 3 idle c0f9a200 aic_recovery 0 2 c7f83c60 c856b000 0 0 0 000204 3 tqthr c0324164 taskqueue 1 c7f83e00 c7f88000 0 0 1 004284 3 wait c7f83e00 init 0 c0323460 c0473000 0 0 0 000204 3 sched c0323460 swapper Any assistance is appreciated, Tony