From owner-freebsd-alpha@FreeBSD.ORG Wed Apr 9 15:33:08 2003 Return-Path: Delivered-To: freebsd-alpha@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7D90137B401 for ; Wed, 9 Apr 2003 15:33:08 -0700 (PDT) Received: from rzsrv1.rz.tu-bs.de (rzsrv1.rz.tu-bs.de [134.169.9.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id 42EB043FBF for ; Wed, 9 Apr 2003 15:33:07 -0700 (PDT) (envelope-from j.roeder@tu-bs.de) Received: from localhost (y0002034@localhost)ESMTP id AAA18309; Thu, 10 Apr 2003 00:33:04 +0200 (METDST) X-Authentication-Warning: rzsrv1.rz.tu-bs.de: y0002034 owned process doing -bs Date: Thu, 10 Apr 2003 00:33:03 +0200 (METDST) From: =?ISO-8859-1?Q?Jens_R=F6der?= X-X-Sender: To: Wilko Bulte In-Reply-To: <20030409180753.GB14966@freebie.xs4all.nl> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE cc: freebsd-alpha@freebsd.org Subject: Re: alpha/50659: reboot causes SRM console to loop endless error and needs to be restetted hard X-BeenThere: freebsd-alpha@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the Alpha List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Apr 2003 22:33:08 -0000 Hello Wilko, thanks for the quick response and good support. On Wed, 9 Apr 2003, Wilko Bulte wrote: > > The machine has about 1 GB RAM. Honestly I am not sure what "processor > > 1GB... that is overkill for a gateway, but hey, it should not hurt ;) :-) Yes, I am happy that I got that machine, what stood unused for years in a cluster because of a loud hardrive, what I simply removed. I hoped for a stable hardware of a real unix box as all depend on that gateway. Of course it can host a few users later when I get it stable with FreeBSD. I think FreeBSD should be secure enough to handle that also users on a gateway. > That is a kernel panic, not a memory problem ;) > > Most Alphas, and your AS500 too, have ECC (error correction) memory. That= allows > single bit memory errors to be corrected. The kernel will tell you if a > correction was applied, these are the processor correctable errors I > mentioned. Hm, sounds interesting, so that does mean for me that in the case of a hardware memory problem I would get a kernel-message and don't need to do any memory checks? > Unaligned accesses in kernel mode are Bad(TM). Check the handbook on > creating more debug info on the crash please. I am not sure if I did the right thing, so there is a core file now available at: http://octopus.homeunix.net/jens@piero.ptch.nat.tu-bs.de.gz > > At the moment I consider also defect memory and will check that as soon= as > > I have a temporarily replacement for that Institute gateway and a night > > Very unlikely, this looks like a problem in the kernel to me. > > > Meanwhile I have compiled a kernel with suffiencet debug mode with the > > hope to offer proper error messages. > > Can you catch a crash dump maybe? At least the kernel did not reboot with the debug function so I could write down for the 5.0-p7 version: fatal kerneltrap: trapentry=09=3D 0x4=09(unaligned access fault) cpuid=09=09=3D 0 faulting va=09=3D 0xfffffc0031d12d0c opcode=09=09=3D 0x2d register=09=3D 0x9 pc=09=09=3D 0xfffffe0004138bc0 ra=09=09=3D 0xfffffe0004138bb4 sp=09=09=3D 0xfffffe001da7db70 usp =09=09=3D 0x11fff628 curthread=09=3D 0xfffffc003e2c87c0 pid 593, comm ipfw Stopped at ipfw_ctl+0x1c0; or =09zero, s0,t2 =09=09=09 Unfortunately I am too new in that area and never work on the db> prompt, so I need lots of reading to do to handle this. Are there one or two commands just to do, to give you a propper error message? (By the way it is an generic kernel in this case). Again, when you use "ipfw show" on 5.0 on alpha, you get messages like this: ptchgate# ipfw show 00100 94 10410 allow ip from any to any via lo0 00200 0 0 deny ip from any to 127.0.0.0/8 pid 585 (ipfw): unaligned access: va=3D0x1200a80b4 pc=3D0x120001780 ra=3D0x120001764 op=3Dldq pid 585 (ipfw): unaligned access: va=3D0x1200a80bc pc=3D0x120001784 ra=3D0x120001764 op=3Dldq 00300 0 0 deny ip from 127.0.0.0/8 to any 65000 921 89561 allow ip from any to any 65535 0 0 deny ip from any to any It gets more likely to crash, when my set of rules are specified and list the rule then. This does not occur on 5.0 for i386, what seems to run stable yet. > > I think the "unalighed access error" when listing the firewall rules > > showed only up in the 5.0 version. I will probably downgrade to 4.7 or = 4.8 > > (what is better to use?) again and recompile with ipfw2 then, and let y= ou > > know then. Before I will try to produce proper errror messages with the > > debug kernel of 5.0. > > I'd go for 4.8. Do you need any ipfw2 functionality? > > > Maybe you can try out the SRM console problem without upgrading to 5.0 = as > > I remember I first noticed it, when I booted from floppy or CD and call= ed > > the machine to abort. I thought first of the errors reason to be my fau= lt > > because of the abortion. Again 4.7 did not have that problem. > > I have a fresh 4.8 on my AS500 and that does not show me the problem. Ok, I will downgrade to 4.8 as soon as I got a proper crash dump from 5.0. > What kind of PCI cards are in the machine? Can you post a SHOW CONF > from the SRM ? Of course, a pleasure for me: =2E.................................................................... Firemware SRM Console: V7.2-2 ARC Console: 4.58 PALcode: OpenVMS PALcode V1.20-0 Tru64 UNIX PALcode V1.22-0 Processor DECchip (tm)21164A-2 Pass 2 500MHz 96KByte SCache 8MB BCache Cia ASIC Pass 3 Memory Size 1024 Mb Bank=09Size/Sets=09Base Addr=09Speed -----=09---------=09---------=09------ 00=09512 Mb/1=09000000000=09Fast 01=09512 Mb/1=09020000000=09Fast BCache Size 8Mb Tested Memory 33 Mbyte PCI Bus Bus 00=09Slot 06: DECchip 21040 Network Controller =09=09=09=09ewa0.0.0.6.0 Bus 00 =09Slot 08: Digital TGA2 Graphics Controller Bus 00=09Slot 09: ISP1020 SCSI Controller =09=09=09pka0.7.9.0 =09SCSI Bus ID 7 =09=09=09dka400.4.0.9.0=09RRD46 =09=09=09dka500.5.0.9.0=09IBM DGHS18u Bus 00 =09Slot 10: Intel 8275EB PCI to Eisa Bridge Bus 00 =09Slot 12: Vendor: 10ec=09Device: 8139 Sub_id 813910ec =2E.............................................................. I hope my eye to keyboard copy doesn't contain errors. :-) Well, I hope, there was something productive for debugging in my mail. I am sorry if I appear to be very unexperienced in FreeBSD, but I am just getting started. with best regards from Germany Jens ---------------------------------------------------------------------------= -- Physikalische und Theoretische Chemie der TU-Braunschweig Jens R=F6der, Hans-Sommer Str.10, 38106 Braunschweig ---------------------------------------------------------------------------= --