Date: Sat, 7 May 2005 01:45:50 +0200 From: Danny Pansters <danny@ricin.com> To: freebsd-questions@freebsd.org Subject: Re: Spontaneous reboots Message-ID: <200505070145.50210.danny@ricin.com> In-Reply-To: <Pine.LNX.4.40.0505050921350.22295-100000@shannon.math.ku.dk> References: <Pine.LNX.4.40.0505050921350.22295-100000@shannon.math.ku.dk>
next in thread | previous in thread | raw e-mail | index | archive | help
I certainly don't expect to be able to solve your problems (the kind of which are everyone's nightmare because you just can't pin down where they start) but I'm willing to give it a try. On Friday 06 May 2005 11:08, Erik Norgaard wrote: > Hi, > > I am experiencing tremendous problems keeping my FBSD 5 up and > happy, yet I keep experiencing spontaneous reboots and crashes. > > This is a looong story, I have been trying to figure out what's > causing the problem for two weeks now. I really appreciate > your patience and response if you make it all to the end :-) > > The setup: > > FBSD---DSL---Internet > > The DSL is a Thomsom 510 ADSL router doing 1-1 NAT, no firewall. > The FBSD is configured with IPFilter firewall and running named, > postfix, cyrus-imap22 with virtual domains and apache with > virtual hosts, also to serve the local net (behind the DSL) it > runs dhcpd, ntpd and mysql. Is the cable modem also running a dhcp server with your FBSD as a client? If in any event you get the same IP number allocated to more than one boxes on a network that's a ticket to a panic/reboot I'd think. Also dhclient likes to run through every interface which can yield unexpected results if you're also running dhcpd for LAN clients on an interface that just got dhcp'ed... Can you capture traffic? Please scroll down... > Postfix, Cyrus-Imap and Apache are all configured with TLS > support and I have generated certificates using OpenSSL. This > system was installed in november, and upgraded begning january. I > have had no problems for months. > > Then - from the beginning: > > On April 15, FreeBSD 5.3-p5, I had two simultaneous+/- events: > > 1) A huge number of incoming mail delivery attempts to addresses > of the type randomchars@mydomain.com > 2) Kernel panic, fatal trap 12 > > I had done no prior system tuning or changes. > > Since then, uptime has been anywhere between 0 and >3 days - the > last obtained by stopping all services and disconnecting the > machine from the network. > > 1) By huge, I mean enough to suck up a 512kbps DSL connection, > but this should be far from enough to make FBSD cough or even > panic. Also, system load is always close to 0.00. > > I have postfix handling mail and use cyrus-imap with virtual > domains as backend. Since postfix didn't know hosted addresses, > cyrus rejects the mail. I created a list of existing addresses so > mail could be rejected faster. > > The illicit mail delivery attempts persists. Can you capture traffic? I for one would be interseted in what this mail is. And it's function. Does it try to use you as a relay or is it merely sent to you, perhaps as a mail bomb... Stuff like that. Do you have any of those mails? Wghat's their contents, and headers then.. > 2) I followed the handbook to investigate the panic: > Following the kernel panic faq: > > Fatal trap 12: Page fault while in kernel mode > Fault virtual address = 0xc > Fault code = supervisor read, page not present > instruction pointer = 0x8:0xc053d638 > stack pointer = 0x10:0xcb4ddaec > frame pointer = 0x10:0xcb4ddaf8 > code segment = base 0x0, limit 0xffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL=0 > current process = 28 (swi1:net) > trap number = 12 > panic: page fault > > # nm -n /boot/kernel/kernel | grep c053d6 > c053d610 T m_copydata > c053d670 T m_dup > > I no longer get this panic, however my system does not deserve the > predicate -STABLE. Somehow, I prefered the panic, at least it gave > some info for debugging. But now it reboots without a blip. > > Disk errors: > > The crashes _always_ causes disk errors that cannot be recovered > by the background fsck, particularly on /var where mail resides. > This may result in new reboots. > > To solve this I have tried mounting drives read-only, unless > write permission was necesary. It turns out that postfix requires > write access to /, /usr and /var - the first two appears to be > related to tls(?). > > Also, I have set fsck_y_enable="yes" in rc.conf, so the disk is > thorouly checked on boot after a crash. > > I had dumpon set in my rc.conf but this just made the partition > full making things even worse. I have removed all kernel dumps and > also unnecessary data as I understood diskperformance may drop > when diskspace is below 15%. > > The kernel: > > The first kernel was a 5.3-p5 custom kernel. To make it easier to > debug I updated to -p8, GENERIC. No change. No change. Following > suggestions by Kris K. I upgraded to 5.4-RC2. > > This solved the panic - but the system still crashes, also after > updating to RC3 and RC4. > > The system: > > Upgrading to 5.4, RC2, I built world also. I then realized that > some ports may have been built against the old base causing new > problems. > > I have now deinstalled all ports. The system has been completely > updated, kernel and base, to 54RC4. I have reinstalled the > minimal set of ports needed to serve my needs, version to -CURRENT > as of may 3. > > I still experience crashes. > > Postfix: Don't expect you having a problem here. > I tried to limit the amount of simultaneaous deliveries handles. > No change. > > When a connection is made postfix sends a lot of dns queryes to > verify that the sender hostname resolves to the ip, that sender > domain exists, and that it is not in a block list. > > IPFilter: Don't expect you having a problem here. > I have restricted access to port 25, now only a handfull of > servers are permitted by the firewall. This has helped, uptime is > now hours rather than minutes, but I still have crashes. > > I have reduced all timeouts to prevent state table from > saturating, but no change. > > If I open up for incoming mail, for a (any) /8 segment, the number > of connections explode. Due to the limitation of simultaneous > postfix threads, many time out. No change. > > I am working on a black list based on the maillog, but this is > another project. > > DNS: Is that daemonsecurity.com, your domain? If you host security related stuff and perhaps usually blog on it or something (it's "maintenance" now), you might be having a single persistant enemy also that is causing you headaches. You never know. > Since mail to mydomain.com is currently useless I have decided to > set the MX record to 127.0.0.1. This has stopped the illicit mail, > but also all other legitimate mail to that domain - mostly this > gives me peace and bandwith. > > Hardware: (dmesg below) > > I have tried to change the disk cable, I have a 2.5" disk with a > converter cable to standard IDE. > > Also, I have tried the disk in my laptop and it appears stable, > but testing period was limited. > > I have tried both IDE connectors on the MB and both NIC's. No > change. > > Summary: > > Despite all my attempts to solve the problem, my system is far > from STABLE. I still experience spontaneous crashes, allthough > less often. > > It is my personal belief that there may be a hardware problem, > or persistent disk errors. Could be, but this may be a complication caused by having several nasty crashes rather than being the original cause. > The reason is that despite the traffic load satturates the > connection it should not be enough to crash even limited hardware. > I have no more ideas on how to debug this. > > Questions: > > * Is there a disk tool for analysing the disk, marking sectors bad > etc? smarttools in ports, badsect in base, .. probably more but never exactly what you want :) > * How do I find the file if I know the Inode number (as reported > by fsck)? With find? should be possible but don't know from the head. > * Can malformed packets cause FBSD crash? Could Thomson510 be > accountable for such packets? I think that a sufficiently bad packet could crash the stack, the module, the kernel, yes. My first impulse after reading your mail was to google for security problems with these routers and there's at least one which looks promising but also hard to actually abuse: http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-0641 You never know though.. Who knows how advanced the virusmakers/spammeisters are nowadays. Your actual symptoms might be side effects because it gets an unexpected system. > * Did I miss the obvious? Paranoia maybe. The social factor. Any enemies who'd be capable? What's in the spam they try to deliver? Do they also try to send/relay? What services does your DSL modem run? What's your link to a security site registered in spain... I reckon your its tech admin.. well the mail is related to that as well, no? > * Any ideas where to go now? > > All help is highly appreciated. If it will help I dunno. Good luck, Dan > Thanks, Erik > > Disk space: df > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/ad0s1a 507630 76966 390054 16% / > devfs 1 1 0 100% /dev > /dev/ad0s1g 30859916 14228272 14162852 50% /home > /dev/ad0s1f 507630 42 466978 0% /tmp > /dev/ad0s1d 12186190 2134420 9076876 19% /usr > /dev/ad0s1e 12186190 7689462 3521834 69% /var > devfs 1 1 0 100% /var/named/dev > > last (24h): > norgaard ttyp0 x.x.x.x Fri 6 May 10:09 still logged in > norgaard ttyp0 x.x.x.x Fri 6 May 09:22 - 09:25 (00:03) > norgaard ttyp0 charm Fri 6 May 08:28 - 08:42 > (00:13) norgaard ttyp0 charm Fri 6 May 07:48 - 08:00 > (00:11) reboot ~ Fri 6 May 04:16 > norgaard ttyp1 charm Thu 5 May 22:45 - 23:18 > (00:32) norgaard ttyp0 charm Thu 5 May 22:09 - crash > (06:07) reboot ~ Thu 5 May 22:05 > norgaard ttyp0 charm Thu 5 May 21:45 - crash > (00:20) reboot ~ Thu 5 May 21:20 > norgaard ttyp1 charm Thu 5 May 21:11 - crash > (00:09) norgaard ttyp0 charm Thu 5 May 20:45 - crash > (00:35) reboot ~ Thu 5 May 18:57 > norgaard ttyp0 x.x.x.x Thu 5 May 18:23 - 18:23 (00:00) > reboot ~ Thu 5 May 18:22 > norgaard ttyp0 x.x.x.x Thu 5 May 16:44 - crash (01:37) > norgaard ttyp0 x.x.x.x Thu 5 May 15:44 - 16:13 (00:28) > norgaard ttyp0 x.x.x.x Thu 5 May 13:57 - 13:58 (00:00) > norgaard ttyp0 x.x.x.x Thu 5 May 13:38 - 13:51 (00:12) > norgaard ttyp0 x.x.x.x Thu 5 May 13:06 - 13:27 (00:21) > norgaard ttyp0 x.x.x.x Thu 5 May 10:53 - 11:00 (00:06) > reboot ~ Thu 5 May 10:43 > norgaard ttyp0 x.x.x.x Thu 5 May 10:37 - crash (00:06) > norgaard ttyp0 x.x.x.x Thu 5 May 10:14 - 10:22 (00:08) > reboot ~ Thu 5 May 10:06 > norgaard ttyp0 charm Thu 5 May 08:38 - crash > (01:27) reboot ~ Thu 5 May 08:38 > norgaard ttyp0 charm Thu 5 May 07:53 - 07:54 > (00:00) norgaard ttyp0 charm Thu 5 May 07:52 - 07:52 > (00:00) reboot ~ Thu 5 May 07:17 > reboot ~ Thu 5 May 04:59 > norgaard ttyp0 charm Thu 5 May 04:17 - crash > (00:41) reboot ~ Thu 5 May 04:16 > shutdown ~ Thu 5 May 04:14 > norgaard ttyp0 charm Thu 5 May 03:45 - shutdown > (00:28) reboot ~ Thu 5 May 03:42 > reboot ~ Thu 5 May 03:40 > norgaard ttyp0 charm Thu 5 May 03:40 - crash > (00:00) reboot ~ Thu 5 May 03:31 > reboot ~ Thu 5 May 03:27 > reboot ~ Thu 5 May 03:13 > reboot ~ Thu 5 May 03:03 > reboot ~ Thu 5 May 02:58 > reboot ~ Thu 5 May 02:51 > reboot ~ Thu 5 May 02:47 > reboot ~ Thu 5 May 02:41 > reboot ~ Thu 5 May 02:35 > reboot ~ Thu 5 May 02:29 > reboot ~ Thu 5 May 02:25 > reboot ~ Thu 5 May 02:20 > reboot ~ Thu 5 May 02:09 > reboot ~ Thu 5 May 01:58 > reboot ~ Thu 5 May 01:53 > reboot ~ Thu 5 May 01:50 > reboot ~ Thu 5 May 01:46 > reboot ~ Thu 5 May 01:42 > reboot ~ Thu 5 May 01:33 > reboot ~ Thu 5 May 01:30 > reboot ~ Thu 5 May 01:27 > reboot ~ Thu 5 May 01:13 > reboot ~ Thu 5 May 01:08 > reboot ~ Thu 5 May 01:05 > reboot ~ Thu 5 May 00:58 > reboot ~ Thu 5 May 00:53 > reboot ~ Thu 5 May 00:44 > reboot ~ Thu 5 May 00:34 > reboot ~ Thu 5 May 00:24 > reboot ~ Thu 5 May 00:20 > reboot ~ Thu 5 May 00:13 > reboot ~ Wed 4 May 23:58 > reboot ~ Wed 4 May 23:43 > reboot ~ Wed 4 May 23:40 > reboot ~ Wed 4 May 23:36 > norgaard ttyp0 charm Wed 4 May 20:57 - 23:29 > (02:31) > > Note the reboots from Wed 4, 23.36 - Thu 5 7.52 appeared to be > caused by postfix throtling due to a read only mounted /usr. > > dmesg.today: > > Copyright (c) 1992-2005 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, > 1993, 1994 > The Regents of the University of California. All rights > reserved. > FreeBSD 5.4-RC4 #0: Tue May 3 14:07:30 CEST 2005 > root@top.daemonsecurity.com:/usr/obj/usr/src/sys/GENERIC > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: VIA C3 Nehemiah+RNG (1002.28-MHz 686-class CPU) > Origin = "CentaurHauls" Id = 0x694 Stepping = 4 > > Features=0x380b03d<FPU,DE,PSE,TSC,MSR,MTRR,PGE,CMOV,MMX,FXSR,SSE> > real memory = 251592704 (239 MB) > avail memory = 236548096 (225 MB) > npx0: <math processor> on motherboard > npx0: INT 16 interface > acpi0: <VT9174 AWRDACPI> on motherboard > acpi0: Power Button (fixed) > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 > cpu0: <ACPI CPU (3 Cx states)> on acpi0 > acpi_throttle0: <ACPI CPU Throttling> on cpu0 > acpi_button0: <Power Button> on acpi0 > pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 > pci0: <ACPI PCI bus> on pcib0 > agp0: <VIA 862x (CLE266) host to PCI bridge> mem 0xd0000000-0xd7ffffff at > device 0.0 on pci0 pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 > pci1: <ACPI PCI bus> on pcib1 > pci1: <display, VGA> at device 0.0 (no driver attached) > vr0: <VIA VT6105 Rhine III 10/100BaseTX> port 0xd000-0xd0ff mem > 0xde000000-0xde0000ff irq 12 at device 15.0 on pci0 miibus0: <MII bus> on > vr0 > ukphy0: <Generic IEEE 802.3u media interface> on miibus0 > ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > vr0: Ethernet address: 00:40:63:d4:89:72 > uhci0: <VIA 83C572 USB controller> port 0xd400-0xd41f irq 11 at > device 16.0 on pci0 > usb0: <VIA 83C572 USB controller> on uhci0 > usb0: USB revision 1.0 > uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub0: 2 ports with 2 removable, self powered > uhci1: <VIA 83C572 USB controller> port 0xd800-0xd81f irq 11 at device 16.1 > on pci0 usb1: <VIA 83C572 USB controller> on uhci1 > usb1: USB revision 1.0 > uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub1: 2 ports with 2 removable, self powered > uhci2: <VIA 83C572 USB controller> port 0xdc00-0xdc1f irq 9 at device 16.2 > on pci0 usb2: <VIA 83C572 USB controller> on uhci2 > usb2: USB revision 1.0 > uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub2: 2 ports with 2 removable, self powered > pci0: <serial bus, USB> at device 16.3 (no driver attached) > isab0: <PCI-ISA bridge> at device 17.0 on pci0 > isa0: <ISA bus> on isab0 > atapci0: <VIA 8235 UDMA133 controller> port > 0xe000-0xe00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0 > ata0: channel #0 on atapci0 > ata1: channel #1 on atapci0 > pci0: <multimedia, audio> at device 17.5 (no driver attached) > vr1: <VIA VT6102 Rhine II 10/100BaseTX> port 0xe800-0xe8ff mem > 0xde002000-0xde0020ff irq 11 at device 18.0 on pci0 miibus1: <MII bus> on > vr1 > ukphy1: <Generic IEEE 802.3u media interface> on miibus1 > ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > vr1: Ethernet address: 00:40:63:d4:89:71 > fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 > fd0: <1440-KB 3.5" drive> on fdc0 drive 0 > sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on > acpi0 sio0: type 16550A > sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 > sio1: type 16550A > ppc0: <Standard parallel printer port> port 0x378-0x37f irq 7 on acpi0 > ppc0: Generic chipset (EPP/NIBBLE) in COMPATIBLE mode > ppbus0: <Parallel port bus> on ppc0 > plip0: <PLIP network interface> on ppbus0 > lpt0: <Printer> on ppbus0 > lpt0: Interrupt-driven port > ppi0: <Parallel I/O> on ppbus0 > sio2: <16550A-compatible COM port> port 0x3e8-0x3ef irq 5 on acpi0 > sio2: type 16550A > sio3: <16550A-compatible COM port> port 0x2e8-0x2ef irq 10 on acpi0 > sio3: type 16550A > orm0: <ISA Option ROM> at iomem 0xc0000-0xcdfff on isa0 > pmtimer0 on isa0 > atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0 > atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 > sc0: <System console> at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > Timecounter "TSC" frequency 1002278507 Hz quality 800 > Timecounters tick every 10.000 msec > ad0: 57231MB <IC25N060ATMR04-0/MO3OAD4A> [116280/16/63] at ata0-master > UDMA100 Mounting root from ufs:/dev/ad0s1a > WARNING: /home was not properly dismounted > WARNING: /tmp was not properly dismounted > WARNING: /usr was not properly dismounted > WARNING: /var was not properly dismounted > IP Filter: v3.4.35 initialized. Default = pass all, Logging = > enabled > Accounting enabled > > > > GnuPG: http://www.locolomo.org/home/norgaard/norgaard.gpg.asc > pub 1024D/11D11F9E 2003-08-15 Erik Norgaard <norgaard@locolomo.org> > Key fingerprint = C394 81C4 D137 EEE5 39BE 82D5 3E6B FB3E 11D1 1F9E > > > > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to > "freebsd-questions-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200505070145.50210.danny>