Date: Tue, 27 Jun 2006 22:33:03 +0800 From: "Ren Zhen" <fblist@gmail.com> To: freebsd-stable@freebsd.org Subject: Re: wi0 down when print a lot of data to screen over ssh Message-ID: <910c4cb0606270733j2dd5f545q83819a0e8e200faf@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
There is some extra information. It's what the kernel say today. I just turn on and turn off the powersave. kernel: wi0: timeout in wi_seek to 152/0 last message repeated 7 times kernel: wi0: device timeout kernel: wi0: timeout in wi_seek to 152/0 kernel: wi0: timeout in wi_cmd 0x010b; event status 0x8000 kernel: wi0: xmit failed kernel: wi0: timeout in wi_seek to 152/0 last message repeated 6 times kernel: wi0: bad alloc 152 != 128, cur 0 nxt 0 kernel: wi0: record read mismatch, rid=fd42, got=fd41 kernel: wi0: record read mismatch, rid=fdc1, got=fd42 kernel: wi0: record read mismatch, rid=fd41, got=fdc1 On 6/27/06, freebsd-stable-request@freebsd.org < freebsd-stable-request@freebsd.org> wrote: > > Send freebsd-stable mailing list submissions to > freebsd-stable@freebsd.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > or, via email, send a message with subject or body 'help' to > freebsd-stable-request@freebsd.org > > You can reach the person managing the list at > freebsd-stable-owner@freebsd.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of freebsd-stable digest..." > > > Today's Topics: > > 1. Re: force panic of remote server ... possible? (Ed Maste) > 2. Re: force panic of remote server ... possible? (Ed Maste) > 3. Re: vinum to gvinum help (Mark Linimon) > 4. Re: Setting up GEOM mirror (Mike Jakubik) > 5. Re: What denotes a 'blocked' process? (Marc G. Fournier) > 6. RE: vinum to gvinum help (Wilde, Donald) > 7. Re: What denotes a 'blocked' process? (Kostik Belousov) > 8. Re: vmstat 'b' (disk busy?) field keeps climbing ... > (Marc G. Fournier) > 9. Re: FreeBSD 6.x CVSUP today crashes with zero load ... > (Dmitry Pryanishnikov) > 10. Re: vmstat 'b' (disk busy?) field keeps climbing ... (Max Laier) > 11. Re: kernel can't find root filesystem (Michael Proto) > 12. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch) > 13. Re: Gigabit ethernet very slow. (Matthew D. Fuller) > 14. Re: FreeBSD 6.x CVSUP today crashes with zero load ... > (Wilko Bulte) > 15. Re: wi0 down when print a lot of data to screen over ssh > (Michael Proto) > 16. Re: kernel can't find root filesystem (M.Hirsch) > 17. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch) > 18. Re: FreeBSD 6.x CVSUP today crashes with zero load ... > (Wilko Bulte) > 19. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch) > 20. Re: FreeBSD 6.x CVSUP today crashes with zero load ... > (Wilko Bulte) > 21. Re: FreeBSD 6.x CVSUP today crashes with zero load ... > (Dmitry Pryanishnikov) > 22. Re: vmstat 'b' (disk busy?) field keeps climbing ... > (Marc G. Fournier) > 23. Re: What denotes a 'blocked' process? (Marc G. Fournier) > 24. RE: FreeBSD 6.x CVSUP today crashes with zero load ... > (Michael Butler) > 25. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch) > 26. Re: FreeBSD 6.x CVSUP today crashes with zero load ... > (Wilko Bulte) > 27. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch) > 28. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch) > 29. Re: FreeBSD 6.x CVSUP today crashes with zero load ... (M.Hirsch) > 30. Re: FreeBSD 6.x CVSUP today crashes with zero load ... > (Dmitry Pryanishnikov) > 31. Re: FreeBSD 6.x CVSUP today crashes with zero load ... > (Steven Hartland) > 32. Re: FreeBSD 6.x CVSUP today crashes with zero load ... > (Thomas Nystr?m) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 26 Jun 2006 10:52:38 -0400 > From: Ed Maste <emaste@phaedrus.sandvine.ca> > Subject: Re: force panic of remote server ... possible? > Cc: freebsd-stable@freebsd.org > Message-ID: <20060626145238.GA22081@sandvine.com> > Content-Type: text/plain; charset=us-ascii > > On Mon, Jun 26, 2006 at 01:06:14PM +0100, Gavin Atkinson wrote: > > > On Mon, 2006-06-26 at 08:55 -0300, Marc G. Fournier wrote: > > > For the server that I'm fighting with right now, where Dmitry pointed > out > > > that it looks like a deadlock issue ... I have dumpdev/savecore > enabled, > > > is there some way of forcing it to panic when I know I actually have > the > > > deadlock, so that it will dump a core? > > > > You cen enter the debugger by setting the (badly names) debug.kdb.enter > > sysctl to 1, although I can't guarantee that'll trigger a dump and > > reboot. Do you have a serial console? > > >From some of your other messages, I believe this is a remote machine? > Unless you can access an attached keyboard, or have a serial console, > debug.kdb.enter will leave the machine sitting in ddb with no way to > get out. Also, if you have a PS/2 keyboard (that is, one handled by > the atkbd(4) driver) ddb will not accept any input on 6.1 or HEAD. > (There is some discussion of this issue on the freebsd-current list.) > Before using ddb on a remote machine I would suggest testing it out > with the same release locally. > > For your original question -- I'm not sure which release it first > appeared in (and it may be only in -CURRENT), but if it exists you > can use: > > $ sysctl -d debug.kdb.panic > debug.kdb.panic: set to panic the kernel > > -ed > > > ------------------------------ > > Message: 2 > Date: Mon, 26 Jun 2006 13:32:37 -0400 > From: Ed Maste <emaste@phaedrus.sandvine.ca> > Subject: Re: force panic of remote server ... possible? > To: "Marc G. Fournier" <scrappy@hub.org> > Cc: freebsd-stable@freebsd.org > Message-ID: <20060626173237.GA53085@sandvine.com> > Content-Type: text/plain; charset=us-ascii > > On Mon, Jun 26, 2006 at 01:06:14PM +0100, Gavin Atkinson wrote: > > > On Mon, 2006-06-26 at 08:55 -0300, Marc G. Fournier wrote: > > > For the server that I'm fighting with right now, where Dmitry pointed > out > > > that it looks like a deadlock issue ... I have dumpdev/savecore > enabled, > > > is there some way of forcing it to panic when I know I actually have > the > > > deadlock, so that it will dump a core? > > > > You cen enter the debugger by setting the (badly names) debug.kdb.enter > > sysctl to 1, although I can't guarantee that'll trigger a dump and > > reboot. Do you have a serial console? > > >From some of your other messages, I believe this is a remote machine? > Unless you can access an attached keyboard, or have a serial console, > debug.kdb.enter will leave the machine sitting in ddb with no way to > get out. Also, if you have a PS/2 keyboard (that is, one handled by > the atkbd(4) driver) ddb will not accept any input on 6.1 or HEAD. > (There is some discussion of this issue on the freebsd-current list.) > Before using ddb on a remote machine I would suggest testing it out > with the same release locally. > > For your original question -- I'm not sure which release it first > appeared in (and it may be only in -CURRENT), but if it exists you > can use: > > $ sysctl -d debug.kdb.panic > debug.kdb.panic: set to panic the kernel > > -ed > > > ------------------------------ > > Message: 3 > Date: Mon, 26 Jun 2006 14:33:19 -0500 > From: linimon@lonesome.com (Mark Linimon) > Subject: Re: vinum to gvinum help > To: Sven Willenberger <sven@dmv.com> > Cc: Roland Smith <rsmith@xs4all.nl>, freebsd-stable > <freebsd-stable@freebsd.org> > Message-ID: <20060626193319.GC909@soaustin.net> > Content-Type: text/plain; charset=us-ascii > > On Mon, Jun 26, 2006 at 02:15:24PM -0400, Sven Willenberger wrote: > > this is a production server that can at best stand an hour or so of > > downtime. > > IMHO there are no 5.2.1 upgrade options that can be accomplish in even > a small number of hours. The kernel libraries were all updated for 5.3; > and hundreds, if not more, ports were updated. Since the 5.3 release, > there have been thousands, if not tens of thousands, of commits to the > ports tree, many of which make major infrastructural changes. > > Either going to 5.5 or 6.1 at this point should (also IMHO) be a complete > reinstall on a staging system, with some tough testing there to show that > the upgrade will work for your applications. > > Otherwise I think you're asking for some serious grief here. > > mcl > > > ------------------------------ > > Message: 4 > Date: Mon, 26 Jun 2006 15:03:54 -0400 > From: Mike Jakubik <mikej@rogers.com> > Subject: Re: Setting up GEOM mirror > To: Vivek Khera <vivek@khera.org> > Cc: freebsd-stable <freebsd-stable@freebsd.org> > Message-ID: <44A02F9A.4080606@rogers.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Vivek Khera wrote: > > > > On Jun 25, 2006, at 2:14 PM, Mike Jakubik wrote: > >> > >> The problem with these instructions is that they don't take in to > >> account the last sector. You may very well end up writing the > >> metadata on the file system. > >> > > > > When was the last time you fdisk'd a disk and it used the last sector > > on the drive? I always end up with a bunch of extra space that didn't > > fit into the round numbers of the file system. > > > > Hopefully never :) Just mentioning this as a precaution. > > > > ------------------------------ > > Message: 5 > Date: Mon, 26 Jun 2006 12:44:17 -0300 (ADT) > From: "Marc G. Fournier" <scrappy@hub.org> > Subject: Re: What denotes a 'blocked' process? > To: freebsd-stable@freebsd.org > Message-ID: <20060626124226.Y1114@ganymede.hub.org> > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > On Mon, 26 Jun 2006, Marc G. Fournier wrote: > > > > > Just upgraded to June 15th sources, started up all the processes, and am > > already at 29 blocked processes ... > > > > I've checked for states D, E and L ... nothing ... > > > > Actually, let's go one better ... attached is a complete list of my > process > > table (MWCHAN, STATE, COMMAND) ... right now, vmstat is showing: > > > > 1 33 0 6381952 177944 1695 0 0 0 1601 0 1 0 416 50012 1657 > 14 > > 14 72 > > 1 33 2 6376440 181744 2013 0 0 0 2172 0 3 0 448 68528 1629 > 17 > > 15 68 > > 4 33 0 6385484 178364 1944 0 3 0 1758 0 8 0 420 57698 1221 > 17 > > 14 69 > > 23 46 0 6463664 149528 5294 29 4 2 4659 0 37 0 505 44758 3040 > 27 > > 28 45 > > 4 34 1 6424904 169660 4216 16 7 0 4047 0 211 0 1002 47502 5769 > 42 > > 30 28 > > 1 35 0 6453992 167388 2414 0 9 0 2265 0 44 0 535 62932 3160 > 18 > > 18 64 > > 7 33 0 6443672 168100 1642 0 0 0 1652 0 5 0 448 51974 2163 > 15 > > 15 70 > > > > So, according to this, there should be 33 processes blocked somewhere > ... > > STATEs D/E/L all show nothing ... even state R (long shot) is showing > 3-4 > > processes, and that's it ... > > > > This kernel is actually worse then the last, in that the last, on a > reboot, > > I'd see 4-5 blocked, and then it would slowly rise over the course of 24 > > hours, not start at 33 and rise from there ... > > Wow, in less then 1 hour, I'm up to 60 blocked, barely 1 runnable: > > 0 60 0 7016076 187424 2527 0 0 0 1722 0 5 0 320 7921 2140 > 24 19 57 > 0 60 0 7027436 185124 581 0 1 0 428 0 9 0 303 3214 > 2425 5 9 86 > 0 60 0 7053368 183060 217 4 1 0 130 0 71 0 453 1748 > 1157 6 4 90 > 1 60 1 7050848 183556 4 0 0 7 27 0 21 0 307 965 > 857 1 4 94 > 0 60 2 7050860 183652 2 0 0 0 6 0 0 0 256 829 > 1030 2 3 95 > 0 60 0 7051028 183348 28 1 2 0 11 0 3 0 307 944 > 855 3 3 95 > 0 60 1 7056876 182248 136 0 0 0 66 0 8 0 285 1190 > 945 1 4 95 > > And nadda in ps: > > pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^D/ || $6 == "STAT"' ; ps > aux | wc -l > PID PPID F MWCHAN TT STAT TIME COMMAND > 2 0 204 - ?? DL 0:00.45 [g_event] > 3 0 204 - ?? DL 0:04.87 [g_up] > 4 0 204 - ?? DL 0:06.19 [g_down] > 5 0 204 - ?? DL 0:00.00 [thread taskq] > 6 0 204 - ?? DL 0:00.00 [kqueue taskq] > 7 0 204 - ?? DL 0:00.00 [acpi_task0] > 8 0 204 - ?? DL 0:00.00 [acpi_task1] > 9 0 204 - ?? DL 0:00.00 [acpi_task2] > 10 0 204 ktrace ?? DL 0:00.00 [ktrace] > 15 0 204 - ?? DL 0:00.68 [yarrow] > 25 0 204 psleep ?? DL 0:00.70 [pagedaemon] > 26 0 204 psleep ?? DL 0:00.00 [vmdaemon] > 27 0 20c pgzero ?? DL 0:14.43 [pagezero] > 28 0 204 psleep ?? DL 0:00.14 [bufdaemon] > 29 0 204 vlruwt ?? DL 0:00.15 [vnlru] > 30 0 204 syncer ?? DL 0:10.29 [syncer] > 31 0 204 sdflus ?? DL 0:00.68 [softdepflush] > 32 0 204 - ?? DL 0:03.28 [schedcpu] > 1170 > pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^E/ || $6 == "STAT"' ; ps > aux | wc -l > PID PPID F MWCHAN TT STAT TIME COMMAND > 1174 > pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^L/ || $6 == "STAT"' ; ps > aux | wc -l > PID PPID F MWCHAN TT STAT TIME COMMAND > 12 0 20c Giant ?? LL 0:08.16 [swi4: clock] > 1170 > pluto# > > Something *has* to be leaking here somewhere ... :( > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org > ) > Email . scrappy@hub.org MSN . scrappy@hub.org > Yahoo . yscrappy Skype: hub.org ICQ . 7615664 > > > ------------------------------ > > Message: 6 > Date: Mon, 26 Jun 2006 13:12:58 -0600 > From: "Wilde, Donald" <dwilde@sandia.gov> > Subject: RE: vinum to gvinum help > To: "freebsd-stable" <freebsd-stable@freebsd.org> > Message-ID: > <040DF00BF960A24897B5B3EFBE63FE8A026B10B8@ES20SNLNT.srn.sandia.gov > > > Content-Type: text/plain; charset=us-ascii > > > > -----Original Message----- > From: owner-freebsd-stable@freebsd.org > [mailto:owner-freebsd-stable@freebsd.org] On Behalf Of Sven Willenberger > Sent: Monday, June 26, 2006 12:15 PM > To: Roland Smith > Cc: freebsd-stable > Subject: Re: vinum to gvinum help > > On Mon, 2006-06-26 at 19:15 +0200, Roland Smith wrote: > > On Mon, Jun 26, 2006 at 12:22:07PM -0400, Sven Willenberger wrote: > > > I have an i386 system currently running 5.2.1-RELEASE with a vinum > > > mirror array (2 drives comprising /usr ). I want to upgrade this to > > > 5.5-RELEASE which, if I understand correctly, no longer supports > > > vinum arrays. Would simply chaning /boot/loader.conf to read > > > gvinum_load instead of vinum_load work or would the geom layer > > > prevent this from working properly? If not, is there a recommended > > > way of upgrading a vinum array to a gvinum or gmirror array? > > > > Lost of things have changed between 5.2.1 and 5.5. I think it would be > > > best to make a backup and do a clean reinstall. > > > > Roland > > Sadly this may not be an option; this is a production server that can at > best stand an hour or so of downtime. Between all the custom symlinked > directories, applications, etc, plus the sheer volume of data that would > need to be backed up, an in-place upgrade would be infinitely more > desirable. If it comes to the point of having to back up and do a fresh > install I suspect I would be using the 6.x series anyway. I was really > hoping that some way of upgrading in-place were available for vinum. > > Sven > > DSW> Sven, your best bet will be to build a set of disks off-line and > then swap them in. That's the only way you can be sure to do it right. > Ask yourself if the cost of finding and building a mule is worth more > than the pain of screwing up. > > It _is_ well worth doing, there were many things that were still unglued > in 5.2.1. > -- > Don Wilde Org 01737 505-844-1126 > Earth Halted: Please reboot to continue > > > > ------------------------------ > > Message: 7 > Date: Mon, 26 Jun 2006 23:05:15 +0300 > From: Kostik Belousov <kostikbel@gmail.com> > Subject: Re: What denotes a 'blocked' process? > To: "Marc G. Fournier" <scrappy@hub.org> > Cc: freebsd-stable@freebsd.org > Message-ID: <20060626200515.GL79678@deviant.kiev.zoral.com.ua> > Content-Type: text/plain; charset="us-ascii" > > On Mon, Jun 26, 2006 at 12:44:17PM -0300, Marc G. Fournier wrote: > > On Mon, 26 Jun 2006, Marc G. Fournier wrote: > > > > > > > >Just upgraded to June 15th sources, started up all the processes, and > am > > >already at 29 blocked processes ... > > > > > >I've checked for states D, E and L ... nothing ... > > > > > >Actually, let's go one better ... attached is a complete list of my > > >process table (MWCHAN, STATE, COMMAND) ... right now, vmstat is > showing: > > > > > >1 33 0 6381952 177944 1695 0 0 0 1601 0 1 0 416 50012 1657 > 14 > > >14 72 > > >1 33 2 6376440 181744 2013 0 0 0 2172 0 3 0 448 68528 1629 > 17 > > >15 68 > > >4 33 0 6385484 178364 1944 0 3 0 1758 0 8 0 420 57698 1221 > 17 > > >14 69 > > >23 46 0 6463664 149528 5294 29 4 2 4659 0 37 0 505 44758 > 3040 > > >27 28 45 > > >4 34 1 6424904 169660 4216 16 7 0 4047 0 211 0 1002 47502 5769 > 42 > > >30 28 > > >1 35 0 6453992 167388 2414 0 9 0 2265 0 44 0 535 62932 3160 > 18 > > >18 64 > > >7 33 0 6443672 168100 1642 0 0 0 1652 0 5 0 448 51974 2163 > 15 > > >15 70 > > > > > >So, according to this, there should be 33 processes blocked somewhere > ... > > >STATEs D/E/L all show nothing ... even state R (long shot) is showing > 3-4 > > >processes, and that's it ... > > > > > >This kernel is actually worse then the last, in that the last, on a > > >reboot, I'd see 4-5 blocked, and then it would slowly rise over the > course > > >of 24 hours, not start at 33 and rise from there ... > > > > Wow, in less then 1 hour, I'm up to 60 blocked, barely 1 runnable: > > > > 0 60 0 7016076 187424 2527 0 0 0 1722 0 5 0 320 7921 2140 > 24 > > 19 57 > > 0 60 0 7027436 185124 581 0 1 0 428 0 9 0 303 3214 > 2425 5 > > 9 86 > > 0 60 0 7053368 183060 217 4 1 0 130 0 71 0 453 1748 > 1157 6 > > 4 90 > > 1 60 1 7050848 183556 4 0 0 7 27 0 21 0 307 965 > 857 1 4 > > 94 > > 0 60 2 7050860 183652 2 0 0 0 6 0 0 0 256 829 > 1030 2 > > 3 95 > > 0 60 0 7051028 183348 28 1 2 0 11 0 3 0 307 944 > 855 3 3 > > 95 > > 0 60 1 7056876 182248 136 0 0 0 66 0 8 0 285 1190 > 945 1 4 > > 95 > > > > And nadda in ps: > > > > pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^D/ || $6 == "STAT"' ; ps > > aux | wc -l > > PID PPID F MWCHAN TT STAT TIME COMMAND > > 2 0 204 - ?? DL 0:00.45 [g_event] > > 3 0 204 - ?? DL 0:04.87 [g_up] > > 4 0 204 - ?? DL 0:06.19 [g_down] > > 5 0 204 - ?? DL 0:00.00 [thread taskq] > > 6 0 204 - ?? DL 0:00.00 [kqueue taskq] > > 7 0 204 - ?? DL 0:00.00 [acpi_task0] > > 8 0 204 - ?? DL 0:00.00 [acpi_task1] > > 9 0 204 - ?? DL 0:00.00 [acpi_task2] > > 10 0 204 ktrace ?? DL 0:00.00 [ktrace] > > 15 0 204 - ?? DL 0:00.68 [yarrow] > > 25 0 204 psleep ?? DL 0:00.70 [pagedaemon] > > 26 0 204 psleep ?? DL 0:00.00 [vmdaemon] > > 27 0 20c pgzero ?? DL 0:14.43 [pagezero] > > 28 0 204 psleep ?? DL 0:00.14 [bufdaemon] > > 29 0 204 vlruwt ?? DL 0:00.15 [vnlru] > > 30 0 204 syncer ?? DL 0:10.29 [syncer] > > 31 0 204 sdflus ?? DL 0:00.68 [softdepflush] > > 32 0 204 - ?? DL 0:03.28 [schedcpu] > > 1170 > > pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^E/ || $6 == "STAT"' ; ps > > aux | wc -l > > PID PPID F MWCHAN TT STAT TIME COMMAND > > 1174 > > pluto# ps ax -O ppid,flags,mwchan | awk '$6 ~ /^L/ || $6 == "STAT"' ; ps > > aux | wc -l > > PID PPID F MWCHAN TT STAT TIME COMMAND > > 12 0 20c Giant ?? LL 0:08.16 [swi4: clock] > > 1170 > > pluto# > > > > Something *has* to be leaking here somewhere ... :( > > Dumb unmotivated question: do you have nfs exports on this machine ? > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/pgp-signature > Size: 187 bytes > Desc: not available > Url : > http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060626/2fd3cb15/attachment-0001.pgp > > ------------------------------ > > Message: 8 > Date: Mon, 26 Jun 2006 15:25:49 -0300 (ADT) > From: "Marc G. Fournier" <scrappy@hub.org> > Subject: Re: vmstat 'b' (disk busy?) field keeps climbing ... > To: Kostik Belousov <kostikbel@gmail.com> > Cc: freebsd-stable@freebsd.org, Dmitry Morozovsky <marck@rinet.ru> > Message-ID: <20060626152345.M1114@ganymede.hub.org> > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > > I think I might have found *at least* one of the problems, and that being > the excessively high blocked states while ps isn't finding anything ... > > MySQL > > We just recently started allowing clients to run a MySQL server *within* > their vServer ... in a drastic move, I just shut them all down on pluto, > and blocked drop'd from ~86 down to 5 in a matter of moments ... > restarting them all has it climbing once more, being up around 22 already > ... > > I'm going to go with that theory for now, and keep an eye on things ... > > Just curious as to why, even with -H, its not showing any blocked states > within ps though ... ? > > Thx > > > On Mon, 26 Jun 2006, Kostik Belousov wrote: > > > On Mon, Jun 26, 2006 at 02:20:12AM -0300, Marc G. Fournier wrote: > >> On Mon, 26 Jun 2006, Kostik Belousov wrote: > >> > >>> Yes, this looks like a deadlock. As I understand, that's on 6.1-STABLE? > >> > >> Yes, kernel sources, it seems, from May 25th, according to my /usr/src > >> tree ... > >> > >>> BTW, do you use snapshots ? > >> > >> Not that I've explicitly enabled ... > >> > >>> I think that without ddb access, diagnose and debug the problem would > be > >>> quite hard. > >> > >> Would it be a simple matter of: > >> > >> CTL-ALT-ESC > >> panic > >> > >> to get it to dump core? Or would more be involved? Would a core dump > >> even work? > > Core dumps are somewhat unconvenient in this situation. Better, > > sending report to me, follow my advise in > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html > > > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org > ) > Email . scrappy@hub.org MSN . scrappy@hub.org > Yahoo . yscrappy Skype: hub.org ICQ . 7615664 > > > ------------------------------ > > Message: 9 > Date: Tue, 27 Jun 2006 00:01:08 +0300 (EEST) > From: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: Robert Watson <rwatson@freebsd.org> > Cc: freebsd-acpi@freebsd.org, freebsd-stable@freebsd.org, Pete > French > <petefrench@ticketswitch.com> > Message-ID: <20060626235355.Q95667@atlantis.atlantis.dp.ua> > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > > Hello! > > On Mon, 26 Jun 2006, Robert Watson wrote: > > I think this is a useful activity, especially if you've already run > extensive > > memory testing on the box. If you haven't yet done that, I encourage > you to > > take a break from buildworld's and make sure the memory tests pass. I > spent > > several months on and off trying to track down a bug a few years ago, > which > > turned out to be a one bit error in memory on the box. It would appear > and > > This is precisely the task which hardware ECC solves: to correct any > single- > bit memory error and to detect 2-bit and most of several-bit errors. I > prefer > ECC-capable hardware even for home PC; for server it's a must IMHO. > > Sincerely, Dmitry > -- > Atlantis ISP, System Administrator > e-mail: dmitry@atlantis.dp.ua > nic-hdl: LYNX-RIPE > > > ------------------------------ > > Message: 10 > Date: Mon, 26 Jun 2006 22:44:18 +0200 > From: Max Laier <max@love2party.net> > Subject: Re: vmstat 'b' (disk busy?) field keeps climbing ... > To: freebsd-stable@freebsd.org > Cc: Kostik Belousov <kostikbel@gmail.com>, Dmitry Morozovsky > <marck@rinet.ru> > Message-ID: <200606262244.25505.max@love2party.net> > Content-Type: text/plain; charset="iso-8859-1" > > On Monday 26 June 2006 20:25, Marc G. Fournier wrote: > > I think I might have found *at least* one of the problems, and that > being > > the excessively high blocked states while ps isn't finding anything ... > > > > MySQL > > > > We just recently started allowing clients to run a MySQL server *within* > > their vServer ... in a drastic move, I just shut them all down on pluto, > > and blocked drop'd from ~86 down to 5 in a matter of moments ... > > restarting them all has it climbing once more, being up around 22 > already > > ... > > > > I'm going to go with that theory for now, and keep an eye on things ... > > > > Just curious as to why, even with -H, its not showing any blocked states > > within ps though ... ? > > The "blocked" column shows also processes that have objects > "paging". Most > likely you are *short* on memory. In order to relieve the pressure > program .text pages are free'ed and need to be refetched from disc > whenever > the respective code is being executed. > > If you allow every vServer to run its own mySQL with all the libaries etc > it's > clear what is killing you! Add more memory or make sure that .text pages > can > be reused by several processes. As far as I understand vServer will all > see > a different source and thus not share buffers or the like. > > > Thx > > > > On Mon, 26 Jun 2006, Kostik Belousov wrote: > > > On Mon, Jun 26, 2006 at 02:20:12AM -0300, Marc G. Fournier wrote: > > >> On Mon, 26 Jun 2006, Kostik Belousov wrote: > > >>> Yes, this looks like a deadlock. As I understand, that's on > 6.1-STABLE > > >>> ? > > >> > > >> Yes, kernel sources, it seems, from May 25th, according to my > /usr/src > > >> tree ... > > >> > > >>> BTW, do you use snapshots ? > > >> > > >> Not that I've explicitly enabled ... > > >> > > >>> I think that without ddb access, diagnose and debug the problem > would > > >>> be quite hard. > > >> > > >> Would it be a simple matter of: > > >> > > >> CTL-ALT-ESC > > >> panic > > >> > > >> to get it to dump core? Or would more be involved? Would a core > dump > > >> even work? > > > > > > Core dumps are somewhat unconvenient in this situation. Better, > > > sending report to me, follow my advise in > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kern > > >eldebug-deadlocks.html > > > > ---- > > Marc G. Fournier Hub.Org Networking Services ( > http://www.hub.org) > > Email . scrappy@hub.org MSN . > scrappy@hub.org > > Yahoo . yscrappy Skype: hub.org ICQ . 7615664 > > _______________________________________________ > > freebsd-stable@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org > " > > -- > /"\ Best regards, | mlaier@freebsd.org > \ / Max Laier | ICQ #67774661 > X http://pf4freebsd.love2party.net/ | mlaier@EFnet > / \ ASCII Ribbon Campaign | Against HTML Mail and News > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/pgp-signature > Size: 189 bytes > Desc: not available > Url : > http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060626/a0fec014/attachment-0001.pgp > > ------------------------------ > > Message: 11 > Date: Mon, 26 Jun 2006 17:18:59 -0400 > From: Michael Proto <mike@jellydonut.org> > Subject: Re: kernel can't find root filesystem > To: freebsd-stable@freebsd.org > Message-ID: <44A04F43.2090400@jellydonut.org> > Content-Type: text/plain; charset=ISO-8859-1 > > Robert Ames wrote: > >> From: "M.Hirsch" <M.Hirsch@hirsch.it> > >> > >> I had the same problem with 6.1. But only on some occasions, not > >> always (iirc). > >> The installations I made over the last weeks had all very different > >> environments and deployment methods. > >> I can't tell anymore when it happens and when not because I simply > >> added the below loader.conf setting to my postinstall-script. > >> > >> Add "vfs.root.mountfrom=ufs:da0s1" to /boot/loader.conf to fix it. > > > > Thank you. That solves my problem even though it seems more like > > a workaround than an actual solution. But I'll take it. :-) > > > > Also, someone responded asking if I had a valid entry in /etc/fstab > > for the root filesystem. > > > > foo# cat /etc/fstab > > # Device Mountpoint FStype Options Dump > > Pass# > > /dev/da0s1a / ufs rw > > 1 1 > > /dev/da0s1b none swap sw > > 0 0 > > /dev/da1s1d /local ufs rw > > 2 2 > > /dev/cd0 /cdrom cd9660 ro,noauto > > 0 0 > > > > If I'm not mistaken, you could also try to (re)install the boot0 loader: > > boot0cfg /dev/da0 > > > -Proto > > > ------------------------------ > > Message: 12 > Date: Mon, 26 Jun 2006 23:21:22 +0200 > From: "M.Hirsch" <M.Hirsch@hirsch.it> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua> > Cc: freebsd-stable@freebsd.org > Message-ID: <44A04FD2.1030001@hirsch.it> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > ECC is a way to mask broken hardware. I rather have my hardware fail > directly when it does first, so I can replace it _immediately_ > What's your hardware good for if it passes a "test", but fails in > production? > > ECC is totally overrated. > > (sorry, couldn't resist...) > > M. > > > ------------------------------ > > Message: 13 > Date: Mon, 26 Jun 2006 14:32:26 -0500 > From: "Matthew D. Fuller" <fullermd@over-yonder.net> > Subject: Re: Gigabit ethernet very slow. > To: Michael Vince <mv@thebeastie.org> > Cc: freebsd-stable@freebsd.org, performance@freebsd.org, Nikolas > Britton <nikolas.britton@gmail.com>, Sean Bryant < > bryants@gmail.com> > Message-ID: <20060626193226.GF74292@over-yonder.net> > Content-Type: text/plain; charset=us-ascii > > On Mon, Jun 26, 2006 at 05:05:26PM +1000 I heard the voice of > Michael Vince, and lo! it spake thus: > > > > According to pftop (with modulate state rules) I am able to get > > about 85megs/sec when I don't have dd running. dd does indeed eats a > > fair amount of cpu (40%) on the AMD64 6-stable machine. > > dd does ridiculously small (512 byte?) read/writes, so it's gotta do a > LOT of system calls and a lot of context switching when you don't give > it a bigger blocksize. > > > -- > Matthew Fuller (MF4839) | fullermd@over-yonder.net > Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ > On the Internet, nobody can hear you scream. > > > ------------------------------ > > Message: 14 > Date: Mon, 26 Jun 2006 23:26:54 +0200 > From: Wilko Bulte <wb@freebie.xs4all.nl> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: "M.Hirsch" <M.Hirsch@hirsch.it> > Cc: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>, > freebsd-stable@FreeBSD.ORG > Message-ID: <20060626212654.GB93703@freebie.xs4all.nl> > Content-Type: text/plain; charset=us-ascii > > On Mon, Jun 26, 2006 at 11:21:22PM +0200, M.Hirsch wrote.. > > ECC is a way to mask broken hardware. I rather have my hardware fail > > directly when it does first, so I can replace it _immediately_ > > What's your hardware good for if it passes a "test", but fails in > > production? > > > > ECC is totally overrated. > > Balderdash. > > Following your rationale you want your bank account data > silently be corrupted by hardware with bit errors? Be my guest, give > me ECC any day. > > Proper hardware will log the ECC errors, a proper OS tailored to that > hardware will log and notify the sysadmins. > > That is how it should be done. > > Wilko > > -- > Wilko Bulte wilko@FreeBSD.org > > > ------------------------------ > > Message: 15 > Date: Mon, 26 Jun 2006 17:28:54 -0400 > From: Michael Proto <mike@jellydonut.org> > Subject: Re: wi0 down when print a lot of data to screen over ssh > To: freebsd-stable@freebsd.org > Message-ID: <44A05196.1070708@jellydonut.org> > Content-Type: text/plain; charset=UTF-8 > > Ren Zhen wrote: > > wi0 goes down when I run a program print a lot of data to > > stdout, or when I use zmrx-zmtx it also goes down. > > > > kernel says: > > kernel: wi0: timeout in wi_seek to 152/0 > > last message repeated 7 times > > kernel: wi0: device timeout > > kernel: wi0: timeout in wi_seek to 152/0 > > kernel: wi0: link state changed to DOWN > > > > another time kernel says: > > kernel: wi0: timeout in wi_cmd 0x010b; event status 0x8000 > > kernel: wi0: xmit failed > > kernel: wi0: timeout in wi_seek to 128/0 > > last message repeated 3 times > > > > I used to see similar behavior with wi0 on my ThinkPad A30p (IBM High > Rate Wireless, PRISM 2.5) when powersave was enabled via ifconfig (I > believe it may be on by default, not sure about that). If you disable > powersave via 'ifconfig wi0 -powersave' do you still see the problem? > > > -Proto > > > ------------------------------ > > Message: 16 > Date: Mon, 26 Jun 2006 23:31:58 +0200 > From: "M.Hirsch" <M.Hirsch@gmx.de> > Subject: Re: kernel can't find root filesystem > To: Michael Proto <mike@jellydonut.org> > Cc: freebsd-stable@freebsd.org > Message-ID: <44A0524E.900@gmx.de> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Sorry, doesn't help. > > There is some kind of bug hiding somewhere in 6.1 where it does not > auto-detect the root partition under certain circumstances. Can't tell > when it worked last, as the last distro I consider "stable" was 4.X... > (sorry for the rant...) > > I am not using (and don't want to use...) boot0 at all. > Well, I tried, but it didn't help the situation anyways... > > It should work with the standard MBR and boot code ("/boot/mbr" and > "/boot/boot"), right? > i.e. fdisk -B and bsdlabel -B without further params should do the job > to get the system bootstrapped. > But it does not. > > M. > > >If I'm not mistaken, you could also try to (re)install the boot0 loader: > > > >boot0cfg /dev/da0 > > > > > >-Proto > >_______________________________________________ > >freebsd-stable@freebsd.org mailing list > >http://lists.freebsd.org/mailman/listinfo/freebsd-stable > >To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > > > > > > > > > ------------------------------ > > Message: 17 > Date: Mon, 26 Jun 2006 23:37:18 +0200 > From: "M.Hirsch" <M.Hirsch@gmx.de> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: Wilko Bulte <wb@freebie.xs4all.nl> > Cc: freebsd-stable@freebsd.org > Message-ID: <44A0538E.6090906@gmx.de> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Nope, > > I'd like my bank data to be stored on a system that does ECC, no question. > But please, on hard disk level (RAID; that is _permanent_), not in the > RAM of a single node. > > If memory gets corrupted, please, raise a kernel panic... Even if > there's ECC in place. > > Counter question: > Would you like your bank account data to be stored on a medium where one > failure can be corrected, two can be detected, but three go unnoticed? > How unlikely is that, if you've got some hardware that is really /broken/? > > I know this is a rather random thing to happen. > Still, I think ECC memory is overrated. Better have it fail immediately. > _With a kernel panic, please_ > > M. > > Wilko Bulte schrieb: > > >Balderdash. > > > >Following your rationale you want your bank account data > >silently be corrupted by hardware with bit errors? Be my guest, give > >me ECC any day. > > > >Proper hardware will log the ECC errors, a proper OS tailored to that > >hardware will log and notify the sysadmins. > > > >That is how it should be done. > > > >Wilko > > > > > > > > > > ------------------------------ > > Message: 18 > Date: Mon, 26 Jun 2006 23:45:35 +0200 > From: Wilko Bulte <wb@freebie.xs4all.nl> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: "M.Hirsch" <M.Hirsch@gmx.de> > Cc: freebsd-stable@freebsd.org > Message-ID: <20060626214535.GA94015@freebie.xs4all.nl> > Content-Type: text/plain; charset=us-ascii > > On Mon, Jun 26, 2006 at 11:37:18PM +0200, M.Hirsch wrote.. > > Nope, > > > > I'd like my bank data to be stored on a system that does ECC, no > question. > > But please, on hard disk level (RAID; that is _permanent_), not in the > > RAM of a single node. > > > > If memory gets corrupted, please, raise a kernel panic... Even if > > You *can't* panic if it is just a single bit error in a user page. You > will never know there was a corruption.. If that was a page holding your > account data your are toast. > > > there's ECC in place. > > Of course not. You only panic once you have no other options left. > Proper hardware with ECC give you these options. I am not talking > consumer grade crap here of course. > > > Counter question: > > Would you like your bank account data to be stored on a medium where one > > failure can be corrected, two can be detected, but three go unnoticed? > > How unlikely is that, if you've got some hardware that is really > /broken/? > > Very unlikely. There is enough hardware design done after all these > years that this kind of problem can be prevented. > > > I know this is a rather random thing to happen. > > Still, I think ECC memory is overrated. Better have it fail immediately. > > _With a kernel panic, please_ > > As said, you can't > > > > > M. > > > > Wilko Bulte schrieb: > > > > >Balderdash. > > > > > >Following your rationale you want your bank account data > > >silently be corrupted by hardware with bit errors? Be my guest, give > > >me ECC any day. > > > > > >Proper hardware will log the ECC errors, a proper OS tailored to that > > >hardware will log and notify the sysadmins. > > > > > >That is how it should be done. > > > > > >Wilko > > > > > > > > > > --- end of quoted text --- > > -- > Wilko Bulte wilko@FreeBSD.org > > > ------------------------------ > > Message: 19 > Date: Tue, 27 Jun 2006 00:11:03 +0200 > From: "M.Hirsch" <M.Hirsch@gmx.de> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: Michael Butler <imb@protected-networks.net> > Cc: freebsd-stable@freebsd.org > Message-ID: <44A05B77.1030200@gmx.de> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > <snip> > > > .. So the logs are there, all that's required is a utility to read them > >and, optionally, alert the administrator to the event, > > > > > > > No, I think a panic _should_ occur, even if there was a correctable > error. Not "when there's no other option left". > Maybe make it optional via a kernel option. > There are much less-significant problems that can cause a panic. > > Sure, you may be one of the few people out there who knows how to > correctly run a _BSD_ system... > There's few of yous out there, ;) > > M. > > > ------------------------------ > > Message: 20 > Date: Tue, 27 Jun 2006 00:18:04 +0200 > From: Wilko Bulte <wb@freebie.xs4all.nl> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: "M.Hirsch" <M.Hirsch@gmx.de> > Cc: freebsd-stable@freebsd.org > Message-ID: <20060626221804.GA94278@freebie.xs4all.nl> > Content-Type: text/plain; charset=us-ascii > > On Tue, Jun 27, 2006 at 12:11:03AM +0200, M.Hirsch wrote.. > > <snip> > > > > >.. So the logs are there, all that's required is a utility to read them > > >and, optionally, alert the administrator to the event, > > > > > > > > > > > No, I think a panic _should_ occur, even if there was a correctable > > error. Not "when there's no other option left". > > You really have never seen a machine used for serious business apparantly. > > > Maybe make it optional via a kernel option. > > There are much less-significant problems that can cause a panic. > > panics like that should be eradicated, adding more nonsensical panics > is not what we need. > > > Sure, you may be one of the few people out there who knows how to > > correctly run a _BSD_ system... > > There's few of yous out there, ;) > > > > M. > > _______________________________________________ > > freebsd-stable@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org > " > --- end of quoted text --- > > -- > Wilko Bulte wilko@FreeBSD.org > > > ------------------------------ > > Message: 21 > Date: Tue, 27 Jun 2006 01:22:47 +0300 (EEST) > From: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: "M.Hirsch" <M.Hirsch@hirsch.it> > Cc: freebsd-stable@freebsd.org > Message-ID: <20060627011512.N95667@atlantis.atlantis.dp.ua> > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > > Hello! > > On Mon, 26 Jun 2006, M.Hirsch wrote: > > ECC is a way to mask broken hardware. I rather have my hardware fail > directly > > when it does first, so I can replace it _immediately_ > > You got it backwards. If your data has any value to you, then you don't > want > to miss any single-error bit in it, do you? If you're running hardware w/o > ECC, your single-bit error in your data will go to the disk unnoticed, and > you'll lose your data. With ECC, hardware will correct it. In (rare) case > of > multiple-bit error ECC logic will generate NMI for you, so you'll notice > and > "replace it _immediately_" instead of two weeks ago when your archive wont > extract. > > > What's your hardware good for if it passes a "test", but fails in > production? > > It's the way in what RAM will manifest single-bit errors: you run memory > test > - it won't catch them, later in production you'll miss this error because > nothing will provide extra sanity check of your data. > > > ECC is totally overrated. > > Only by the people who don't understand it's point! > > > Sincerely, Dmitry > -- > Atlantis ISP, System Administrator > e-mail: dmitry@atlantis.dp.ua > nic-hdl: LYNX-RIPE > > > ------------------------------ > > Message: 22 > Date: Mon, 26 Jun 2006 18:55:17 -0300 (ADT) > From: "Marc G. Fournier" <scrappy@hub.org> > Subject: Re: vmstat 'b' (disk busy?) field keeps climbing ... > To: Max Laier <max@love2party.net> > Cc: Kostik Belousov <kostikbel@gmail.com>, freebsd-stable@freebsd.org, > Dmitry Morozovsky <marck@rinet.ru> > Message-ID: <20060626185437.I1114@ganymede.hub.org> > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > On Mon, 26 Jun 2006, Max Laier wrote: > > > On Monday 26 June 2006 20:25, Marc G. Fournier wrote: > >> I think I might have found *at least* one of the problems, and that > being > >> the excessively high blocked states while ps isn't finding anything ... > >> > >> MySQL > >> > >> We just recently started allowing clients to run a MySQL server > *within* > >> their vServer ... in a drastic move, I just shut them all down on > pluto, > >> and blocked drop'd from ~86 down to 5 in a matter of moments ... > >> restarting them all has it climbing once more, being up around 22 > already > >> ... > >> > >> I'm going to go with that theory for now, and keep an eye on things ... > >> > >> Just curious as to why, even with -H, its not showing any blocked > states > >> within ps though ... ? > > > > The "blocked" column shows also processes that have objects "paging". > > Most likely you are *short* on memory. In order to relieve the pressure > > program .text pages are free'ed and need to be refetched from disc > > whenever the respective code is being executed. > > 'k, but shouldn't the OS be doing any swapping, if this was the case? I'm > getting <1M of swappage when the blocked pages are really high ... > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org > ) > Email . scrappy@hub.org MSN . scrappy@hub.org > Yahoo . yscrappy Skype: hub.org ICQ . 7615664 > > > ------------------------------ > > Message: 23 > Date: Mon, 26 Jun 2006 18:54:08 -0300 (ADT) > From: "Marc G. Fournier" <scrappy@hub.org> > Subject: Re: What denotes a 'blocked' process? > To: Kostik Belousov <kostikbel@gmail.com> > Cc: freebsd-stable@freebsd.org > Message-ID: <20060626185338.D1114@ganymede.hub.org> > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > On Mon, 26 Jun 2006, Kostik Belousov wrote: > > > Dumb unmotivated question: do you have nfs exports on this machine ? > > neither nfs nor mountd are currently running ... > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org > ) > Email . scrappy@hub.org MSN . scrappy@hub.org > Yahoo . yscrappy Skype: hub.org ICQ . 7615664 > > > ------------------------------ > > Message: 24 > Date: Mon, 26 Jun 2006 18:02:38 -0400 > From: "Michael Butler" <imb@protected-networks.net> > Subject: RE: FreeBSD 6.x CVSUP today crashes with zero load ... > To: "'Wilko Bulte'" <wb@freebie.xs4all.nl>, "'M.Hirsch'" > <M.Hirsch@gmx.de> > Cc: freebsd-stable@freebsd.org > Message-ID: <000001c6996c$3eab9df0$ad0d510a@toshi> > Content-Type: text/plain; charset="us-ascii" > > > Of course not. You only panic once you have no other options left. > > Proper hardware with ECC give you these options. I am not talking > > consumer grade crap here of course. > > I agree that no panic should occur if the error was correctable and it > should when it isn't. > > However, *real* equipment will log a corrected error .. from an aging Dell > 1-U server .. > > Handle 0x0024, DMI type 15, 33 bytes > System Event Log > Area Length: 4096 bytes > Header Start Offset: 0x0000 > Header Length: 16 bytes > Data Start Offset: 0x0010 > Access Method: Memory-mapped physical 32-bit address > Access Address: 0xFFF33000 > Status: Valid, Not Full > Change Token: 0x00000000 > Header Format: Type 1 > Supported Log Type Descriptors: 5 > Descriptor 1: POST error > Data Format 1: POST results bitmap > Descriptor 2: Parity memory error > Data Format 2: Multiple-event > Descriptor 3: I/O channel block > Data Format 3: Multiple-event > Descriptor 4: Single-bit ECC memory error > Data Format 4: Multiple-event > Descriptor 5: Multi-bit ECC memory error > Data Format 5: Multiple-event > > .. So the logs are there, all that's required is a utility to read them > and, optionally, alert the administrator to the event, > > Michael Butler, CISSP > Security Architect > Protected Networks > http://www.protected-networks.net > > > > ------------------------------ > > Message: 25 > Date: Mon, 26 Jun 2006 23:54:53 +0200 > From: "M.Hirsch" <M.Hirsch@hirsch.it> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: Wilko Bulte <wb@freebie.xs4all.nl> > Cc: freebsd-stable@freebsd.org > Message-ID: <44A057AD.7050700@hirsch.it> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Ok, sorry. Misunderstanding here. > My point was, along what has been posted here in this thread: > "An ECC error should raise a kernel panic immediately, not only a > message in the log files." > Any hardware showing ECC errors should be replaced asap.. > Make them lazy admins do what they're getting paid for... > > Correct, you can't (quickly) detect this without ECC hardware, of course. > But I keep reading about "ECC" being the solution to broken RAM sticks... > > Since FreeBSD panics on creating simple malloc() vnodes, it should do so > on ECC errors first. > Different mission, I guess ;) > (And different problems with the recent fricking code...) > > M. > > > ------------------------------ > > Message: 26 > Date: Tue, 27 Jun 2006 00:02:06 +0200 > From: Wilko Bulte <wb@freebie.xs4all.nl> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: "M.Hirsch" <M.Hirsch@hirsch.it> > Cc: freebsd-stable@freebsd.org > Message-ID: <20060626220206.GA94183@freebie.xs4all.nl> > Content-Type: text/plain; charset=us-ascii > > On Mon, Jun 26, 2006 at 11:54:53PM +0200, M.Hirsch wrote.. > > Ok, sorry. Misunderstanding here. > > My point was, along what has been posted here in this thread: > > "An ECC error should raise a kernel panic immediately, not only a > > message in the log files." > > Any hardware showing ECC errors should be replaced asap.. > > Yes, but keep in mind that ASAP often means "during a scheduled > maintenance window". Which can be months away in some cases. > > > Make them lazy admins do what they're getting paid for... > > > > Correct, you can't (quickly) detect this without ECC hardware, of > course. > > Skip the 'quickly', you need ECC, full stop. Otherwise you will not > detect > it until it is way too late. I can tell you from personal experience > that customers hate nothing more than undetected data corruption. ECC > RAM is only part of the fix of course. ECC better be end to end, but it > hardly is.. > > > But I keep reading about "ECC" being the solution to broken RAM > sticks... > > Not really of course. But there are OS-es that simply map pages with > known problems into a "do not use" list. > > > Since FreeBSD panics on creating simple malloc() vnodes, it should do so > > on ECC errors first. > > Different mission, I guess ;) > > (And different problems with the recent fricking code...) > > > > M. > --- end of quoted text --- > > -- > Wilko Bulte wilko@FreeBSD.org > > > ------------------------------ > > Message: 27 > Date: Tue, 27 Jun 2006 00:33:39 +0200 > From: "M.Hirsch" <M.Hirsch@hirsch.it> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: Wilko Bulte <wb@freebie.xs4all.nl> > Cc: freebsd-stable@freebsd.org > Message-ID: <44A060C3.8090008@hirsch.it> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Wilko Bulte schrieb: > > >You really have never seen a machine used for serious business > apparantly. > > > > > > > Depends on what you define "serious business"... > Yes, I am rather new to FreeBSD (2y+) > I am just trying to setup a /stable/ cluster of six machines right now. > For over a week straight. > 4.11 works perfectly. But support is going to be dropped very soon, so > that's a bad option for me right now. > > Over all, the system is /only/ supposed to handle a few hundred hits per > second. (but including dynamic stuff like php...) > > Dunno if that (or what else) is "serious business" for you. > Which version would you suggest for "serious business"? > > Anyways, my point stands: I rather have any of my nodes panic than > carrying the risk of creating invalid data... > One in a billion can be high probability, soon... (just planning for the > future...) > > >panics like that should be eradicated, adding more nonsensical panics > >is not what we need. > > > > > uh, I would not call hardware failure "nonsensical panics". I guess I > must have misunderstood you... > > M. > > > ------------------------------ > > Message: 28 > Date: Tue, 27 Jun 2006 00:39:47 +0200 > From: "M.Hirsch" <M.Hirsch@hirsch.it> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua> > Cc: freebsd-stable@freebsd.org > Message-ID: <44A06233.1090704@hirsch.it> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Dmitry Pryanishnikov schrieb: > > > > > Hello! > > > > On Mon, 26 Jun 2006, M.Hirsch wrote: > > > >> ECC is a way to mask broken hardware. I rather have my hardware fail > >> directly when it does first, so I can replace it _immediately_ > > > > > > You got it backwards. If your data has any value to you, then you > > don't want > > to miss any single-error bit in it, do you? If you're running hardware > > w/o > > ECC, your single-bit error in your data will go to the disk unnoticed, > > and you'll lose your data. With ECC, hardware will correct it. In > > (rare) case of multiple-bit error ECC logic will generate NMI for you, > > so you'll notice and "replace it _immediately_" instead of two weeks > > ago when your archive wont extract. > > > Nope, I am right on track. > I do not want to lose any data. So I'd prefer a ECC error to raise a > panic so I can replace the hardware ASAP. > Don't get me wrong, but tracking bugs in FreeBSD is quite more of an > effort than "just" akquiring a new box... > > >> What's your hardware good for if it passes a "test", but fails in > >> production? > > > > > > It's the way in what RAM will manifest single-bit errors: you run > > memory test - it won't catch them, later in production you'll miss > > this error because > > nothing will provide extra sanity check of your data. > > Ok... > Does the standard fs, UFS2, do "extra sanity checks", then? > > M. > > > ------------------------------ > > Message: 29 > Date: Tue, 27 Jun 2006 00:51:56 +0200 > From: "M.Hirsch" <M.Hirsch@gmx.de> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: "M.Hirsch" <M.Hirsch@hirsch.it> > Cc: freebsd-stable@freebsd.org > Message-ID: <44A0650C.7020806@gmx.de> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > Ok... > > Does the standard fs, UFS2, do "extra sanity checks", then? > > > Sorry, replying to myself... > No, this does not matter. > If the OS thinks the data is ok, UFS will write OK data... > > So, let me rephrase this: > How can I make sure there is no broken hardware in my cluster? > I am not looking for workarounds, like ECC. I want the box to break > immediately once any single component goes wrong... > > > > ------------------------------ > > Message: 30 > Date: Tue, 27 Jun 2006 01:57:17 +0300 (EEST) > From: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: "M.Hirsch" <M.Hirsch@hirsch.it> > Cc: freebsd-stable@freebsd.org > Message-ID: <20060627014335.E87535@atlantis.atlantis.dp.ua> > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > On Tue, 27 Jun 2006, M.Hirsch wrote: > >> On Mon, 26 Jun 2006, M.Hirsch wrote: > >>> ECC is a way to mask broken hardware. I rather have my hardware fail > >>> directly when it does first, so I can replace it _immediately_ > >> > >> > >> You got it backwards. If your data has any value to you, then you > don't > >> > > Nope, I am right on track. > > I do not want to lose any data. So I'd prefer a ECC error to raise a > panic so > > I can replace the hardware ASAP. > > When you wrote "ECC is a way to mask broken hardware", you were plain > wrong. > If you're using hardware w/o ECC, it just can't tell whether error present > or absent. So ECC _is_ the way to detect (not mask) broken hardware. > > If you want ECC corrector to raise NMI on corrected error (as well as > uncorrectable), just set approproate bit in control register - every > Intel's ECC-capable chipset allows it. But if we're speaking about > production environment, such behaviour (abnormal termination on > _corrected_ > error) is unacceptable. > > > Don't get me wrong, but tracking bugs in FreeBSD is quite more of an > effort > > than "just" akquiring a new box... > > I don't see connection between this sentence and ECC (which is hardware > option). > > > Does the standard fs, UFS2, do "extra sanity checks", then? > > Ditto. And don't forget that _every_ data sector on HDD _is_ checked > with CRC. As well as ATA data transfers in UDMA modes. As well as data > in CPU cache. Extra check gives extra reliability. > > Sincerely, Dmitry > -- > Atlantis ISP, System Administrator > e-mail: dmitry@atlantis.dp.ua > nic-hdl: LYNX-RIPE > > > ------------------------------ > > Message: 31 > Date: Mon, 26 Jun 2006 23:59:02 +0100 > From: "Steven Hartland" <killing@multiplay.co.uk> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: "M.Hirsch" <M.Hirsch@hirsch.it>, "Dmitry Pryanishnikov" > <dmitry@atlantis.dp.ua> > Cc: freebsd-stable@freebsd.org > Message-ID: <005401c69974$217f8860$b3db87d4@multiplay.co.uk> > Content-Type: text/plain; format=flowed; charset="iso-8859-1"; > reply-type=response > > M.Hirsch wrote: > > Ok... > > Does the standard fs, UFS2, do "extra sanity checks", then? > > My advice would be dont feed the troll. > > Steve > > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. and > the person or entity to whom it is addressed. In the event of misdirection, > the recipient is prohibited from using, copying, printing or otherwise > disseminating it or any information contained in it. > > In the event of misdirection, illegible or incomplete transmission please > telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > > > ------------------------------ > > Message: 32 > Date: Tue, 27 Jun 2006 01:09:03 +0200 > From: Thomas Nystr?m <thn@saeab.se> > Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... > To: "M.Hirsch" <M.Hirsch@hirsch.it> > Cc: freebsd-stable@freebsd.org > Message-ID: <44A0690F.8040005@saeab.se> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > M.Hirsch wrote: > > Any hardware showing ECC errors should be replaced asap.. > > No. ALL memory will sooner or later show single bit error. > > Several years ago I was checking this during my work at Ericsson. > There was a discussion if ECC should be present in the GSM-base-stations > or not. I had a special test-software running in several units looking > for soft-errors. Soft errors are bits that are flipped spontaneously in > the memory. When the bit are rewritten it will work OK again, no > permanent damage to the memory and no need to replace the memory. > > During my test period (I think it was 6-8 monthes) I saw four occasions > when this occured (total amount of memory 96 MB). > > ECC is intended to fix this: It will correct a single bit fault and > allow the system to contiune uninterrupted. > > Of course this event should be logged and if it occurs several times > at the same place then it is time to replace the memory. > > Of course memory should be better these days but.... knock on wood.... > > /thn [20 years as HW-designer, FreeBSD since 3.0] > > -- > --------------------------------------------------------------- > Svensk Aktuell Elektronik AB Thomas Nystr�m > Box 10 Phone: +46 8 35 92 85 > S-191 21 Sollentuna Fax: +46 8 35 92 86 > Sweden Email: thn@saeab.se > --------------------------------------------------------------- > > > ------------------------------ > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > End of freebsd-stable Digest, Vol 164, Issue 4 > ********************************************** >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?910c4cb0606270733j2dd5f545q83819a0e8e200faf>
