Date: Fri, 23 Dec 2011 16:20:44 -0800 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Charlie Martin <crmartin@sgi.com> Cc: Eric Richards <erichards@sgi.com>, Larry Fenske <LFenske@sgi.com>, freebsd-stable@FreeBSD.org, "Peter W. Morreale" <morreale@sgi.com> Subject: Re: PRINTF_BUFR_SIZE=4096? Message-ID: <20111224002044.GA30339@icarus.home.lan> In-Reply-To: <4EF50882.9080609@sgi.com> References: <4EF3B790.5050509@sgi.com> <20111223000705.GA6242@icarus.home.lan> <4EF4FED2.6020909@sgi.com> <20111223225445.GA29093@icarus.home.lan> <4EF50882.9080609@sgi.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Dec 23, 2011 at 04:02:26PM -0700, Charlie Martin wrote: > Thanks, Jeremy, I really was trying to keep you from needing to dig > this out. This is inherited code with some very peculiar > intermittent panics, so you can imagine that I would be interested > in specifics of the odd behavior. Sadly, I don't think we're seeing > any stack overflows. I say this politely, not condescendingly: your last statement indicates you don't quite understand the nature of what having a large-ish stack-based buffer could do to the kernel. This is not userland. I'm pretty sure the issues you're seeing (the devfs stuff) is fixed or improved in RELENG_8, but I say that without being able to point you to a specific commit. My reasoning is that there has been a *ton* of improvements in devfs in RELENG_8 onward, and these will almost certainly not be backported to RELENG_7. You are very, *very* adamant about stating "we cannot upgrade", and it is my opinion that as long as you don't upgrade, you're going to be "stuck" with these kind of bugs. Therefore, it would be worth your time to put forth efforts in testing RELENG_8 (not 8.2-RELEASE please; seriously, go with 8.2-STABLE (RELENG_8), just trust me on this) in a test environment and see how things go for you. I think you will be pleased with the results. You'll also get much more attentive/better support from the community/developers since RELENG_8 is supported, while RELENG_7 (especially -PRERELEASE) is losing more and more attention. It's Security EOL ends sometime early next year, and I hope you're aware of that fact as well. What I'm getting at here, without getting political: you need to start considering developing resources to help with upgrading. But for sake of example, we have a FreeBSD RELENG_6 box (6.4-STABLE) in our cluster that has actively been up for 385 days (went down a year ago because of co-lo maintenance I was doing on power conduits). If this machine suddenly panic'd, would I report the bug to -stable and so on? No. I would suck it up. > On 12/23/2011 03:54 PM, Jeremy Chadwick wrote: > >On Fri, Dec 23, 2011 at 03:21:06PM -0700, Charlie Martin wrote: > > >When I was doing FreeBSD "stuff" as part of the Project, I added this to > >my Commonly Reported Issues wiki page since it comes up quite often. > >Search for "BUFR". > > > >http://wiki.freebsd.org/BugBusting/Commonly_reported_issues > > > > I will note that all the "Commonly reported" page says is "set the > value to 256" and point to three examples of people seeing garbled > output. There's some history here for why that is.... kind of. I'll try to explain: For many years, PRINT_BUFR_SIZE was not defined in any of the default kernel configs. It was mentioned in /sys/conf/NOTES, but did not ship in GENERIC, etc.. Then after more and more people (since the FreeBSD 6.x days) began reporting interspersed kernel output, more and more developers started finding it annoying too (both the reports and the problem itself; let me tell you, it makes using ddb to debug a kernel crash in real-time), the option was added to the default kernels since it *does* improve things a little bit (better than nothing). The value 256 is something *I personally* chose, because 128 was simply not improving things "enough" on our systems. 256 made a bigger difference. The reason it still remains as 128 in the stock kernel configs is due to the issue I mentioned in my previous post, re: developers having justified concerns over the implications of increasing this value too high. I want readers of this thread to understand something: my previous paragraph should not elude to "the higher the value, the better off you are". I have not actually *looked* at the code to see how it works. I tend to trust folks who know more about the implications (especially in kernel space) of large static buffers, but even in userland I understand the difference and implications of doing char buf[65536]; rather than char *buf = calloc(1, 65536);. TL;DR -- Don't just go increasing this value to something gigantic in hopes that the larger value means you can solve the problem. It won't solve the problem entirely. For now, *knowing* about interspersed output is enough. I'll also point out that Solaris 10 (not sure about OpenIndiana) also has this problem (we see it at work on occasion), so FreeBSD isn't alone. P.S. -- No one on this list should *ever* feel obliged to "cut me some slack" because of holidays. For example, for the past 10 years I have worked on every single US holiday including Christmas. I consider them just like any other day. Maybe it's because I'm not married, don't have kids, don't have a tree, etc. instead preferring to stick with relying on nostalgia/old memories of childhood Christmases and stuff like that. That's just how I am. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111224002044.GA30339>