Date: Fri, 18 Oct 2002 17:21:44 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Ben Stuyts <ben@stuyts.nl> Cc: current@freebsd.org, Jeff Roberson <jroberson@chesapeake.net>, Robert Watson <rwatson@freebsd.org>, jeff@freebsd.org, Alfred Perlstein <alfred@FreeBSD.ORG> Subject: Re: [Ugly PATCH] Again: panic kmem_malloc() Message-ID: <3DB0A598.C53FD37D@mindspring.com> References: <4.3.2.7.2.20021018125313.00bb8990@terminus> <4.3.2.7.2.20021019001010.00b89f28@terminus>
next in thread | previous in thread | raw e-mail | index | archive | help
Ben Stuyts wrote: > >Almost 5.3M of unswappable physical memory dedicated to semaphores > >seems like a bit much. > > Yes, and it increases continuously, for example when I fetch new mail (over > pop) from my windows pc. The pc stores this again on a network drive, so > both qpopper and smbd are involved. For example, vmstat -m says: > > vmstat -m | grep sem > sem155886 2443K 2443K 155886 16,1024,4096 > > Now when I do a fetch-mail with Eudora on my pc, the same command says. > > vmstat -m | grep sem > sem156178 2448K 2448K 156178 16,1024,4096 > > I can repeat this at will, and each time I loose 4-5 KB. qpopper is started > from inetd, and smbd runs as a daemon. I tried stopping smbd: None of us have been able to repeat your problem, up to now. I suppose now that we know you are running qpopper on -current, we could repeat the problem, but, frankly, you already have a test environment set up, and it would be a lot of work for us to duplicate it, and even so, we won't know for sure if we could repeat the problem. Have you checked out your source tree with a date tag, so that it's possible for everyone else to check out and get the same source files? Line number references in tracebacks are pretty useless, if the lines don't match. Unless you can identify the exact number of bytes being consumed, and then identify a kernel structure used in the semaphore code that is equal to that size, or for which that size is a least common multiple, and there are a number of evets equal to the size of the divisor, then that's no good. This is why everyone keeps asking you to run the kernel debugger, so that you can tell us exactly the code that's failing, and why, and why a stack backtrace, more detailed than "it contained a call to sem" is important. This problem is evidently a memory leak in the semaphore code; but that does not mean that the crash that results will be in any way related to where the leak occurs. In other words, the crash is a secondary effect. Only by fully understanding the crash will anyone be able to help you with the root cause. I understand that it's frustrating to go step by step, when you think you have isolated the problem to a smaller area, but the information you gather from outside that area will tell you about the inside much more clearly than staring at the outside of a black box where we know the problem lives. The only alternative to rewriting the black box from scratch, or grovelling through it with a line-by-line code review (I'm not interested in doing that; perhaps you could interest the author of the changes that resulted in the problem) is to find a smoking gun, and work from that, instead. If this problem is in the way of you getting work done (one wonders why you are using -current, if you need to get work done), then my best suggestion to you is to back out the changes Alfred made, one by one, and when it stops having the problem, you will have identified a very small patch that causes the problem. > >But without knowing what software you are running, it's hard to say > >if the number is unreasonable, or not. > > Well, it is really a lightly loaded server, just serving one windows pc > here at home. Here is a ps, and the only thing that's missing from it is > the occasional pop session. Also note that this system is not connected to > the internet, so the http that's running is mostly for my own pleasure (and > proxy/cache). I do run ppp and uucp every now and then. Perhaps I wasn't clear. Not knowing what calls your software makes that cause the problem to occur, it is not possible for us to create a cut-down test case in less than 30 lines of C source code, so that we can repeat the problem at will, without secondary effects. As it is, you only *suppose* that the qpopper usage alone is sufficient to cause the problem; even if you are correct, that's insufficient to identify where the problem is... it may not even really be in the semaphore source code at all.. maybe it's in kevent code, for unfreed events, etc.. I think you need to go back one email: | > Just had another panic, same kmem_malloc(). I did a trace but forgot to | > write the traceback down. | | Wait until the next one, and remember to write it down; preferrably, | obtain a system dump image, so you can examine it with the debugger, | and make sure that the kernel you are running has a debuggable | counterpart already there (i.e. you used "config -g" to create the | kernel you are running). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3DB0A598.C53FD37D>