Date: Fri, 28 May 2010 11:28:15 -0700 From: Carl Johnson <carlj@peak.org> To: freebsd-questions@freebsd.org Subject: Re: Bash lockups Message-ID: <87mxvjridc.fsf@cjlinux.localnet> In-Reply-To: <8739xlnhzz.fsf@kobe.laptop> (Giorgos Keramidas's message of "Fri\, 21 May 2010 22\:59\:28 %2B0300") References: <877hmzbi1f.fsf@cjlinux.localnet> <878w7d7qbb.fsf@kobe.laptop> <87pr0przea.fsf@cjlinux.localnet> <8739xlnhzz.fsf@kobe.laptop>
next in thread | previous in thread | raw e-mail | index | archive | help
Giorgos Keramidas <keramida@ceid.upatras.gr> writes: > On Fri, 21 May 2010 09:30:05 -0700, Carl Johnson <carlj@peak.org> wrote: >> Giorgos Keramidas <keramida@ceid.upatras.gr> writes: >>> Does this lock-up happen if you leave the shell 'idle' for too long >>> over an ssh session? There may be problems with stateful connection >>> tracking between your terminal and the remote shell :-/ >> >> No, I don't think that could be the problem. I am just using ssh >> between local machines and there is no firewall between them. It also >> often seems to happen to a shell as I switch away from it to another >> one. One suspicion is that something is sending a signal to the shell >> as it switches, and bash sometimes doesn't handle that signal >> properly. >> >> I also should have mentioned that I have been running bash as my >> default shell for years under Linux and have never seen this problem >> there. >> >> Thanks for the suggestion. > > That's ok. If you can attach to the bash process with ktrace please try > to grab a ktrace file from a deadlocked shell. We may be able to see > why it gets deadlocked by running kdump(8) on the shell trace file. > > You can run a second shell under ktrace (and hope that the parent > doesn't deadlock before the traced child shell), by running: > > bash$ ktrace -f bash.trace bash --login > > When you exit from the child shell you can dump ktrace(8) events from > the bash.trace file with: > > bash$ kdump -f bash.trace > logfile 2>&1 > > Looking near the last records dumped in 'logfile' should be quite > informative if the process is dead-locked or spinning around the same > code over and over again. I finally got one after starting ktrace a few days ago. It is informative, but it raises as many questions as it answers. It basically just wrote out the prompt, *started* to setup for reading the input and just stopped. I ran gdb on it and it is stuck looping somewhere in getenv. I don't have the system compiled with debugging, so I have limited information on what it is doing there. I checked multiple times, and I also saw getenv running routines such as memset, strlen, mbrtowc, and wcsnrtombs. The following is the tail end of the 'kdump -Ef' output: 67263 bash 61412.013860 GIO fd 2 wrote 28 bytes 0x0000 0d0f 1b5b 316d 5b63 6172 6c6a 4063 6a62 7364 3874 207e 5d24 1b5b |...[1m[carlj@cjbsd8t ~]$.[| 0x001a 6d20 |m | 67263 bash 61412.013867 RET write 28/0x1c 67263 bash 61412.013874 CALL sigprocmask(SIG_SETMASK,0x80e133c,0) 67263 bash 61412.013880 RET sigprocmask 0 and the following is the similar section of a normal prompt: 67263 bash 61403.461469 GIO fd 2 wrote 27 bytes 0x0000 0f1b 5b31 6d5b 6361 726c 6a40 636a 6273 6438 7420 7e5d 241b 5b6d |..[1m[carlj@cjbsd8t ~]$.[m| 0x001a 20 | | 67263 bash 61403.461476 RET write 27/0x1b 67263 bash 61403.461483 CALL sigprocmask(SIG_SETMASK,0x80e133c,0) 67263 bash 61403.461489 RET sigprocmask 0 67263 bash 61403.461497 CALL sigprocmask(SIG_BLOCK,0,0x80e1e3c) 67263 bash 61403.461504 RET sigprocmask 0 67263 bash 61403.461513 CALL read(0,0xbfbfd95f,0x1) I just realized there is an extra CR at the beginning of that prompt (28 bytes instead of 27) that I don't see elsewhere, but nothing else before that looks different. This one is an i368 8.0 release, but I also have another hung shell in a amd64 7.3 release system in VirtualBox. I just checked my other ktrace logs and I found one other place where that extra CR occurs, but there is no lockup there and that was my other system. The following is a section of a backtrace from gdb: #0 0x28308540 in mbrtowc () from /lib/libc.so.7 #1 0x080c7ce6 in getenv () #2 0x080c1335 in getenv () #3 0x080ae1d4 in getenv () #4 0x080ac4b0 in getenv () #5 0x080ac815 in getenv () #6 0x080c3955 in getenv () #7 0x080c3ac9 in getenv () #8 0x080ac4b0 in getenv () #9 0x080ac815 in getenv () #10 0x080acb6c in getenv () #11 0x080acf55 in getenv () #12 0x08054611 in ?? () #13 0x284a9a80 in ?? () ... #67 0x2832cbfd in time () from /lib/libc.so.7 The first few entries change when I let it run for a while, but the last 8-9 getenv addresses and everything before them remain the same. There are a total of about 65 backtrace entries this time, some of which are 0x00000000 addresses which seem suspicious. The backtrace from the other hung shell is also in getenv, but I didn't have ktrace running on that one. I am at the limit of my experience, so does anybody else have any ideas about what could cause this, or how I could trace it further? I am keeping the processes attached to gdb, so I can do further checking on them if anyone has any other ideas. Thanks in advance for any help, and thanks for the help that allowed me to get this far. -- Carl Johnson carlj@peak.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?87mxvjridc.fsf>