From owner-freebsd-mozilla Tue Sep 14 16:45: 8 1999 Delivered-To: freebsd-mozilla@freebsd.org Received: from sparkle.Generation.NET (sparkle.Generation.NET [205.205.119.4]) by hub.freebsd.org (Postfix) with ESMTP id 6BD9114D51 for ; Tue, 14 Sep 1999 16:45:00 -0700 (PDT) (envelope-from gsstark@mit.edu) Received: from localhost (brnstndkramden.acf.nyu.edu@x2-513.mtl.Generation.NET [209.205.11.168]) by sparkle.Generation.NET (8.9.3/8.9.3) with SMTP id TAA22587; Tue, 14 Sep 1999 19:45:07 -0400 (EDT) To: Terry Lambert References: 199905030116.saa1670-@usr05.primenet.com Subject: Re: Communicator 4.5: "Xlib: Unexpected async reply" msg flood! Cc: denis@acacia.cts.ucla.edu (Denis DeLaRoca) Cc: mozilla@freebsd.org From: Greg Stark Date: 14 Sep 1999 19:44:53 -0400 Message-ID: <87vh9df47e.fsf@mit.edu> Lines: 50 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-mozilla@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I don't think this explanation really makes sense. Netscape doesn't actually use the native threads, it uses some sort of user-space threads on all platforms. It is the right basic idea though, "Unexpected async reply" and the "sequence lost" both have to do with either a multi-threaded program calling Xlib functions from two threads or a signal handler calling Xlib functions. Do you have further evidence about this involving freebsd native threads? greg > > I've seen this problem occur with both the FreeBSD version of Communicator > > 4.5 in the packages collection as well as with Communicator 4.51 for Linux > > under FreeBSD 3.1. > > ... > > Communicator starts looping and outputting messages to stderr that read > > > > Xlib: Unexpected async reply (sequence 0x####) > > I believe the dialog box you refer to runs as JavaScript. > > There are a number of problems with the FreeBSD implementation of > Netscape. > > The number one problem is that it is assuming that threads serially > run to completion based on scheduling. > > Basically, this means that if you get an involuntary context switch, > the threads pick up where they left off, in the order that they > left off. > > The FreeBSD threads don't guarantee this (neither does POSIX), so > the serialization assumption is flawed. > > For what it's worth, Linux and Macintosh threading have the same > issues with the assumptions made by the NetScape JavaScript engine > programmers. > > The major problems most commonly surface in the builtins, like the > bookmark change script, the mail client, and the GIF decoder. The > workaround for GIF's is to serialize the GIF loading (easy; just change > your JavaScript) or to replace the GIF loader with a JNI to some > reentrant code (not very easy, and not very portable; it'll only > fix your personal browser). > > Since the problem exists on other platforms (e.g. Macintosh), it's > best to just let it do its thing and finish doing it, before you > move the mouse pointer. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-mozilla" in the body of the message From owner-freebsd-mozilla Tue Sep 14 20:57:22 1999 Delivered-To: freebsd-mozilla@freebsd.org Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (Postfix) with ESMTP id C82D814C42 for ; Tue, 14 Sep 1999 20:57:19 -0700 (PDT) (envelope-from tlambert@usr06.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.8.8/8.8.8) id UAA18603; Tue, 14 Sep 1999 20:57:17 -0700 (MST) Received: from usr06.primenet.com(206.165.6.206) via SMTP by smtp01.primenet.com, id smtpd018576; Tue Sep 14 20:57:10 1999 Received: (from tlambert@localhost) by usr06.primenet.com (8.8.5/8.8.5) id UAA16678; Tue, 14 Sep 1999 20:57:06 -0700 (MST) From: Terry Lambert Message-Id: <199909150357.UAA16678@usr06.primenet.com> Subject: Re: Communicator 4.5: "Xlib: Unexpected async reply" msg flood! To: gsstark@mit.edu (Greg Stark) Date: Wed, 15 Sep 1999 03:57:01 +0000 (GMT) Cc: tlambert@primenet.com, denis@acacia.cts.ucla.edu, mozilla@freebsd.org In-Reply-To: <87vh9df47e.fsf@mit.edu> from "Greg Stark" at Sep 14, 99 07:44:53 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-mozilla@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > I don't think this explanation really makes sense. Netscape doesn't actually > use the native threads, it uses some sort of user-space threads on all > platforms. > > It is the right basic idea though, "Unexpected async reply" and the "sequence > lost" both have to do with either a multi-threaded program calling Xlib > functions from two threads or a signal handler calling Xlib functions. > > Do you have further evidence about this involving freebsd native threads? Yes. The new InterJet II user interface (reviewed in PC Week Magazine, so I can talk about it) had to specifically be modified so that it would run the GIF downloads consecutively instead of concurrently to avoid crashing NetScape on both FreeBSD and Macintosh. The Microsoft Internet Explorer on Macintosh did not have the same problems, and it uses a binary GIF decoder wedged in via JNI (it is also significantly faster, as a result). There are still issues with the FreeBSD version of NetScape crashing if one moves the mouse over an image that will be used as an image map during download. Since it doesn't affect the Macintosh, it was felt that serializing the I/O further, which would result in an actual user perceptible slow down, at this point, was not worth it to obtain FreeBSD interoperability with the uder interface. The serialization soloution was arrived at by me, after observing the problem and non-problem platforms, and taking into account my detailed knowledge of threads implemetnations on Solaris, Linux, Windows 98, Windows NT, Macintosh, and FreeBSD. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-mozilla" in the body of the message From owner-freebsd-mozilla Tue Sep 14 22:46:13 1999 Delivered-To: freebsd-mozilla@freebsd.org Received: from sparkle.Generation.NET (sparkle.Generation.NET [205.205.119.4]) by hub.freebsd.org (Postfix) with ESMTP id BA52C1548A for ; Tue, 14 Sep 1999 22:46:09 -0700 (PDT) (envelope-from gsstark@mit.edu) Received: from x2-513.mtl.Generation.NET (brnstndkramden.acf.nyu.edu@x2-513.mtl.Generation.NET [209.205.11.168]) by sparkle.Generation.NET (8.9.3/8.9.3) with SMTP id BAA22529; Wed, 15 Sep 1999 01:46:14 -0400 (EDT) To: Terry Lambert Cc: gsstark@mit.edu (Greg Stark), denis@acacia.cts.ucla.edu, mozilla@freebsd.org Subject: Re: Communicator 4.5: "Xlib: Unexpected async reply" msg flood! References: <199909150357.UAA16678@usr06.primenet.com> In-Reply-To: Terry Lambert's message of "Wed, 15 Sep 1999 03:57:01 +0000 (GMT)" From: Greg Stark Organization: People's Front Against MWM Date: 15 Sep 1999 01:45:58 -0400 Message-ID: <877lls682x.fsf@x2-513.mtl.Generation.NET> Lines: 22 User-Agent: Gnus/5.070095 (Pterodactyl Gnus v0.95) Emacs/20.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-mozilla@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > The serialization soloution was arrived at by me, after observing > the problem and non-problem platforms, and taking into account my > detailed knowledge of threads implemetnations on Solaris, Linux, > Windows 98, Windows NT, Macintosh, and FreeBSD. So in your case you're fairly certain it was the GIF decoder that was buggy? Did you have a particular test page that could reliably crash communicator? This would be especially good if it reliably triggered the sequence errors. I'm certain the Linux version and fairly certain that the other versions of communicator do _not_ use the native OS thread implementation. They use the built in user-space NSPR thread implementation. Which I think is supposed to use a simple FIFO scheduler like you describe. My hunch on the bug is that Java's run-time sets up the signal handlers one way and the rest of Netscape expects them to be set up a different way. And the net result is that some call that doesn't expect to be interrupted gets a SIGALRM and some X library call is preempted when it shouldn't be. -- greg To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-mozilla" in the body of the message From owner-freebsd-mozilla Wed Sep 15 11:22:46 1999 Delivered-To: freebsd-mozilla@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 459D715311 for ; Wed, 15 Sep 1999 11:19:41 -0700 (PDT) (envelope-from tlambert@usr09.primenet.com) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id LAA15024; Wed, 15 Sep 1999 11:18:16 -0700 (MST) Received: from usr09.primenet.com(206.165.6.209) via SMTP by smtp04.primenet.com, id smtpdAAAUBaWXC; Wed Sep 15 11:17:50 1999 Received: (from tlambert@localhost) by usr09.primenet.com (8.8.5/8.8.5) id LAA23542; Wed, 15 Sep 1999 11:18:19 -0700 (MST) From: Terry Lambert Message-Id: <199909151818.LAA23542@usr09.primenet.com> Subject: Re: Communicator 4.5: "Xlib: Unexpected async reply" msg flood! To: gsstark@mit.edu (Greg Stark) Date: Wed, 15 Sep 1999 18:18:18 +0000 (GMT) Cc: tlambert@primenet.com, gsstark@mit.edu, denis@acacia.cts.ucla.edu, mozilla@FreeBSD.ORG In-Reply-To: <877lls682x.fsf@x2-513.mtl.Generation.NET> from "Greg Stark" at Sep 15, 99 01:45:58 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-mozilla@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > The serialization soloution was arrived at by me, after observing > > the problem and non-problem platforms, and taking into account my > > detailed knowledge of threads implemetnations on Solaris, Linux, > > Windows 98, Windows NT, Macintosh, and FreeBSD. > > So in your case you're fairly certain it was the GIF decoder that was buggy? The GIF decoder _at least_ is buggy, in that it assumes that the context switch will be back to the thread that was context switched out after an involuntary preemption. I suspect that it is not the _only_ code which is buggy, just the most visible to me, in my application. > Did you have a particular test page that could reliably crash communicator? Yes, if I move the mouse over the image map while it is being loaded, and the image map handling code gets time slices. > This would be especially good if it reliably triggered the sequence errors. It crashes the program. The crash may not be in the same place each time. I don't have a copy of Communicator with all symbols intact to be able to tell. > I'm certain the Linux version and fairly certain that the other versions of > communicator do _not_ use the native OS thread implementation. They use the > built in user-space NSPR thread implementation. Which I think is supposed to > use a simple FIFO scheduler like you describe. What about Macintosh and FreeBSD? Don't they use the same scheduler? Or are they trying to use native pthreads on these platforms? > My hunch on the bug is that Java's run-time sets up the signal handlers one > way and the rest of Netscape expects them to be set up a different way. And > the net result is that some call that doesn't expect to be interrupted gets a > SIGALRM and some X library call is preempted when it shouldn't be. Interesting hypothesis. This somewhat conflicts with the observed behaviour, however, in that the FreeBSD X library is not multithreaded, and the preemption should not be an issue for the code, since it (Netscape) runs without problems in a Windows environment. Hmmm. Have there been any crashes using the "-remote OpenURL("xxx")" reported on Windows? It seems to me that another access to the same image with a different thread may result in a crash without an explicit call the CreateFreeThreadedMarshallar(), since Windows threads instance per thread data onto thread local storage which is not accessible in the address space of a different kernel thread. The purpose of the Marshaller is to reinstantiate objects between these address spaces. Back to FreeBSD: If it is using a "sigsched" type threading mechanism, there are indeed differences in the signal handling mechanism between the OS's. It seems to me that in one case, the alarms are being delivered async (assuming it's the alarms), and in the other case, they are causing a preemption. This appears to either be missing mutex protection, or missing signal masking, either of which are really a coding error resulting from assuming too much about the underlying threads behaviour. I am unfamiliar with NSPR internals; is it perhaps the case that what gets scheduled by the alarm is a scheduler activation rather than a context switch? This would allow the thread to run to completion, if the signal masking specified system call restart for the signal being delivered. It's possible that system call restart is failing on FreeBSD, or on the Macintosh, especially if POSIX and non-POSIX signal access functions are being utilized simultaneously. A fast test in the BSD case would be to call siginterrupt(3), which was introduces in BSD 4.2b (via DEC Ultrix) to obtain traditional BSD signal behaviour (which was to restart all system calls). The POSIX behaviour of aborting the system call instead of restarting seems to me to be a SVR4 kludge to do things in signal handlers which ought not to be done there (e.g. other than setting a volatile flag to be examined in the main loop of the event driven application). I know that the FreeBSD user space threads code does not very robustly encapsulate signal interruption of "system calls" (really pthreads wrappers in libc_r), and that scheduler activations with a restart on system calls for all signals, with a scheduler activation on exit (similar to a trampoline) would probably be a better approach. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-mozilla" in the body of the message From owner-freebsd-mozilla Thu Sep 16 1:12:18 1999 Delivered-To: freebsd-mozilla@freebsd.org Received: from mail2.netcologne.de (mail2.netcologne.de [194.8.194.103]) by hub.freebsd.org (Postfix) with ESMTP id ED7CD1529F for ; Thu, 16 Sep 1999 01:12:14 -0700 (PDT) (envelope-from van.woerkom@netcologne.de) Received: from oranje.my.domain (dial5-70.netcologne.de [194.8.195.70]) by mail2.netcologne.de (8.9.3/8.9.3) with ESMTP id KAA10447; Thu, 16 Sep 1999 10:12:11 +0200 (MET DST) Received: (from marc@localhost) by oranje.my.domain (8.9.3/8.9.3) id KAA02161; Thu, 16 Sep 1999 10:11:34 +0200 (CEST) (envelope-from van.woerkom@netcologne.de) Date: Thu, 16 Sep 1999 10:11:34 +0200 (CEST) Message-Id: <199909160811.KAA02161@oranje.my.domain> X-Authentication-Warning: oranje.my.domain: marc set sender to van.woerkom@netcologne.de using -f From: Marc van Woerkom To: freebsd-mozilla@freebsd.org Subject: Netscape communicator 4.61 and Slashdot Reply-To: van.woerkom@netcologne.de Sender: owner-freebsd-mozilla@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Suddenly netscape crashes really often when I read Slashdot. Is this because I run XFree86 3.3.5 lately or some new /.-effect? :-) Fun aside, I remember Rob changed something about how ads are displayed, so if anyone else noticed increased instability I would put the guilt on that changes, if not I must consider a rebuilt of my machine or running the linux version (shudder). Regards, Marc To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-mozilla" in the body of the message From owner-freebsd-mozilla Fri Sep 17 2:55:35 1999 Delivered-To: freebsd-mozilla@freebsd.org Received: from news.IAEhv.nl (news.IAE.nl [194.151.64.4]) by hub.freebsd.org (Postfix) with ESMTP id 09C0214F2F for ; Fri, 17 Sep 1999 02:55:17 -0700 (PDT) (envelope-from marc@bowtie.nl) Received: (from uucp@localhost) by news.IAEhv.nl (8.9.1/8.9.1) with IAEhv.nl id LAA28151; Fri, 17 Sep 1999 11:55:09 +0200 (MET DST) Received: from localhost (localhost [127.0.0.1]) by bowtie.nl (8.8.8/8.8.8) with ESMTP id LAA07604; Fri, 17 Sep 1999 11:50:30 +0200 (CEST) (envelope-from marc@bowtie.nl) Message-Id: <199909170950.LAA07604@bowtie.nl> X-Mailer: exmh version 2.0.2 2/24/98 To: van.woerkom@netcologne.de Cc: freebsd-mozilla@FreeBSD.ORG Subject: Re: Netscape communicator 4.61 and Slashdot In-reply-to: van.woerkom's message of Thu, 16 Sep 1999 10:11:34 +0200. <199909160811.KAA02161@oranje.my.domain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 17 Sep 1999 11:50:30 +0200 From: Marc van Kempen Sender: owner-freebsd-mozilla@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > Suddenly netscape crashes really often when I read Slashdot. > Is this because I run XFree86 3.3.5 lately or some new /.-effect? :-) > > Fun aside, I remember Rob changed something about how ads are displayed, > so if anyone else noticed increased instability I would put the guilt > on that changes, if not I must consider a rebuilt of my machine or > running the linux version (shudder). > > Now that you mention it, I have seen this too on a 3.2 machine, with regular XFree86 (3.3.3 or something) I have not really seen it on my other 2.2.6 machine, but I haven't really paid attention to it either. Marc. -- ---------------------------------------------------- Marc van Kempen BowTie Technology Email: marc@bowtie.nl WWW & Databases tel. +31 40 2 43 20 65 fax. +31 40 2 44 21 86 http://www.bowtie.nl ---------------------------------------------------- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-mozilla" in the body of the message From owner-freebsd-mozilla Fri Sep 17 3: 3:35 1999 Delivered-To: freebsd-mozilla@freebsd.org Received: from complx.LF.net (complx.LF.net [212.118.160.200]) by hub.freebsd.org (Postfix) with ESMTP id 27228150FF for ; Fri, 17 Sep 1999 03:03:29 -0700 (PDT) (envelope-from pi@complx.LF.net) Received: by complx.LF.net (Smail3.2.0.106/complx.LF.net) via LF.net GmbH Internet Services from pi for freebsd-mozilla@FreeBSD.ORG for host hub.FreeBSD.ORG id m11Ruql-000zyLC; Fri, 17 Sep 1999 12:02:39 +0200 (CEST) Message-Id: Subject: Re: Netscape communicator 4.61 and Slashdot To: marc@bowtie.nl (Marc van Kempen) Date: Fri, 17 Sep 1999 12:02:39 +0200 (CEST) From: "Kurt Jaeger" Cc: van.woerkom@netcologne.de, freebsd-mozilla@FreeBSD.ORG, malda@SLASHDOT.ORG In-Reply-To: <199909170950.LAA07604@bowtie.nl> from "Marc van Kempen" at Sep 17, 1999 11:50:30 AM X-NCC-RegID: de.lfnet MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-mozilla@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hi! > > Suddenly netscape crashes really often when I read Slashdot. > > Is this because I run XFree86 3.3.5 lately or some new /.-effect? :-) Every time, to be exact. Communicator 4.6 on FreeBSD 2.2.7 RELEASE. Accelerated-X Xserver. I'm now back to lynx for /. -- MfG/Best regards, Kurt Jaeger 21 years to go ! LF.net GmbH pi@LF.net Oberon.net GmbH pi@oberon.net Vor dem Lauch 23 fon +49 711 90074-23 Friedrich-Ebert-Str.1 D-70567 Stuttgart fax +49 711 7289041 40210 Duesseldorf fon +49 211 179253-11 For Redmond: "nuke the site from orbit -- it's the only way to be sure." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-mozilla" in the body of the message