Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 12 May 2013 23:52:06 -0700
From:      Jeremy Chadwick <jdc@koitsu.org>
To:        Marek Salwerowicz <marek_sal@wp.pl>
Cc:        Adrian Chadd <adrian@freebsd.org>, "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>, "illoai@gmail.com" <illoai@gmail.com>
Subject:   Re: Build GENERIC with IPX support
Message-ID:  <20130513065206.GA78810@icarus.home.lan>
In-Reply-To: <5190832E.5040102@wp.pl>
References:  <518ED0CA.4030007@wp.pl> <CAHHBGkoXm4XZbdOtswK2Ek1OV_5NZYAjWmOPzFNM0yXqG=tC%2Bg@mail.gmail.com> <51900FA5.20204@wp.pl> <CAJ-Vmo=t_2%2Bcb22n%2B-ew%2B9DswStYJVZg1i8pu57vTJWyGJyqNA@mail.gmail.com> <5190832E.5040102@wp.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, May 13, 2013 at 08:07:42AM +0200, Marek Salwerowicz wrote:
> W dniu 2013-05-13 00:45, Adrian Chadd pisze:
> >It's supported as long as someone wants to use it and can help in at
> >least diagnosing issues.
> >
> >So, if you have a segfault, run it inside gdb and report where its dying.
> >
> >Chances are things have just bitrotted a bit but not so much that it's
> >worth killing.
> 
> # gdb ncplogin
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...(no debugging
> symbols found)...
> (gdb) run
> Starting program: /usr/bin/ncplogin
> (no debugging symbols found)...(no debugging symbols found)...(no
> debugging symbols found)...(no debugging symbols found)...
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000800d285f7 in strlen () from /lib/libc.so.7
> (gdb) bt
> #0  0x0000000800d285f7 in strlen () from /lib/libc.so.7
> #1  0x0000000800d205b0 in gettimeofday () from /lib/libc.so.7
> #2  0x0000000800d2163e in gettimeofday () from /lib/libc.so.7
> #3  0x0000000800d21798 in vfprintf_l () from /lib/libc.so.7
> #4  0x0000000800d0e701 in fprintf () from /lib/libc.so.7
> #5  0x0000000800822a85 in ncp_error () from /usr/lib/libncp.so.4
> #6  0x000000080081fa7c in ncp_li_readrc () from /usr/lib/libncp.so.4
> #7  0x0000000000400ea7 in ?? ()
> #8  0x0000000000400d2e in ?? ()
> #9  0x000000080061c000 in ?? ()
> #10 0x0000000000000000 in ?? ()
> #11 0x0000000000000001 in ?? ()
> #12 0x00007fffffffddf8 in ?? ()
> #13 0x0000000000000000 in ?? ()
> #14 0x00007fffffffde0a in ?? ()
> #15 0x00007fffffffde1e in ?? ()
> #16 0x00007fffffffde35 in ?? ()
> #17 0x00007fffffffde3d in ?? ()
> #18 0x00007fffffffde49 in ?? ()
> #19 0x00007fffffffde52 in ?? ()
> #20 0x00007fffffffde67 in ?? ()
> #21 0x00007fffffffde74 in ?? ()
> #22 0x00007fffffffde88 in ?? ()
> #23 0x00007fffffffdee5 in ?? ()
> #24 0x00007fffffffdef3 in ?? ()
> #25 0x00007fffffffdf07 in ?? ()
> #26 0x00007fffffffdf12 in ?? ()
> #27 0x00007fffffffdf1d in ?? ()
> #28 0x00007fffffffdf27 in ?? ()
> #29 0x00007fffffffdf40 in ?? ()
> #30 0x00007fffffffdf50 in ?? ()
> #31 0x00007fffffffdf5e in ?? ()
> #32 0x0000000000000000 in ?? ()
> #33 0x0000000000000003 in ?? ()
> #34 0x0000000000400040 in ?? ()
> #35 0x0000000000000004 in ?? ()
> #36 0x0000000000000038 in ?? ()
> #37 0x0000000000000005 in ?? ()
> #38 0x0000000000000008 in ?? ()
> #39 0x0000000000000006 in ?? ()
> #40 0x0000000000001000 in ?? ()
> #41 0x0000000000000008 in ?? ()
> #42 0x0000000000000000 in ?? ()
> #43 0x0000000000000009 in ?? ()
> #44 0x0000000000400ca0 in ?? ()
> #45 0x0000000000000007 in ?? ()
> #46 0x0000000800601000 in ?? ()
> #47 0x000000000000000f in ?? ()
> #48 <signal handler called>
> #49 0x0000000000000000 in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (gdb)
> 
> #
> 
> my /etc/rc.conf file contains these lines:
> 
> ifconfig_em0f1_ipx="ipx 0x01230000.1"
> ipxrouted_enable="YES"
> 
> and in /boot/loader.conf:
> if_ef_load="YES"
> 
> What's more, the 'ncplist s' command is unable to find any NetWare servers:
> # ncplist s
> Can't find any file server
> #
> 
> But Frame type (802.3) and network number  (0x0123) are correct.

Without debugging symbols this will be annoying to debug.  From a brief
skim of the code, it looks like the author has very horrible error
checking and makes a lot of assumptions about the user's environment
(dot files, etc.).

IPX has been neglected for what should be obvious reasons.  As someone
who got his CNE back in 1994 (circa Netware 3.11), you're the first
person I have encountered since roughly 1997 who is actively using IPX.
Netware does support TCP/IP, you know...

Anyway, in your case, you're in luck:

> #0  0x0000000800d285f7 in strlen () from /lib/libc.so.7
> #1  0x0000000800d205b0 in gettimeofday () from /lib/libc.so.7
> #2  0x0000000800d2163e in gettimeofday () from /lib/libc.so.7
> #3  0x0000000800d21798 in vfprintf_l () from /lib/libc.so.7
> #4  0x0000000800d0e701 in fprintf () from /lib/libc.so.7
> #5  0x0000000800822a85 in ncp_error () from /usr/lib/libncp.so.4
> #6  0x000000080081fa7c in ncp_li_readrc () from /usr/lib/libncp.so.4

ncp_li_readrc(), which is part of libncp, only has one call to
ncp_error() in it:

src/lib/libncp/ncpl_conn.c --

180 /*
181  * read rc file as follows:
182  * 1. read [server] section
183  * 2. override with [server:user] section
184  * Since abcence of rcfile is not a bug, silently ignore that fact.
185  * rcfile never closed to reduce number of open/close operations.
186  */
187 int
188 ncp_li_readrc(struct ncp_conn_loginfo *li) {
189         int i, val, error;
190         char uname[NCP_BINDERY_NAME_LEN*2+1];
191         char *sect = NULL, *p;
192
193         /*
194          * if info from cmd line incomplete, try to find existing
195          * connection and fill server/user from it.
196          */
197         if (li->server[0] == 0 || li->user == NULL) {
198                 int connHandle;
199                 struct ncp_conn_stat cs;
200
201                 if ((error = ncp_conn_scan(li, &connHandle)) != 0) {
202                         ncp_error("no default connection found", errno);
203                         return error;
204                 }

To me, this may indicate you have some kind of "ncp rc file" (I believe
this is ~/.nwfsrc according to the ncplist(1) man page) that may contain
something invalid, or maybe you lack such a file altogether (creating one
might work around the problem).

Back to the actual segfault itself: ncp_error() is pretty simple:

src/lib/libncp/ncpl_subr.c --

447 /*
448  * Print a (descriptive) error message
449  * error values:
450  *         0 - no specific error code available;
451  *  -999..-1 - NDS error
452  *  1..32767 - system error
453  *  the rest - requester error;
454  */
455 void
456 ncp_error(const char *fmt, int error, ...) {
457         va_list ap;
458
459         fprintf(stderr, "%s: ", _getprogname());
460         va_start(ap, error);
461         vfprintf(stderr, fmt, ap);
462         va_end(ap);
463         if (error == -1)
464                 error = errno;
465         if (error > -1000 && error < 0) {
466                 fprintf(stderr, ": dserr = %d\n", error);
467         } else if (error & 0x8000) {
468                 fprintf(stderr, ": nwerr = %04x\n", error);
469         } else if (error) {
470                 fprintf(stderr, ": syserr = %s\n", strerror(error));
471         } else
472                 fprintf(stderr, "\n");
473 }

What I don't understand from the calling stack is how gettimeofday() is
involved.  I have looked at the libc code, looked at the underlying
calling functions and so on (from fprintf() to vfprintf_l() and deeper),
and I don't see how or where gettimeofday() would be called.  The only
place I can think of might be the related locale stuff, but I'm doubting
that given what I've looked at but could still be wrong.

Have world/kernel on this system ever been rebuilt?  If they have,
were both kernel and world rebuilt together from the same source code
and not at different times?

If you're setting LANG, LC_CTYPE, LC_COLLATE, or other locale-oriented
settings in your environment (and my gut feeling is that you are), you
could try removing them and see if you get an actual useful error
message on stderr, but I'm not holding my breath.

I cannot help you with the remaining IPX-specific "stuff"; it's fairly
obvious though, as I said, that this code has been neglected.

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130513065206.GA78810>