Date: Thu, 4 Oct 2012 07:52:59 +0200 From: Gomes do Vale Victor <ulysse31@gmail.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: nfsv4 kerberized and gssname=root and allgsname Message-ID: <836B0731-DC60-40DF-8D9E-ADB9D3FD5AB5@gmail.com> In-Reply-To: <1483416316.1685354.1349303741302.JavaMail.root@erie.cs.uoguelph.ca> References: <1483416316.1685354.1349303741302.JavaMail.root@erie.cs.uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
Le 4 oct. 2012 =C3=A0 00:35, Rick Macklem <rmacklem@uoguelph.ca> a =C3=A9cri= t : > Ulysse 31 wrote: >> 2012/9/29 Rick Macklem <rmacklem@uoguelph.ca>: >>> Ulysse 31 wrote: >>>> Hi all, >>>>=20 >>>> I am actually working on a freebsd 9 backup server. >>>> this server would backup the production server via kerberized nfs4 >>>> (since the old backup server, a linux one, was doing so). >>>> we used on the old backup server a root/<fqdn> kerberos identity, >>>> which allows the backup server to access all the data. >>>> I have followed the documentation found at : >>>>=20 >>>> http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup >>>>=20 >>>> done : >>>> - added to kernel : >>>>=20 >>>> options KGSSAPI >>>> device crypto >>>>=20 >>>> - added to rc.conf : >>>>=20 >>>> nfs_client_enable=3D"YES" >>>> rpc_lockd_enable=3D"YES" >>>> rpc_statd_enable=3D"YES" >>>> rpcbind_enable=3D"YES" >>>> devfs_enable=3D"YES" >>>> gssd_enable=3D"YES" >>>>=20 >>>> - have done sysctl vfs.rpcsec.keytab_enctype=3D1 and added it to >>>> /etc/sysctl.conf >>>>=20 >>>> We used MIT kerberos implementation, since it is the one used on >>>> all >>>> our servers (mostly linux), and we have created and >>>> /etc/krb5.keytab >>>> containing the following keys : >>>> host/<fqdn> >>>> nfs/<fqdn> >>>> root/<fqdn> >>>>=20 >>>> and, of course, i have used the available patch at : >>>> http://people.freebsd.org/~rmacklem/rpcsec_gss-9.patch >>>>=20 >>>> When i try to mount with the (B) method (the one of the google >>>> wiki), >>>> it works as expected, i mean, with a correct user credential, i can >>>> access to the user data. >>>> But, when i try to access via the (C) method (the one that i need >>>> in >>>> order to do a full backup of the production storage server) i get a >>>> systematic kernel panic when launch the mount command. >>>> The mount command looks to something like : mount -t nfs -o >>>> nfsv4,sec=3Dkrb5i,gssname=3Droot,allgssname <production server >>>> fqdn>:<export_path> <local_path_where_to_mount> > Just to confirm it, you are saying that exactly the same mount command, > except without the "allgssname" option, doesn't crash? No, in fact it's the same command with gssname=3Dnfs instead of gssname=3Dro= ot that does not crash. When I specify gssname=3Droot it panics. The same command with gssname=3Dnfs and allgssname together "works", well sh= ould say mounts and don't crash because it does not allow accessing as root t= o the nfs share since the netapp expects a root/fqdn key to be used for that= . Don't know if this would give you an hint, I'm gonna test this patch. tell m= e if you have other ideas. For now we decided disabling kerberised nfs on the new FreeBSD backup server= in order to go on production with it without getting late. Thanks for the help. >=20 > That is weird, since when I look at the code, there shouldn't be any > difference between the two mounts, up to the point where it crashes. >=20 > The crash seems to indicate that nr_auth is bogus, but I can't see > how/why that would happen. >=20 > I have attached a patch which changes the way nr_auth is set and "might" > help, although I doubt it. (It is untested, but if you want to try it, > good luck with it.) >=20 > I'll email again if I get something more solid figured out, rick >=20 >>>> I have activated the kernel debugging stuff to get some infos, here >>>> is >>>> the message : >>>>=20 >>>>=20 >>>> Fatal trap 12: page fault while in kernel mode >>>> cpuid =3D 0; apic id =3D 00 >>>> fault virtual address =3D 0x368 >>>> fault code =3D supervisor read data, page not present >>>> instruction pointer =3D 0x20:0xffffffff80866ab7 >>>> stack pointer =3D 0x28:0xffffff804aa39ce0 >>>> frame pointer =3D 0x28:0xffffff804aa39d30 >>>> code segment =3D base 0x0, limit 0xfffff, type 0x1b >>>> =3D DPL 0, pres 1, long 1, def32 0, gran 1 >>>> processor eflags =3D interrupt enabled, resume, IOPL =3D 0 >>>> current process =3D 701 (mount_nfs) >>>> trap number =3D 12 >>>> panic: page fault >>>> cpuid =3D 0 >>>> KDB: stack backtrace: >>>> #0 0xffffffff808ae486 at kdb_backtrace+0x66 >>>> #1 0xffffffff8087885e at panic+0x1ce >>>> #2 0xffffffff80b82380 at trap_fatal+0x290 >>>> #3 0xffffffff80b826b8 at trap_pfault+0x1e8 >>>> #4 0xffffffff80b82cbe at trap+0x3be >>>> #5 0xffffffff80b6c57f at calltrap+0x8 >>>> #6 0xffffffff80a78eda at rpc_gss_init+0x72a >>>> #7 0xffffffff80a79cd6 at rpc_gss_refresh_auth+0x46 >>>> #8 0xffffffff807a5a53 at newnfs_request+0x163 >>>> #9 0xffffffff807bf0f7 at nfsrpc_getattrnovp+0xd7 >>>> #10 0xffffffff807d9b29 at mountnfs+0x4e9 >>>> #11 0xffffffff807db60a at nfs_mount+0x13ba >>>> #12 0xffffffff809068fb at vfs_donmount+0x100b >>>> #13 0xffffffff80907086 at sys_nmount+0x66 >>>> #14 0xffffffff80b81c60 at amd64_syscall+0x540 >>>> #15 0xffffffff80b6c867 at Xfast_syscall+0xf7 >>>> Uptime: 2m31s >>>> Dumping 97 out of 1002 MB:..17%..33%..50%..66%..83%..99% >>>>=20 >>>> -----------------------------------------------------------------------= - >>>>=20 >>>> Does anyone as experience something similar ? is their a way to >>>> correct that ? >>>> Thanks for the help. >>>>=20 >>> Well, you're probably the first person to try doing this in years. I >>> did >>> have it working about 4-5years ago. Welcome to the bleeding edge;-) >>>=20 >>> Could you do the following w.r.t. above kernel: >>> # cd /boot/nkernel (or wherever the kernel lives) >>> # nm kernel | grep rpc_gss_init >>> - add the offset 0x72a to the address for rpc_gss_init >>> # addr2line -e kernel.symbols >>> 0xXXX - the hex number above (address of rpc_gss_init+0x72a) >>> - email me what it prints out, so I know where the crash is >>> occurring >>>=20 >>> You could also run the following command on the Linux server to >>> capture >>> packets during the mount attempt, then email me the xxx.pcap file so >>> I >>> can look at it in wireshark, to see what is happening before the >>> crash. >>> (I'm guessing nr_auth is somehow bogus, but that's just a guess.:-) >>> # tcpdump -s 0 -w xxx.pcap host <freebsd-client> >>=20 >> Hi, >>=20 >> Sorry for the delay i was on travel and no working network connection. >> Back online for the rest of the week ^^. >> Thanks for your help, here is what it prints out : >>=20 >> root@bsdenc:/boot/kernel # nm kernel | grep rpc_gss_init >> ffffffff80df07b0 r __set_sysinit_set_sym_svc_rpc_gss_init_sys_init >> ffffffff80a787b0 t rpc_gss_init >> ffffffff80a7a580 t svc_rpc_gss_init >> ffffffff81127530 d svc_rpc_gss_init_sys_init >> ffffffff80a7a3b0 T xdr_rpc_gss_init_res >> root@bsdenc:/boot/kernel # addr2line -e kernel.symbols >> 0xffffffff80a78eda >> /usr/src/sys/rpc/rpcsec_gss/rpcsec_gss.c:772 >>=20 >>=20 >> for the tcpdump from the linux server, i think you may are doing >> reference to the production nfs server ? >> if yes, unfortunately it is not linux, it is a netapp filer, so no >> "real" root access on it (so no tcpdump available :s ). >> if you were mentioning the old backup server (which is linux but nfs >> client), i cannot do unmount/mount on it since its production >> (mountpoint always busy), but i can made a quick VM/testmachine that >> acts like the linux backup server and do a tcpdump from it. >> Just let me know. Thanks again. >>=20 >> -- >> Ulysse31 >>=20 >>>=20 >>> rick >>>=20 >>>> -- >>>> Ulysse31 >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to >>>> "freebsd-fs-unsubscribe@freebsd.org" > <rpcsec-crash.patch>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?836B0731-DC60-40DF-8D9E-ADB9D3FD5AB5>