Date: Tue, 29 Jun 2010 15:41:06 +1200 From: Martin Minkus <martin.minkus@punz.co.nz> To: freebsd-questions <freebsd-questions@freebsd.org>, Martin Minkus <martin.minkus@punz.co.nz> Subject: RE: sshd / tcp packet corruption ? ZFS & Samba? Message-ID: <H00000ac003077fe.1277782865.silver.pulse.local@MHS> In-Reply-To: <H00000ac00302e37.1277673717.silver.pulse.local@MHS>
next in thread | previous in thread | raw e-mail | index | archive | help
Okay guys, =20 Just thought i=E2=80=99d post that a resolution has been found. =20 People suggested it could be hardware and try memtest =E2=80=93 which nev= er found anything. =20 It seems though that in the end the issue is the motherboard; Possibly the southbridge or something to do with the PCI bus. =20 The SATA drives which are hanging of a marvel in a pcie slot was unaffected. No amount of zfs scrubs and rsync with checksumming found anything wrong. =20 It was only network traffic on the intel pro (pci card) or onboard nvidia nfe card that had issues. It was worst when using samba of ZFS, though god knows why that exposed the issue more. =20 I never had any kernel panics, just silent data corruption on the PCI bus. =20 Moved hdds and cards to a different motherboard, and everything is 100% fine. =20 So a couple weeks looking at this on and off (and slowly losing my mind) and it was nothing more than flaky hardware. =20 Thanks for your help to those who took the time to reply. =20 Martin. =20 From: Martin Minkus=20 Sent: Monday, 28 June 2010 09:22 To: freebsd-questions@freebsd.org Subject: RE: sshd / tcp packet corruption ? ZFS & Samba? =20 Hey all, =20 It was suggested I do a memtest, but that checked out fine. (I wish it was as simple as just the ram!) =20 I=E2=80=99ve realised the issue manifests itself almost immediately when accessing an underlying ZFS filesystem using Samba. But if it is UFS, it is fine. =20 Does this mean anything to anyone? =20 Ie: md5=E2=80=99ing the same file over SMB, one on UFS (/tmp) one on ZFS:= =20 cd5d0011c28fb335d57a83b3751831e7 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe bb433ae7e4c3c70c49b3c8c1590e8aa5 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 8eeaf672f6742ae4f900b16ec3cb190a *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe bc327dc715516b5ba2e8478036112bd2 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 0cde0cf7ec036cedc8f3294153209b4c *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 71e705470a4af5533eb019e00df3a946 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe ba7041e4cad852d00c8da1a461e3b5f9 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 7ce9ea8b9a4d8858899da23472a24c76 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 8f0eff7cb6069ff39aa46e2affc27a4b *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe c23fceb0302fd59b49e22bce61eabe8d *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 46c9d538c99be3947b92f9ec47bb900a *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 2a2a94c94a167a8e525e368aceb07875 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe d303861d09b0584f6c6621e9881e3f63 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe ad8f8cef1829de206460b947687909f0 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 9a866d9602a9df92b6acb6f1182b05ab *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 5552491a9e295890ad48064440d8d05b *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe ceee04c26b03132db48d67c076526c82 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 7aa666918d73e40a25ccdb1c104f8476 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 561aa772884c0b7ef139f556355adffb *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 30540ecb4bfb8533969f4a4137a77e79 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe c0f315f00be76a4e15dec68de2bba49b *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 9de4864a97ed4ad9c495c221fe1b932f *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 47c8ad183dbe0d4637229af08cc2cd89 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe c9bfe8c7073940acbcdb31430eb4a061 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 327605a6ddb89f7a3e2bd056c5f28b2a *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 6008526a44790297110f4361fe1a5292 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 3f6444cf9b7482df5b6aee577906821c *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 23a3fea1c1c79df4cdc30544f2af1b2d *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 1591ac3f2e730a1a47792241bb708a1c *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe fa7c62b330717a66b5442c7df2bdce3e *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 996cbec57e67a14f69bb288e43eb81b2 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 074fe31d93ed0ccf42867bfe34502c1a *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 4d69eb69423fd8e373978c068003021c *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 82cc83d5af8f0217f8d196882ddf5d90 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe b610773b74ec85511548dfe6d3d12b74 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe a4a694f353175ef774a25a92bc35badc *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe c7a23899df5987bd65a8c7e0cf0dfcd3 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 5a4fdf3f3d74562eec83491236a168a4 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 65fbd57a0ebaa3e94ab78ea3b3ec8497 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 476208d260b18c724e77e43fe79c6960 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 3880b71d78a22422b8299c66f7192cb0 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe f2a1540d9833f1faab312026164d271d *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe cde5e5dcf53e0eb93e6af64b70e7961f *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe d584260380b2800e85dc2c877534378a *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 77d36239bd1728219196461f57d2b859 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 4fbad72dde8d79e6103dac67fad852be *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 91f59e575e6cca8f402e228d8a72ad1a *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 9ddac79b29819dbe88dd7583ee6df4b9 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 1264d7634c329125bf87d9d9ab40a128 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 00e004a4a491377b965c8bc5515a9e6f *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 774ec65d5a04f4482bb99a8c05aebff7 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe ea36f8719932894229911a5a958c778b *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 113d3573f119dedf2a09c27e52957a5d *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 0c35c60ab988e140d0a6ff8e52027576 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 536a6b09753c661f7029a0bb983a6e93 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 53a16a6860544c3dba85ca46f97c1865 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 4c5a453abc1c72da148bd1a5e4addafa *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 61a1920af7e0250eecc91afeb79e3ce9 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 774b7dd4b9cc894dda000a572e7dfed5 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 11745efe136291bed3f3db64e12449b5 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe d4915a8b84fd35650d0aaf537119977a *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 62f386bce4f1193a1aca73283164d6f9 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 4664c7b894ffd928c55cc9089b64bdf3 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 3973b22753d411d3bf736537fc20ae20 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe a4b96ffd965667fdd2d73d44afcfbdcb *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe eb2bb2e51a7439e1ed3d17b1280cf760 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 21fc6cff11e9d8e22595b6af19d69e67 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe b88eab9b32e58c072258a38b037a9a25 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 2730294929de8e517c83041cb7233291 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe cddbe019b780dcf056525c48703d7b1e *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 2652bb1a766d002e633a73d10321edfe *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 52136867ac3d477e9643aa15a2c0e957 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe =20 The //kinetic/temp/ share is on UFS (it is /tmp) while //kinetic/pulse/temp is on the /pulse ZFS pool. =20 For the record, locally on kinetic: =20 kinetic:~# md5 /tmp/Desktop.exe MD5 (/tmp/Desktop.exe) =3D 2447bdb56c5fa8efa761ffa100908022 kinetic:~# md5 /pulse/shares/cti/bin/Desktop.exe=20 MD5 (/pulse/shares/cti/bin/Desktop.exe) =3D 2447bdb56c5fa8efa761ffa100908022 kinetic:~# =20 So accessing the filesystem local is okay? It is only a samba off of ZFS issue? =20 Following that, eventually (a few days time) network traffic in general will start to be corrupted (hence ssh connections drop out, the netcat sessions below, etc). =20 I=E2=80=99ve tried testing a generic kernel and without zfs, and everythi= ng is fine. It is only once ZFS is loaded into the kernel and we try to access it using samba does this happen. =20 Smb.conf: =20 #=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Gl= obal Settings =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D [global] workgroup =3D PULSE server string =3D Kinetic ZFS Fileserver netbios name =3D KINETIC security =3D user load printers =3D no log file =3D /var/log/samba/log.%m #log level =3D 10 max log size =3D 50 encrypt passwords =3D yes =20 #smb ports =3D 139 socket options =3D TCP_NODELAY SO_SNDBUF=3D65536 SO_RCVBUF=3D65536 #socket options =3D TCP_NODELAY SO_SNDBUF=3D8192 SO_RCVBUF=3D8192 read raw =3D yes use sendfile =3D yes directory name cache size =3D 0 =20 preserve case =3D yes short preserve case =3D yes case sensitive =3D no =20 guest account =3D nobody =20 wins support =3D yes #passdb backend =3D ldapsam:"ldap://gold.pulse.local" passdb backend =3D ldapsam:"ldap://kinetic.pulse.local ldap://gold.pulse.local" ldap ssl =3D no ldap admin dn =3D cn=3DManager,dc=3Dpulse,dc=3Dlocal ldap suffix =3D dc=3Dpulse,dc=3Dlocal ldap group suffix =3D ou=3DGroups ldap user suffix =3D ou=3DUsers ldap machine suffix =3D ou=3DComputers =20 #nt acl support =3D yes #acl compatibility =3D auto #acl group control =3D yes #acl map full control =3D true =20 #=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D Share Definitions =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D =20 [Temp] comment =3D Temp Space guest ok =3D yes browseable =3D Yes path =3D /tmp =20 etc... =20 dmesg: =20 Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.1-RC1 #4: Thu Jun 24 16:09:27 NZST 2010 martinm@kinetic.pulse.local:/usr/obj/usr/src/sys/PULSE amd64 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ (2712.36-MHz K8-class CPU) Origin =3D "AuthenticAMD" Id =3D 0x60fb2 Family =3D f Model =3D 6b Stepping =3D 2 =20 Features=3D0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PG= E ,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=3D0x2001<SSE3,CX16> AMD Features=3D0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!> AMD Features2=3D0x11f<LAHF,CMP,SVM,ExtAPIC,CR8,Prefetch> TSC: P-state invariant real memory =3D 4294967296 (4096 MB) avail memory =3D 4044939264 (3857 MB) ACPI APIC Table: <GBT NVDAACPI> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 2 ioapic0 <Version 1.1> irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: <GBT NVDAACPI> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, cbdf0000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 acpi_hpet0: <High Precision Event Timer> iomem 0xfeff0000-0xfeff03ff on acpi0 Timecounter "HPET" frequency 25000000 Hz quality 900 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pci0: <memory, RAM> at device 0.0 (no driver attached) isab0: <PCI-ISA bridge> at device 1.0 on pci0 isa0: <ISA bus> on isab0 pci0: <serial bus, SMBus> at device 1.1 (no driver attached) pci0: <memory, RAM> at device 1.2 (no driver attached) ohci0: <nVidia nForce MCP61 USB Controller> mem 0xfe02f000-0xfe02ffff irq 21 at device 2.0 on pci0 ohci0: [ITHREAD] usbus0: <nVidia nForce MCP61 USB Controller> on ohci0 ehci0: <NVIDIA nForce MCP61 USB 2.0 controller> mem 0xfe02e000-0xfe02e0ff irq 22 at device 2.1 on pci0 ehci0: [ITHREAD] usbus1: EHCI version 1.0 usbus1: <NVIDIA nForce MCP61 USB 2.0 controller> on ehci0 pcib1: <ACPI PCI-PCI bridge> at device 4.0 on pci0 pci1: <ACPI PCI bus> on pcib1 em0: <Intel(R) PRO/1000 Legacy Network Connection 1.0.1> port 0xcc00-0xcc3f mem 0xfdfe0000-0xfdffffff,0xfdfc0000-0xfdfdffff irq 17 at device 7.0 on pci1 em0: [FILTER] em0: Ethernet address: 00:0e:0c:6b:d6:d3 pci0: <multimedia, HDA> at device 5.0 (no driver attached) atapci0: <nVidia nForce MCP61 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 6.0 on pci0 ata0: <ATA channel 0> on atapci0 ata0: [ITHREAD] ata1: <ATA channel 1> on atapci0 ata1: [ITHREAD] nfe0: <NVIDIA nForce MCP61 Networking Adapter> port 0xec00-0xec07 mem 0xfe02d000-0xfe02dfff irq 20 at device 7.0 on pci0 miibus0: <MII bus> on nfe0 rlphy0: <RTL8201L 10/100 media interface> PHY 1 on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto nfe0: Ethernet address: 00:24:1d:15:11:48 nfe0: [FILTER] nfe0: [FILTER] nfe0: [FILTER] nfe0: [FILTER] nfe0: [FILTER] nfe0: [FILTER] nfe0: [FILTER] nfe0: [FILTER] atapci1: <nVidia nForce MCP61 SATA300 controller> port 0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xd800-0xd80f mem 0xfe02c000-0xfe02cfff irq 21 at device 8.0 on pci0 atapci1: [ITHREAD] ata2: <ATA channel 0> on atapci1 ata2: [ITHREAD] ata3: <ATA channel 1> on atapci1 ata3: [ITHREAD] pcib2: <ACPI PCI-PCI bridge> at device 9.0 on pci0 pci2: <ACPI PCI bus> on pcib2 mvs0: <Marvell 88SX7042 SATA controller> port 0xbc00-0xbcff mem 0xfde00000-0xfdefffff irq 16 at device 0.0 on pci2 mvs0: Gen-IIe, 4 3Gbps ports, Port Multiplier supported with FBS mvs0: [ITHREAD] mvsch0: <Marvell SATA channel> at channel 0 on mvs0 mvsch0: [ITHREAD] mvsch1: <Marvell SATA channel> at channel 1 on mvs0 mvsch1: [ITHREAD] mvsch2: <Marvell SATA channel> at channel 2 on mvs0 mvsch2: [ITHREAD] mvsch3: <Marvell SATA channel> at channel 3 on mvs0 mvsch3: [ITHREAD] vgapci0: <VGA-compatible display> mem 0xfb000000-0xfbffffff,0xd0000000-0xdfffffff,0xfc000000-0xfcffffff irq 22 at device 13.0 on pci0 atrtc0: <AT realtime clock> port 0x70-0x73 on acpi0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: [FILTER] ppc0: <Parallel port> port 0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppc0: [ITHREAD] ppbus0: <Parallel port bus> on ppc0 plip0: <PLIP network interface> on ppbus0 plip0: [ITHREAD] lpt0: <Printer> on ppbus0 lpt0: [ITHREAD] lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 orm0: <ISA Option ROMs> at iomem 0xd0000-0xd3fff,0xdb000-0xdbfff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=3D0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] acpi_throttle0: <ACPI CPU Throttling> on cpu0 powernow0: <PowerNow! K8> on cpu0 device_attach: powernow0 attach returned 6 acpi_throttle1: <ACPI CPU Throttling> on cpu1 acpi_throttle1: failed to attach P_CNT device_attach: acpi_throttle1 attach returned 6 powernow1: <PowerNow! K8> on cpu1 device_attach: powernow1 attach returned 6 Timecounters tick every 1.000 msec usbus0: 12Mbps Full Speed USB v1.0 usbus1: 480Mbps High Speed USB v2.0 acd0: DVDR <HL-DT-STDVD-RAM GH22NP20/1.02> at ata0-slave UDMA66=20 ad4: 76319MB <WDC WD800JD-60LSA0 07.01D07> at ata2-master UDMA100 SATA 3Gb/s ugen0.1: <nVidia> at usbus0 uhub0: <nVidia OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 ugen1.1: <nVidia> at usbus1 uhub1: <nVidia EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 uhub0: 10 ports with 10 removable, self powered ada0 at mvsch0 bus 0 scbus0 target 0 lun 0 ada0: <GB0500C4413 HPG1> ATA-7 SATA 1.x device ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada1 at mvsch1 bus 0 scbus1 target 0 lun 0 ada1: <GB0500C4413 HPG1> ATA-7 SATA 1.x device ada1: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada2 at mvsch2 bus 0 scbus2 target 0 lun 0 ada2: <GB0500C4413 HPG3> ATA-7 SATA 1.x device ada2: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada3 at mvsch3 bus 0 scbus3 target 0 lun 0 ada3: <GB0500C4413 HPG1> ATA-7 SATA 1.x device ada3: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) SMP: AP CPU #1 Launched! Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 uhub1: 10 ports with 10 removable, self powered Trying to mount root from ufs:/dev/ad4s1a ugen0.2: <CHICONY> at usbus0 ukbd0: <CHICONY Compaq USB Keyboard, class 0/0, rev 1.10/1.05, addr 2> on usbus0 kbd2 at ukbd0 uhid0: <CHICONY Compaq USB Keyboard, class 0/0, rev 1.10/1.05, addr 2> on usbus0 em0: link state changed to UP ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present; to enable, add "vfs.zfs.prefetch_disable=3D0" to /boot/loader.conf. ZFS filesystem version 3 ZFS storage pool version 14 kinetic:~# =20 =20 I=E2=80=99ve since removed everything from /etc/sysctl.conf and/boot/load= er.conf so no tuning is used. I=E2=80=99ve also been fiddling and trying all sort= s of different things in smb.conf. =20 It makes no difference. =20 I am at a complete loss as to what is going on here. =20 Should I just give up? Is there some obscure ZFS+Samba issue on FreeBSD? =20 Thanks, Martin. =20 =20 From: Martin Minkus=20 Sent: Wednesday, 23 June 2010 16:01 To: freebsd-questions@freebsd.org Subject: sshd / tcp packet corruption ? =20 It seems this issue I reported below may actually be related to some kind of TCP packet corruption ? =20 Still same box. I=E2=80=99ve noticed my SSH connections into the box will= die randomly, with errors. =20 Sshd logs the following on the box itself: =20 Jun 18 11:15:32 kinetic sshd[1406]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption.=20 Jun 18 11:15:41 kinetic sshd[15746]: Accepted publickey for martinm from 10.64.10.251 port 56469 ssh2 Jun 18 11:15:58 kinetic su: nss_ldap: could not get LDAP result - Can't contact LDAP server Jun 18 11:15:58 kinetic su: martinm to root on /dev/pts/0 Jun 18 11:16:06 kinetic su: martinm to root on /dev/pts/1 Jun 18 11:16:29 kinetic sshd[15748]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption.=20 Jun 18 11:16:30 kinetic sshd[15746]: syslogin_perform_logout: logout() returned an error Jun 18 11:16:34 kinetic sshd[16511]: Accepted publickey for martinm from 10.64.10.251 port 56470 ssh2 Jun 18 11:16:41 kinetic sshd[16513]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption.=20 Jun 18 11:16:41 kinetic sshd[16511]: syslogin_perform_logout: logout() returned an error =20 Jun 23 15:52:59 kinetic sshd[56974]: Received disconnect from 10.64.10.209: 5: Message Authentication Code did not verify (packet #75658). Data integrity has been compromised.=20 Jun 23 15:53:12 kinetic sshd[57109]: Accepted publickey for martinm from 10.64.10.209 port 9494 ssh2 Jun 23 15:53:38 kinetic su: martinm to root on /dev/pts/3 Jun 23 15:56:36 kinetic sshd[57111]: Received disconnect from 10.64.10.209: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption.=20 Jun 23 15:56:44 kinetic sshd[57151]: Accepted publickey for martinm from 10.64.10.209 port 9534 ssh2 =20 My googlefu has failed me on this. =20 Any ideas what on earth this could be ? =20 Ethernet card? =20 em0: <Intel(R) PRO/1000 Legacy Network Connection 1.0.1> port 0xcc00-0xcc3f mem 0xfdfe0000-0xfdffffff,0xfdfc0000-0xfdfdffff irq 17 at device 7.0 on pci1 em0: [FILTER] em0: Ethernet address: 00:0e:0c:6b:d6:d3 =20 em0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 =20 options=3D209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGI= C > ether 00:0e:0c:6b:d6:d3 inet 10.64.10.10 netmask 0xffffff00 broadcast 10.64.10.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active =20 Thanks, Martin. =20 =20 From: Martin Minkus=20 Sent: Monday, 14 June 2010 11:21 To: freebsd-questions@freebsd.org Subject: FreeBSD+ZFS+Samba: open_socket_in: Protocol not supported - after a few days? =20 Samba 3.4 on FreeBSD 8-STABLE branch. After a few days I start getting weird errors and windows PC's can't access the samba share, have trouble accessing files, etc, and samba becomes totally unusable. Restarting samba doesn't fix it =E2=80=93 only a reboot does. =20 Accessing files on the ZFS pool locally is fine. Other services (like dhcpd, openldap server) on the box continue to work fine. Only samba dies and by dies I mean it can no longer service clients and windows brings up bizarre errors. Windows can access our other samba servers (on linux, etc) just fine. Kernel: =20 FreeBSD kinetic.pulse.local 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #4: Wed May 26 18:09:14 NZST 2010 martinm@kinetic.pulse.local:/usr/obj/usr/src/sys/PULSE amd64 =20 Zpool status: =20 kinetic:~$ zpool status pool: pulse state: ONLINE scrub: none requested config: =20 NAME STATE READ WRITE CKSUM pulse ONLINE 0 =20 0 0 raidz1 ONLINE 0 =20 0 0 gptid/3baa4ef3-3ef8-0ac0-f110-f61ea23352 ONLINE 0 =20 0 0 gptid/0eaa8131-828e-6449-b9ba-89ac63729d ONLINE 0 =20 0 0 gptid/77a8da7c-8e3c-184c-9893-e0b12b2c60 ONLINE 0 =20 0 0 gptid/dddb2b48-a498-c1cd-82f2-a2d2feea01 ONLINE 0 =20 0 0 =20 errors: No known data errors kinetic:~$ log.smb: [2010/06/10 17:22:39, 0] lib/util_sock.c:902(open_socket_in) open_socket_in(): socket() call failed: Protocol not supported [2010/06/10 17:22:39, 0] smbd/server.c:457(smbd_open_one_socket) smbd_open_once_socket: open_socket_in: Protocol not supported [2010/06/10 17:22:39, 2] smbd/server.c:676(smbd_parent_loop) waiting for connections log.ANYPC: [2010/06/08 19:55:55, 0] lib/util_sock.c:1491(get_peer_addr_internal) getpeername failed. Error was Socket is not connected read_fd_with_timeout: client 0.0.0.0 read error =3D Socket is not connected. The code in lib/util_sock.c, around line 902: /*********************************************************************** ***** Open a socket of the specified type, port, and address for incoming data. ************************************************************************ ****/ int open_socket_in(int type, uint16_t port, int dlevel, const struct sockaddr_storage *psock, bool rebind) { struct sockaddr_storage sock; int res; socklen_t slen =3D sizeof(struct sockaddr_in); sock =3D *psock; #if defined(HAVE_IPV6) if (sock.ss_family =3D=3D AF_INET6) { ((struct sockaddr_in6 *)&sock)->sin6_port =3D htons(port); slen =3D sizeof(struct sockaddr_in6); } #endif if (sock.ss_family =3D=3D AF_INET) { ((struct sockaddr_in *)&sock)->sin_port =3D htons(port); } res =3D socket(sock.ss_family, type, 0 ); if( res =3D=3D -1 ) { if( DEBUGLVL(0) ) { dbgtext( "open_socket_in(): socket() call failed: " ); dbgtext( "%s\n", strerror( errno ) ); } In other words, it looks like something in the kernel is exhausted (what?). I don=E2=80=99t know if tuning is required, or this is some kind= of bug? /boot/loader.conf: mvs_load=3D"YES" zfs_load=3D"YES" vm.kmem_size=3D"20G" #vfs.zfs.arc_min=3D"512M" #vfs.zfs.arc_max=3D"1536M" vfs.zfs.arc_min=3D"512M" vfs.zfs.arc_max=3D"3072M" I=E2=80=99ve played with a few sysctl settings (found these recommendatio= ns online, but they make no difference) /etc/sysctl.conf: kern.ipc.maxsockbuf=3D2097152 net.inet.tcp.sendspace=3D262144 net.inet.tcp.recvspace=3D262144 net.inet.tcp.mssdflt=3D1452 net.inet.udp.recvspace=3D65535 net.inet.udp.maxdgram=3D65535 net.local.stream.recvspace=3D65535 net.local.stream.sendspace=3D65535 Any ideas on what could possibly be going wrong? =20 Any help would be greatly appreciated! =20 Thanks, Martin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?H00000ac003077fe.1277782865.silver.pulse.local>