Date: Tue, 3 May 2005 01:58:10 -0700 From: steve@Watt.COM (Steve Watt) To: hackers@freebsd.org Subject: EPERM from write() Message-ID: <200505030858.j438wA1q038864@wattres.watt.com>
next in thread | raw e-mail | index | archive | help
I'm down in the bowels of chasing a weird bug that is being presented by sendmail when doing TLS. For *some* clients it works, but for the ones that fail, they fail hard. There are various weird things in my analyses. The proximate symptom is that the call to write() to put out the ServerHello message returns -1, errno 1 (EPERM), but I can see from the tcpdump trace that most (all?) of it got out. Looking backwards through the ktrace reveals that the fd that is being written to (fd 4) is the fd that the "Ready to start TLS" message went out on, and that (previous) write completed successfully. Here's the ktrace from that successful write to the one that blew up, sans numerous calls to gettimeofday and getpid: 35388 sendmail GIO fd 4 wrote 30 bytes "220 2.0.0 Ready to start TLS\r " 35388 sendmail RET write 30/0x1e 35388 sendmail CALL fcntl(0x8,0x3,0) 35388 sendmail RET fcntl 2 (get flags on fd 8) 35388 sendmail CALL fcntl(0x8,0x4,0x6) 35388 sendmail RET fcntl 0 (set O_NONBLOCK on fd 8) 35388 sendmail CALL break(0x8191000) 35388 sendmail RET break 0 35388 sendmail CALL break(0x8196000) 35388 sendmail RET break 0 35388 sendmail CALL read(0x8,0x8188000,0xb) 35388 sendmail RET read -1 errno 35 Resource temporarily unavailable 35388 sendmail CALL select(0x9,0xbfbfbe00,0,0xbfbfb850,0xbfbfb648) 35388 sendmail RET select 1 35388 sendmail CALL read(0x8,0x8188000,0xb) (Here's the ClientHello) 35388 sendmail GIO fd 8 read 11 bytes 0x0000 807c 0103 0100 6300 0000 10 |.|....c....| 35388 sendmail RET read 11/0xb 35388 sendmail CALL read(0x8,0x818800b,0x73) (and the rest of the ClientHello, with all the cipher suites) 35388 sendmail GIO fd 8 read 115 bytes 0x0000 0000 3900 0038 0000 3500 0016 0000 1300 000a 0700 |..9..8..5...........| 0x0014 c000 0033 0000 3200 002f 0300 8000 0066 0000 0500 |...3..2../.....f....| 0x0028 0004 0100 8008 0080 0000 6300 0062 0000 6100 0015 |..........c..b..a...| 0x003c 0000 1200 0009 0600 4000 0065 0000 6400 0060 0000 |........@..e..d..`..| 0x0050 1400 0011 0000 0800 0006 0400 8000 0003 0200 804d |...................M| 0x0064 57e4 2a11 c967 1804 e745 7cfc a991 93 |W.*..g...E|....| 35388 sendmail RET read 115/0x73 35388 sendmail CALL break(0x8197000) 35388 sendmail RET break 0 35388 sendmail CALL break(0x8198000) 35388 sendmail RET break 0 (here's the write that fails, but I see all 0xa59 bytes on the wire!) 35388 sendmail CALL write(0x4,0x8174000,0xa59) 35388 sendmail RET write -1 errno 1 Operation not permitted (And just for fun the syslog that comes out, too.) 35388 sendmail CALL sendto(0x3,0xbfbfa4d0,0x6d,0,0,0) 35388 sendmail GIO fd 3 wrote 109 bytes "<20>May 2 23:16:25 sm-mta[35388]: STARTTLS=server, error: accept failed=-1, SSL\ _error=5, timedout=0, errno=1" 35388 sendmail RET sendto 109/0x6d I'm really kind of puzzled at this point. Why would I get -1/EPERM on an apparently completed write? The box is 5.4-STABLE, fresh cvsup as of 2 May 05 17:47 GMT. I forced a rebuild of the SASL stuff (cyrus-sasl-2.1.20_1 and cyrus-sasl-saslauthd-2.1.20_1) , and then rebuilt sendmail again. I've got the following in my make.conf re: sendmail building: SENDMAIL_CFLAGS+=-I/usr/local/include -DSASL=2 SENDMAIL_CFLAGS+=-D_FFR_SMTP_SSL SENDMAIL_LDFLAGS+=-L/usr/local/lib SENDMAIL_LDADD+=-lsasl2 The client machine appears to be running 8.13.1. What I see in the tcpdump disassembly (thanks to Ethereal!) is (paraphrased) {client} {mybox} TCP S {mybox} {client} TCP SA {client} {mybox} TCP A {mybox} {client} TCP A seq=1 ack=1 len=89 "220 {mybox} ESMTP Sendmail 8.13.3/8.13.3" (+date) {client} {mybox} TCP A seq=1 ack=90 len=0 {client} {mybox} TCP A seq=1 ack=90 len=24 "EHLO {client}\r\n" {mybox} {client} TCP A seq=90 ack=25 len=226 ESMTP options menu, includes STARTTLS {client} {mybox} TCP A seq=25 ack=316 len=0 {client} {mybox} TCP A seq=25 ack=316 len=10 "STARTTLS\r\n" {mybox} {client} TCP A seq=316 ack=35 len=30 "220 Ready to start TLS\r\n" {client} {mybox} TCP A seq=35 ack=346 len=0 {client} {mybox} TCP A seq=35 ack=346 len=126 SSLv2 Record len 124 Handshake: ClientHello, TLS 1.0, gaggle of cipher suites, no resume session, 16 byte challenge {mybox} {client} TCP A seq=346 ack=161 len=1448 TLS Record Handshake (22) TLS 1.0 (0x0301) len 74 Handshake: ServerHello (2), random, cipher suite TLS_DHE_RSA_WITH_AES256_CBC_SHA, no compression TLS Record Handshake (22) TLS 1.0 len 2184 Handshake: Certificate pile (first segment) {client} {mybox} TCP A seq=161 ack=1794 len=0 {mybox} {client} TCP FPA seq=1796 ack=161 len=1199 Continuation of certificate pile TLS Record Handshake (22) TLS 1.0 (0x0301) len 205 Handshake: ServerKeyExchange TLS Record Handshake (22) TLS 1.0 (0x0301) len 166 Handshake: CertificateRequest {client} {mybox} TCP A seq=161 ack=1794 len=0 {mybox} {client} TCP FPA seq=1794 ack=161 len=1201 Continuation of certificate pile TLS Record Handshake (22) TLS 1.0 (0x0301) len 205 Handshake: ServerKeyExchange TLS Record Handshake (22) TLS 1.0 (0x0301) len 166 Handshake: CertificateRequest {client} {mybox} TCP A seq=161 ack=2996 len=146 TLS Record Handshake (22) TLS 1.0 (0x0301) len 7 Handshake: Certificate (0 certificates) TLS Record Handshake (22) TLS 1.0 (0x0301) len 70 Handshake: ClientKeyExchange (len 66) TLS Record ChangeCipherSpec TLS 1.0 (0x0301) len 1 TLS Record Handshake (22) TLS 1.0 (0x0301) len 48 Handshake: Encrypted {mybox} {client} TCP RST seq=2996 ack={0} len=0 The offset sequence number in the first FIN segment from my FreeBSD box (sequence 1796, when 1794 is the next expected) is reproduceable. Another odd thing is that the client responded to the SSL record from my server, which I find mildly unexpected since there's no ServerHelloDone message. I suppose that could be because the data got pushed up the SSL stack because of the FIN, and it didn't notice. Extra intriguingly, the ISC server that's running the DHCP mailing list can successfully transfer mail, and it uses TLS. As can any number of other servers... This appears to have broken sometime between late February and mid April, but enough sites work that I didn't notice it until recently. I'm pretty stumped, and looking for ideas... Solving the TLS handshake problem is first, but that 2 byte skip in sequence numbers is _weird_. I suppose it could be related somehow. -- Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8" / 37N 20' 14.9" Internet: steve @ Watt.COM Whois: SW32 Free time? There's no such thing. It just comes in varying prices...
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200505030858.j438wA1q038864>