Date: Sat, 05 Sep 2020 17:49:33 +0100 From: Mike Clarke <jmc-freebsd2@milibyte.co.uk> To: freebsd-questions@freebsd.org Subject: Exim closes connection when negotiating SSL handshake Message-ID: <15996093.ofW7TPSIZ6@curlew>
next in thread | raw e-mail | index | archive | help
A couple of months ago I had a problem with exim after it was upgraded exim- sa-exim from 4.93 to 4.94. When attempting to send mail exim would close the connection while trying to establish an SSL connection. I've now managed get 4.94 working but I would appreciate any advice on whether what I've done is a valid modification to my configuration or if it's just a lucky workaround which works for now but might come back and bite me after some future upgrade. The original problem was that exim would immediately close the connection on starting to negotiate an SSL handshake. It would then periodically retry to send the email and eventually succeed but only after an unpredictable number of retries, for example one message succeeded on the second attempt after 30 minutes but a different one needed seven retries over three and a half hours. I posted the problem in an earlier thread here and received a number of useful comments which helped me to track down where things were going wrong but didn't enable me to determine why it was happening or to resolve the problem so as a temporary workaround I reverted to 4.93. The full thread is at https://lists.freebsd.org/pipermail/freebsd-questions/2020-June/ 290344.html Things have now moved on and a recent upgrade to perl has led to a dependency issue which I temporarily worked around by adding an entry for libperl.so. 5.30 in /etc/libmap.conf but this has prompted me to try to fix the problem properly before any future dependency issues create a more serious problem. The following extract from truss output for exim 4.94 while attempting to send an email covers the SSL negotiation 3216: read(9,"##\n## ca-root-nss.crt -- Bundl"...,131072) = 131072 (0x20000) After about 40 lines reading /etc/ssl/cert.pem it continues with: 3216: read(9,0x8011adec0,131072) = 0 (0x0) 3216: close(9) = 0 (0x0) 3216: issetugid() = 0 (0x0) 3216: setitimer(0,{ 0.000000, 300.000000 },{ 0.000000, 0.000000 }) = 0 (0x0) 3216: mmap(0x0,24576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34385039360 (0x801821000) 3216: mmap(0x0,24576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34385063936 (0x801827000) 3216: getpid() = 3216 (0xc90) 3216: getpid() = 3216 (0xc90) 3216: getpid() = 3216 (0xc90) 3216: getpid() = 3216 (0xc90) 3216: getpid() = 3216 (0xc90) 3216: getpid() = 3216 (0xc90) 3216: write(7,"\^V\^C\^A\^A%\^A\0\^A!\^C\^Ck"...,298) ERR#57 'Socket is not connected' A similar section for 4.93 follows the same pattern but ends with 88001: write(7,"\^V\^C\^A\^A%\^A\0\^A!\^C\^C\M-`"...,298) = 298 (0x12a) ... and goes on to successfully send the email. When I look in the debug output from exim 4.94 for details of the SSL negotiation I see the following 12:14:25 48240 Connecting to electron.ecohosting.co.uk [185.53.59.181]:465 ... 185.53.59.181 in hosts_try_fastopen? yes (matched "*") 12:14:25 48240 TFO mode connection attempt to 185.53.59.181, 0 data 12:14:25 48240 connected 12:14:25 48240 ╭considering: $primary_hostname 12:14:25 48240 ├──expanding: $primary_hostname 12:14:25 48240 ╰─────result: curlew.milibyte.co.uk 12:14:25 48240 185.53.59.181 in hosts_avoid_esmtp? no (option unset) 12:14:25 48240 185.53.59.181 in hosts_require_ocsp? no (option unset) 12:14:25 48240 185.53.59.181 in hosts_request_ocsp? yes (matched "*") 12:14:25 48240 setting SSL CTX options: 0x42004000 12:14:25 48240 Diffie-Hellman initialized from default with 2048-bit prime 12:14:25 48240 Initialized TLS 12:14:25 48240 185.53.59.181 in tls_verify_hosts? no (option unset) 12:14:25 48240 185.53.59.181 in tls_try_verify_hosts? yes (matched "*") 12:14:25 48240 tls_verify_certificates: system 12:14:25 48240 185.53.59.181 in tls_verify_cert_hostnames? yes (matched "*") 12:14:25 48240 Cert hostname to check: "electron.ecohosting.co.uk" 12:14:25 48240 Calling SSL_connect 12:14:25 48240 SSL_connect: before SSL initialization 12:14:25 48240 SSL_connect: error in SSLv3/TLS write client hello 12:14:25 48240 TLS error '(SSL_connect): error: 00000000:lib(0):func(0):reason(0)' 12:14:25 48240 TLS session fail: (SSL_connect): error: 00000000:lib(0):func(0):reason(0) 12:14:25 48240 SMTP(close)>> 12:14:25 48240 set_process_info: 48240 delivering 1k7GcL-000CY0-IH: just tried electron.ecohosting.co.uk [185.53.59.181]:465 for #REDACTED#@gmail.com: result DEFER 12:14:25 48240 added retry item for T:electron.ecohosting.co.uk: 185.53.59.181:465: errno=-37 more_errno=0,A flags=2 It appears that exim is failing to make a TCP Fast Open connection to the remote server. In fact it looks like it doesn't even try. Wireshark shows that it closes the connection immediately after receiving the first response from the remote server. No. Time Source Destination Protocol Length Info 1 0.000000000 192.168.1.13 185.53.59.181 TCP 78 54794 → 465 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 SACK_PERM=1 TSval=4153193768 TSecr=0 TFO=R No. Time Source Destination Protocol Length Info 2 0.022693018 185.53.59.181 192.168.1.13 TCP 74 465 → 54794 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1452 SACK_PERM=1 TSval=2019435234 TSecr=4153193768 WS=128 No. Time Source Destination Protocol Length Info 3 0.022718023 192.168.1.13 185.53.59.181 TCP 66 54794 → 465 [FIN, ACK] Seq=1 Ack=1 Win=66752 Len=0 TSval=4153193791 TSecr=2019435234 This problem is not related to a specific remote server, I have tried routing through mail servers run by four different service providers and see exactly the same behaviour. But looking at a similar section from debug output for 4.93 it appears that 4.93 does not attempt to use TCP Fast Open even though hosts_try_fastopen is set to *. 12:00:36 44920 Connecting to electron.ecohosting.co.uk [185.53.59.181]:465 ... 185.53.59.181 in hosts_try_fastopen? yes (matched "*") 12:00:36 44920 non-TFO mode connection attempt to 185.53.59.181, 0 data 12:00:36 44920 connected 12:00:36 44920 ╭considering: $primary_hostname 12:00:36 44920 ├──expanding: $primary_hostname 12:00:36 44920 ╰─────result: curlew.milibyte.co.uk 12:00:36 44920 185.53.59.181 in hosts_avoid_esmtp? no (option unset) 12:00:36 44920 185.53.59.181 in hosts_require_ocsp? no (option unset) 12:00:36 44920 185.53.59.181 in hosts_request_ocsp? yes (matched "*") 12:00:36 44920 setting SSL CTX options: 0x42004000 12:00:36 44920 Diffie-Hellman initialized from default with 2048-bit prime 12:00:36 44920 Initialized TLS 12:00:36 44920 185.53.59.181 in tls_verify_hosts? no (option unset) 12:00:36 44920 185.53.59.181 in tls_try_verify_hosts? yes (matched "*") 12:00:36 44920 tls_verify_certificates: system 12:00:36 44920 185.53.59.181 in tls_verify_cert_hostnames? yes (matched "*") 12:00:36 44920 Cert hostname to check: "electron.ecohosting.co.uk" 12:00:36 44920 Calling SSL_connect 12:00:36 44920 SSL_connect: before SSL initialization 12:00:36 44920 SSL_connect: SSLv3/TLS write client hello 12:00:36 44920 SSL_connect: SSLv3/TLS write client hello 12:00:36 44920 SSL_connect: SSLv3/TLS read server hello 12:00:36 44920 SSL verify ok: depth=2 SN=/O=Digital Signature Trust Co./ CN=DST Root CA X3 12:00:36 44920 SSL verify ok: depth=1 SN=/C=US/O=Let's Encrypt/CN=Let's Encrypt Authority X3 12:00:36 44920 SSL authenticated verify ok: depth=0 SN=/ CN=electron.ecohosting.co.uk 12:00:36 44920 SSL_connect: SSLv3/TLS read server certificate 12:00:36 44920 SSL_connect: SSLv3/TLS read server key exchange 12:00:36 44920 Received TLS status callback (OCSP stapling): 12:00:36 44920 null 12:00:36 44920 SSL_connect: SSLv3/TLS read server done So as an experiment I removed the default value of * for hosts_try_fastopen for 4.94 by reseting it in the transport section of the configuration file f: remote_auth_smtp: driver = smtp protocol = smtps hosts_require_auth = $host_address hosts_require_tls = $host_address hosts_try_fastopen = This appears to have 'fixed' the problem with exim 4.94 but leaves various questions in my mind: 1) exim 4.93 appears to not use TCP Fast Open despite the default value of hosts_try_fastopen implying that it should use it. Was this a bug or a feature? 2) exim 4.94 attempts to respect the default value of hosts_try_fastopen and tries to use TFO but fails. Should it not fall back to not using TFO after a failure instead of aborting and is this a bug? 3) Is TFO supported by FreeBSD 12? The exim documentation states "On (at least some) current Linux distributions the facility must be enabled in the kernel by the sysadmin before the support is usable." - https://www.exim.org/ exim-html-current/doc/html/spec_html/ch-the_smtp_transport.html . Would this also apply for FreeBSD though I assume it is supported because sysctl shows net.inet.tcp.fastopen.client_enable: 1 3) Is there some error in my configuration which was not sufficient to cause a problem with 4.93 but causes problems with 4.94? Since web searches haven't produced details of anyone else suffering from this problem then it could be quite likely that I've done something wrong. -- Mike Clarke
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15996093.ofW7TPSIZ6>
