Date: Sat, 05 Sep 2020 17:49:33 +0100 From: Mike Clarke <jmc-freebsd2@milibyte.co.uk> To: freebsd-questions@freebsd.org Subject: Exim closes connection when negotiating SSL handshake Message-ID: <15996093.ofW7TPSIZ6@curlew>
next in thread | raw e-mail | index | archive | help
A couple of months ago I had a problem with exim after it was upgraded exim- sa-exim from 4.93 to 4.94. When attempting to send mail exim would close th= e=20 connection while trying to establish an SSL connection. I've now managed get 4.94 working but I would appreciate any advice on=20 whether what I've done is a valid modification to my configuration or if it= 's=20 just a lucky workaround which works for now but might come back and bite me= =20 after some future upgrade. The original problem was that exim would immediately close the connection o= n=20 starting to negotiate an SSL handshake. It would then periodically retry to= =20 send the email and eventually succeed but only after an unpredictable numbe= r=20 of retries, for example one message succeeded on the second attempt after 3= 0=20 minutes but a different one needed seven retries over three and a half hour= s. I posted the problem in an earlier thread here and received a number of=20 useful comments which helped me to track down where things were going wrong= =20 but didn't enable me to determine why it was happening or to resolve the=20 problem so as a temporary workaround I reverted to 4.93. The full thread is= =20 at https://lists.freebsd.org/pipermail/freebsd-questions/2020-June/ 290344.html=20 Things have now moved on and a recent upgrade to perl has led to a dependen= cy=20 issue which I temporarily worked around by adding an entry for libperl.so. 5.30 in /etc/libmap.conf but this has prompted me to try to fix the problem= =20 properly before any future dependency issues create a more serious problem. The following extract from truss output for exim 4.94 while attempting to=20 send an email covers the SSL negotiation 3216: read(9,"##\n## ca-root-nss.crt -- Bundl"...,131072) =3D 131072 (0x20= 000) After about 40 lines reading /etc/ssl/cert.pem it continues with: 3216: read(9,0x8011adec0,131072) =3D 0 (0x0) 3216: close(9) =3D 0 (0x0) 3216: issetugid() =3D 0 (0x0) 3216: setitimer(0,{ 0.000000, 300.000000 },{ 0.000000, 0.000000 }) =3D 0 (= 0x0) 3216: mmap(0x0,24576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =3D= =20 34385039360 (0x801821000) 3216: mmap(0x0,24576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =3D= =20 34385063936 (0x801827000) 3216: getpid() =3D 3216=20 (0xc90) 3216: getpid() =3D 3216=20 (0xc90) 3216: getpid() =3D 3216=20 (0xc90) 3216: getpid() =3D 3216=20 (0xc90) 3216: getpid() =3D 3216=20 (0xc90) 3216: getpid() =3D 3216=20 (0xc90) 3216: write(7,"\^V\^C\^A\^A%\^A\0\^A!\^C\^Ck"...,298) ERR#57 'Socket is no= t=20 connected' A similar section for 4.93 follows the same pattern but ends with=20 88001: write(7,"\^V\^C\^A\^A%\^A\0\^A!\^C\^C\M-`"...,298) =3D 298 (0x12a) =2E.. and goes on to successfully send the email. When I look in the debug output from exim 4.94 for details of the SSL=20 negotiation I see the following=20 12:14:25 48240 Connecting to electron.ecohosting.co.uk [185.53.59.181]:465= =20 =2E.. 185.53.59.181 in hosts_try_fastopen? yes (matched "*") 12:14:25 48240 TFO mode connection attempt to 185.53.59.181, 0 data 12:14:25 48240 connected 12:14:25 48240 =E2=95=ADconsidering: $primary_hostname 12:14:25 48240 =E2=94=9C=E2=94=80=E2=94=80expanding: $primary_hostname 12:14:25 48240 =E2=95=B0=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80resul= t: curlew.milibyte.co.uk 12:14:25 48240 185.53.59.181 in hosts_avoid_esmtp? no (option unset) 12:14:25 48240 185.53.59.181 in hosts_require_ocsp? no (option unset) 12:14:25 48240 185.53.59.181 in hosts_request_ocsp? yes (matched "*") 12:14:25 48240 setting SSL CTX options: 0x42004000 12:14:25 48240 Diffie-Hellman initialized from default with 2048-bit prime 12:14:25 48240 Initialized TLS 12:14:25 48240 185.53.59.181 in tls_verify_hosts? no (option unset) 12:14:25 48240 185.53.59.181 in tls_try_verify_hosts? yes (matched "*") 12:14:25 48240 tls_verify_certificates: system 12:14:25 48240 185.53.59.181 in tls_verify_cert_hostnames? yes (matched "*") 12:14:25 48240 Cert hostname to check: "electron.ecohosting.co.uk" 12:14:25 48240 Calling SSL_connect 12:14:25 48240 SSL_connect: before SSL initialization 12:14:25 48240 SSL_connect: error in SSLv3/TLS write client hello 12:14:25 48240 TLS error '(SSL_connect): error: 00000000:lib(0):func(0):reason(0)' 12:14:25 48240 TLS session fail: (SSL_connect): error: 00000000:lib(0):func(0):reason(0) 12:14:25 48240 SMTP(close)>> 12:14:25 48240 set_process_info: 48240 delivering 1k7GcL-000CY0-IH: just=20 tried electron.ecohosting.co.uk [185.53.59.181]:465 for #REDACTED#@gmail.co= m:=20 result DEFER 12:14:25 48240 added retry item for T:electron.ecohosting.co.uk: 185.53.59.181:465: errno=3D-37 more_errno=3D0,A flags=3D2 It appears that exim is failing to make a TCP Fast Open connection to the=20 remote server. In fact it looks like it doesn't even try. Wireshark shows=20 that it closes the connection immediately after receiving the first respons= e=20 from the remote server. No. Time Source Destination Protocol= =20 Length Info 1 0.000000000 192.168.1.13 185.53.59.181 TCP = =20 78 54794 =E2=86=92 465 [SYN] Seq=3D0 Win=3D65535 Len=3D0 MSS=3D1460 WS= =3D64 SACK_PERM=3D1=20 TSval=3D4153193768 TSecr=3D0 TFO=3DR No. Time Source Destination Protocol= =20 Length Info 2 0.022693018 185.53.59.181 192.168.1.13 TCP = =20 74 465 =E2=86=92 54794 [SYN, ACK] Seq=3D0 Ack=3D1 Win=3D28960 Len=3D0 M= SS=3D1452=20 SACK_PERM=3D1 TSval=3D2019435234 TSecr=3D4153193768 WS=3D128 No. Time Source Destination Protocol= =20 Length Info 3 0.022718023 192.168.1.13 185.53.59.181 TCP = =20 66 54794 =E2=86=92 465 [FIN, ACK] Seq=3D1 Ack=3D1 Win=3D66752 Len=3D0 T= Sval=3D4153193791=20 TSecr=3D2019435234 This problem is not related to a specific remote server, I have tried routi= ng=20 through mail servers run by four different service providers and see exactl= y=20 the same behaviour. But looking at a similar section from debug output for 4.93 it appears that= =20 4.93 does not attempt to use TCP Fast Open even though hosts_try_fastopen i= s=20 set to *. 12:00:36 44920 Connecting to electron.ecohosting.co.uk [185.53.59.181]:465= =20 =2E.. 185.53.59.181 in hosts_try_fastopen? yes (matched "*") 12:00:36 44920 non-TFO mode connection attempt to 185.53.59.181, 0 data 12:00:36 44920 connected 12:00:36 44920 =E2=95=ADconsidering: $primary_hostname 12:00:36 44920 =E2=94=9C=E2=94=80=E2=94=80expanding: $primary_hostname 12:00:36 44920 =E2=95=B0=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80resul= t: curlew.milibyte.co.uk 12:00:36 44920 185.53.59.181 in hosts_avoid_esmtp? no (option unset) 12:00:36 44920 185.53.59.181 in hosts_require_ocsp? no (option unset) 12:00:36 44920 185.53.59.181 in hosts_request_ocsp? yes (matched "*") 12:00:36 44920 setting SSL CTX options: 0x42004000 12:00:36 44920 Diffie-Hellman initialized from default with 2048-bit prime 12:00:36 44920 Initialized TLS 12:00:36 44920 185.53.59.181 in tls_verify_hosts? no (option unset) 12:00:36 44920 185.53.59.181 in tls_try_verify_hosts? yes (matched "*") 12:00:36 44920 tls_verify_certificates: system 12:00:36 44920 185.53.59.181 in tls_verify_cert_hostnames? yes (matched "*") 12:00:36 44920 Cert hostname to check: "electron.ecohosting.co.uk" 12:00:36 44920 Calling SSL_connect 12:00:36 44920 SSL_connect: before SSL initialization 12:00:36 44920 SSL_connect: SSLv3/TLS write client hello 12:00:36 44920 SSL_connect: SSLv3/TLS write client hello 12:00:36 44920 SSL_connect: SSLv3/TLS read server hello 12:00:36 44920 SSL verify ok: depth=3D2 SN=3D/O=3DDigital Signature Trust C= o./ CN=3DDST Root CA X3 12:00:36 44920 SSL verify ok: depth=3D1 SN=3D/C=3DUS/O=3DLet's Encrypt/CN= =3DLet's=20 Encrypt Authority X3 12:00:36 44920 SSL authenticated verify ok: depth=3D0 SN=3D/ CN=3Delectron.ecohosting.co.uk 12:00:36 44920 SSL_connect: SSLv3/TLS read server certificate 12:00:36 44920 SSL_connect: SSLv3/TLS read server key exchange 12:00:36 44920 Received TLS status callback (OCSP stapling): 12:00:36 44920 null 12:00:36 44920 SSL_connect: SSLv3/TLS read server done So as an experiment I removed the default value of * for hosts_try_fastopen= =20 for 4.94 by reseting it in the transport section of the configuration file = f: remote_auth_smtp:=20 driver =3D smtp=20 protocol =3D smtps=20 hosts_require_auth =3D $host_address=20 hosts_require_tls =3D $host_address=20 hosts_try_fastopen =3D This appears to have 'fixed' the problem with exim 4.94 but leaves various= =20 questions in my mind: 1) exim 4.93 appears to not use TCP Fast Open despite the default value of= =20 hosts_try_fastopen implying that it should use it. Was this a bug or a=20 feature? 2) exim 4.94 attempts to respect the default value of hosts_try_fastopen an= d=20 tries to use TFO but fails. Should it not fall back to not using TFO after = a=20 failure instead of aborting and is this a bug? 3) Is TFO supported by FreeBSD 12? The exim documentation states "On (at=20 least some) current Linux distributions the facility must be enabled in the= =20 kernel by the sysadmin before the support is usable." - https://www.exim.or= g/ exim-html-current/doc/html/spec_html/ch-the_smtp_transport.html . Would this also apply for FreeBSD though I assume it is supported because=20 sysctl shows net.inet.tcp.fastopen.client_enable: 1 3) Is there some error in my configuration which was not sufficient to caus= e=20 a problem with 4.93 but causes problems with 4.94? Since web searches haven= 't=20 produced details of anyone else suffering from this problem then it could b= e=20 quite likely that I've done something wrong. =2D-=20 Mike Clarke
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15996093.ofW7TPSIZ6>