Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 05 Sep 2020 17:49:33 +0100
From:      Mike Clarke <jmc-freebsd2@milibyte.co.uk>
To:        freebsd-questions@freebsd.org
Subject:   Exim closes connection when negotiating SSL handshake
Message-ID:  <15996093.ofW7TPSIZ6@curlew>

next in thread | raw e-mail | index | archive | help
A couple of months ago I had a problem with exim after it was upgraded exim-
sa-exim from 4.93 to 4.94. When attempting to send mail exim would close th=
e=20
connection while trying to establish an SSL connection.

I've now managed get 4.94 working but I would appreciate any advice on=20
whether what I've done is a valid modification to my configuration or if it=
's=20
just a lucky workaround which works for now but might come back and bite me=
=20
after some future upgrade.

The original problem was that exim would immediately close the connection o=
n=20
starting to negotiate an SSL handshake. It would then periodically retry to=
=20
send the email and eventually succeed but only after an unpredictable numbe=
r=20
of retries, for example one message succeeded on the second attempt after 3=
0=20
minutes but a different one needed seven retries over three and a half hour=
s.

I posted the problem in an earlier thread here and received a number of=20
useful comments which helped me to track down where things were going wrong=
=20
but didn't enable me to determine why it was happening or to resolve the=20
problem so as a temporary workaround I reverted to 4.93. The full thread is=
=20
at  https://lists.freebsd.org/pipermail/freebsd-questions/2020-June/
290344.html=20

Things have now moved on and a recent upgrade to perl has led to a dependen=
cy=20
issue which I temporarily worked around by adding an entry for libperl.so.
5.30 in /etc/libmap.conf but this has prompted me to try to fix the problem=
=20
properly before any future dependency issues create a more serious problem.

The following extract from truss output for exim 4.94 while attempting to=20
send an email covers the SSL negotiation

3216: read(9,"##\n##  ca-root-nss.crt -- Bundl"...,131072) =3D 131072 (0x20=
000)

After about 40 lines reading /etc/ssl/cert.pem it continues with:

 3216: read(9,0x8011adec0,131072)                =3D 0 (0x0)
 3216: close(9)                                  =3D 0 (0x0)
 3216: issetugid()                               =3D 0 (0x0)
 3216: setitimer(0,{ 0.000000, 300.000000 },{ 0.000000, 0.000000 }) =3D 0 (=
0x0)
 3216: mmap(0x0,24576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =3D=
=20
34385039360 (0x801821000)
 3216: mmap(0x0,24576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =3D=
=20
34385063936 (0x801827000)
 3216: getpid()                                  =3D 3216=20
(0xc90)
 3216: getpid()                                  =3D 3216=20
(0xc90)
 3216: getpid()                                  =3D 3216=20
(0xc90)
 3216: getpid()                                  =3D 3216=20
(0xc90)
 3216: getpid()                                  =3D 3216=20
(0xc90)
 3216: getpid()                                  =3D 3216=20
(0xc90)
 3216: write(7,"\^V\^C\^A\^A%\^A\0\^A!\^C\^Ck"...,298) ERR#57 'Socket is no=
t=20
connected'

A similar section for 4.93 follows the same pattern but ends with=20
 88001: write(7,"\^V\^C\^A\^A%\^A\0\^A!\^C\^C\M-`"...,298) =3D 298 (0x12a)
=2E.. and goes on to successfully send the email.
When I look in the debug output from exim 4.94 for details of the SSL=20
negotiation I see the following=20

12:14:25 48240 Connecting to electron.ecohosting.co.uk [185.53.59.181]:465=
=20
=2E.. 185.53.59.181 in hosts_try_fastopen? yes (matched "*")
12:14:25 48240  TFO mode connection attempt to 185.53.59.181, 0 data
12:14:25 48240  connected
12:14:25 48240  =E2=95=ADconsidering: $primary_hostname
12:14:25 48240  =E2=94=9C=E2=94=80=E2=94=80expanding: $primary_hostname
12:14:25 48240  =E2=95=B0=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80resul=
t: curlew.milibyte.co.uk
12:14:25 48240 185.53.59.181 in hosts_avoid_esmtp? no (option unset)
12:14:25 48240 185.53.59.181 in hosts_require_ocsp? no (option unset)
12:14:25 48240 185.53.59.181 in hosts_request_ocsp? yes (matched "*")
12:14:25 48240 setting SSL CTX options: 0x42004000
12:14:25 48240 Diffie-Hellman initialized from default with 2048-bit prime
12:14:25 48240 Initialized TLS
12:14:25 48240 185.53.59.181 in tls_verify_hosts? no (option unset)
12:14:25 48240 185.53.59.181 in tls_try_verify_hosts? yes (matched "*")
12:14:25 48240 tls_verify_certificates: system
12:14:25 48240 185.53.59.181 in tls_verify_cert_hostnames? yes (matched "*")
12:14:25 48240 Cert hostname to check: "electron.ecohosting.co.uk"
12:14:25 48240 Calling SSL_connect
12:14:25 48240 SSL_connect: before SSL initialization
12:14:25 48240 SSL_connect: error in SSLv3/TLS write client hello
12:14:25 48240 TLS error '(SSL_connect): error:
00000000:lib(0):func(0):reason(0)'
12:14:25 48240 TLS session fail: (SSL_connect): error:
00000000:lib(0):func(0):reason(0)
12:14:25 48240   SMTP(close)>>
12:14:25 48240 set_process_info: 48240 delivering 1k7GcL-000CY0-IH: just=20
tried electron.ecohosting.co.uk [185.53.59.181]:465 for #REDACTED#@gmail.co=
m:=20
result DEFER
12:14:25 48240 added retry item for T:electron.ecohosting.co.uk:
185.53.59.181:465: errno=3D-37 more_errno=3D0,A flags=3D2

It appears that exim is failing to make a TCP Fast Open connection to the=20
remote server. In fact it looks like it doesn't even try. Wireshark shows=20
that it closes the connection immediately after receiving the first respons=
e=20
from the remote server.

No.     Time           Source                Destination           Protocol=
=20
Length Info
      1 0.000000000    192.168.1.13          185.53.59.181         TCP     =
=20
78     54794 =E2=86=92 465 [SYN] Seq=3D0 Win=3D65535 Len=3D0 MSS=3D1460 WS=
=3D64 SACK_PERM=3D1=20
TSval=3D4153193768 TSecr=3D0 TFO=3DR

No.     Time           Source                Destination           Protocol=
=20
Length Info
      2 0.022693018    185.53.59.181         192.168.1.13          TCP     =
=20
74     465 =E2=86=92 54794 [SYN, ACK] Seq=3D0 Ack=3D1 Win=3D28960 Len=3D0 M=
SS=3D1452=20
SACK_PERM=3D1 TSval=3D2019435234 TSecr=3D4153193768 WS=3D128

No.     Time           Source                Destination           Protocol=
=20
Length Info
      3 0.022718023    192.168.1.13          185.53.59.181         TCP     =
=20
66     54794 =E2=86=92 465 [FIN, ACK] Seq=3D1 Ack=3D1 Win=3D66752 Len=3D0 T=
Sval=3D4153193791=20
TSecr=3D2019435234

This problem is not related to a specific remote server, I have tried routi=
ng=20
through mail servers run by four different service providers and see exactl=
y=20
the same behaviour.

But looking at a similar section from debug output for 4.93 it appears that=
=20
4.93 does not attempt to use TCP Fast Open even though hosts_try_fastopen i=
s=20
set to *.

12:00:36 44920 Connecting to electron.ecohosting.co.uk [185.53.59.181]:465=
=20
=2E.. 185.53.59.181 in hosts_try_fastopen? yes (matched "*")
12:00:36 44920  non-TFO mode connection attempt to 185.53.59.181, 0 data
12:00:36 44920  connected
12:00:36 44920  =E2=95=ADconsidering: $primary_hostname
12:00:36 44920  =E2=94=9C=E2=94=80=E2=94=80expanding: $primary_hostname
12:00:36 44920  =E2=95=B0=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80resul=
t: curlew.milibyte.co.uk
12:00:36 44920 185.53.59.181 in hosts_avoid_esmtp? no (option unset)
12:00:36 44920 185.53.59.181 in hosts_require_ocsp? no (option unset)
12:00:36 44920 185.53.59.181 in hosts_request_ocsp? yes (matched "*")
12:00:36 44920 setting SSL CTX options: 0x42004000
12:00:36 44920 Diffie-Hellman initialized from default with 2048-bit prime
12:00:36 44920 Initialized TLS
12:00:36 44920 185.53.59.181 in tls_verify_hosts? no (option unset)
12:00:36 44920 185.53.59.181 in tls_try_verify_hosts? yes (matched "*")
12:00:36 44920 tls_verify_certificates: system
12:00:36 44920 185.53.59.181 in tls_verify_cert_hostnames? yes (matched "*")
12:00:36 44920 Cert hostname to check: "electron.ecohosting.co.uk"
12:00:36 44920 Calling SSL_connect
12:00:36 44920 SSL_connect: before SSL initialization
12:00:36 44920 SSL_connect: SSLv3/TLS write client hello
12:00:36 44920 SSL_connect: SSLv3/TLS write client hello
12:00:36 44920 SSL_connect: SSLv3/TLS read server hello
12:00:36 44920 SSL verify ok: depth=3D2 SN=3D/O=3DDigital Signature Trust C=
o./
CN=3DDST Root CA X3
12:00:36 44920 SSL verify ok: depth=3D1 SN=3D/C=3DUS/O=3DLet's Encrypt/CN=
=3DLet's=20
Encrypt Authority X3
12:00:36 44920 SSL authenticated verify ok: depth=3D0 SN=3D/
CN=3Delectron.ecohosting.co.uk
12:00:36 44920 SSL_connect: SSLv3/TLS read server certificate
12:00:36 44920 SSL_connect: SSLv3/TLS read server key exchange
12:00:36 44920 Received TLS status callback (OCSP stapling):
12:00:36 44920  null
12:00:36 44920 SSL_connect: SSLv3/TLS read server done

So as an experiment I removed the default value of * for hosts_try_fastopen=
=20
for 4.94 by reseting it in the transport section of the configuration file =
f:

remote_auth_smtp:=20
 driver =3D smtp=20
 protocol =3D smtps=20
 hosts_require_auth =3D $host_address=20
 hosts_require_tls =3D $host_address=20
 hosts_try_fastopen =3D

This appears to have 'fixed' the problem with exim 4.94 but leaves various=
=20
questions in my mind:

1) exim 4.93 appears to not use TCP Fast Open despite the default value of=
=20
hosts_try_fastopen implying that it should use it. Was this a bug or a=20
feature?

2) exim 4.94 attempts to respect the default value of hosts_try_fastopen an=
d=20
tries to use TFO but fails. Should it not fall back to not using TFO after =
a=20
failure instead of aborting and is this a bug?

3) Is TFO supported by FreeBSD 12? The exim documentation states "On (at=20
least some) current Linux distributions the facility must be enabled in the=
=20
kernel by the sysadmin before the support is usable." - https://www.exim.or=
g/
exim-html-current/doc/html/spec_html/ch-the_smtp_transport.html .
Would this also apply for FreeBSD though I assume it is supported because=20
sysctl shows net.inet.tcp.fastopen.client_enable: 1

3) Is there some error in my configuration which was not sufficient to caus=
e=20
a problem with 4.93 but causes problems with 4.94? Since web searches haven=
't=20
produced details of anyone else suffering from this problem then it could b=
e=20
quite likely that I've done something wrong.

=2D-=20
Mike Clarke





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15996093.ofW7TPSIZ6>