Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 6 Jul 2024 20:36:09 +0200
From:      John Hay <john@sanren.ac.za>
To:        freebsd-hackers@freebsd.org
Subject:   ns8250: UART FCR is broken, message might be misleading
Message-ID:  <CAGv8uarKqo8rm0rg6kHksYb0v=OHAWZ_bZcamn_aezrWXBmJrA@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
--000000000000eb39e8061c987479
Content-Type: text/plain; charset="UTF-8"

Hi,

I have 3 machines running FreeBSD 14.0. (I have upgraded one to 14.1
recently). All 3 have a uart with a GPS behind it, but no program reading
from the uart. I see the "ns8250: UART FCR is broken" on all of them on
average a little less than one per day.

For example, the one machine has an uptime of 48 days. In that time the
message was printed 44 times, but there was 12846 overruns according to the
below sysctl:

dev.uart.2.rx_overruns: 12846

If the FCR was really broken, I would have expected the message to be
printed for every overrun.

The 16550d documentation that I could find on the internet has this about
Bit 1 of the Fifo Control Register (FCR):

Bit 1 Writing a 1 to FCR1 clears all bytes in the RCVR FIFO
and resets its counter logic to 0 The shift register is not
cleared The 1 that is written to this bit position is self-clear-ing

So what I think is happening is that occasionally when the RCVR FIFO is
cleared, a character is almost received and between the RCVR is cleared and
LSR bit LSR_RXRDY is checked, the new character is there.

The piece of code in ns8250_flush() looks like this:
<snip>
        uart_setreg(bas, REG_FCR, fcr);
        uart_barrier(bas);

        /*
         * Detect and work around emulated UARTs which don't implement the
         * FCR register; on these systems we need to drain the FIFO since
         * the flush we request doesn't happen.  One such system is the
         * Firecracker VMM, aka. the rust-vmm/vm-superio emulation code:
         * https://github.com/rust-vmm/vm-superio/issues/83
         */
        lsr = uart_getreg(bas, REG_LSR);
        if (((lsr & LSR_TEMT) == 0) && (what & UART_FLUSH_TRANSMITTER))
                drain |= UART_DRAIN_TRANSMITTER;
        if ((lsr & LSR_RXRDY) && (what & UART_FLUSH_RECEIVER))
                drain |= UART_DRAIN_RECEIVER;
        if (drain != 0) {
                printf("ns8250: UART FCR is broken\n");
                ns8250_drain(bas, drain);
        }
</snip>

So how to distinguish between a real FCR error and this case? Maybe if
ns8250_drain() returned the number of bytes it drained instead and it
returned one, then it isn't an FCR error. Currently ns8250_drain() returns
0 on no error or EIO if there is a hardware problem. Maybe that can be
changed to return -EIO and handled properly where its return value is used?

Note that these uarts are implemented on Xilinx/AMD FPGAs using the v2.0 IP
in this link, but I do think it can probably happen on other 16x50 uarts
too. https://docs.amd.com/v/u/en-US/pg143-axi-uart16550

Regards

John

--000000000000eb39e8061c987479
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hi,</div><div><br></div><div>I have 3 machines runnin=
g FreeBSD 14.0. (I have upgraded one to 14.1 recently). All 3 have a uart w=
ith a GPS behind it, but no program reading from the uart. I see the &quot;=
ns8250: UART FCR is broken&quot; on all of them on average a little less th=
an one per day.</div><div><br></div><div>For example, the one machine has a=
n uptime of 48 days. In that time the message was printed 44 times, but the=
re was 12846 overruns according to the below sysctl:</div><div><br></div><d=
iv>dev.uart.2.rx_overruns: 12846</div><div><br></div><div>If the FCR was re=
ally broken, I would have expected the message to be printed for every over=
run.</div><div><br></div><div>The 16550d documentation that I could find on=
 the internet has this about Bit 1 of the Fifo Control Register (FCR):</div=
><div><br></div><div>Bit 1 Writing a 1 to FCR1 clears all bytes in the RCVR=
 FIFO<br>and resets its counter logic to 0 The shift register is not<br>cle=
ared The 1 that is written to this bit position is self-clear-ing</div><div=
><br></div><div>So what I think is happening is that occasionally when the =
RCVR FIFO is cleared, a character is almost received and between the RCVR i=
s cleared and LSR bit LSR_RXRDY is checked, the new character is there.</di=
v><div><br></div><div>The piece of code in ns8250_flush() looks like this:<=
/div><div>&lt;snip&gt;<br></div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 uart_setre=
g(bas, REG_FCR, fcr);<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 uart_barrier(bas);<br>=
<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 /*<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* D=
etect and work around emulated UARTs which don&#39;t implement the<br>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* FCR register; on these systems we need to =
drain the FIFO since<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* the flush we re=
quest doesn&#39;t happen.=C2=A0 One such system is the<br>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0* Firecracker VMM, aka. the rust-vmm/vm-superio emulation =
code:<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* <a href=3D"https://github.com/=
rust-vmm/vm-superio/issues/83">https://github.com/rust-vmm/vm-superio/issue=
s/83</a><br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*/<br>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 lsr =3D uart_getreg(bas, REG_LSR);<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 if=
 (((lsr &amp; LSR_TEMT) =3D=3D 0) &amp;&amp; (what &amp; UART_FLUSH_TRANSMI=
TTER))<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 drain |=
=3D UART_DRAIN_TRANSMITTER;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 if ((lsr &amp; L=
SR_RXRDY) &amp;&amp; (what &amp; UART_FLUSH_RECEIVER))<br>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 drain |=3D UART_DRAIN_RECEIVER;<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (drain !=3D 0) {<br>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 printf(&quot;ns8250: UART FCR is broken\n&q=
uot;);<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ns8250_dr=
ain(bas, drain);<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 }</div><div>&lt;/snip&gt;</=
div><div><br></div><div>So how to distinguish between a real FCR error and =
this case? Maybe if ns8250_drain() returned the number of bytes it drained =
instead and it returned one, then it isn&#39;t an FCR error. Currently ns82=
50_drain() returns 0 on no error or EIO if there is a hardware problem. May=
be that can be changed to return -EIO and handled properly where its return=
 value is used?</div><div><br></div><div>Note that these uarts are implemen=
ted on Xilinx/AMD FPGAs using the v2.0 IP in this link, but I do think it c=
an probably happen on other 16x50 uarts too. <a href=3D"https://docs.amd.co=
m/v/u/en-US/pg143-axi-uart16550">https://docs.amd.com/v/u/en-US/pg143-axi-u=
art16550</a></div><div><br></div><div>Regards</div><div><br></div><div>John=
</div><div><br></div><div><br></div><div><br></div></div>

--000000000000eb39e8061c987479--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGv8uarKqo8rm0rg6kHksYb0v=OHAWZ_bZcamn_aezrWXBmJrA>