Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Jan 2024 10:13:30 +0000
From:      David Chisnall <theraven@FreeBSD.org>
To:        Alan Somers <asomers@FreeBSD.org>
Cc:        George Mitchell <george+freebsd@m5p.com>, freebsd-hackers@freebsd.org
Subject:   Re: The Case for Rust (in the base system)
Message-ID:  <C5FC83ED-25BC-44AF-BD20-E0E5F5BC64FE@FreeBSD.org>
In-Reply-To: <CAOtMX2hppfdu5ypDdGpfw_QDcd1rwJEeyVfSk9ogFEm7CiV6Kw@mail.gmail.com>
References:  <CAOtMX2hAUiWdGPtpaCJLPZB%2Bj2yzNw5DSjUmkwTi%2B%2BmyemehCA@mail.gmail.com> <1673801705774097@mail.yandex.ru> <CANCZdfpqWgvV_RCvVO_pvTrmajQFspW%2BQ9TM_Ok3JrXZAfeAfA@mail.gmail.com> <ef4ad207-5899-42b6-8728-bc46f1417e9e@antonovs.family> <202401210751.40L7pWEF011188@critter.freebsd.dk> <40bc1694-ee00-431b-866e-396e9d5c07a2@m5p.com> <CAOtMX2hppfdu5ypDdGpfw_QDcd1rwJEeyVfSk9ogFEm7CiV6Kw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 21 Jan 2024, at 16:04, Alan Somers <asomers@FreeBSD.org> wrote:
>=20
> Perhaps it will.  But Like David Chisnall, I'm afraid that if FreeBSD =
never
> modernizes, then it itself will go out of fashion by the 2040s.

Apparently I=E2=80=99m participating in this thread already.  I=E2=80=99m =
getting over a nasty cold and my head is full of cotton wool, so =
apologies in advance if this is more rambling than normal:

I hope it=E2=80=99s no surprise to anyone that I am in favour of =
languages that give stronger guarantees to programmers and let you think =
more abut the problems.  I can=E2=80=99t imagine going back to writing =
anything non-trivial in a language without RAII or a rich set of generic =
collections.

To give a bit of personal background: In my previous role, I was one of =
the coauthors of the internal strategy document that argued for safe =
languages at Microsoft.  Our rough recommendation was:

 - No new C code.  There are *always* better options.
 - C++ code should follow the Core Guidelines and use static analysis.  =
New C++ code is acceptable in projects that are already C/C++ and need =
to incrementally improve.
 - Rust in new projects that need a systems programming language.
 - Managed languages anywhere where a systems language is not needed =
(i.e. most places).

Between modern C++ with static analysers and Rust, there was a small =
safety delta.  The recommendation was primarily based on a human-factors =
decision: it=E2=80=99s far easier to prevent people from committing code =
that doesn=E2=80=99t compile than it is to prevent them from committing =
code that raises static analysis warnings.  If a project isn=E2=80=99t =
doing pre-merge static analysis, it=E2=80=99s basically impossible.  =
Between using modern C++ (even just smart pointers and ranges) and C, =
there is an enormous safety delta. =20

The unstable Rust ecosystem was less of an issue for Microsoft because =
they had a large compiler team and were happy to maintain security =
back-ports of any critical crates.  The same software supply chain =
things applied for Rust as everything else: no random pulling from =
Cargo, dependencies need to be cloned internally and run through a load =
of compliance things.  That=E2=80=99s probably the only sensible way of =
interacting with the Rust ecosystem.

For userspace, I=E2=80=99d love to see FreeBSD more actively support the =
cap-std project in Rust, which makes it incredibly easy to write Rust =
programs that play nicely with Capsicum.

It=E2=80=99s unclear to me that now is the right time to support Rust in =
the base system, because there=E2=80=99s still a lot of churn.  Facebook =
has effectively forked Rust because their (huge) Rust codebase doesn=E2=80=
=99t build with newer compilers.  If you=E2=80=99re Microsoft or =
Facebook, maintaining an old Rust compiler for a few years and =
back-porting things to work with that language snapshot is a cost that =
may be worth paying.  I don=E2=80=99t think the FreeBSD project has the =
resources to do so.  A limited set of dependencies may work.


There are a few caveats about Rust:

First, it=E2=80=99s quite hard to find competent Rust developers.  Here =
are the OpenHub stats on new F/OSS code being written in Rust, C, and =
C++:

=
https://openhub.net/languages/compare?language_name%5B%5D=3Dc&language_nam=
e%5B%5D=3Dcpp&language_name%5B%5D=3Drust&language_name%5B%5D=3D-1&language=
_name%5B%5D=3D-1&measure=3Dloc_changed

C++ has been slowly trending up, and C down, for the last decade.  Rust =
is trending up a lot, but it=E2=80=99s starting from zero and there=E2=80=99=
s still a lot more C or C++ code being written than Rust.  It=E2=80=99s =
now easier to hire systems programmers to write C++ than C, and easier =
to hire either than to hire good Rust programmers.  This tradeoff may be =
very different for an open source project because there are a lot of =
*very* enthusiastic Rust developers and attracting a dozen or two of =
them to contribute would be a huge win.  People tend to be less =
enthusiastic about C or C++.

Most of the new kernels written in the last 20 years have been C++, most =
of the new kernels written in the last four years have been Rust.  Make =
of that what you will.

Neither Rust nor C++ guarantee safety.  C++ can always escape to bare =
pointers (it=E2=80=99s code smell, but it=E2=80=99s sometimes =
unavoidable).  Rust has unsafe and requires it for any data structure =
that isn=E2=80=99t a tree (either directly or via some existing code =
such as the RC / ARC traits).  One of our concerns was the degree to =
which the different uses of unsafe in various Rust crates compose.  =
There was a paper a couple of years ago that found a lot of =
vulnerabilities from this composition.  I don=E2=80=99t personally have =
a great deal of faith that unique ownership at an object level with a =
load of heuristics about when it=E2=80=99s safe to alias is the right =
long-term model.  Verona went a very different way and I hope Rust may =
be able to retrofit our ideas at some point. =20

One project that I worked with, for example, was bitten by the fact that =
unsafe in Rust means =E2=80=98I promise to follow all of the Rust rules, =
you just can=E2=80=99t mechanically check them=E2=80=99.  It read a =
value from an MMIO register into a variable typed as an enumeration.  =
Outside of the unsafe block, it then checked that the value was in =
range.  Rust enumerations are type safe and so the compiler helpfully =
elided this check.  Moving the check into the unsafe block fixed it, but =
ran counter to the generic =E2=80=98put as little in unsafe blocks as =
humanly possible=E2=80=99 advice that is given for Rust programmers.

When I looked at a couple of hobbyist kernels written in Rust, they had =
trivial security vulnerabilities due to not sanitising system call =
arguments.  This was depressing because both Rust and C++ make it =
trivial to wrap userspace pointers in a smart pointer type that does the =
checks automatically. =20

In snmalloc, for example, we use C++ templates to express the lifecycle =
of memory throughout its allocation flow.  This would also be possible =
in Rust, but isn=E2=80=99t free in either language: you have to use the =
tools provided, but the outcome is that we can statically check a lot of =
properties at compile time.

With one of my other hats, I am the maintainer of an RTOS that is =
written in C++ and runs on a platform where the hardware enforces =
spatial and temporal memory safety.  To date, I don=E2=80=99t believe =
we=E2=80=99ve had any bugs that would have been prevented by Rust.  All =
of the memory-safety bugs (we have had some, and we catch them fairly =
easily because they lead to traps and so are easy to add tests for) have =
been in code that=E2=80=99s doing intrinsically unsafe things (memory =
allocators, for example).  We use C++20, with moderately heavy use of =
concepts.  We have a ring buffer implementation that uses a mixture of =
static_asserts and templates to verify the wrapping behaviour at compile =
time and that=E2=80=99s just one example of a place where we do a lot of =
compile-time checks that are impossible in C.

I=E2=80=99d also like to clear up a few misunderstandings about C++:

 - The Itanium C++ ABI has been stable for 20+ years.  C++ shared =
libraries compiled with clang and linked against those compiled with GCC =
(or vice versa), or different versions of the same compiler has been =
standard practice for a long time.  Both libstdc++ and libc++ use inner =
namespaces for the standard-library types and so allow something like =
symbol versioning but exposed at the language level.  You can see ABI =
breaks if one library uses a newer version of a type and the other an =
older one, but that=E2=80=99s why we only bump those forward on major =
releases: C++ DSOs compiled for FreeBSD 13 may not link with binaries =
compiled for FreeBSD 14.

 - Command-line argument parsing and JSON are not part of the C++ =
standard library, but there are de-facto standards.  Nlohmann JSON[1] =
and CLI11[2] are widely used (it=E2=80=99s been a long time since I=E2=80=99=
ve seen a project that used anything else) and have very easy-to-use =
interfaces.  I believe (I am a member of the C++ standards committee, =
but I only recently joined and have not participated in discussions =
around this) that a big part of the reason it isn=E2=80=99t in the core =
specification is that there is a de-facto standard and there=E2=80=99s =
little urgency in adding it to the core.




Finally, one of the key things that we found was that a lot of projects =
used C/C++ out of inertia.  They don=E2=80=99t have peak memory or =
sub-millisecond-latency constraints and could easily be written in a =
managed language, often even in an interpreted one.  We have Lua in the =
base system.  I=E2=80=99d love to see a richer set of things exposed to =
Lua.  I played a bit with a kqueue wrapper using Sol2[3] that lets you =
write Lua coroutines and have them implicitly yield on blocking =
operations. =20

I=E2=80=99d love to see a generic process manager in the base system =
that subsumes devd and inetd written in Lua, with C++ wrappers around =
pdfork (ideally pdvfork, but it doesn=E2=80=99t exist yet) and friends, =
exposed via sol2.  The code in C++ is dealing directly with low-level =
system interfaces and would not be safer in Rust, but all of the parsing =
and control-plane logic can live in a safe GC=E2=80=99d language.  You =
can run a lot of Lua code in the time it takes one fork call to execute.

If we exposed type info from dynamic sysctls generically (I think =
there=E2=80=99s a project working on this?) then things like sysstat =
could be written in Lua.  I was experimenting with Dear ImGui for this, =
since it had back ends that rendered in X11, Wayland, in a terminal, or =
remotely over a websocket.  Unfortunately, the latter two were never =
merged and are probably unmaintained (the author is also the person =
behind llama.cpp and so probably isn=E2=80=99t going to work on it for a =
while).  Being able to run management tools in a terminal and click on a =
URL to open them in the web browser would be amazing, but doesn=E2=80=99t =
require a new systems programming language.

I=E2=80=99d love to see a default that anything intended to run with =
elevated privilege is written in Lua.

David

[1] https://github.com/nlohmann/json
[2] https://github.com/CLIUtils/CLI11
[3] https://sol2.readthedocs.io/=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C5FC83ED-25BC-44AF-BD20-E0E5F5BC64FE>