Date: Mon, 22 Jan 2024 10:13:30 +0000 From: David Chisnall <theraven@FreeBSD.org> To: Alan Somers <asomers@FreeBSD.org> Cc: George Mitchell <george+freebsd@m5p.com>, freebsd-hackers@freebsd.org Subject: Re: The Case for Rust (in the base system) Message-ID: <C5FC83ED-25BC-44AF-BD20-E0E5F5BC64FE@FreeBSD.org> In-Reply-To: <CAOtMX2hppfdu5ypDdGpfw_QDcd1rwJEeyVfSk9ogFEm7CiV6Kw@mail.gmail.com> References: <CAOtMX2hAUiWdGPtpaCJLPZB%2Bj2yzNw5DSjUmkwTi%2B%2BmyemehCA@mail.gmail.com> <1673801705774097@mail.yandex.ru> <CANCZdfpqWgvV_RCvVO_pvTrmajQFspW%2BQ9TM_Ok3JrXZAfeAfA@mail.gmail.com> <ef4ad207-5899-42b6-8728-bc46f1417e9e@antonovs.family> <202401210751.40L7pWEF011188@critter.freebsd.dk> <40bc1694-ee00-431b-866e-396e9d5c07a2@m5p.com> <CAOtMX2hppfdu5ypDdGpfw_QDcd1rwJEeyVfSk9ogFEm7CiV6Kw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 21 Jan 2024, at 16:04, Alan Somers <asomers@FreeBSD.org> wrote: >=20 > Perhaps it will. But Like David Chisnall, I'm afraid that if FreeBSD = never > modernizes, then it itself will go out of fashion by the 2040s. Apparently I=E2=80=99m participating in this thread already. I=E2=80=99m = getting over a nasty cold and my head is full of cotton wool, so = apologies in advance if this is more rambling than normal: I hope it=E2=80=99s no surprise to anyone that I am in favour of = languages that give stronger guarantees to programmers and let you think = more abut the problems. I can=E2=80=99t imagine going back to writing = anything non-trivial in a language without RAII or a rich set of generic = collections. To give a bit of personal background: In my previous role, I was one of = the coauthors of the internal strategy document that argued for safe = languages at Microsoft. Our rough recommendation was: - No new C code. There are *always* better options. - C++ code should follow the Core Guidelines and use static analysis. = New C++ code is acceptable in projects that are already C/C++ and need = to incrementally improve. - Rust in new projects that need a systems programming language. - Managed languages anywhere where a systems language is not needed = (i.e. most places). Between modern C++ with static analysers and Rust, there was a small = safety delta. The recommendation was primarily based on a human-factors = decision: it=E2=80=99s far easier to prevent people from committing code = that doesn=E2=80=99t compile than it is to prevent them from committing = code that raises static analysis warnings. If a project isn=E2=80=99t = doing pre-merge static analysis, it=E2=80=99s basically impossible. = Between using modern C++ (even just smart pointers and ranges) and C, = there is an enormous safety delta. =20 The unstable Rust ecosystem was less of an issue for Microsoft because = they had a large compiler team and were happy to maintain security = back-ports of any critical crates. The same software supply chain = things applied for Rust as everything else: no random pulling from = Cargo, dependencies need to be cloned internally and run through a load = of compliance things. That=E2=80=99s probably the only sensible way of = interacting with the Rust ecosystem. For userspace, I=E2=80=99d love to see FreeBSD more actively support the = cap-std project in Rust, which makes it incredibly easy to write Rust = programs that play nicely with Capsicum. It=E2=80=99s unclear to me that now is the right time to support Rust in = the base system, because there=E2=80=99s still a lot of churn. Facebook = has effectively forked Rust because their (huge) Rust codebase doesn=E2=80= =99t build with newer compilers. If you=E2=80=99re Microsoft or = Facebook, maintaining an old Rust compiler for a few years and = back-porting things to work with that language snapshot is a cost that = may be worth paying. I don=E2=80=99t think the FreeBSD project has the = resources to do so. A limited set of dependencies may work. There are a few caveats about Rust: First, it=E2=80=99s quite hard to find competent Rust developers. Here = are the OpenHub stats on new F/OSS code being written in Rust, C, and = C++: = https://openhub.net/languages/compare?language_name%5B%5D=3Dc&language_nam= e%5B%5D=3Dcpp&language_name%5B%5D=3Drust&language_name%5B%5D=3D-1&language= _name%5B%5D=3D-1&measure=3Dloc_changed C++ has been slowly trending up, and C down, for the last decade. Rust = is trending up a lot, but it=E2=80=99s starting from zero and there=E2=80=99= s still a lot more C or C++ code being written than Rust. It=E2=80=99s = now easier to hire systems programmers to write C++ than C, and easier = to hire either than to hire good Rust programmers. This tradeoff may be = very different for an open source project because there are a lot of = *very* enthusiastic Rust developers and attracting a dozen or two of = them to contribute would be a huge win. People tend to be less = enthusiastic about C or C++. Most of the new kernels written in the last 20 years have been C++, most = of the new kernels written in the last four years have been Rust. Make = of that what you will. Neither Rust nor C++ guarantee safety. C++ can always escape to bare = pointers (it=E2=80=99s code smell, but it=E2=80=99s sometimes = unavoidable). Rust has unsafe and requires it for any data structure = that isn=E2=80=99t a tree (either directly or via some existing code = such as the RC / ARC traits). One of our concerns was the degree to = which the different uses of unsafe in various Rust crates compose. = There was a paper a couple of years ago that found a lot of = vulnerabilities from this composition. I don=E2=80=99t personally have = a great deal of faith that unique ownership at an object level with a = load of heuristics about when it=E2=80=99s safe to alias is the right = long-term model. Verona went a very different way and I hope Rust may = be able to retrofit our ideas at some point. =20 One project that I worked with, for example, was bitten by the fact that = unsafe in Rust means =E2=80=98I promise to follow all of the Rust rules, = you just can=E2=80=99t mechanically check them=E2=80=99. It read a = value from an MMIO register into a variable typed as an enumeration. = Outside of the unsafe block, it then checked that the value was in = range. Rust enumerations are type safe and so the compiler helpfully = elided this check. Moving the check into the unsafe block fixed it, but = ran counter to the generic =E2=80=98put as little in unsafe blocks as = humanly possible=E2=80=99 advice that is given for Rust programmers. When I looked at a couple of hobbyist kernels written in Rust, they had = trivial security vulnerabilities due to not sanitising system call = arguments. This was depressing because both Rust and C++ make it = trivial to wrap userspace pointers in a smart pointer type that does the = checks automatically. =20 In snmalloc, for example, we use C++ templates to express the lifecycle = of memory throughout its allocation flow. This would also be possible = in Rust, but isn=E2=80=99t free in either language: you have to use the = tools provided, but the outcome is that we can statically check a lot of = properties at compile time. With one of my other hats, I am the maintainer of an RTOS that is = written in C++ and runs on a platform where the hardware enforces = spatial and temporal memory safety. To date, I don=E2=80=99t believe = we=E2=80=99ve had any bugs that would have been prevented by Rust. All = of the memory-safety bugs (we have had some, and we catch them fairly = easily because they lead to traps and so are easy to add tests for) have = been in code that=E2=80=99s doing intrinsically unsafe things (memory = allocators, for example). We use C++20, with moderately heavy use of = concepts. We have a ring buffer implementation that uses a mixture of = static_asserts and templates to verify the wrapping behaviour at compile = time and that=E2=80=99s just one example of a place where we do a lot of = compile-time checks that are impossible in C. I=E2=80=99d also like to clear up a few misunderstandings about C++: - The Itanium C++ ABI has been stable for 20+ years. C++ shared = libraries compiled with clang and linked against those compiled with GCC = (or vice versa), or different versions of the same compiler has been = standard practice for a long time. Both libstdc++ and libc++ use inner = namespaces for the standard-library types and so allow something like = symbol versioning but exposed at the language level. You can see ABI = breaks if one library uses a newer version of a type and the other an = older one, but that=E2=80=99s why we only bump those forward on major = releases: C++ DSOs compiled for FreeBSD 13 may not link with binaries = compiled for FreeBSD 14. - Command-line argument parsing and JSON are not part of the C++ = standard library, but there are de-facto standards. Nlohmann JSON[1] = and CLI11[2] are widely used (it=E2=80=99s been a long time since I=E2=80=99= ve seen a project that used anything else) and have very easy-to-use = interfaces. I believe (I am a member of the C++ standards committee, = but I only recently joined and have not participated in discussions = around this) that a big part of the reason it isn=E2=80=99t in the core = specification is that there is a de-facto standard and there=E2=80=99s = little urgency in adding it to the core. Finally, one of the key things that we found was that a lot of projects = used C/C++ out of inertia. They don=E2=80=99t have peak memory or = sub-millisecond-latency constraints and could easily be written in a = managed language, often even in an interpreted one. We have Lua in the = base system. I=E2=80=99d love to see a richer set of things exposed to = Lua. I played a bit with a kqueue wrapper using Sol2[3] that lets you = write Lua coroutines and have them implicitly yield on blocking = operations. =20 I=E2=80=99d love to see a generic process manager in the base system = that subsumes devd and inetd written in Lua, with C++ wrappers around = pdfork (ideally pdvfork, but it doesn=E2=80=99t exist yet) and friends, = exposed via sol2. The code in C++ is dealing directly with low-level = system interfaces and would not be safer in Rust, but all of the parsing = and control-plane logic can live in a safe GC=E2=80=99d language. You = can run a lot of Lua code in the time it takes one fork call to execute. If we exposed type info from dynamic sysctls generically (I think = there=E2=80=99s a project working on this?) then things like sysstat = could be written in Lua. I was experimenting with Dear ImGui for this, = since it had back ends that rendered in X11, Wayland, in a terminal, or = remotely over a websocket. Unfortunately, the latter two were never = merged and are probably unmaintained (the author is also the person = behind llama.cpp and so probably isn=E2=80=99t going to work on it for a = while). Being able to run management tools in a terminal and click on a = URL to open them in the web browser would be amazing, but doesn=E2=80=99t = require a new systems programming language. I=E2=80=99d love to see a default that anything intended to run with = elevated privilege is written in Lua. David [1] https://github.com/nlohmann/json [2] https://github.com/CLIUtils/CLI11 [3] https://sol2.readthedocs.io/=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C5FC83ED-25BC-44AF-BD20-E0E5F5BC64FE>