Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Feb 2003 16:36:20 -0800 (PST)
From:      Scott Long <scottl@FreeBSD.org>
To:        current@freebsd.org
Subject:   5-STABLE Roadmap
Message-ID:  <200302140036.h1E0aK3q071051@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
All,

Thanks to the hard work of everyone, FreeBSD 5.0 became a reality and
is working better than most even hoped.  However, there is still a
lot of work to be done before we can create the RELENG_5/5-STABLE
branch and declare success.  Below is a document that I have drafted
with the input and review of the Release Engineering Team, the
Technical Review Board, and the Core Team that defines what needs to
be done in order to reach 5-STABLE.  I'm happy to take further input
into this, and I will also mark it up and make it available online.


The Roadmap for 5-STABLE

1.  Introduction and background
After nearly three years of work, FreeBSD 5.0 was released in January of
2003.  Features like the GEOM block layer, Mandatory Access Controls,
ACPI, sparc64 and ia64 platform support, and UFS snapshots, background
filesystem checks, and 64-bit inode sizes make it an exciting operating
system for both desktop and production users.  However, some important
features are not complete.  The foundations for fine-grained locking and
preemption in the kernel exist, but much more work is left to be done.
Work on Kernel Schedulable Entities, also known as Scheduler Activations,
has been ongoing but needs a push to realize its benefit.  Performance
compared to FreeBSD 4.x has declined and must be restored and surpassed.

This is somewhat similar to the situation that FreeBSD faced in the 3.x
series.  Work on 3-CURRENT trudged along seemingly forever, and finally
a cry was made to 'just ship it' and clean up later.  This decision
resulted in the 3.0 and 3.1 releases being very unsatisfying for most,
and it wasn't until 3.2 that the series was considered 'stable'.  To
make matters worse, the RELENG_3 branch was created along with the 3.0
release, and the HEAD branch was allowed to advance immediately towards
4-CURRENT.  This resulted in a quick divergence between HEAD and
RELENG_3, making maintenance of the RELENG_3 branch very difficult.
FreeBSD 2.2.8 was left for quite a while as the last production-quality
version of FreeBSD.

Our intent is to avoid repeating that scenario with FreeBSD 5.x.
Delaying the RELENG_5 branch until it is stable and production quality
will ensure that it stays maintainable and provides a compelling
reason to upgrade from 4.x,  To do this, we must identify the current
areas of weakness and set clear goals for resolving them.  This document
contains what we as the release engineering team feel are the milestones
and issues that must be resolved for the RELENG_5 branch.  It does not
dictate every aspect of FreeBSD development, and we welcome further
input.  Nothing that follows is meant to be a sleight against any person
or group, or to trivialize any work that has been done.  There are some
significant issues, though, that need decisive and unbiased action.

2.  Major issues
The state of SMPng and kernel lockdown is the biggest concern for 5.x.
To date, few major systems have come out from under the kernel-wide mutex
known as 'Giant'.  The SMP status page at http://www.FreeBSD.org/smp
provides a comprehensive breakdown of the overall SMPng status.  Status
specific to SMPng progress in deivce drivers can be found at at
httP//www.FreeBSD.org/projects/busdma.  In summary:

 - VM - the kmem_malloc(M_NOWAIT) path no longer needs Giant held.  The
   kmem_malloc(M_WAITOK) path is in progress and is expected to be
   finished in the coming weeks.  Other facets of the VM system, like
   the vfs interface, buffer/cache, etc, are largely untouched.
 - GEOM - The GEOM block layer was designed to run free of Giant, but at
   this time no block drivers can run without Giant.  Additionally, it has
   the potential to suffer performance loss due to its upcall/downcall
   data paths happening in kernel threads.  Lightweight context switches
   might help this.
 - Network - Work is in progress to lock the TCP and UDP portions of the
   stack.  This also includes locking the routing tree, ARP code, and ifaddr
   and inet data structures.  RawIP, IPv6, Appletalk, etc, have not been
   touched.  Locking the socket layer is in progress but is largely
   untested.  None of the hardware drivers have been locked.
 - VFS - Initial pre-cleanup started.
 - buffer/cache - Initial work complete.
 - Proc - Work on locking the proc structure was ongoing for a while but
   seems to have stalled.
 - CAM - No significant work has occurred on the CAM SCSI layer.
 - Newbus - some work has started on locking down the device_t structure.
 - Pipes - complete with the exception of VM-related optimizations.
 - File descriptors - complete.
 - Process accounting - jails, credentials, MAC labels, and scheduler are
   out from under Giant.
 - MAC Framework - complete
 - Timekeeping - complete
 - kernel encryption - crypto drivers and core crypto framework are
   Giant-free.  KAME IPsec and FAST IPSec have not been locked.
 - Sound subsystem - complete
 - kernel preemption - preemption for interrupt threads is enabled.
   However, contention due to Giant covering much of the kernel and
   most of the device driver interrupt routines causes excessive context
   switches and might actually be hurting performance.  Work is underway
   to explore ways to make preemption be conditional.

Another issue with SMPng is interrupt latency.  The overhead of
doing a complete context switch to a kernel interrupt thread is high and
shows noticeable latency.  Work is ongoing to implement lazy context
switching on all platforms.  Fine grained locking of drivers will also
help this, as will converting drivers to be as efficient as possible in
their interrupt routines.

Next, the state of KSE must resolved for RELENG_5.  Work on it has
slowed noticeably in the past 6 months but appears to be picking up
again.  There are a number of issues that must be addressed:
 - Signal delivery to threads is not defined.  Signals are delivered to
   the process, but which thread actually receives it is random.
 - There is confusion over whether upcalls are generated on every system
   call or when a thread blocks.  The former is highly undesirable and
   needs to be investigated.
 - The userland threading library, currently called libkse, is incomplete
   and has not been used for any significant threaded application.
 - KSE has the potential to uncover latent race conditions and create
   new ones.  An audit needs to be performed to ensure that no obvious
   problems exist.

According to the release schedule below, KSE kernel and userland
components must be functionality complete by June 2003 in order to be
included in the RELENG_5 branch.  For security and stability reasons,
if KSE cannot be finished in time then, by default, all KSE-specific
syscalls should be modified to return ENOSYS and all other KSE-specific
interfaces disabled.  Deprecating KSE from RELENG_5 but keeping it in the
HEAD branch will pose problems in porting bugfixes and features between
the two branches, so every effort should be made to finish it on time.

3.  Goals for 5-STABLE
The goals for the RELENG_5 branch point are:
 - All subsystems and interfaces must be mature enough to be maintainable
   for improvements and bug fixes
 - equal or better stability from FreeBSD 4.8.
 - no functional regressions from 4.8.  It is important to make sure that
   users do not avoid upgrading to 5.x because of lost functionality.
 - performance on par with FreeBSD 4.8 for most common operations.  Both UP
   and SMP configurations should be evaluated.  SMP has the potential to
   perform much better than 4.x, though for the purposes of creating the
   RELENG_5 branch, comparable performance between the two should be
   acceptable.

It is unrealistic to expect that the SMPng project will be fully complete by
RELENG_5, or that performance will be significantly better than 4.x. However,
focusing on a subset of the outstanding tasks will give enough benefit for
the branch to be viable and maintainable.  To break it down:

 - ABI/API/Infrastructure stability - Enough infrastructure must be in
   place and stable to allow fixes from HEAD to easily and safely be
   merged into RELENG_5.  Also, we must draw a line as to what subsystems
   are to be locked down when we go into 5-STABLE.
   - SMPng
     - VM - Most codepaths, others than the ones that interact with VFS,
       should be Giant-free for RELENG_5.
     - Network - Taking the network stack out from under Giant poses the
       risk of uncovering latent bugs and races.  Locking it down but not
       removing Giant imposes further performance penalties.  A decision
       on whether to continue with locking the network layers, and whether
       they should be free from Giant for RELENG_5 should be made no later
       than March 15.  If the decision is made to allow the locking to go
       forward, the IPv4, UDP, and TCP layers should be free of Giant.
       IPv6 and the socket layers would be nice to have also, though it
       should be investigated whether they can be safely locked down in 5.x
       after the RELENG_5 branch.  If the decision is to keep the network
       stack under Giant for the branch, then an investigation should be
       made to determine if the present locking work can be reverted and
       deferred to 6-CURRENT.
       Having a Giant-free path from the the TCP/IP layers to the hardware
       should be investigated as it could allow significant performance
       gains in the network benchmarks.  If this can be achieved then the
       hardware interface layer needs to allow for drivers to incrementally
       become free of Giant.  Locking down at least two Ethernet drivers
       would be highly desirable.  If the semantics are too complex to have
       the stack free of Giant but not the hardware drivers, investigation
       should be done into making it configurable.
       Lesser-used network stacks like netatlk, netipx, etc, should not
       break while this work is going on.  However, locking them is not a
       high priority.
     - GEOM - At least 2 block drivers should be locked in order to
       demonstrate that others can also be locked without changing the
       interface to GEOM.  The ATA driver is a good candidate for this,
       though caution should be taken as it is also extremely high-profile
       and any problems with it will affect nearly all users of FreeBSD.
     - Lazy context switching - sparc64 is the only platform that performs
       lazy context switching when entering the kernel.  The performance
       gains promised by this are significant enough to require that it be
       implemented for all other Tier 1 platforms.
   - KSE - The kernel side of KSE must be functionally complete and have
     undergone a security audit.  libkse must be complete enough to
     demonstrate a real-world application running correctly on it using
     the standard POSIX Threads API.  Examples would be apache 2.0, squid,
     and/or mozilla.  A functional regression test suite is also a
     requirement for RELENG_5 and should test signal delivery, scheduling,
     performance, and process security/credentials for both KSE and non-
     KSE processes.  KSE kernel and userland components must also reach
     the same level of functionality for all Tier-1 platforms in both UP
     and SMP configurations.  The definition of 'Tier-1 platforms' can be
     found at http://www.freebsd.org/doc/en_US.ISO8859-1/articles/committers-guide/archs.html.
   - busdma interface and drivers - architectures like PAE/i386 and sparc64
     which don't have a direct mapping between host memory address space
     and expansion bus address space require the elimination for vtophys()
     and friends.  The busdma interface was created to handle exactly this
     problem, but many drivers do not use it yet.  The busdma project at
     http://www.FreeBSD.org/projects/busdma/index.html tracks the progress
     of this and should be used to determine which drivers must be
     converted for RELENG_5 and which can be left behind.  Also, there has
     been talk by several developers and the original author to give the
     busdma interface a minor overhaul.  If this is to happen, it needs to
     happen before RELENG_5.  Otherwise, differences between the old and
     new API will make driver maintenance difficult.
   - PCI resource allocation - PC2003 compliance requires that x86 systems
     no longer configure PCI devices from the system BIOS, leaving this task
     soley to the OS.  FreeBSD must gain the ability to manage and allocate
     PCI memory resources on its own.  Implementing this should take into
     account cardbus, PCI-HotPlug, and laptop dockstation requirements.
     This feature will become increasingly critical through the lifetime of
     RELENG_5, and therefore is a requirement for the RELENG_5 branch.

 - Performance - most performance gains hinge on the progress of SMPng
   Areas that should be concentrated on are:
   - Storage I/O - I/O performance suffers from two problems, too many
     expensive context switches, and too much work being done in
     interrupt threads.  Specifically, it takes 3 context switches for
     most drivers to get from the hardware completion interrupt to
     unblocking the user process:  one for the interrupt thread, one for
     the GEOM g_up thread, and one to get back to the user thread.
     Drivers that attempt to be efficient and quick in their interrupt
     handlers (as all should be) usually also schedule a taskqueue, which
     adds a context switch in between the interrupt thread and the g_up
     thread and brings the total up to 4.  Two things need to be done to
     attack this:
     - make all drivers defer most of their processing out of their
       interrupt thread.  Significant performance gains have been shown 
       recently in the aac(4) driver by making its interrupt handler be
       'INTR_MPSAFE' and moving all processing to a taskqueue.
     - investigate eliminating the taskqueue context switch by adding a
       callback to the g_up thread that allows a driver to do its
       interrupt processing there instead of in the taskqueue.
   - Network - Network drivers suffer from the interrupt latency
     previously mentioned as well as from the network stack being
     partially locked down but not free from Giant.  Possible strategies
     for addressing this are described in the previous section.
   - Other locking - XXX ?

 - Benchmarks and performance testing - Having a source of reliable and
   useful benchmarks is essential to identifying performance problems
   and guarding against performance regressions.  A 'performance team'
   that is made up of people and resources for formulating, developing,
   and executing benchmark tests should be put into place soon.  
   Comparisons should be made against both FreeBSD 4.x and Linux 2.4.x.
   Tests to consider are:
    - the classic 'worldstone'
    - webstone - /usr/ports/www/webstone
    - Fstress - http://www.cs.duke.edu/ari/fstress
    - ApacheBench - /usr/ports/www/p5-ApacheBench
    - netperf - /usr/ports/benchmarks/netperf

 - Features:
   - ACPI - Intel's ACPI power management and device configuration
     subsystem has become an integral part of FreeBSD's x86 and ia64
     device configuration model.  However, many bugs exist in Intel's
     vendor code, our OS-specific code, and motherboard BIOSes, causing
     many ACPI-enabled systems to fail to boot, misdetect drivers, and/or
     have many other problems.  Fixing these problems seems to be an
     uphill battle and is often times causing a poor first-impression of
     FreeBSD 5.0.  Most x86 systems can function with ACPI disabled, and
     logic should be added to the bootloader and sysinstall to allow users
     to easily and intuitively turn it off.  Turning off ACPI by default
     is prone to problems also as many newer systems rely on it to
     provide correct interrupt routing information.  Also, a centralized
     resource should be created to track ACPI problems and solutions.
     Linux uses the same Intel vendor sources as FreeBSD, so we should
     investigate how they have handled some of the known problems.
   - NEWCARD/OLDCARD - The NEWCARD subsystem was made the default for 5.0.
     Unfortunately, it contains no support for non-Cardbus bridges and
     falls victim to interrupt routine problems on some laptops.  The
     classic 16-bit bridge support, OLDCARD, still exists and can be
     compiled in, but this is highly inconvenient for users of older
     laptops.  If OLDCARD cannot be completely deprecated for RELENG_5,
     then provisions must be made to allow users to easily install an
     OLDCARD-enabled kernel.  Documentation should be written to help
     trasition users from OLDCARD to NEWCARD and from 'pccardd' to 'devd'.
     The power management and 'dumpcis' functionality of pccardc(1) needs
     to be brought forward to work with NEWCARD, along with the ability to
     load CIS quirk entries.  Most of this functionality can be integrated
     into devd and devctl.
   - New scheduler framework - The new scheduler framework is in place, and
     users can select between the classic 44bsd scheduler and the new ULE
     scheduler.  A scheduler that demonstrates processor affinity,
     HyperThreading and KSE awareness, and no regressions in performance
     or interactivity characteristics must be available for RELENG_5.
   - sparc64 local console - neither syscons nor vt work on sparc64, leaving
     it with only serial and 'fake' OFW console support.  This is a major
     support hole for what is a Tier 1 platform.  Whether syscons can be
     shoe-horned in or wscons be adopted from NetBSD is up for debate.
     However, sparc64 must have local console support for RELENG_5.  Having
     this will also allow the XFree86 server to run, which is also a
     requirement for RELENG_5.
   - gcc/toolchain - gcc 3.3 might be available in time for RELENG_5 and
     might offer some attractive benefits, but also likely to introduce ABI
     incompatibility with prior gcc versions.  ABI compatibility should be
     locked down for the RELENG_5 branch.
     There has also been a request to move /usr/include/g++ to
     /usr/include/g++-v3 to be more compliant with the stock behavior of
     gcc.  This should be investigated for RELENG_5 also.
   - gdb - gdb from the base system should work for sparc64.  It should
     also understand KSE thread semantics, assuming that KSE is included
     in the RELENG_5 branch.  gdb 5.3 is available and there are reports
     that it should address the sparc64 issue.
   - disklabel(8) regressions - The biggest casualty of the introduction of
     GEOM appears to be the disklabel utility.  The '-r' option gives
     unpredictable results in most cases now and should be removed or fixed.
     Work is planned for a new unified interface for modifying labels and
     slices, however this should not preclude disklabel from being fixed.

 - Documentation:
   - The manual pages, Handbook, and FAQ should be free from content
     specific to FreeBSD 4.x, i.e. all text should be equally applicable to
     FreeBSD 5.x.  The installation section of the handbook needs the most
     work in this area.
   - The release documentation needs to be complete and accurate for all
     Tier 1 architectures.  The hardware notes and installation guides need
     specific attention.
   - If FreeBSD 5.1 is not the branch point for RELENG_5 then the Early
     Adopters Guide needs to be updated.  This document should then be
     removed just before the release closest to the RELENG_5 branch point.

4. Schedule
If branching RELENG_5 at the 5.1 release is paramount, 5.1 will probably
need to move out by at least 3 months.  The schedule would be:

 - Jun 30, 2003 - KSE and SMPng feature freeze
 - Aug  4, 2003 - 5.1-BETA, general code freeze
 - Aug 18, 2003 - 5.1-RC1, RELENG_5 and RELENG_5_1 branched
 - Aug 25, 2003 - 5.1-RC2
 - Sept 1, 2003 - 5.1-RELEASE

Taking an incremental approach might be more beneficial.  Releasing 5.1
in time for USENIX ATC 2003 will provide a wide audience for productive
feedback and will keep FreeBSD visible.  In this scenario, 5.1 should
offer a significant improvement over 5.0 in terms of bug fixes and
performance.  Lockdowns and improvements to the storage subsystem and
scheduler should be expected, the NEWCARD/OLDCARD issues should be
addressed, and all known bugs and regressions from the 5.0 errata list
should be fixed.  KSE and other SMPng tasks that cannot finish in time for
5.1 should also not reduce the stability of the release.  The schedule for
this would be:

 - May   5, 2003 - 5.1-BETA, general code freeze
 - May  19, 2003 - 5.1-RC1, RELENG_5_1 branched
 - May  27, 2003 - 5.1-RC2
 - Jun   2, 2003 - 5.1-RELEASE
 - Jun  30, 2003 - KSE and SMPng feature freeze
 - Sept  1, 2003 - 5.2-BETA, general code freeze
 - Sept 15, 2003 - 5.2-RC1, RELENG_5 and RELENG_5_2 branched
 - Sept 22, 2003 - 5.2-RC2
 - Sept 29, 2003 - 5.2-RELEASE

5. Post RELENG_5 direction
As with all -STABLE development streams, the focus should be bug fixes and
incremental improvements.  Just like normal, everything should be vetted
through the HEAD branch first and committed to RELENG_5 with caution.
As before, new device drivers, incremental features, etc, will be welcome
in the branch once they have been proven in HEAD.

Further SMPng lockdowns will be divided into two categories, driver and
subsystem.  The only subsystem that will be sufficiently locked down
for RELENG_5 will be GEOM, so incrementally locking down device drivers
under it is a worthy goal for the branch.  Full subsystem lockdowns will
have to be fully tested and proven in HEAD before consideration will be
given to merging them into RELENG_5.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200302140036.h1E0aK3q071051>