Date: Tue, 13 Aug 2002 21:30:08 +0300 From: Ruslan Ermilov <ru@FreeBSD.ORG> To: Bakul Shah <bakul@bitblocks.com> Cc: Luigi Rizzo <rizzo@icir.org>, net@FreeBSD.ORG Subject: Re: Consistency of cached routes Message-ID: <20020813183008.GA29747@sunbay.com> In-Reply-To: <200208131722.NAA21497@wellington.cnchost.com> References: <20020813021333.A4507@iguana.icir.org> <200208131722.NAA21497@wellington.cnchost.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--J/dobhs11T7y2rNN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Aug 13, 2002 at 10:22:19AM -0700, Bakul Shah wrote: > > ip_forward, ipflow and TCP/UDP sockets cache a copy > > of the result of route lookups, but these entries might be > > out-of-date when a routing update is performed. Ruslan just MFC'ed > > a fix to invalidate the (one-entry) cache in ip_forward, but > > the other two can still be inconsistent. >=20 > If a host route to ADDR is cloned from a route to ADDR/n and > a route to ADDR/n+k is added, the cloned route to ADDR must > be deleted. This is already taken care of in rt_fixchange() > -- see the last comment in this function. The next time a > pkt to ADDR is delivered, a new ADDR route will be cloned > from ADDR/n+k. >=20 Correct. > TCP/UDP sockets cache a host route. If it is a cloned route > it will get the above treatment. So the issue is really only > for cached net routes that are less specific than a newly > added route. Does ipflow cache those? >=20 Everything that is allocated using rtalloc_ign(, RTF_PRCLONING) should be taken care of (or a raw form using rtalloc1()). > > ok, so i guess it is time to instrument route lookups and see > > how expensive they are, and sort out what is the best way to > > solve the tradeoff between caching and potential inconsistencies. >=20 > Any incosistency needs to be fixed. >=20 > > The timestamp idea is good because it has constant cost, though > > on a box with many routing updates it might completely defeat > > the cache. >=20 > You can use a long long counter as a virtual timestamp. Even > if you allow a million route changes per second, this number > won't roll over for almost 300000 years! But this will cause > all cached entries to be flushed any time the route table was > updated + every cache use just got a little more expensive. > On the other hand now you can afford to keep a bigger cache. > Caching just one forwarding entry doesn't make a lot of sense > for a router. >=20 > > Ideas anyone ? This might be a problem with a known efficient solution. >=20 > I just thought of a simple way. >=20 > When a route to ADDR/n is added, for all routes to ADD/n-k, > (k in 0,1..n) with refcnt > 0, *move* the route entry! That > is, allocate a new data structure, do a memcpy from old to > new and move its sole route table reference from the old > entry to the new entry. Mark the original rtentry data > structure as down. It still exists but is not accessible > from the route table; it is accessible only from its cached > references. Upon next use of such a ref it will be found to > be down, so its refcount gets decremented (and the structure > deleted when refcnt drops to 0) and a new rtalloc will be > done. >=20 Hmm, it has a non-obvious impact on rt_gwroute's, but the idea sounds promising. > Second, this entirely avoids the need to walk the whole tree > on addition/deletion/change of any route. >=20 It was never an intent. > Third, we can dispense with the silliness of cloned routes > alltogether and go back to keeping a separate host route > cache. >=20 Stevens says, in TCP/IP Ill. Vol 2 in section 18.12: : The Patricia tree data structure is well suited to routing tables. : Routing table lookups occur much more frequently than adding or : deleting routes, so from a performance standpoint using Patricia : trees for the routing table makes sense. Patricia trees provide : fast lookups at the expense of additional work in adding and : deleting. Measurements in [Sklower 1991] comparing the radix : tree approach to the Net/1 hash table show that the radix tree : method is about two times faster in building a test tree and : four times faster in searching. Cheers, --=20 Ruslan Ermilov Sysadmin and DBA, ru@sunbay.com Sunbay Software AG, ru@FreeBSD.org FreeBSD committer, +380.652.512.251 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age --J/dobhs11T7y2rNN Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (FreeBSD) iD8DBQE9WVAwUkv4P6juNwoRAt9CAJ9FVrSbs+lD7yce5lyRYo0f5RzMagCfRaG7 K0v5v0ICyXrzknwSiPHTTpc= =7Gn+ -----END PGP SIGNATURE----- --J/dobhs11T7y2rNN-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020813183008.GA29747>