From owner-freebsd-net@FreeBSD.ORG Wed Dec 24 16:51:25 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A119106564A; Wed, 24 Dec 2008 16:51:25 +0000 (UTC) (envelope-from bms@incunabulum.net) Received: from out1.smtp.messagingengine.com (out1.smtp.messagingengine.com [66.111.4.25]) by mx1.freebsd.org (Postfix) with ESMTP id 283AC8FC17; Wed, 24 Dec 2008 16:51:25 +0000 (UTC) (envelope-from bms@incunabulum.net) Received: from compute2.internal (compute2.internal [10.202.2.42]) by out1.messagingengine.com (Postfix) with ESMTP id 9072E1F3649; Wed, 24 Dec 2008 11:51:24 -0500 (EST) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute2.internal (MEProxy); Wed, 24 Dec 2008 11:51:24 -0500 X-Sasl-enc: cdgzc09GBHMMJ1XzNM6gy771myLCD7x+jl8l60PbP4ec 1230137484 Received: from anglepoise.lon.incunabulum.net (82-35-112-254.cable.ubr07.dals.blueyonder.co.uk [82.35.112.254]) by mail.messagingengine.com (Postfix) with ESMTPSA id 53C7AA651; Wed, 24 Dec 2008 11:51:23 -0500 (EST) Message-ID: <4952688A.3050707@incunabulum.net> Date: Wed, 24 Dec 2008 16:51:22 +0000 From: Bruce Simpson User-Agent: Thunderbird 2.0.0.18 (X11/20081204) MIME-Version: 1.0 To: Ian FREISLICH References: <49524131.7010700@incunabulum.net> <494FAFAC.90802@FreeBSD.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Gerald Pfeifer , Vladimir Grebenschikov , Kip Macy , Qing Li , freebsd-net@freebsd.org, freebsd-current@freebsd.org, Sergey Matveychuk Subject: Re: HEADSUP: arp-v2 has been committed X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Dec 2008 16:51:25 -0000 Hi Ian, Well, yuletide and new year is a good time to clean out the cupboards, so... without further ado... Ian FREISLICH wrote: > ... >> Do you have applications which do not explicitly specify the interface >> address to use for multicast group joins? >> >> If they do not, that's a bug in the application -- IPv4 and IPv6 >> multicast *requires* that a link be specified somehow, either using the >> new APIs which take an ifindex, or an IPv4 "primary address". >> > > quagga does specify the ifindex passed in a struct ip_mreqn in the > imr_ifindex member to setsockopt, which reading the documentation > should be sufficient, yet it is not. I have checked that it does > set the correct ifindex. Setting the IP address in the imr_address > member of the same struct correctly chooses the interface. > I seem to remember there was some breakage in Quagga after the introduction of in_mcast.c over 18 months ago. I did make a patch available for using the new MCAST_JOIN APIs for the Rhyolite.com routed, but for whatever reason, the necessary changes didn't get incorporated upstream into Quagga. This is despite their reference platform, Linux, having supported these APIs for many years. The APIs in question are well covered in the literature (UNIX Network Programming 3e) and RFCs which were published over 3 years ago, it seems regrettable for Quagga to not have incorporated/tested these changes -- despite Microsoft Windows, Linux and MacOS X having supported the APIs for some time. Of course, that said, I'm not a Quagga developer, and have no formal relationship, fiscal or professional, with that project. My door has always been open to guide and advise how to fix their code to move with the times, and I certainly don't bill for advice given with warmth. The impression I got from their response to this was that they perhaps saw me as someone throwing a rulebook at them, which is hardly the case, I just want IP multicast to work properly across the board. > >> Unfortunately there has been historical breakage in the multicast APIs. >> There are some apps which run before all interfaces have been ifconfig'd >> up in the system, and they need to create multicast sockets. >> >> The kernel behaviour you describe is historical and I had to reintroduce >> it to avoid breaking such applications. It is a kludge which we probably >> can't retire until their developers fix their multicast apps to be aware >> of multiple interfaces on the system. >> > > Is this the BSD struct ip_mreq hack? This particular code isn't > using that. > No. This is when applications issue IP_ADD_MEMBERSHIP on a socket whilst specifying an IP address of 0.0.0.0 (i.e. INADDR_ANY). This is officially unsupported behaviour. What most implementations try to do, is to treat this as meaning "I want this socket to join the group on the interface pointing towards the default route". If there is no default route, then the first multicast capable interface in the system is used. Please see src/sys/netinet/in_mcast.c change history for full details. I'm sure you can understand this leads to comedy of errors if there are multiple default routes, and FreeBSD is fast hurtling towards equal-cost multipathing support. Unfortunately apps which want to run before IPv4 addresses have been configured still try to do this. This was accepted practice before IGMP was further formalized as a protocol, but it stems from an apparent misunderstanding of how IPv4 multicasting actually works. This isn't a problem with IPv6, because MLDv1 and MLDv2 specs both require that the link-scoped address is used for multicast group memberships. Most implementations, including FreeBSD's, will choose the first IPv4 address ('primary address') configured on the link as the source address for all IGMP control traffic. Removing the address selected for such traffic will break IGMP and lead to inconsistent membership reports being received by the upstream IGMP querier. There is currently no clean way to deal with the IGMP endpoint address problem in the FreeBSD implementation. I believe from memory that Linux has the same issues. If you look at the Microsoft implementation, Dave Thaler and his team did a lot of good work on bringing Windows up to speed with where their stack needed to be. There are reasons why Windows is being used as a multicast media delivery platform where Linux and FreeBSD aren't, and that's just one of them. Whilst I would love to fix it all, time resources and motivation are finite, and developer focus tends to go where the bread-and-butter is -- that's just how the world is. thanks, BMS