Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Aug 2010 13:34:45 +0800
From:      "MAI JIN" <Jin.Mai@alcatel-sbell.com.cn>
To:        <freebsd-net@freebsd.org>
Subject:   RE: HELP. FreeBSD 8.1 polling issue
Message-ID:  <1DB91DF937A4544C81E636468B91C21C0728EA30@CNSHGSMBS03.ad4.ad.alcatel.com>

next in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
Hi,

I got a freeBSD 8.1 polling issue on my PC. It is a dual-core Intel
Pentium x86 PC (2.8GHz each core). The Ethernet interface is Broadcom
NetXtreme 57xx Gigabit Ethernet interface.
I set the following options (enable polling and zero-buffer copy) and
rebuilt the kernel:

Code:
# To make an SMP kernel, the next two lines are needed
options         SMP                     # Symmetric MultiProcessor
Kernel
device          apic                    # I/O APIC

options DEVICE_POLLING # Open Polling
options HZ=1000
options ZERO_COPY_SOCKETS
The following were appended to the /etc/sysctl.conf

Code:
kern.polling.enable=1
# increase BPF buffer to 10M
net.bpf.bufsize=10485760
net.bpf.maxbufsize=10485760
kern.polling.idle_poll=1
kern.polling.burst_max=1000
After installed and rebooted the system, kern.polling.enable was not
found in MIB so I had to ignore this error. Looks like
kern.polling.enable is removed from FreeBSD v8.1?
Everything looked good so build my application to received data from
another HP server. I wrote the application using libpcap-1.1.1 with BFP
zero-copy turned on (I found the #define HAVE_ZEROCOPY_BPF 1 in
config.h). Attached please find the source code of my application.

Before running the application, I set the following parameters:

Code:
ifconfig bge0 polling     # This will turn on the polling of the
Broadcom driver.
Code:
sysctl -w net.bpf.bufsize=10485760 
sysctl -w net.bpf.maxbufsize=10485760
sysctl -w kern.polling.idle_poll=1
sysctl -w kern.polling.burst_max=1000
sysctl -w kern.polling.each_burst=128
sysctl -w net.inet.ip.intr_queue_maxlen=256
Then I ran the application to receive data from the HP server. I ran
multiple iperf on the HP server to send around 133Mbits/s UDP load to
the PC under test. The UDP payload size was 47 bytes. The entire IP
packet size is 76 bytes.

First of all, the receiving application worked well and received around
205K packets/second without packet losing (I checked the receiving
status using pcap_stats). However, after 2 minutes, the application can
not received data any more. The packets/second is 0. I ran the ping from
the PC under test and found that the ping reporting timeout and
destination unreachable (the ping from HP to the PC also failed). Looked
like the link between the HP server and PC was broken so the application
could receive data. No packet was dropped. Then I restart the bge0
interface using: ifconfig bge0 down && ifconfig bge0 up

And then I re-ran the application and it continued receiving data. But
after 1 or 2 minutes, the link broke again. I think it was my
application that caused the bge0 interface down. I started the tcpdump
and it worked well without breaking the link. 

I tried to increase the kern.polling.each_burst from 128 to 500 but the
application would cause the bge0 down within 1 minute. No packet was
dropped before the link was down.

I checked the CPU usage of the PC. The sys used is around 90% (might be
caused by kern.polling.idle_poll=1), user land is 13%. 
I don't understand why the application would break the bge0.

I tried changing the parameters:
options HZ=2000

sysctl -w net.bpf.bufsize=20485760 
sysctl -w net.bpf.maxbufsize=20485760
sysctl -w kern.polling.idle_poll=1
sysctl -w kern.polling.burst_max=10000
sysctl -w kern.polling.each_burst=5000

The performance was better: I got 307K packet/second (the HP server
sended around 250Mbits/s, my PC got 200Mbits/s). But after 2 minutes,
the bge0 was down again. 

Could anybody have a look at this issue? How can  <<cap.cpp>> I optimize
the performance of the polling?

Thanks,
Jin 

 

Best regards
===========================
Jin 
Alcatel Shanghai Bell (Nanjing) Co. Ltd.
Alcatel-Net: 2735-5011 
Tel: (+86)-25-8473 1240-5011
Addr: 11F, Yangtse River Tech Park. 
           Building No.40 of Nanchang Road, 
           Gulou District, Nanjing, China
Zip: 210037
jin.mai@Alcatel-sbell.com.cn
ASB/MoAD/RDR/BSR APL
 


[-- Attachment #2 --]
#include </root/pcap/include/pcap.h>
 #include <unistd.h>
#include <sys/types.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#ifndef WIN32
#include <sys/wait.h>
#include <sys/resource.h>
#include <errno.h>
#endif /* WIN32 */
//#include <net/if.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <net/ethernet.h>
#include <netinet/if_ether.h>
#include <netinet/ip.h>
#include <netinet/udp.h>

typedef int STATUS;
typedef unsigned char uchar_t;

#define ALARM_SLEEP             5

#define IP_RECV_MAX_VLAN        4096
#define IP_RECV_TIMEOUT 0
#define IP_REASSEMBLE_TIMEOUT 60
#define IP_RECV_BUF_SIZE    4096
#define MAX_FILTER_SIZE 512
#define IP_RECV_MAX_PACKET_SIZE         65536

#define ERROR -1
#define OK 0


#define ERROR_MAJOR_CLASS printf
#define TRACE_WARNING_CLASS printf

#define IP_RECV_TIME_THOUSAND   1000
#define IP_RECV_TIME_MILLION    (1000000)

#ifndef ETHERTYPE_VLAN
#define ETHERTYPE_VLAN          0x8100  /* IEEE 802.1Q VLAN tagging */
#endif

#define MAX_AAL2PATH_NUM      512
#define AAL2PATH_START_PORT   8192

pcap_t *m_pd = NULL;
int m_datalink = DLT_EN10MB;
int m_snaplen = IP_RECV_BUF_SIZE;

unsigned long m_pktCount = 0;
unsigned long m_pktLen = 0;

in_addr_t m_ip = 0;

void my_sigalarm(int sig) ;

typedef struct pcap_stat mystat; 
 
mystat actualStat; /* allocate memory for mystat on stack - you can also do it on the heap by malloc-ing */ 
mystat *mystatp = &actualStat; /* use allocated memory */ 
 

STATUS processIpPacket(const struct pcap_pkthdr *h, size_t len, const uchar_t *pkt)
{
    static uchar_t new_pkt[IP_RECV_MAX_PACKET_SIZE];
    unsigned new_len = sizeof(new_pkt);
    int is_fragment = 0;
    const struct ip *ip;
    unsigned x = 0;
    unsigned proto = 0;
    struct in_addr ip_dst;
    struct in_addr ip_src;
    const uchar_t *orig_pkt = pkt;
    size_t orig_len = len;
    unsigned frag_hdr_offset = 0;

    if (len < sizeof(struct ip))
        return ERROR;

    ip = (const struct ip *) pkt;
    if (ip->ip_v != IPVERSION)
        return ERROR;

    proto = ip->ip_p;

    memcpy(&ip_dst, &ip->ip_dst, sizeof(struct in_addr));
    memcpy(&ip_src, &ip->ip_src, sizeof(struct in_addr));
 
    m_pktCount++;
    m_pktLen += len;

    x = ip->ip_hl << 2;
    if (len <= x)
        return ERROR;

    pkt += x;
    len -= x;

    x = ntohs(ip->ip_off);
    is_fragment = (x & IP_OFFMASK) != 0 || (x & IP_MF) != 0;

        if (is_fragment) 
        {
        }
    else
    {
                return OK;
    }
    return ERROR;
}


STATUS cleanup()
{
    if (m_pd)
            pcap_breakloop(m_pd);
        /*
         * We don't have "pcap_breakloop()"; this isn't safe, but
         * it's the best we can do.  Print the summary if we're
         * not reading from a savefile - i.e., if we're doing a
         * live capture - and exit.
         */
        pcap_close(m_pd);
    return OK;
}



STATUS receiveData(const struct pcap_pkthdr *h, const uchar_t *buf)
{
        size_t len = h->caplen;
    unsigned etype=0 , vlan;
    const uchar_t *pkt = buf;

    switch (m_datalink)
    {
        case DLT_EN10MB:
        {
                const struct ether_header *ether;

                if (len < ETHER_HDR_LEN)
                        return ERROR;
                ether = (const struct ether_header *) pkt;
                etype = ntohs(ether->ether_type);
                pkt += ETHER_HDR_LEN;
                len -= ETHER_HDR_LEN;
                if (etype == ETHERTYPE_VLAN) {
                        if (len < 4)
                                return ERROR;
                        vlan = ntohs(*(const uint16_t *) pkt);
                        pkt += 2;
                        len -= 2;
                        if (vlan < 1 || vlan > IP_RECV_MAX_VLAN)
                                return ERROR;
                        etype = ntohs(*(const uint16_t *) pkt);
                        pkt += 2;
                        len -= 2;
               }
         
/*
    m_pktCount++;
    m_pktLen += len;
*/
        break;
        }

        case DLT_RAW:
        {
             etype = ETHERTYPE_IP;
             break;
        }

        case DLT_NULL: {
                unsigned x;

                if (len < sizeof(int32_t))
                        return ERROR;
                x = *(const uint32_t *)pkt;
                if (x == PF_INET)
                        etype = ETHERTYPE_IP;
/*
                else if (x == PF_INET6)
                        etype = ETHERTYPE_IPV6;
*/
                else
                        return ERROR;
                pkt += sizeof(int32_t);
                len -= sizeof(int32_t);
                break;
            }
        
        //Not ethernet frame
        default:
          return OK;
    }
     
    if (etype != ETHERTYPE_IP || len <= sizeof(struct ip)) //we receive IPv4 packrt only
    {
        return ERROR;
    }
    return processIpPacket(h, len, pkt);
}

void packetHandler(uchar_t *user, const struct pcap_pkthdr *h, const uchar_t *sp)
{
   receiveData(h, sp);
}



void printMsgInHex(const uchar_t *buf, int len)
{
#define NTL_MAX_MSG_DUMP_LINE 50

        char    log[8192] = {0};
        char     *ptr = log;
        int     i;

        ptr += sprintf(ptr, "[%4d] ",0);
        for (i = 0; i < len; i++)
        {
                ptr += sprintf(ptr, "%02X ", buf[i]);
                if ( (i+1) % NTL_MAX_MSG_DUMP_LINE == 0 )  /* FeedLine */
                {
                        //ptr += sprintf(ptr, "\n");
                        ptr = log;
                        printf("%s\n", log);
                        ptr += sprintf(ptr, "[%4d] ",i+1);
                }
        }
        if( i % NTL_MAX_MSG_DUMP_LINE != 0 )   /* Print Remaing Bytes */
        {
                //ptr += sprintf(ptr, "\n");
                ptr = log;
        }
        printf("%s\n", log);
        return;
}


STATUS start()
{
        register int cnt, i;
        uint32_t localnet, netmask;
        register char *cmdbuf = "udp and dst host 192.168.6.111 and dst portrange 8192-8500", *device;
        //register char *cmdbuf = "udp and dst host 192.168.6.111 and dst portrange 8192-8500", *device;
        int type;
        struct bpf_program fcode;
//        sighandler_t oldhandler;
        char ebuf[PCAP_ERRBUF_SIZE];
        int status;

        cnt = -1; //loop for ever
        device = "bge0";
    
        if (device[0] == '\0') {
                device = pcap_lookupdev(ebuf);
                if (device == NULL)
                {
                    ERROR_MAJOR_CLASS("ERROR: No network interface to receive IP packets. %s", ebuf);
                    return ERROR;
                }
        }

        if (m_pd)
           pcap_close(m_pd);
        

            *ebuf = '\0';

            printf("Openging capture on %s\n", device);

            m_pd = pcap_open_live(device, IP_RECV_BUF_SIZE, 0, IP_RECV_TIMEOUT, ebuf);
            if (m_pd == NULL){
                ERROR_MAJOR_CLASS("ERROR: cannot open %s to read IP packets. %s", device, ebuf);
                return ERROR;
            }
            else if (*ebuf){
                TRACE_WARNING_CLASS("%s", ebuf);
            }

            /*
             * Let user own process after socket has been opened.
             */

         m_datalink = pcap_datalink(m_pd);
         if (m_datalink != DLT_EN10MB && m_datalink != DLT_RAW){
                    ERROR_MAJOR_CLASS("Datalink %s is not one of the DLTs supported by this device %s. Only DLT_EN10MB and DLT_RAW supported currently.\n",
                          pcap_datalink_val_to_name(m_datalink), device);
                    return ERROR;
         }

        i = pcap_snapshot(m_pd);
        if (m_snaplen < i) {
                TRACE_WARNING_CLASS("snaplen raised from %d to %d", m_snaplen, i);
                m_snaplen = i;
        }
        if (pcap_lookupnet(device, &localnet, &netmask, ebuf) < 0) {
                localnet = 0;
                netmask = 0;
                TRACE_WARNING_CLASS("%s", ebuf);
        }
    
/*
        if (pcap_compile(m_pd, &fcode, cmdbuf, 0, netmask) < 0){
            ERROR_MAJOR_CLASS("%s", pcap_geterr(m_pd));
            return ERROR;
        }

 
        if (pcap_setfilter(m_pd, &fcode) < 0){
            ERROR_MAJOR_CLASS("%s", pcap_geterr(m_pd));
            return ERROR;
            }
*/

        type = pcap_datalink(m_pd);

    
        status = pcap_loop(m_pd, cnt, packetHandler, 0);
        if (status == -1) {
            /*
             * Error.  Report it.
             */
         ERROR_MAJOR_CLASS( "%s: pcap_loop exit: %s\n",
                device, pcap_geterr(m_pd));
        }
        pcap_close(m_pd);
        return (status == -1 ? ERROR : OK);
}


void my_sigalarm(int sig) {


  printf("Packets/S: %d,  Bits/S: %d (%f Mbits/S)\n",  m_pktCount/ALARM_SLEEP, m_pktLen*8/ALARM_SLEEP, ((float)(m_pktLen*8))/(ALARM_SLEEP*1024*1024));

  m_pktCount=0;
  m_pktLen=0;

/* Put the interface in statistics mode */
if(pcap_stats(m_pd, mystatp) < 0)
{
    fprintf(stderr,"\nError pcap_stats.\n");
}
else
{
    printf("Num of recv: %d, Num of drop: %d\n", mystatp->ps_recv, mystatp->ps_drop);
}
  alarm(ALARM_SLEEP);
  signal(SIGALRM, my_sigalarm);
}

int main()
{
    signal(SIGALRM, my_sigalarm);
    alarm(ALARM_SLEEP);
    return start();
}

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1DB91DF937A4544C81E636468B91C21C0728EA30>