Date: Thu, 7 Jan 2010 09:54:31 +0000 (UTC) From: Luigi Rizzo <luigi@FreeBSD.org> To: src-committers@freebsd.org, svn-src-user@freebsd.org Subject: svn commit: r201728 - user/luigi/ipfw3-head/sys/netinet/ipfw Message-ID: <201001070954.o079sVvP013482@svn.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: luigi Date: Thu Jan 7 09:54:31 2010 New Revision: 201728 URL: http://svn.freebsd.org/changeset/base/201728 Log: add some temporary documentation Added: user/luigi/ipfw3-head/sys/netinet/ipfw/dummynet.txt (contents, props changed) Added: user/luigi/ipfw3-head/sys/netinet/ipfw/dummynet.txt ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ user/luigi/ipfw3-head/sys/netinet/ipfw/dummynet.txt Thu Jan 7 09:54:31 2010 (r201728) @@ -0,0 +1,756 @@ +Notes on the internal structure of dummynet (2010 version) +by Riccardo Panicucci and Luigi Rizzo + +********* +* INDEX * +********* +Implementation of new dummynet + Internal structure + Files +Packet arrival + The reconfiguration routine +dummynet_task() +Configuration + Add a pipe + Add a scheduler + Add a flowset +Listing object +Delete of object + Delete a pipe + Delete a flowset + Delete a scheduler +Compatibility with FreeBSD7.2 and FreeBSD 8 ipfw binary + ip_dummynet_glue.c + ip_fw_glue.c +How to configure dummynet +How to implement a new scheduler + + + + +Dummynet is a traffic shaper and network emulator. Packets are +selected by an external filter such as ipfw, and passed to the emulator +with a tag such as "pipe 10" or "queue 5" which tells what to +do with the packet. As an example + + ipfw add queue 5 icmp from 10.0.0.2 to all + +All packets with the same tag belong to a "flowset", or a set +of flows which can be further partitioned according to a mask. +Flowsets are then passed to a scheduler for processing. The +association of flowsets and schedulers is configurable e.g. + + ipfw queue 5 config sched 10 weight 3 flow_mask xxxx + ipfw queue 8 config sched 10 weight 1 ... + ipfw queue 3 config sched 20 weight 1 ... + +"sched 10" represents one or more scheduler instances, +selected through a mask on the 5-tuple itself. + + ipfw sched 20 config sched_mask yyy ... + +There are in fact two masks applied to each packet: ++ the "sched_mask" sends packets arriving to a scheduler_id to + one of many instances. ++ the "flow_mask" together with the flowset_id is used to + collect packets into independent flows on each scheduler. + +As an example, we can have + ipfw queue 5 config sched 10 flow_mask src-ip 0x000000ff + ipfw sched 10 config sched_mask src-ip 0xffffff00 + +means that sched 10 will have one instance per /24 source subnet, +and within that, each individual source will be a flow. + +Internal structure +----------------- +Dummynet-related data is split into five main strucs: + +- struct new_pipe: contains data about the physical link such + as bandwith, delay, fields to simulate a delay profile and so on. + +- struct new_fs: describes a flowset. It contains template values for the + specified flowset, and a pointer (alg_fs) to an opaque struct that + can contain per-flowset scheduler-specific parameters, such as + weight, priorities, slot sizes and the like. + It also contains a flow_mask to allow to create more queues + depending of the flow id of the packet. All queues are stored into the + scheduler instance. + +- struct new_sch: it acts as a template for the scheduler used. It contains + enqueue and dequeue packet functions, a configure function for + possible global parameters, and two functions to create and destroy + the scheduler instance. + A scheduler can have more scheduler instance: a field sched_mask is + used to know how many instance could exist for this scheduler. + This struct also contains an hash table of queues pointers + +- struct new_sch_inst: it is the struct that represents the instance of the + scheduler. It has a pointer to the template, and some general parameter + and status variable relative to the single instance. + It also contains all queues associated with this instance and the delay line, + which is a list of packets that will be sent after a certain amount of time. + +- struct new_queue: it contains all data belonging to a single queue, as + total bytes, number of packets dropped, list of packet... + It can have some extra data about the scheduling algorithm. + XXX this is one instance of a flowset ? + + + +----------------+ + +--------+ ptr_sched | | +----------+ + | new_fs |--------------->| new_sch |-------->| new_pipe | + `--------' | | +----------+ + | +----------------+ + | | ^ + | V \___ + | +------------+ | + | | hash table | | + | | (m1) | | + | +------------+ | + | | ..... \_ | + -----------+ ... v \ | + | new_queue |<-----| -------------- | | + +-----------+ | | | | | + .... | | new_sch_inst | | | + -----------+ | | | | | + | new_queue |<-----| -------------- | | + +-----------+ | v | + | .--------------. -------------- | + | | hash table | | | | + | | (m2) + |<-----| new_sch_inst |_/ + | | new_sch_inst | | | + | `--------------' -------------- + | ^ + | | + `---------------------------------' + +Note that the hash table m2 is not mandatory, a scheduler instance +can use its own struct to store its queues + +Three global data structures (hash tables) contain all +pipes, schedulers and flowsets. +- pipehash[x]: contains all pipes in the system +- schedulerhash[x]: contains all scheduler templates in the system +- flowsethash[x]: contains all flowset linked with a scheduler (or pipe). +Additionally, a list that contains all unlinked flowset: +- unlinkedflowset: contains flowset that are not linked with any scheduler +flowset are put in this list when they refer to a non +existing scheduler or pipe. + +Scheduler instances and the delay lines associated with pipes +need to be woken up at certain times. Because we have many +such objects, we keep them in a priority heap (system_heap). + +Almost all objects in this implementation are preceded +by a structure (struct dn_id) which makes it easier to +identify it. + + +Files +----- +New dummynet is split in several files. +Two headers, a file to implement the userland part and some files for the +kernel side are provided. +- ip_dummynet_s.h is the minimal header that is used to implement + a scheduler +- ip_dummynet.h is the main header, that includes the ip_dummynet_s.h and is + use by both kernel and user space. +- dummynet.c is the file used to implement the user side of dummynet. + It contains the function to parsing command line, and functions to + show the output of dummynet objects. +- ip_dummynet.c is the main files for the kernel side of dummynet. It contains + main functions for processing packets, some functions exported to help + writing new scheduler, and the handler for the various dummynet socket + options. +- ip_dummynet_config.c cointains functions to create and delete objects. +- ip_dummynet_get.c contains functions to prepare the buffer to pass to + userland to show object info. +Moreover, there are two new file (ip_dummynet_glue.c and ip_fw_glue.c) that +are used to allow compatibility with the "ipfw" binary from FreeBSd 7.2 and +FreeBSD 8. + + +Packet arrival +============== +A packet enter in dummynet process throught the function dummynet_io(). +When a packet arrives, first it is checked the flow set number and the flowset +is searched in the flowset list and if it isn't found the packet is dropped. +Dummynet uses a generation number to check if pointers are valid. The +flowset has a pointer to the scheduler that could be deleted, so in the +flowset struct there is a value that contain a number. If this number doesn't +match the generation number, a reconfigure routine is started to check if +pointers are valid. The generation number is incremented when a change in +the internal struct occurs (for example when a new object is inserted or +deleted). + +At this point, the pointer to the scheduler is valid, and the scheduler +instance is searched from all scheduler instance of the scheduler, depending +of the sched_mask field. If a scheduler instance isn't found, a new one is +created now. + +Now the enqueue() function of the scheduler is called and if the scheduler +instance was idle the dequeue() function is called now, else it will +be called by system later. After the dequeue(), the packet returned is moved +in the delay line and send after a delay depending on link parameters. +See ip_dummynet.c dummynet_io() function for more details + +The reconfiguretion routine +--------------------------- +The reconfigure routine is called by the system when the flowset ptr_sched_val +number differs from the generation number. This means that a new object is +inserted in the system or a object was deleted and no new packet belonging to +this flowset are yet arrived. + +The reconfigure routine first check if the scheduler pointed by the flowset is +the same that the scheduler located by the number. If so, and if the type are +the same, it means that the scheduler wasn't changed. Now is check if the pipe +exist, and eventually the pointer in the scheduler is updated. If scheduler +and pipe exists, the packet can be enqueued. + +If the scheduler type differs from flowset type, it means that the scheduler +has changed type so the flowset must be deleted and recreated. The pointer +are update and the packet can be enqueued. + +If the scheduler no longer exists, the flowset is remove from flowset list +and inserted in the unlinked flowset list, so that new packet are discarded +early. + +If scheduler or pipe don't exist,packet shoubl be dropped and the function +return 1 so that dummynet_io() can drop the packet. + +dummynet_task() +=============== +The dummynet_task() is the the main dummynet processing function and is +called every tick. This function first calculate the new current time, then +it checks if it is the time to wake up object from the system_heap comparing +the current time and the key of the heap. Two types of object (really the +heap contains pointer to objects) are in the +system_heap: + +- scheduler instance: if a scheduler instance is waked up, the dequeue() + function is called until it has credit. If the dequeue() returns packets, + the scheduler instance is inserted in the heap with a new key depending of + the data that will be send out. If the scheduler instance remains with + some credit, it means that is hasn't other packet to send and so the + instance is no longer inserted in the heap. + + If the scheduler instance extracted from the heap has the DELETE flag set, + the dequeue() is not called and the instance is destroyed now. + +- delay line: when extracting a delay line, the function transmit_event() is + called to send out packet from delay line. + + If the scheduler instance associated with this delay line doesn't exists, + the delay line will be delete now. + +Configuration +============= +To create a pipe, queue or scheduler, the user should type commands like: +"ipfw pipe x config" +"ipfw queue y config pipe x" +"ipfw pipe x config sched <type>" + +The userland side of dummynet will prepare a buffer contains data to pass to +kernel side. +The buffer contains all struct needed to configure an object. In more detail, +to configure a pipe all three structs (new_pipe, new_sch, new_fs) are needed, +plus the delay profile struct if the pipe has a delay profile. + +If configuring a scheduler only the struct new_sch is wrote in the buffer, +while if configuring a flowset only the new_fs struct is wrote. + +The first struct in the buffer contains the type of command request, that is +if it is configuring a pipe, a queue, or a scheduler. Then there are structs +need to configure the object, and finally there is the struct that mark +the end of the buffer. + +To support the insertion of pipe and queue using the old syntax, when adding +a pipe it's necessary to create a FIFO flowset and a FIFO scheduler, which +have a number x + DN_PIPEOFFSET. + +Add a pipe +---------- +A pipe is only a template for a link. +If the pipe already exists, parameters are updated. If a delay profile exists +it is deleted and a new one is created. +If the pipe doesn't exist a new one is created. After the creation, the +flowset unlinked list is scanned to see if there are some flowset that would +be linked with this pipe. If so, these flowset will be of wf2q+ type (for +compatibility) and a new wf2q+ scheduler is created now. + +Add a scheduler +--------------- +If the scheduler already exists, and the type and the mask are the same, the +scheduler is simply reconfigured calling the config_scheduler() scheduler +function with the RECONFIGURE flag active. +If the type or the mask differ, it is necessary to delete the old scheduler +and create a new one. +If the scheduler doesn't exists, a new one is created. If the scheduler has +a mask, the hash table is created to store pointers to scheduler instances. +When a new scheduler is created, it is necessary to scan the unlinked +flowset list to search eventually flowset that would be linked with this +scheduler number. If some are found, flowsets became of the type of this +scheduler and they are configured properly. + +Add a flowset +------------- +Flowset pointers are store in the system in two list. The unlinked flowset list +contains all flowset that aren't linked with a scheduler, the flowset list +contains flowset linked to a scheduler, and so they have a type. +When adding a new flowset, first it is checked if the flowset exists (that is, +it is in the flowset list) and if it doesn't exists a new flowset is created +and added to unlinked flowset list if the scheduler which the flowset would be +linked doesn't exists, or added in the flowset list and configured properly if +the scheduler exists. If the flowset (before to be created) was in the +unlinked flowset list, it is removed and deleted, and then recreated. +If the flowset exists, to allow reconfiguration of this flowset, the +scheduler number and types must match with the one in memory. If this isn't +so, the flowset is deleted and a new one will be created. Really, the flowset +it isn't deleted now, but it is removed from flowset list and it will be +deleted later because there could be some queues that are using it. + +Listing of object +================= +The user can request a list of object present in dummynet through the command +"ipfw [-v] pipe|queue [x] list|show" +The kernel side of dummynet send a buffer to user side that contains all +pipe, all scheduler, all flowset, plus all scheduler instances and all queues. +The dummynet user land will format the output and show only the relevant +information. +The buffer sent start with all pipe from the system. The entire struct new_pipe +is passed, except the delay_profile struct that is useless in user space. +After pipes, all flowset are wrote in the buffer. The struct contains +scheduler flowset specific data is linked with the flowset writing the +'obj' id of the extension into the 'alg_fs' pointer. +Then schedulers are wrote. If a scheduler has one or more scheduler instance, +these are linked to the parent scheduler writing the id of the parent in the +'ptr_sched' pointer. If a scheduler instance has queues, there are wrote in +the buffer and linked thorugh the 'obj' and 'sched_inst' pointer. +Finally, flowsets in the unlinked flowset list are write in the buffer, and +then a struct gen in saved in the buffer to mark the last struct in the buffer. + + +Delete of object +================ +An object is usually removed by user through a command like +"ipfw pipe|queue x delete". XXX sched? +ipfw pass to the kernel a struct gen that contains the type and the number +of the object to remove + +Delete of pipe x +---------------- +A pipe can be deleted by the user throught the command 'ipfw pipe x delete'. +To delete a pipe, the pipe is removed from the pipe list, and then deleted. +Also the scheduler associated with this pipe should be deleted. +For compatibility with old dummynet syntax, the associated FIFO scheduler and +FIFO flowset must be deleted. + +Delete of flowset x +------------------- +To remove a flowset, we must be sure that is no loger referenced by any object. +If the flowset to remove is in the unlinked flowset list, there is not any +issue, the flowset can be safely removed calling a free() (the flowset +extension is not yet created if the flowset is in this list). +If the flowset is in the flowset list, first we remove from it so new packet +are discarded when arrive. Next, the flowset is marked as delete. +Now we must check if some queue is using this flowset. +To do this, a counter (active_f) is provided. This counter indicate how many +queues exist using this flowset. +The active_f counter is automatically incremented when a queue is created +and decremented when a queue is deleted. +If the counter is 0, the flowset can be safely deleted, and the delete_alg_fs() +scheduler function is called before deallocate memory. +If the counter is not 0, the flowset remain in memory until the counter become +zero. When a queue is delete (by dn_delete_queue() function) it is checked if +the linked flowset is deleting and if so the counter is decrementing. If the +counter reaches 0, the flowset is deleted. +The deletion of a queue can be done only by the scheduler, or when the scheduler +is destroyed. + +Delete of scheduler x +--------------------- +To delete a scheduler we must be sure that any scheduler instance of this type +are in the system_heap. To do so, a counter (inst_counter) is provided. +This counter is managed by the system: it is incremented every time it is +inserted in the system_heap, and decremented every time it is extracted from it. +To delete the scheduler, first we remove it from the scheduler list, so new +packet are discarded when they arrive, and mark the scheduler as deleting. + +If the counter is 0, we can remove the scheduler safely calling the +really_deletescheduler() function. This function will scan all scheduler +instances and call the delete_scheduler_instance() function that will delete +the instance. When all instance are deleted, the scheduler template is +deleted calling the delete_scheduler_template(). If the delay line associate +with the scheduler is empty, it is deleted now, else it will be deleted when +it will became empy. +If the counter was not 0, we wait for it. Every time the dummynet_task() +function extract a scheduler from the system_heap, the counter is decremented. +If the scheduler has the delete flag enabled the dequeue() is not called and +delete_scheduler_instance() is called to delete the instance. +Obviously this scheduler instance is no loger inserted in the system_heap. +If the counter reaches 0, the delete_scheduler_template() function is called +all memory is released. +NOTE: Flowsets that belong to this scheduler are not deleted, so if a new + scheduler with the same number is inserted will use these flowsets. + To do so, the best approach would be insert these flowset in the + unlinked flowset list, but doing this now will be very expensive. + So flowsets will remain in memory and linked with a scheduler that no + longer exists until a packet belonging to this flowset arrives. When + this packet arrives, the reconfigure() function is called because the + generation number mismatch with one contains in the flowset and so + the flowset will be moved into the flowset unlinked list, or will be + linked with the new scheduler if a new one was created. + + +COMPATIBILITY WITH FREEBSD 7.2 AND FREEBSD 8 'IPFW' BINARY +========================================================== +Dummynet is not compatible with old ipfw binary because internal structs are +changed. Moreover, the old ipfw binary is not compatible with new kernels +because the struct that represents a firewall rule has changed. So, if a user +install a new kernel on a FreeBSD 7.2, the ipfw (and possibly many other +commands) will not work. +New dummynet uses a new socket option: IP_DUMMYNET3, used for both set and get. +The old option can be used to allow compatibility with the 'ipfw' binary of +older version (tested with 7.2 and 8.0) of FreeBSD. +Two file are provided for this purpose: +- ip_dummynet_glue.c translates old dummynet requests to the new ones, +- ip_fw_glue.c converts the rule format between 7.2 and 8 versions. +Let see in detail these two files. + +IP_DUMMYNET_GLUE.C +------------------ +The internal structs of new dummynet are very different from the original. +Because of there are some difference from between dummynet in FreeBSD 7.2 and +dummynet in FreeBSD 8 (the FreeBSD 8 version includes support to pipe delay +profile and burst option), I have to include both header files. I copied +the revision 191715 (for version 7.2) and the revision 196045 (for version 8) +and I appended a number to each struct to mark them. + +The main function of this file is ip_dummynet_compat() that is called by +ip_dn_ctl() when it receive a request of old socket option. + +A global variabile ('is7') store the version of 'ipfw' that FreeBSD is using. +This variable is set every time a request of configuration is done, because +with this request we receive a buffer of which size depending of ipfw version. +Because of in general the first action is a configuration, this variable is +usually set accordly. If the first action is a request of listing of pipes +or queues, the system cannot know the version of ipfw, and we suppose that +version 7.2 is used. If version is wrong, the output can be senseless, but +the application should not crash. + +There are four request for old dummynet: +- IP_DUMMYNET_FLUSH: the flush options have no parameter, so simply the + dummynet_flush() function is called; +- IP_DUMMYNET_DEL: the delete option need to be translate. + It is only necessary to extract the number and the type of the object + (pipe or queue) to delete from the buffer received and build a new struct + gen contains the right parameters, then call the delete_object() function; +- IP_DUMMYNET_CONFIGURE: the configure command receive a buffer depending of + the ipfw version. After the properly extraction of all data, that depends + by the ipfw version used, new structures are filled and then the dummynet + config_pipe() function is properly called. Note that the 7.2 version does + not support some parameter as burst or delay profile. +- IP_DUMMYNET_GET: The get command should send to the ipfw the correct buffer + depending of its version. There are two function that build the + corrected buffer, ip_dummynet_get7() and ip_dummynet_get8(). These + functions reproduce the buffer exactly as 'ipfw' expect. The only difference + is that the weight parameter for a queue is no loger sent by dummynet and so + it is set to 0. + Moreover, because of the internal structure has changed, the bucket size + of a queue could not be correct, because now all flowset share the hash + table. + If the version of ipfw is wrong, the output could be senseless or truncated, + but the application should not crash. + +IP_FW_GLUE.C +------------ +The ipfw binary also is used to add rules to FreeBSD firewall. Because of the +struct ip_fw is changed from FreeBsd 7.2 to FreeBSD 8, it is necessary +to write some glue code to allow use ipfw from FreeBSD 7.2 with the kernel +provided with FreeBSD 8. +This file contains two functions to convert a rule from FreeBSD 7.2 format to +FreeBSD 8 format, and viceversa. +The conversion should be done when a rule passes from userspace to kernel space +and viceversa. +I have to modify the ip_fw2.c file to manage these two case, and added a +variable (is7) to store the ipfw version used, using an approach like the +previous file: +- when a new rule is added (option IP_FW_ADD) the is7 variable is set if the + size of the rule received corrispond to FreeBSD 7.2 ipfw version. If so, the + rule is converted to version 8 calling the function convert_rule_to_8(). + Moreover, after the insertion of the rule, the rule is now reconverted to + version 7 because the ipfw binary will print it. +- when the user request a list of rules (option IP_FW_GET) the is7 variable + should be set correctly because we suppose that a configure command was done, + else we suppose that the FreeBSD version is 8. The function ipfw_getrules() + in ip_fw2.c file return all rules, eventually converted to version 7 (if + the is7 is set) to the ipfw binary. +The conversion of a rule is quite simple. The only difference between the +two structures (struct ip_fw) is that in the new there is a new field +(uint32_t id). So, I copy the entire rule in a buffer and the copy the rule in +the right position in the new (or old) struct. The size of commands are not +changed, and the copy is done into a cicle. + +How to configure dummynet +========================= +It is possible to configure dummynet through two main commands: +'ipfw pipe' and 'ipfw queue'. +To allow compatibility with old version, it is possible configure dummynet +using the old command syntax. Doing so, obviously, it is only possible to +configure a FIFO scheduler or a wf2q+ scheduler. +A new command, 'ipfw pipe x config sched <type>' is supported to add a new +scheduler to the system. + +- ipfw pipe x config ... + create a new pipe with the link parameters + create a new scheduler fifo (x + offset) + create a new flowset fifo (x + offset) + the mask is eventually stored in the FIFO scheduler + +- ipfw queue y config pipe x ... + create a new flowset y linked to sched x. + The type of flowset depends by the specified scheduler. + If the scheduler does not exist, this flowset is inserted in a special + list and will be not active. + If pipe x exists and sched does not exist, a new wf2q+ scheduler is + created and the flowset will be linked to this new scheduler (this is + done for compatibility with old syntax). + +- ipfw pipe x config sched <type> ... + create a new scheduler x of type <type>. + Search into the flowset unlinked list if there are some flowset that + should be linked with this new scheduler. + +- ipfw pipe x delete + delete the pipe x + delete the scheduler fifo (x + offset) + delete the scheduler x + delete the flowset fifo (x + offset) + +- ipfw queue x delete + delete the flowset x + +- ipfw sched x delete ///XXX + delete the scheduler x + +Follow now some examples to how configure dummynet: +- Ex1: + ipfw pipe 10 config bw 1M delay 15 // create a pipe with band and delay + A FIFO flowset and scheduler is + also created + ipfw queue 5 config pipe 10 weight 56 // create a flowset. This flowset + will be of wf2q+ because a pipe 10 + exists. Moreover, the wf2q+ + scheduler is created now. +- Ex2: + ipfw queue 5 config pipe 10 weight 56 // Create a flowset. Scheduler 10 + does not exist, so this flowset + is inserted in the unlinked + flowset list. + ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler. + Because of a flowset with 'pipe 10' exists, + a wf2q+ scheduler is created now and that + flowset is linked with this sceduler. + +- Ex3: + ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler. + ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to + pipe 10 + ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5. This flowset + will belong to scheduler 10 and + it is of type RR + +- Ex4: + ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to + pipe 10 (not exist yet) + ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler. + ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5.This flowset + will belong to scheduler 10 and + it is of type RR + ipfw pipe 10 config sched wf2q+ // Modify the type of scheduler 10. It + becomes a wf2q+ scheduler. + When a new packet of flowset 5 arrives, + the flowset 5 becomes to wf2q+ type. + +How to implement a new scheduler +================================ +In dummynet, a scheduler algorithm is represented by two main structs, some +functions and other minor structs. +- A struct new_sch_xyz (where xyz is the 'type' of scheduler algorithm + implemented) contains data relative to scheduler, as global parameter that + are common to all instances of the scheduler +- A struct new_sch_inst_xyz contains data relative to a single scheduler + instance, as local status variable depending for example by flows that + are linked with the scheduler, and so on. +To add a scheduler to dummynet, the user should type a command like: +'ipfw pipe x config sched <type> [mask ... ...]' +This command creates a new struct new_sch_xyz of type <type>, and +store the optional parameter in that struct. + +The parameter mask determines how many scheduler instance of this +scheduler may exist. For example, it is possible to divide traffic +depending on the source port (or destination, or ip address...), +so that every scheduler instance act as an independent scheduler. +If the mask is not set, all traffic goes to the same instance. + +When a packet arrives to a scheduler, the system search the corrected +scheduler instance, and if it does not exist it is created now (the +struct new_sch_inst_xyz is allocated by the system, and the scheduler +fills the field correctly). It is a task of the scheduler to create +the struct that contains all queues for a scheduler instance. +Dummynet provides some function to create an hash table to store +queues, but the schedule algorithm can choice the own struct. + +To link a flow to a scheduler, the user should type a command like: +'ipfw queue z config pipe x [mask... ...]' + +This command creates a new 'new_fs' struct that will be inserted +in the system. If the scheduler x exists, this flowset will be +linked to that scheduler and the flowset type become the same as +the scheduler type. At this point, the function create_alg_fs_xyz() +is called to allow store eventually parameter for the flowset that +depend by scheduler (for example the 'weight' parameter for a wf2q+ +scheduler, or some priority...). A parameter mask can be used for +a flowset. If the mask parameter is set, the scheduler instance can +separate packet according to its flow id (src and dst ip, ports...) +and assign it to a separate queue. This is done by the scheduler, +so it can ignore the mask if it wants. + +See now the two main structs: +struct new_sch_xyz { + struct gen g; /* important the name g */ + /* global params */ +}; +struct new_sch_inst_xyz { + struct gen g; /* important the name g */ + /* params of the instance */ +}; +It is important to embed the struct gen as first parameter. The struct gen +contains some values that the scheduler instance must fill (the 'type' of +scheduler, the 'len' of the struct...) +The function create_scheduler_xyz() should be implemented to initialize global +parameters in the first struct, and if memory allocation is done it is +mandatory to implement the delete_scheduler_template() function to free that +memory. +The function create_scheduler_instance_xyz() must be implemented even if the +scheduler instance does not use extra parameters. In this function the struct +gen fields must be filled with corrected infos. The +delete_scheduler_instance_xyz() function must bu implemented if the instance +has allocated some memory in the previous function. + +To store data belonging to a flowset the follow struct is used: +struct alg_fs_xyz { + struct gen g; + /* fill correctly the gen struct + g.subtype = DN_XYZ; + g.len = sizeof(struct alg_fs_xyz) + ... + */ + /* params for the flow */ +}; +The create_alg_fs_xyz() function is mandatory, because it must fill the struct +gen, but the delete_alg_fs_xyz() is mandatory only if the previous function +has allocated some memory. + +A struct new_queue contains packets belonging to a queue and some statistical +data. The scheduler could have to store data in this struct, so it must define +a new_queue_xyz struct: +struct new_queue_xyz { + struct new_queue q; + /* parameter for a queue */ +} + +All structures are allocated by the system. To do so, the scheduler must +set the size of its structs in the scheduler descriptor: +scheduler_size: sizeof(new_sch_xyz) +scheduler_i_size: sizeof(new_sch_inst_xyz) +flowset_size: sizeof(alg_fs_xyz) +queue_size: sizeof(new_queue_xyz); +The scheduler_size could be 0, but other struct must have at least a struct gen. + + +After the definition of structs, it is necessary to implement the +scheduler functions. + +- int (*config_scheduler)(char *command, void *sch, int reconfigure); + Configure a scheduler, or reconfigure if 'reconfigure' == 1. + This function performs additional allocation and initialization of global + parameter for this scheduler. + If memory is allocated here, the delete_scheduler_template() function + should be implemented to remove this memory. +- int (*delete_scheduler_template)(void* sch); + Delete a scheduler template. This function is mandatory if the scheduler + uses extra data respect the struct new_sch. +- int (*create_scheduler_instance)(void *s); + Create a new scheduler instance. The system allocate the necessary memory + and the schedulet can access it using the 's' pointer. + The scheduler instance stores all queues, and to do this can use the + hash table provided by the system. +- int (*delete_scheduler_instance)(void *s); + Delete a scheduler instance. It is important to free memory allocated + by create_scheduler_instance() function. The memory allocated by system + is freed by the system itself. The struct contains all queue also has + to be deleted. +- int (*enqueue)(void *s, struct gen *f, struct mbuf *m, + struct ipfw_flow_id *id); + Called when a packet arrives. The packet 'm' belongs to the scheduler + instance 's', has a flowset 'f' and the flowid 'id' has already been + masked. The enqueue() must call dn_queue_packet(q, m) function to really + enqueue packet in the queue q. The queue 'q' is chosen by the scheduler + and if it does not exist should be created calling the dn_create_queue() + function. If the schedule want to drop the packet, it must call the + dn_drop_packet() function and then return 1. +- struct mbuf * (*dequeue)(void *s); + Called when the timer expires (or when a packet arrives and the scheduler + instance is idle). + This function is called when at least a packet can be send out. The + scheduler choices the packet and returns it; if no packet are in the + schedulerinstance, the function must return NULL. + Before return a packet, it is important to call the function + dn_return_packet() to update some statistic of the queue and update the + queue counters. +- int (*drain_queue)(void *s, int flag); + The system request to scheduler to delete all queues that is not using + to free memory. The flag parameter indicate if a queue must be deleted + even if it is active. + +- int (*create_alg_fs)(char *command, struct gen *g, int reconfigure); + It is called when a flowset is linked with a scheduler. This is done + when the scheduler is defined, so we can know the type of flowset. + The function initialize the flowset paramenter parsing the command + line. The parameter will be stored in the g struct that have the right + size allocated by the system. If the reconfigure flag is set, it means + that the flowset is reconfiguring +- int (*delete_alg_fs)(struct gen *f); + It is called when a flowset is deleting. Must remove the memory allocate + by the create_alg_fs() function. + +- int (*create_queue_alg)(struct new_queue *q, struct gen *f); + Called when a queue is created. The function should link the queue + to the struct used by the scheduler instance to store all queues. +- int (*delete_queue_alg)(struct new_queue *q); + Called when a queue is deleting. The function should remove extra data + and update the struct contains all queues in the scheduler instance. + +The struct scheduler represent the scheduler descriptor that is passed to +dummynet when a scheduler module is loaded. +This struct contains the type of scheduler, the lenght of all structs and +all function pointers. +If a function is not implemented should be initialize to NULL. Some functions +are mandatory, other are mandatory if some memory should be freed. +Mandatory functions: +- create_scheduler_instance() +- enqueue() +- dequeue() +- create_alg_fs() +- drain_queue() +Optional functions: +- config_scheduler() +- create_queue_alg() +Mandatory functions if the corresponding create...() has allocated memory: +- delete_scheduler_template() +- delete_scheduler_instance() +- delete_alg_fs() +- delete_queue_alg() +
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201001070954.o079sVvP013482>