There is a reason that I mentioned in my earlier post on Introduction to Monkey server runs only on Linux operating systems. It is because, the server depends on a set of non-portable Linux system calls for its performance. One among the most important of system calls is the epoll feature.
What is epoll ?
Epoll is a feature provided by Linux kernel to manage file descriptors. It monitors the file descriptors that have been added to an epoll instance to see if I/O event is possible on any one of them. They may be used either as edge-triggered or level-triggered.
Some of the functions used in this regard are :
- epoll_create1 – This creates an epoll instance. Each epoll instance has a file descriptor. epoll_create1 () returns the efd (epoll file descriptor) of the epoll instance created.
- epoll_ctl – This is used to perform specified operations on them. It may be adding a particular fd to the epoll instance.
- epoll_wait – This waits on the file descriptors to see if an I/O event is possible. If this is not possible on any of the file descriptors monitored by an epoll instance of efd, it blocks the calling thread.
Some of the data types used in this regards are :
- struct epoll_data – void *ptr, int fd, uint32_t u32, uint64_t u64.
- struct epoll_events – uint32_t events, epoll_data_t data.
Some of the actions whose occurrence the epoll instance looks for in a file descriptor are as follows :
- EPOLLIN – for a read operation
- EPOLLOUT – for a write operation
- EPOLLERR – indicates an error that happened on the file descriptor
- EPOLLHUP – indicates that a hangup occured on the fd
- EPOLLRDHUP – indicates that the connection was set down by a socket peer
- EPOLLET – sets Edge triggered behaviour for the file descriptor
Epoll component in Monkey server
The monkey server also makes use of the epoll feature provided by the Linux kernel to handle the innumerable file descriptors.
Implementation Details :
The epoll states of all the file descriptors that are being handled by a thread are stored in a red black tree in addition to being stored in an epoll queue. Besides this, each thread also has two lists – one for available and the other for busy slots. Further for each file descriptor, a set of data is to be maintained about the behaviour, events to be monitored, current mode, etc.The structures that are used to represent the various the epoll states for a file descriptor and for a thread are as follows :
- struct epoll_state_index – for each thread.
- size – number of requests that a thread can handle.
- red black tree for all the epoll state nodes in the thread.
- available list of those epoll state nodes that are free.
- busy list of those epoll state nodes being currently used.
- struct epoll_state – for each file descriptor.
- file descriptor – one whose epoll state this struct represents
- mode – current mode of the file descriptor (this is an element of events list)
- events – the list of events that should be monitored
- behaviour – denoting whether the events should be monitored on the basis of edge triggering or level triggering.
- A red black node so that it can be added to the red black tree that stores the epoll states for file descriptors for a particular thread.
- A list node so that it can be added to either the busy queue or available queue that is maintained for each thread.
- struct mk_epoll_handlers – this will be explained in another post.
- int (*read) (int);
- int (*write) (int);
- int (*close) (int);
Functions in src/mk_epoll.c :
In the source code for Monkey server, there are separate functions written for this very important component. The following is a brief description of the functions and what each function does :
- mk_epoll_create () – This creates an epoll instance for a thread, with the help of the system call epoll_create1, and returns the epoll file descriptor for the epoll instance of the thread. If due to some reason, the epoll instance was not created (that is, epoll_create1 returned -1), it prints an error.
- mk_epoll_init (int efd, int max_events) – This is the function where the epoll instance waits infinitely for on the events that occur on the file descriptors. It is done with the help of epoll_wait function. Depending on which event is occurring on the file descriptor, the corresponding method is called. It does this for all the file descriptors are present on the epoll instance.
- mk_epoll_state_get (int fd) – This function gets called from two places – from function mk_epoll_add and mk_epoll_chage_mode. Given the file descriptor, this function gets the epoll state of the file descriptor from the red black tree that contains the epoll states of all file descriptors for the thread. If the epoll state of a file descriptor has not been found, it prints a message saying so and returns NULL.
- mk_epoll_state_set (int fd, uint8_t mode, unsigned int behaviour, uint32_t events) – Given the parameters, it searches the red black tree for this particular epoll state node with file descriptor = fd. If the node is already present, it just sets the mode, and if the mode is not SLEEP, the events and behaviour of the fd, with the values that have been passed to this function. This case will happen when it called from inside change_mode function (explained in the end). If the node is not present in the red black tree (which means that it is a new one), it is assigned the values that have been passed to this function, such as mode, behaviour, events, etc.. This node is then removed from the available queue and added to the busy queue. It is also inserted into the red black tree for the thread.
- mk_epoll_state_init () – This function initializes values for members of struct epoll_state_index.
- mk_epoll_add (int efd, int fd, int init_mode, unsigned int behaviour) – With the given parameters, it adds the initial mode to the events list of the file descriptor fd. It then calls epoll_ctl with operation EPOLL_CTL_ADD on fd, to add it to the epoll instance for the thread. At the end it calls state_set function to add it to the red black tree and the one of the lists of the thread.
- mk_epoll_del (int efd, int fd) – This removes the epoll state of fd from the epoll instance efd, with the help of EPOLL_CTL_DEL operation in epoll_ctl function. It then calls mk_epoll_del_state function to remove it from other places.
- mk_epoll_state_del (int fd) – This function removes it from the red black tree where the epoll states are stored, and deletes it from the list where it is present, adds the epoll state node to the available queue.
- mk_epoll_change_mode (int efd, int fd, int mode, unsigned int behaviour) – This function is used to change the mode of a particular file descriptor. Using a switch case, it checks the mode against possible modes. On finding the matching case, it appends that particular mode to the events list. If the mode is SLEEP, it disables all events in the event list for the file descriptor. If the mode is WAKEUP, it gets the node from red black tree, and (if the file descriptor is present ) reassigns events and behaviour for the node (which had been previously disabled when its mode was changed to SLEEP). After the changes made to the mode of the file descriptor, it calls epoll_ctl (efd, EPOLL_CTL_MOD, fd, events_list) for the descriptor in the epoll instance of the thread, with operation to modify the epoll event for fd. It then calls the state_get function again to reset the mode, events list and behaviour.
The workflow of where and when the epoll is used in the Monkey server has been explained, trying maximum to keep out other components of the server :
- When a thread is created that will be listening for incoming requests, an epoll instance is created using epoll_create.
- Now that an epoll instance has been created for the thread, this epoll instance is initialized with values for various fields of the instance.
- As soon as an incoming connection arrives, it is assigned the least loaded thread and soon after this, the file descriptor for this connection is added to the epoll instance of the thread.
- The epoll instance monitors the events that occur in each file descriptor that has been added to it.
- Depending on whether the event on a file descriptor is read or write or any of the events mentioned in the previous section, the corresponding method is called.
- After all the requests on a particular socket have been completed, it is removed from the epoll instance.
- man pages for epoll