- To include a configuration file for the cache plugin.
- To start integrating cache with the server.
Git hub repo : https://github.com/tssavita/cache-plugin-implementation
Git hub repo : https://github.com/tssavita/cache-plugin-implementation
Recently, my proposal for Google Summer of Code 2014 in Monkey HTTP Server got selected. I have posted my GSoC proposal here.
Organization: Monkey Project
Short description: This proposal provides a brief description of the implementation of a caching plugin for Monkey server. This plugin will be of great benefit to the server since it will reduce the server load and increase the performance of the server by enabling to handle requests from more clients. More has been described in the proposal.
Table of Contents:
In conditions of high traffic, where the servers are loaded with a large number of requests, it sometimes happens that a server is requested for the same information repeatedly by clients, that too, often within a very short span of time. This clearly wastes the performance of the server since it could be tending to the requests of many other clients.
As a solution to this problem, the idea proposes a caching plugin for Monkey server. A cache is a collection of locally stored data that are often requested by the client or user. Since a cache is physically more close to the requester, information from the cache can be accessed more quickly than from the server itself. With the help of the cache, information that was recently requested is stored in the cache. The performance of the server, to satisfy the requests of multiple clients can be improved.
By the completion of this internship, the following will have been delivered to Monkey project.
A cache plugin for Monkey server.
4. 1. The two main tasks involved in this project are the following:
Solution 1 : Implementation using min heap.
The first solution is to use a min heap to store the files and in a cache. The nodes of the heap is used to store files. Each node a.k.a. file is inserted into the min heap using the access count which indicates how many times the file has been accessed. Each time a file is looked up for, using the hash table, the access count is incremented by 1. The heap is then balanced accordingly. The node at the top of the min heap will have the minimum access count. When the cache becomes full, the node at the top (the root node) is deleted, the new resource could be inserted, and the heap is again balanced accordingly.
Insertion : O (log n)
Deletion : O (log n)
Lookup : O (log n)
Time complexity for operations of a hash table = O (1)
Total run time of that LFU cache : Time complexity of lookup in hash table + Time complexity of operations in min heap.
Solution 2 : Implementation using two doubly linked lists.
7. Timeline:
Now – April 21st |
|
April 21st – May 19th |
|
May 19th – May 25th |
|
May 26th – June 1st |
|
June 2nd – June 8th |
|
June 9th – June 15th |
|
June 16th – June 22nd |
|
June 23rd – June 29th |
Deliverable at the end of this period: The cache plugin would be ready by the end of midterm evaluation period. |
June 30th – July 6th |
|
July 7th – July 17th |
|
July 18th – July 28th |
|
July 29th – August 3rd |
|
August 4 – August 10th |
|
August 11th – August 17th |
|
There is a reason that I mentioned in my earlier post on Introduction to Monkey server runs only on Linux operating systems. It is because, the server depends on a set of non-portable Linux system calls for its performance. One among the most important of system calls is the epoll feature.
Epoll is a feature provided by Linux kernel to manage file descriptors. It monitors the file descriptors that have been added to an epoll instance to see if I/O event is possible on any one of them. They may be used either as edge-triggered or level-triggered.
Some of the functions used in this regard are :
Some of the data types used in this regards are :
Some of the actions whose occurrence the epoll instance looks for in a file descriptor are as follows :
The monkey server also makes use of the epoll feature provided by the Linux kernel to handle the innumerable file descriptors.
The epoll states of all the file descriptors that are being handled by a thread are stored in a red black tree in addition to being stored in an epoll queue. Besides this, each thread also has two lists – one for available and the other for busy slots. Further for each file descriptor, a set of data is to be maintained about the behaviour, events to be monitored, current mode, etc.The structures that are used to represent the various the epoll states for a file descriptor and for a thread are as follows :
In the source code for Monkey server, there are separate functions written for this very important component. The following is a brief description of the functions and what each function does :
The workflow of where and when the epoll is used in the Monkey server has been explained, trying maximum to keep out other components of the server :
References :
Reverse proxy is the process of routing client requests to backend servers. It sits in between the client and backend servers. It takes a request from the client and assigns it to one of the servers. It fetches the requested resource from the server and returns it to the client. Some of the advantages of using a reverse proxy are :
The following diagram gives a picture of the entire system.
The monkey server provides support for reverse proxy through the plugin proxy_reverse. You can read more about it here.
This is a brief workflow and where the reverse proxy plugin fits in :
Some of balancing algorithms used for this purpose are as follows :
Thanks to Sonny Karlsson, a developer of Monkey httpd server for the idea. He proposed the idea of developing a cache for reverse proxy plugin in the monkey server.
Exception case:
When Monkey server becomes one of the slave servers for reverse proxy plugin there is the possibility of a recursion happening. The following diagrams would show a normal case and the exceptional case. For this reason, it has to be identified if a request is coming from a client or a reverse proxy plugin. If it is found that the resource is not present in the cache, and the slave server to which the request is going to be redirected is Monkey server itself, this would be like an ordinary incoming request to the server, which would be processed and looked up the cache, to see if the resource is present. This will happen indefinitely. Thus, if the resource requested by a request from the proxy reverse plugin is not present in the cache and the slave server to which the request is going to be directed to is found to be Monkey, immediately the next slave server in the list is chosen and the request is sent to this server. The following diagram will show you a workflow of entire system.
Virtual hosting is allowing one server to represent different machines. This basically for supporting multiple host names on one server. For example, there may be two websites – example.com and example.org that may be having the same IP address. For the two websites, even though their information is hosted on the same server, it is maintained in different directories.
There are two types of virtual hosting at the moment:
1. Name-based virtual hosting
2. IP based virtual hosting
The advantage of virtual hosting is that a server shares its resources with multiple websites have the site content on the same server. It shares its resources like processor cycles and memory, etc. with different servers. In a server, virtual hosting is seen as a method to share the server content between different domains.
The following are the steps to be followed while configuring support for virtual hosts in Monkey server.
1. You need to change the /etc/hosts file to add multiple hosts to the same IP address. For example,
127.0.0.1 monkey1 127.0.0.1 monkey2
So now we have two virtual hosts assigned to the same IP address.
2. Now both of these servers will have different directories under which their resources are shared, in monkey/htdocs/ directory. (monkey/htdocs/monkey1 and monkey/htdocs/monkey2)
3. Each will also need configuration file that states the Document root folder where its resources have been located and its ServerName. The example of a configuration file for one of these servers is as follows:
[HOST] Documentroot /home/savita/monkey/htdocs/monkey1 ServerName monkey1
4. Now on typing the URL: https://monkey1:2001/monkey1/amma1.jpg this particular image which was stored in monkey1 folder is displayed on the browser. For virtual host monkey2, this resource will be present in monkey2/ directory. So if you were to type in the URL space, https://monkey2:2001/monkey2/amma2.jpg, the image that was placed in that location is obtained and displayed on the browser.
The process of having different virtual hosts. In each HTTP request, the host name and the resource URI are mentioned. Now, it checks if the particular virtual host exists and within the virtual host, a particular resource exists or not. If not, then an error is returned.
Monkey follows a special method of sharing file descriptors among different requests that come to the same virtual host so that it avoids opening numerable file descriptors at the same time. This is called File Descriptor Table (FDT).
For each worker or thread, a hash table is maintained with 64 entries and each of these maintains a subarray of 8 chains. So when a request arrives at one of the virtual hosts, it hashes the name of the resource being requested (which can be obtained from the HTTP header) and looks up if the file descriptor for this resource is already present in the hash table for that particular worker. If present, instead of opening another file descriptor for the same file, the already open file descriptor is used. The number of users using that particular file descriptor is incremented. If a particular request stops using a particular file, it checks if the number of users of the file descriptor is greater than zero or not. If it is greater than 0, then a message is just returned that states that the file descriptor is still being used. Else, if the number of users of the file descriptor is zero, then the file descriptor is closed. <using close (2) system call>.
This improves the performance of the monkey server.
Under normal conditions, different virtual hosts do not share data with each other. The main reason that the sharing of file descriptors takes place between different requests within the same virtual host and not among different virtual hosts is to enable or disable such an option for various virtual hosts.
Reference :
Monkey server is a lightweight Http server for Linux, written completely in C. It places immense importance on performance and consumes very less memory and for this reason, it is also favourable for embedded devices. It focuses strongly on Linux, the reason for this being that the server depends on the Linux kernel to perform operations that improve the performance of the server. It is HTTP/1.1 compliant. Some of the important features of the server are:
The major components of the monkey server are scheduler, threads, plugins. The monkey server has a variety of plugins that serve various functionalities.
Follow the below steps to install and monkey from source code.
git clone https://github.com/monkey/monkey.git cd monkey ./configure make bin/monkey
Take localhost:2001 in your browser and lo ! You have successfully installed and run monkey server.
You could read more about this server here :