select,poll,epoll的区别以及使用方法(The differences and usage of select, poll and epoll)

I/O多路复用是指:通过一种机制,可以监视多个描述符,一旦某个描述符就绪(一般是读就绪或者写就绪),能够通知程序进行相应的读写操作。

原生socket客户端在与服务端建立连接时,即服务端调用accept方法时是阻塞的,同时服务端和客户端在收发数据(调用recv、send、sendall)时也是阻塞的。原生socket服务端在同一时刻只能处理一个客户端请求,即服务端不能同时与多个客户端进行通信,实现并发,导致服务端资源闲置(此时服务端只占据 I/O,CPU空闲)。

如果我们的需求是要让多个客户端连接至服务器端,而且服务器端需要处理来自多个客户端请求。很明显,原生socket实现不了这种需求,此时我们使用I/O多路复用机制就可以实现这种需求,可以同时监听多个文件描述符,一旦描述符就绪,能够通知程序进行相应的读写操作。

linux中的IO多路复用

(1)select

select最早于1983年出现在4.2BSD中,它通过一个select()系统调用来监视多个文件描述符的数组,当select()返回后,该数组中就绪的文件描述符便会被内核修改标志位,使得进程可以获得这些文件描述符从而进行后续的读写操作。

select目前几乎在所有的平台上支持,其良好跨平台支持也是它的一个优点,事实上从现在看来,这也是它所剩不多的优点之一。

select的一个缺点在于单个进程能够监视的文件描述符的数量存在最大限制,在Linux上一般为1024,不过可以通过修改宏定义甚至重新编译内核的方式提升这一限制。

另外,select()所维护的存储大量文件描述符的数据结构,随着文件描述符数量的增大,其复制的开销也线性增长。同时,由于网络响应时间的延迟使得大 量TCP连接处于非活跃状态,但调用select()会对所有socket进行一次线性扫描,所以这也浪费了一定的开销。

(2)poll

poll在1986年诞生于System V Release 3,它和select在本质上没有多大差别,但是poll没有最大文件描述符数量的限制。

poll和select同样存在一个缺点就是,包含大量文件描述符的数组被整体复制于用户态和内核的地址空间之间,而不论这些文件描述符是否就绪,它的开销随着文件描述符数量的增加而线性增大。

另外,select()和poll()将就绪的文件描述符告诉进程后,如果进程没有对其进行IO操作,那么下次调用select()和poll()的时候 将 再次报告这些文件描述符,所以它们一般不会丢失就绪的消息,这种方式称为水平触发(Level Triggered)。

(3)epoll

直到Linux2.6才出现了由内核直接支持的实现方法,那就是epoll,它几乎具备了之前所说的一切优点,被公认为Linux2.6下性能最好的多路I/O就绪通知方法。

epoll可以同时支持水平触发和边缘触发(Edge Triggered,只告诉进程哪些文件描述符刚刚变为就绪状态,它只说一遍,如果我们没有采取行动,那么它将不会再次告知,这种方式称为边缘触发),理论上边缘触发的性能要更高一些,但是代码实现相当复杂。

epoll同样只告知那些就绪的文件描述符,而且当我们调用epoll_wait()获得就绪文件描述符时,返回的不是实际的描述符,而是一个代表就绪描 述符数量的 值,你只需要去epoll指定的一个数组中依次取得相应数量的文件描述符即可,这里也使用了内存映射(mmap)技术,这样便彻底省掉了这些文件描述符在 系统调用时复制的开销。

另一个本质的改进在于epoll采用基于事件的就绪通知方式。在select/poll 中,进程只有在调用一定的方法后,内核才对所有监视的文件描述符进行扫描,而epoll事先通过epoll_ctl()来注册一个文件描述符,一旦基于某 个文件描述符就绪时,内核会采用类似callback的回调机制,迅速激活这个文件描述符,当进程调用epoll_wait()时便得到通知。

总结:

select

select的几大缺点:

(1)每次调用select,都需要把fd集合从用户态拷贝到内核态,这个开销在fd很多时会很大

(2)同时每次调用select都需要在内核遍历传递进来的所有fd,这个开销在fd很多时也很大

(3)select支持的文件描述符数量太小了,默认是1024

poll

poll的机制与select类似,与select在本质上没有多大差别,管理多个描述符也是进行轮询,根据描述符的状态进行处理,但是poll没有最大文件描述符数量的限制。poll和select同样存在一个缺点就是,包含大量文件描述符的数组被整体复制于用户态和内核的地址空间之间,而不论这些文件描述符是否就绪,它的开销随着文件描述符数量的增加而线性增大。

epoll

epoll是在2.6内核中提出的,是之前的select和poll的增强版本。相对于select和poll来说,epoll更加灵活,没有描述符限制。epoll使用一个文件描述符管理多个描述符,将用户关系的文件描述符的事件存放到内核的一个事件表中,这样在用户空间和内核空间的copy只需一次。

最终调用epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);函数等待事件到来,返回值是需要处理的事件数目,events表示要处理的事件集合。

一句话总结

(1)select,poll实现需要自己不断轮询所有fd集合,直到设备就绪,期间可能要睡眠和唤醒多次交替。而epoll其实也需要调用epoll_wait不断轮询就绪链表,期间也可能多次睡眠和唤醒交替,但是它是设备就绪时,调用回调函数,把就绪fd放入就绪链表中,并唤醒在epoll_wait中进入睡眠的进程。虽然都要睡眠和交替,但是select和poll在“醒着”的时候要遍历整个fd集合,而epoll在“醒着”的时候只要判断一下就绪链表是否为空就行了,这节省了大量的CPU时间。这就是回调机制带来的性能提升。

(2)select,poll每次调用都要把fd集合从用户态往内核态拷贝一次,并且要把current往设备等待队列中挂一次,而epoll只要一次拷贝,而且把current往等待队列上挂也只挂一次(在epoll_wait的开始,注意这里的等待队列并不是设备等待队列,只是一个epoll内部定义的等待队列)。这也能节省不少的开销。

epoll的使用方法

epoll的接口非常简单,一共就三个函数。

1,epoll_create

/*
size:在 Linux最新的一些内核版本的实现中,这个 size参数没有任何意义。
返回值:返回值为一个文件描述符,作为后面两个函数的参数
*/
int epoll_create(int size)

此函数可以在内核中创建一个内核事件表,通过返回的内核事件表来管理

2,epoll_ctl

/*
epfd:操作内核时间表的文件描述符,即epoll_create函数的返回值
op:操作内核时间表的方式
	EPOLL_CTL_ADD(向内核时间表添加文件描述符,即注册);
	EPOLL_CTL_MOD(修改内核事件表事件);
	EPOLL_CTL_DEL (删除内核事件表中的事件);
fd:操作的文件描述符
event:指向struct epoll_event的指针
*/
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)

poll的事件注册函数,epoll_ctl向 epoll对象中添加、修改或者删除感兴趣的事件,返回0表示成功,否则返回–1,此时需要根据errno错误码判断错误类型。

event结构

struct epoll_event
{
    /*
    储存用户感兴趣的事情和就绪事件,
    events可以是以下几个宏的集合:
    EPOLLIN :表示对应的文件描述符可以读(包括对端SOCKET正常关闭);
    EPOLLOUT:表示对应的文件描述符可以写;
    EPOLLPRI:表示对应的文件描述符有紧急的数据可读(这里应该表示有带外数据到来);
    EPOLLERR:表示对应的文件描述符发生错误;
    EPOLLHUP:表示对应的文件描述符被挂断;
    EPOLLET: 将EPOLL设为边缘触发(Edge Triggered)模式,这是相对于水平触发(Level Triggered)来说的。
    EPOLLONESHOT:只监听一次事件,当监听完这次事件之后,如果还需要继续监听这个socket的话,需要再次把这个socket加入到EPOLL队列里
    */
    uint32_t events; 
    epoll_data_t data; //联合体最重要的就是fd,即要操作的文件描述符
};
 
typedef union epoll_data
{
    void *ptr;
    int fd;
    _uint32_t u32;
    _uint64_t u64;
}epoll_data_t;

3,epoll_wait

/*
epfd:同上面函数
events:用于接收内核返回的就绪事件的数组
maxevents:用户最多能处理的事件个数
等待I/O的超时值(后面的编程设为-1,表示永不超时),单位为ms
返回值,指的是就绪事件的个数
*/
int epoll_wait(int epfd, struct epoll_event events, int maxevents, int timeout)

等待事件的产生,类似于select()调用。参数events用来从内核得到事件的集合,maxevents告之内核这个events有多大,这个 maxevents的值不能大于创建epoll_create()时的size,参数timeout是超时时间(毫秒,0会立即返回,-1将不确定,也有说法说是永久阻塞)。该函数返回需要处理的事件数目,如返回0表示已超时。如果返回–1,则表示出现错误,需要检查 errno错误码判断错误类型。

下面通过一个echo回射服务器的客户端和服务端案例介绍epoll的使用方法

服务端事件poll

    int epollFd;
    struct epoll_event events[MAX_EVENTS];
    int ret;
    char buf[MAXSIZE];
    memset(buf,0,MAXSIZE);
    //创建一个epoll描述符,通过这个描述管理多个描述符
    epollFd = epoll_create(FDSIZE);
    //添加监听描述符事件
    add_event(epollFd,listenFd,EPOLLIN);
    while(1){
        //获取已经准备好的描述符事件,阻塞
        ret = epoll_wait(epollFd, events, MAX_EVENTS,-1);
        //处理事件,ret是发生的事件个数
        handle_events(epollFd,events,ret,listenFd,buf);
    }
    close(epollFd);

客户端事件poll

    int                 sockfd;
    struct sockaddr_in  servaddr;
    sockfd = socket(AF_INET,SOCK_STREAM, IPPROTO_TCP);
    bzero(&servaddr,sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    servaddr.sin_port = htons(SERV_PORT);
    servaddr.sin_addr.s_addr = inet_addr(IPADDRESS);
    printf("start\n");
    if(connect(sockfd,(struct sockaddr*)&servaddr, sizeof(sockaddr_in)) < 0){
        perror("connect err: ");
        return 0;
    }
    else{
        printf("connect succ\n");
    }
    //处理连接
    handle_connection(sockfd);
    close(sockfd);
    return 0;

程序运行结果

客户端

./cli
start
connect succ
cli hello
epollfd 4, rdfd 0, sockfd 3, read 10
epollfd 4, wrfd 3, sockfd 3, write 10
epollfd 4, rdfd 3, sockfd 3, read 10
cli hello
epollfd 4, wrfd 1, sockfd 3, write 10
cli over
epollfd 4, rdfd 0, sockfd 3, read 9
epollfd 4, wrfd 3, sockfd 3, write 9
epollfd 4, rdfd 3, sockfd 3, read 9
cli over
epollfd 4, wrfd 1, sockfd 3, write 9
^C

服务端

./srv accept a new client: 127.0.0.1:37098, fd = 5read fd=5, num read=10read message is : cli hellowrite fd=5, num write=10read fd=5, num read=9read message is : cli overwrite fd=5, num write=9read fd=5, num read=0client close.^C

本文简单总结了select,poll,epoll的使用方法以及各自的优劣势,以及写了一个epoll的demo供参考,详细的运行机制参考文章,

程序源代码详见公众号 xutopia77 的文章 《select,poll,epoll的区别以及使用方法》

————————

I / O multiplexing means that multiple descriptors can be < strong > monitored through a mechanism. Once a descriptor is ready (generally read ready or write ready), it can notify the program to perform corresponding read-write operations.

When the native socket client establishes a connection with the server, that is, when the server calls the accept method, it is blocked. At the same time, the server and client are blocked when sending and receiving data (calling recv, send, sendall). The native socket server can only process one client request at a time, that is, the server cannot communicate with multiple clients at the same time to achieve concurrency, resulting in idle server resources (at this time, the server only occupies I / O and the CPU is idle).

If our requirement is to connect multiple clients to the server, and the server needs to process requests from multiple clients. Obviously, the native socket cannot meet this requirement. At this time, we can use the I / O multiplexing mechanism to meet this requirement. We can listen to multiple file descriptors at the same time. Once the descriptors are ready, we can notify the program to read and write accordingly.

linux中的IO多路复用

(1)select

Select first appeared in 4.2bsd in 1983. It is used to monitor the array of multiple file descriptors through a select() system call. When select() returns, the ready file descriptors in the array will be modified by the kernel, so that the process can obtain these file descriptors for subsequent read and write operations.

Select is currently supported on almost all platforms, and its good cross platform support is also one of its advantages. In fact, from now on, this is one of its few remaining advantages.

One disadvantage of select is that there is a maximum limit on the number of file descriptors that a single process can monitor, which is generally 1024 on Linux, but this limit can be raised by modifying the macro definition or even recompiling the kernel.

In addition, the data structure maintained by select () stores a large number of file descriptors. With the increase of the number of file descriptors, the replication overhead increases linearly. At the same time, due to the delay of network response time, a large number of TCP connections are inactive, but calling select() will perform a linear scan on all sockets, so it also wastes some overhead.

(2)poll

Poll was born in System V release 3 in 1986. It is not much different from select in essence, but poll has no limit on the maximum number of file descriptors.

Poll and select also have a disadvantage that the array containing a large number of file descriptors is copied between the user state and the address space of the kernel as a whole. Regardless of whether these file descriptors are ready or not, its overhead increases linearly with the increase of the number of file descriptors.

In addition, after select () and poll () tell the process of ready file descriptors, if the process does not perform IO operations on them, these file descriptors will be reported again the next time select () and poll () are called, so they generally do not lose ready information. This method is called level triggered.

(3)epoll

Until Linux 2 The implementation method directly supported by the kernel, epoll, has almost all the advantages mentioned before and is recognized as Linux 2 6, the best multi-channel I / O ready notification method.

Epoll can support both horizontal trigger and edge trigger (edge triggered, which only tells the process which file descriptors have just become ready. It only says it once. If we do not take action, it will not tell again. This method is called edge trigger). Theoretically, the performance of edge trigger is higher, but the code implementation is quite complex.

Epoll also tells only those ready file descriptors, and when we call epoll_ When wait() obtains ready file descriptors, it returns not the actual descriptors, but a value representing the number of ready descriptors. You only need to obtain the corresponding number of file descriptors from an array specified by epoll. Here, memory mapping (MMAP) technology is also used, This completely eliminates the overhead of copying these file descriptors during system calls.

Another essential improvement is that epoll adopts event based ready notification. In select / poll, the kernel scans all monitored file descriptors only after the process calls a certain method, and epoll passes epoll in advance_ CTL () to register a file descriptor. Once a file descriptor is ready, the kernel will use a callback mechanism similar to callback to quickly activate the file descriptor. When the process calls epoll_ You are notified when you wait ().

Summary:

select

Several disadvantages of select:

(1) Every time you call select, you need to copy the FD set from the user state to the kernel state. This overhead will be great when FD is a lot

(2) At the same time, every time you call select, you need to traverse all FD passed in the kernel. This overhead is also great when there are many FD

(3) The number of file descriptors supported by select is too small. The default is 1024

poll

The mechanism of poll is similar to that of select, which is not much different in essence. Managing multiple descriptors is also polling, which is processed according to the status of descriptors, but poll has no limit on the maximum number of file descriptors. Poll and select also have a disadvantage that the array containing a large number of file descriptors is copied between the user state and the address space of the kernel as a whole. Regardless of whether these file descriptors are ready or not, its overhead increases linearly with the increase of the number of file descriptors.

epoll

Epoll is proposed in the 2.6 kernel and is an enhanced version of the previous select and poll. Compared with select and poll, epoll is more flexible and has no descriptor restrictions. Epoll uses a file descriptor to manage multiple descriptors, and stores the events of the file descriptor of user relationship in an event table of the kernel, so that it only needs to copy once in user space and kernel space.

最终调用epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);函数等待事件到来,返回值是需要处理的事件数目,events表示要处理的事件集合。

One sentence summary

(1) The select and poll implementation needs to poll all FD sets continuously until the device is ready, during which sleep and wake-up may alternate many times. Epoll actually needs to call epoll_ Wait continuously polls the ready linked list. It may alternate between sleep and wake-up for many times, but it calls the callback function when the device is ready, puts the ready FD into the ready linked list, and wakes up in epoll_ The process of entering sleep in wait. Although both sleep and alternate, select and poll need to traverse the entire FD set when they are “awake”, while epoll only needs to judge whether the ready list is empty when they are “awake”, which saves a lot of CPU time. This is the performance improvement brought by the callback mechanism.

(2) Select and poll copy the FD set from the user state to the kernel state once for each call, and hang the current to the device waiting queue once, while epoll only needs one copy, Moreover, you can only hang current to the waiting queue once (at the beginning of epoll_wait, note that the waiting queue here is not a device waiting queue, but a waiting queue defined in epoll). This can also save a lot of expenses.

How to use epoll

Epoll interface is very simple, a total of three functions.

1,epoll_create

/*
size:在 Linux最新的一些内核版本的实现中,这个 size参数没有任何意义。
返回值:返回值为一个文件描述符,作为后面两个函数的参数
*/
int epoll_create(int size)

This function can create a kernel event table in the kernel and manage it through the returned kernel event table

2,epoll_ctl

/*
epfd:操作内核时间表的文件描述符,即epoll_create函数的返回值
op:操作内核时间表的方式
	EPOLL_CTL_ADD(向内核时间表添加文件描述符,即注册);
	EPOLL_CTL_MOD(修改内核事件表事件);
	EPOLL_CTL_DEL (删除内核事件表中的事件);
fd:操作的文件描述符
event:指向struct epoll_event的指针
*/
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)

Event registration function of poll, epoll_ CTL adds, modifies, or deletes an event of interest to the epoll object. A return of 0 indicates success. Otherwise, a return of – 1 is returned. At this time, the error type needs to be determined according to the errno error code.

event结构

struct epoll_event
{
    /*
    储存用户感兴趣的事情和就绪事件,
    events可以是以下几个宏的集合:
    EPOLLIN :表示对应的文件描述符可以读(包括对端SOCKET正常关闭);
    EPOLLOUT:表示对应的文件描述符可以写;
    EPOLLPRI:表示对应的文件描述符有紧急的数据可读(这里应该表示有带外数据到来);
    EPOLLERR:表示对应的文件描述符发生错误;
    EPOLLHUP:表示对应的文件描述符被挂断;
    EPOLLET: 将EPOLL设为边缘触发(Edge Triggered)模式,这是相对于水平触发(Level Triggered)来说的。
    EPOLLONESHOT:只监听一次事件,当监听完这次事件之后,如果还需要继续监听这个socket的话,需要再次把这个socket加入到EPOLL队列里
    */
    uint32_t events; 
    epoll_data_t data; //联合体最重要的就是fd,即要操作的文件描述符
};
 
typedef union epoll_data
{
    void *ptr;
    int fd;
    _uint32_t u32;
    _uint64_t u64;
}epoll_data_t;

3,epoll_wait

/*
epfd:同上面函数
events:用于接收内核返回的就绪事件的数组
maxevents:用户最多能处理的事件个数
等待I/O的超时值(后面的编程设为-1,表示永不超时),单位为ms
返回值,指的是就绪事件的个数
*/
int epoll_wait(int epfd, struct epoll_event events, int maxevents, int timeout)

Wait for the event to occur, similar to the select () call. The parameter events is used to get the set of events from the kernel. Maxevents tells the kernel how big the events are. The value of maxevents cannot be greater than the value of the created epoll_ The size of create(), and the parameter timeout is the timeout (milliseconds, 0 will be returned immediately, and – 1 will be uncertain. It is also said that it is permanently blocked). This function returns the number of events to be processed. If 0 is returned, it indicates that it has timed out. If – 1 is returned, it indicates that an error has occurred. You need to check the errno error code to determine the error type.

The following describes the use of epoll through a case of echo echo echo server client and server

Server event poll

    int epollFd;
    struct epoll_event events[MAX_EVENTS];
    int ret;
    char buf[MAXSIZE];
    memset(buf,0,MAXSIZE);
    //创建一个epoll描述符,通过这个描述管理多个描述符
    epollFd = epoll_create(FDSIZE);
    //添加监听描述符事件
    add_event(epollFd,listenFd,EPOLLIN);
    while(1){
        //获取已经准备好的描述符事件,阻塞
        ret = epoll_wait(epollFd, events, MAX_EVENTS,-1);
        //处理事件,ret是发生的事件个数
        handle_events(epollFd,events,ret,listenFd,buf);
    }
    close(epollFd);

Client event poll

    int                 sockfd;
    struct sockaddr_in  servaddr;
    sockfd = socket(AF_INET,SOCK_STREAM, IPPROTO_TCP);
    bzero(&servaddr,sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    servaddr.sin_port = htons(SERV_PORT);
    servaddr.sin_addr.s_addr = inet_addr(IPADDRESS);
    printf("start\n");
    if(connect(sockfd,(struct sockaddr*)&servaddr, sizeof(sockaddr_in)) < 0){
        perror("connect err: ");
        return 0;
    }
    else{
        printf("connect succ\n");
    }
    //处理连接
    handle_connection(sockfd);
    close(sockfd);
    return 0;

Program running results

client

./cli
start
connect succ
cli hello
epollfd 4, rdfd 0, sockfd 3, read 10
epollfd 4, wrfd 3, sockfd 3, write 10
epollfd 4, rdfd 3, sockfd 3, read 10
cli hello
epollfd 4, wrfd 1, sockfd 3, write 10
cli over
epollfd 4, rdfd 0, sockfd 3, read 9
epollfd 4, wrfd 3, sockfd 3, write 9
epollfd 4, rdfd 3, sockfd 3, read 9
cli over
epollfd 4, wrfd 1, sockfd 3, write 9
^C

Server

./srv accept a new client: 127.0.0.1:37098, fd = 5read fd=5, num read=10read message is : cli hellowrite fd=5, num write=10read fd=5, num read=9read message is : cli overwrite fd=5, num write=9read fd=5, num read=0client close.^C

This paper briefly summarizes the use methods, advantages and disadvantages of select, poll and epoll, and writes a demo of epoll for reference,

The source code is detailed in the article “select, poll, epoll and the way of use” in the official account xutopia77.