At the most basic level, whenever a browser needs a file that is hosted on a web server, the browser requests the file via HTTP. When the request reaches the correct (hardware) web server, the (software) HTTP server accepts the request, finds the requested document, and sends it back to the browser, also through HTTP.
HTTP server usually use TCP for communications.
Socket : mechanism to give programs access to the network
- create a socket with
socket()
- identify the socket with
bind()
- wait for a connection with
listen()
andaccept()
- send and receive messages with
read()
andwrite()
(orsend()
andrecv()
) - close the socket with
close()
There are some explanations about those functions in the following section.
#include <arpa/inet.h>
uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);
htons()
: converts a short integer (e.g. address) to a network representation- etc.
epoll()
is a method used to monitor several sockets. It waits for changing state or changing level for each socket monitored. epoll()
can handle a lot of sockets descriptors. It contains a internal structure containing two lists :
- an interest list : which corresponds to all the file descriptors monitored
- a ready list which corresponds to the file descriptors ready for I/O
By default, epoll()
is looking only at level changes.
int epoll_create(int nb);
creates a new epoll
instance and returns a descriptor.
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
changes the behavior of our epoll
instance.
epfd
is the descriptor of theepoll
instance createdop
is the operation wanted on the epoll structure (for example add an new fd in the interest list, modify it or delete it)fd
is the concerned descriptorevent
should be filled with the concernedfd
andflags
we want to apply on thisfd
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
waits for an event on any descriptor in the interest list.
epfd
is the descriptor of theepoll
instance createdmaxevents
is the maximum of events returnedevents
is used to return information from the ready list
This function will block until :
- a file descriptor delivers an event
- the call is interrupted by a signal handler
- the timeout expires
#include <sys/socket.h>
int socket(int domain, int type, int protocol);
- return value : socket descriptor (like file descriptor)
domain
: specifies communication domain (localAF_LOCAL
, through an internet protocolAF_INET
, etc.)type
: specifies the semantics of communication over the socket (SOCK_STREAM
,SOCK_DGRAM
, ...)protocol
: specifies a protocol to use, it should be consistant with thedomain
, you can see protocol's values in/etc/protocols
int accept(int sockfd, struct sockaddr *restrict addr, socklen_t *restrict addrlen);
accept
grabs the first connection request and create a new socket for communication (the listening socket should be used only for listening purpose). addr
and addrlen
are filled by the function.
int listen(int sockfd, int backlog);
marks the socket sockfd
as a listening socket. The backlog
argument defines the maximum lenght of the queue of pending connection requests.
ssize_t send(int sockfd, const void *buf, size_t len, int flags);
The only difference between write()
and send()
is the presence of flags.
ssize_t recv(int sockfd, void *buf, size_t len, int flags);
The only difference between read()
and recv()
is the presence of flags.
Almost like assigning an address to a mailbox
int bind(int sockfd, const struct sockaddr *address, socklen_t address_len);
struct sockaddr_in
{
__uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
with :
sin_family
=domain
sin_port
= a port numbersin_addr
= address for the socket (for exampleinet_addr("127.0.0.1")
or const likeINADDR_ANY
)
int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
Connects the socket sockfd
to the address specified by addr
.
#include <arpa/inet.h>
in_addr_t inet_addr(const char *cp);
converts the string cp
to an integer value suitable for use as an Internet address.
int setsockopt(int sockfd, int level, int option_name, const void *option_value, socklen_t option_len);
sets the option option_name
argument, at the protocol level specified by the level
argument, to the value option_value
for the socket sockfd
.
#include <fcntl.h>
int fcntl(int fd, int cmd, ... /* arg */ );
performs one of the operation cmd
on the open file descriptor fd
.
HTTP defines a set of request methods(or verbs) to indicate the desired action to be performed for a given resource.
The HTTP/1.0 specification defined the GET, HEAD, and POST methods, and the HTTP/1.1 specification added five new methods: PUT, DELETE, CONNECT, OPTIONS, and TRACE.
GET
= requests that the target resource transfer a representation of its state (HTTP status codes). Requests using GET should only retrieve data without making changes.HEAD
= asks for a response identical to a GET request, but without the response body (only the header).POST
= submits an entity to the specified resource, often causing a change in state or side effects on the server.PUT
= requests that the target resource create or update its state with the state defined by the submitted request. A distinction from POST is that the client specifies the target location on the server.DELETE
= deletes the specified resource.CONNECT
= establishes a tunnel to the server identified by the target resource.(?)OPTIONS
= requests that the target resource transfer the HTTP methods that it supports. This can be used to check the functionality of a web server by requesting '*' instead of a specific resource.TRACE
= requests that the target resource transfer the received request in the response body. That way a client can see what (if any) changes or additions have been made by intermediaries.PATCH
= applies partial modifications to a resource.
GET / HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0
Accept: *
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
A response message is sent by a server to a client as a reply to its former request message. They define how information sent/received, the session verification and identification of the client (cookies, IP, user-agent) or their anonymity (VPN or proxy), how the server should handle data (Do-Not-Track)...
HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 155
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Server: BestServ (Unix) (Red-Hat/Linux)
ETag: "3f80f-1b6-3e1cb03b"
Accept-Ranges: bytes
Connection: close
<html>
<head>
<title>An Example Page</title>
</head>
<body>
<p>Exemple of a server response.</p>
</body>
</html>
Example of a complete configuration file (nginx.conf
inspiration):
server {
listen HOST:PORT
server_name SERVER_NAME
root ROOT
index INDEX
client_body MAX_CLIENT_BODY
methods METHOD1 METHOD2 ...
error_page NUM_ERROR ERROR_FILE
location *.php {
cgi_pass CGI
}
}
server {
listen HOST:PORT
server_name SERVER_NAME
root ROOT
index INDEX
client_body MAX_CLIENT_BODY
methods METHOD1 METHOD2 ...
location /DIRECTORY1 {
root ROOT
index INDEX
client_body MAX_CLIENT_BODY
methods METHOD1 METHOD2 ...
}
location /DIRECTORY2 {
root ROOT
index INDEX
client_body MAX_CLIENT_BODY
methods METHOD1 METHOD2 ...
autoindex
}
location /TOREDIR {
return REDIR_URL
}
}
CIG (Common Gateway Interface) enables web servers to execute an external program, for example to process user request.
Those programs requires additionnal informations (passed as environnement variables) to be executed. In return they provide all the informations needed by the server to respond to the client.
Our server should be able to specify which URLs should be handled by a specific CGI (cf location *.php { cgi_pass CGI_PATH }
blocks).
As mentionned in the subject, we can fork to execute the CGI.
execve(CGI_PATH, args, env);
where env
is filled as above.
Server specific variables :
SERVER_SOFTWARE
: name/version of HTTP server.SERVER_NAME
: host name of the server, may be dot-decimal IP address.GATEWAY_INTERFACE
: CGI/version.
Request specific variables :
SERVER_PROTOCOL
: HTTP/version.SERVER_PORT
: TCP port (decimal).REQUEST_METHOD
: name of HTTP method (see above)PATH_INFO
: path suffix, if appended to URL after program name and a slashPATH_TRANSLATED
: corresponding full path as supposed by server, ifPATH_INFO
is present.SCRIPT_NAME
: relative path to the program, like/cgi-bin/script.cgi
.QUERY_STRING
: the part of URL after?
characterREMOTE_HOST
: host name of the client, unset if server did not perform such lookupREMOTE_ADDR
: IP address of the client (dot-decimal)AUTH_TYPE
: identification type, if applicableREMOTE_USER
: used for certain AUTH_TYPEsREMOTE_IDENT
: see ident, only if server performed such lookup.CONTENT_TYPE
: Internet media type of input data ifPUT
orPOST
method are used, as provided via HTTP headerCONTENT_LENGTH
: similarly, size of input data (decimal, in octets) if provided via HTTP header- Variables passed by user agent (HTTP_ACCEPT, HTTP_ACCEPT_LANGUAGE, HTTP_USER_AGENT, HTTP_COOKIE and possibly others) contain values of corresponding HTTP headers and therefore have the same sense.
Convention : we should have a cgi-bin
directory in our root
Uploading files could be handled by CGI (for example php)