2022-08-17 19:45:37 +02:00
2022-08-13 22:38:31 +02:00
2022-08-14 21:26:57 +02:00
2022-08-17 19:45:37 +02:00
2022-08-08 18:06:41 +02:00
2022-08-17 19:45:37 +02:00
2022-08-17 16:18:37 +02:00
2022-07-12 00:51:43 +02:00
2022-08-17 19:45:37 +02:00
2022-08-16 02:04:35 +02:00
2022-08-17 18:21:55 +02:00
2022-08-12 05:50:00 +02:00
2022-08-16 23:54:30 +02:00

work together


man

  • htons, htonl, ntohs, ntohl : converts the unsigned short or integer argument between host byte order and network byte order
  • poll : waits for one of a set of file descriptors to become ready to perform I/O
    • alternatives : select, epoll (epoll_create, epoll_ctl, epoll_wait), kqueue (kqueue, kevent)
  • socket : creates an endpoint for communication and returns a file descriptor that refers to that endpoint
  • listen : marks a socket as a passive socket, that is, as a socket that will be used to accept incoming connection requests using accept()
  • accept : used with connection-based socket types. It extracts the first connection request on the queue of pending connections for the listening socket, creates a new connected socket, and returns a new file descriptor referring to that socket. The newly created socket is not in the listening state. The original socket is unaffected by this call
  • send : (~write) used to transmit a message to another socket. May be used only when the socket is in a connected state (so that the intended recipient is known). The only difference between send() and write() is the presence of flags. With a zero flags argument, send() is equivalent to write()
  • recv : (~read) used to receive messages from a socket. May be used to receive data on both connectionless and connection-oriented sockets. The only difference between recv() and read() is the presence of flags. With a zero flags argument, recv() is generally equivalent to read()
  • bind : associate a socket fd to a local address. When a socket is created with socket(), it exists in a name space (address family) but has no address assigned to it. It is normally necessary to assign a local address using bind() before a socket may receive connections (see accept())
  • connect : connects a socket fd to a remote address
  • inet_addr : converts the Internet host address cp from IPv4 numbers-and-dots notation into binary data in network byte order. Use of this function is problematic because in case of error it returns -1, wich is a valid address (255.255.255.255). Avoid its use in favor of inet_aton(), inet_pton(), or getaddrinfo()
  • setsockopt : manipulate options for a socket fd. Options may exist at multiple protocol levels; they are always present at the uppermost socket level
  • getsockname : returns the current address to which a socket fd is bound
  • fcntl : manipulate an open fd, by performing some actions, like duplicate it or changing its flags

todo

  • read the RFC and do some tests with telnet and NGINX

parsing config

  • Your program has to take a configuration file as argument, or use a default path.
  • Choose the port and host of each server.
  • Setup the server_names or not.
  • The first server for a host:port will be the default for this host:port (that means it will answer to all the requests that dont belong to an other server).
  • Setup default error pages.
  • Limit client body size.
  • Setup routes with one or multiple of the following rules/configuration (routes wont be using regexp):
    • Define a list of accepted HTTP methods for the route.
    • Define a HTTP redirection.
    • Define a directory or a file from where the file should be searched (for example, if url /kapouet is rooted to /tmp/www, url /kapouet/pouic/toto/pouet is /tmp/www/pouic/toto/pouet).
    • Turn on or off directory listing.
    • Set a default file to answer if the request is a directory.
    • Execute CGI based on certain file extension (for example .php).
    • Make the route able to accept uploaded files and configure where they should be saved.

connection basic

  • You cant execve another web server.
  • Your server must never block and the client can be bounced properly if necessary.
  • It must be non-blocking and use only 1 poll() (or equivalent) for all the I/O operations between the client and the server (listen included).
  • poll() (or equivalent) must check read and write at the same time.
  • You must never do a read or a write operation without going through poll() (or equivalent).
  • Checking the value of errno is strictly forbidden after a read or a write operation.
  • You dont need to use poll() (or equivalent) before reading your configuration file. Because you have to use non-blocking file descriptors, it is possible to use read/recv or write/send functions with no poll() (or equivalent), and your server wouldnt be blocking. But it would consume more system resources. Thus, if you try to read/recv or write/send in any file descriptor without using poll() (or equivalent), your grade will be 0.
  • You can use every macro and define like FD_SET, FD_CLR, FD_ISSET, FD_ZERO (understanding what and how they do it is very useful).
  • A request to your server should never hang forever.
  • Your server must be compatible with the web browser of your choice.

parsing request HTTP (fields, ...)

  • We will consider that NGINX is HTTP 1.1 compliant and may be used to compare headers and answer behaviors.

response HTTP (fields, ...)

  • Your HTTP response status codes must be accurate.
  • You server must have default error pages if none are provided.
  • You cant use fork for something else than CGI (like PHP, or Python, and so forth).
  • You must be able to serve a fully static website.

upload files

  • Clients must be able to upload files.

CGI

  • You need at least GET, POST, and DELETE methods.
  • Do you wonder what a CGI is?
  • Because you wont call the CGI directly, use the full path as PATH_INFO.
  • Just remember that, for chunked request, your server needs to unchunked it and the CGI will expect EOF as end of the body.
  • Same things for the output of the CGI. If no content_length is returned from the CGI, EOF will mark the end of the returned data.
  • Your program should call the CGI with the file requested as first argument.
  • The CGI should be run in the correct directory for relative path file access.
  • Your server should work with one CGI (php-CGI, Python, and so forth).

write tests

  • Stress tests your server. It must stay available at all cost.
  • Do not test with only one program.
  • Write your tests with a more convenient language such as Python or Golang, and so forth. Even in C or C++ if you want to

persistent connexion

  • Your server must be able to listen to multiple ports (see Configuration file)
  • Your server should never die.

cgi rfc

rfc 3875

output cgi script :

summary :

  • the cgi-script will send back at least one header field followed by an empty line
  • this header field will be one of three :
    • "Content-Type"
    • "Location"
    • "Status"
  • the cgi-script may send back more header fields
  • the server must check and modify few things :
    • there is no field duplicate (resolve conflicts)
    • there is no space between filed name and ":"
    • change all the '\n' by '\r\n'
    • if no Location field && no Status field -> status code = 200
    • handle Location field, either :
      • local : start with '/' --> rerun the request with new uri
      • client : start with ':' --> send back status code 302
    • there is at least one header field followed by '\r\n\r\n' :
      • "Content-Type"
      • "Location"
      • "Status"
    • if status field, change server status for this one
  • to pass the body-message to the cgi-script, we write it into the temporary fd on which the script read it's standard input

3.1: server responsabilities

  • The server [...] receives the request from the client
  • selects a CGI script to handle the request
  • converts the client request to a CGI request
  • executes the script and converts the CGI response into a response for the client

3.3: script uri

  • the 'Script-URI' [...] MUST have the property that if the client had accessed this URI instead, then the script would have been executed

4: how the server prepare the cgi requests

  • the cgi receives 2 differents set of informations :
    • the request meta-variables (in UNIX, by env variables)
    • and the message-body

4.1: request meta-variables

  • a header field that spans multiple lines MUST be merged onto a single line

4.2: request message-body

  • unless defined otherwise, the script access request data by reading stdin

6: how the response from the script is returned to the server

  • The response comprises 2 parts, separated by a blank line :
    • a message-header
    • and a message-body
  • The message-header contains one or more header fields
  • The body may be NULL

6.2: responses types

  • four types of responses :
    • document response
    • local redirect response
    • client redirect response
    • client redirect response with document
  • document response :
    • it must return a Content-Type header field
    • a Status-Header field is optional (200 is assumed if omited)
    • the server must check the cgi-script output, and modifie it to comply with the protocol version
  • local redirect response :
    • it must return only a Location field
    • it contains a local path URI and query string ('local-pathquery')
    • the local path URI must start with a "/"
    • the server must generate the response for this local-pathquery
  • client redirect response :
    • it must return only a Location field
    • it contains an absolute URI path, to indicate the client that it should reprocess the request with this URI
    • the absolute URI always start with the name of scheme followed by ":"
    • the http server must generate a 302 'Found' message
  • client redirect response with document
    • it must return a Location field with an absolute URI path
    • it must return the Status header field, with a value of 302 'Found'
    • the server must check the cgi-script output, and modifie it to comply with the protocol version

6.3: cgi header fields

  • whitespace is permitted between the ":" and the field-value
  • but not between the field-name and the ":"
  • the CGI script can set three differents fields :
    • Content-Type
    • Location
    • Status
  • Content-Type :
    • if there is a body in the response, a Content-Type field must be present
    • if there is no Content-Type, the server must not attempt to determine one
  • Location :
    • the local URI path must be an absolut path, not a relative path, nor NULL
    • the local URI path must, then, start with "/"
    • the absolut URI start with ":"
  • Status :
    • a 3-digit integer code
    • 4 standards :
      • 200 'OK' indicates success, it's the default value
      • 302 'Found' with Location header and response message-body
      • 400 'Bad Request' an unknown request format, like missing CONTENT-TYPE
      • 501 'Not Implemented' the script received unsupported REQUEST-METHOD
    • construction: Status:400 "explication of the error"\n
  • the cgi-script can return other header fields, concerning the response message
    • the server must translate cgi-headers syntax into http-header syntax
    • for exemple, newline can be encoded in different ways
  • the cgi-script must not return header fields concerning client-side communication

6.3: cgi message body

  • the server must read it untill EOF
  • the server must not modify it, except to convert charset if needed

7 and 8: usefull informations about implementation and security


cgi env variables

cgi env variables wikipedia variables environnements cgi cgi server variables on adobe

AUTH_TYPE			: if the srcipt is protected, the authentification method used to validate the user
CONTENT_LENGTH		: length of the request body-message
CONTENT_TYPE		: (Content-Type field) if there is attached information, as with method POST or PUT, this is the content type of the data (e.g. "text/plain", it is set by the attribute "enctype" in html <form> as three values : "application/x-www-form-urlencoded", "multipart/form-data", "text/plain")
GATEWAY_INTERFACE	: CGI version (e.g. CGI/1.1)
PATH_INFO			: if any, path of the resquest in addition to the cgi script path (e.g. for cgi script path = "/usr/web/cgi-bin/script.cgi", and the url = "http://server.org/cgi-bin/script.cgi/house", the PATH-INFO would be "house")
PATH_TRANSLATED		: full path of the request, like path-to-cgi/PATH_INFO, null if PATH_INFO is null (e.g. for "http://server.org/cgi-bin/prog/the/path", PATH_INFO would be : "/the/path" and PATH_TRANSLATED would be : "/usr/web/cgi-bin/prog/the/path")
QUERY_STRING		: everything following the ? in the url sent by client (e.g. for url "http://server.org/query?var1=val2&var2=val2", it would be : "var1=val2&var2=val2")
REMOTE_ADDR			: ip address of the client
REMOTE_HOST			: host name of the client, empty if not known, or equal to REMOTE_ADDR
REMOTE_IDENT		: if known, username of the client, otherwise empty, use for logging only
REMOTE_USER			: username of client, if script is protected and the server support user authentification
REQUEST_METHOD		: method used for the request (for http, usually POST or GET)
SCRIPT_NAME			: path to the cgi, relative to the root, used for self-referencing URLs (e.g. "/cgi-bin/script.cgi")
SERVER_NAME			: name of the server, as hostname, IP address, or DNS (e.g. dns : "www.server.org")
SERVER_PORT			: the port number your server is listening on (e.g. 80)
SERVER_PROTOCOL		: protocol used for the request (e.g. HTTP/1.1)
SERVER_SOFTWARE		: the server software you're using (e.g. Apache 1.3)

redirect status for php-cgi

REDIRECT_STATUS		: for exemple, 200

g 50 34 48 p 30 23 32 l 20 14 20 71


http status

rfc 2616

Informational

  • 100 Continue
  • 101 Switching Protocols

Successful

  • 200 OK
  • 201 Created
  • 202 Accepted
  • 203 Non-Authoritative Information
  • 204 No Content
  • 205 Reset Content
  • 206 Partial Content

Redirection

  • 300 Multiple Choices
  • 301 Moved Permanently
  • 302 Found
  • 303 See Other
  • 304 Not Modified
  • 305 Use Proxy
  • 306 (Unused)
  • 307 Temporary Redirect

Client Error

  • 400 Bad Request
  • 401 Unauthorized
  • 402 Payment Required
  • 403 Forbidden
  • 404 Not Found
  • 405 Method Not Allowed
  • 406 Not Acceptable
  • 407 Proxy Authentication Required
  • 408 Request Timeout
  • 409 Conflict
  • 410 Gone
  • 411 Length Required
  • 412 Precondition Failed
  • 413 Request Entity Too Large
  • 414 Request-URI Too Long
  • 415 Unsupported Media Type
  • 416 Requested Range Not Satisfiable
  • 417 Expectation Failed

Server Error

  • 500 Internal Server Error
  • 501 Not Implemented
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Timeout
  • 505 HTTP Version Not Supported

ressources

Description
No description provided
Readme 15 MiB
Languages
C++ 55.9%
HTML 26.2%
Shell 6.8%
CSS 5.9%
PHP 2.8%
Other 2.4%