Files
42_INT_12_webserv/README.md
2022-08-17 18:21:55 +02:00

307 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## work together
---
## man
- **htons, htonl, ntohs, ntohl :** converts the unsigned short or integer argument between host byte order and network byte order
- **poll :** waits for one of a set of file descriptors to become ready to perform I/O
- alternatives : select, epoll (epoll_create, epoll_ctl, epoll_wait), kqueue (kqueue, kevent)
- **socket :** creates an endpoint for communication and returns a file descriptor that refers to that endpoint
- **listen :** marks a socket as a passive socket, that is, as a socket that will be used to accept incoming connection requests using accept()
- **accept :** used with connection-based socket types. It extracts the first connection request on the queue of pending connections for the listening socket, creates a new connected socket, and returns a new file descriptor referring to that socket. The newly created socket is not in the listening state. The original socket is unaffected by this call
- **send :** (~write) used to transmit a message to another socket. May be used only when the socket is in a connected state (so that the intended recipient is known). The only difference between send() and write() is the presence of flags. With a zero flags argument, send() is equivalent to write()
- **recv :** (~read) used to receive messages from a socket. May be used to receive data on both connectionless and connection-oriented sockets. The only difference between recv() and read() is the presence of flags. With a zero flags argument, recv() is generally equivalent to read()
- **bind :** associate a socket fd to a local address. When a socket is created with socket(), it exists in a name space (address family) but has no address assigned to it. It is normally necessary to assign a local address using bind() before a socket may receive connections (see accept())
- **connect :** connects a socket fd to a remote address
- **inet_addr :** converts the Internet host address cp from IPv4 numbers-and-dots notation into binary data in network byte order. Use of this function is problematic because in case of error it returns -1, wich is a valid address (255.255.255.255). Avoid its use in favor of inet_aton(), inet_pton(), or getaddrinfo()
- **setsockopt :** manipulate options for a socket fd. Options may exist at multiple protocol levels; they are always present at the uppermost socket level
- **getsockname :** returns the current address to which a socket fd is bound
- **fcntl :** manipulate an open fd, by performing some actions, like duplicate it or changing its flags
---
## todo
- [ ] read the RFC and do some tests with telnet and NGINX
#### parsing config
- [ ] Your program has to take a configuration file as argument, or use a default path.
- [ ] Choose the port and host of each server.
- [ ] Setup the server_names or not.
- [ ] The first server for a host:port will be the default for this host:port (that means it will answer to all the requests that dont belong to an other server).
- [ ] Setup default error pages.
- [ ] Limit client body size.
- [ ] Setup routes with one or multiple of the following rules/configuration (routes wont be using regexp):
- [ ] Define a list of accepted HTTP methods for the route.
- [ ] Define a HTTP redirection.
- [ ] Define a directory or a file from where the file should be searched (for example, if url /kapouet is rooted to /tmp/www, url /kapouet/pouic/toto/pouet is /tmp/www/pouic/toto/pouet).
- [ ] Turn on or off directory listing.
- [ ] Set a default file to answer if the request is a directory.
- [ ] Execute CGI based on certain file extension (for example .php).
- [ ] Make the route able to accept uploaded files and configure where they should be saved.
#### connection basic
- [ ] You cant execve another web server.
- [ ] Your server must never block and the client can be bounced properly if necessary.
- [ ] It must be non-blocking and use only 1 poll() (or equivalent) for all the I/O operations between the client and the server (listen included).
- [ ] poll() (or equivalent) must check read and write at the same time.
- [ ] You must never do a read or a write operation without going through poll() (or equivalent).
- [ ] Checking the value of errno is strictly forbidden after a read or a write operation.
- [ ] You dont need to use poll() (or equivalent) before reading your configuration file. Because you have to use non-blocking file descriptors, it is possible to use read/recv or write/send functions with no poll() (or equivalent), and your server wouldnt be blocking. But it would consume more system resources. Thus, if you try to read/recv or write/send in any file descriptor without using poll() (or equivalent), your grade will be 0.
- [ ] You can use every macro and define like FD_SET, FD_CLR, FD_ISSET, FD_ZERO (understanding what and how they do it is very useful).
- [ ] A request to your server should never hang forever.
- [ ] Your server must be compatible with the web browser of your choice.
#### parsing request HTTP (fields, ...)
- [ ] We will consider that NGINX is HTTP 1.1 compliant and may be used to compare headers and answer behaviors.
#### response HTTP (fields, ...)
- [ ] Your HTTP response status codes must be accurate.
- [ ] You server must have default error pages if none are provided.
- [ ] You cant use fork for something else than CGI (like PHP, or Python, and so forth).
- [ ] You must be able to serve a fully static website.
#### upload files
- [ ] Clients must be able to upload files.
#### CGI
- [ ] You need at least GET, POST, and DELETE methods.
- [ ] Do you wonder what a CGI is?
- [ ] Because you wont call the CGI directly, use the full path as PATH_INFO.
- [ ] Just remember that, for chunked request, your server needs to unchunked it and the CGI will expect EOF as end of the body.
- [ ] Same things for the output of the CGI. If no content_length is returned from the CGI, EOF will mark the end of the returned data.
- [ ] Your program should call the CGI with the file requested as first argument.
- [ ] The CGI should be run in the correct directory for relative path file access.
- [ ] Your server should work with one CGI (php-CGI, Python, and so forth).
#### write tests
- [ ] Stress tests your server. It must stay available at all cost.
- [ ] Do not test with only one program.
- [ ] Write your tests with a more convenient language such as Python or Golang, and so forth. Even in C or C++ if you want to
#### persistent connexion
- [ ] Your server must be able to listen to multiple ports (see Configuration file)
- [ ] Your server should never die.
---
## cgi rfc
[rfc 3875](https://www.rfc-editor.org/rfc/rfc3875)
#### output cgi script :
#### summary :
- the cgi-script will send back at least one header field followed by an empty line
- this header field will be one of three :
- "Content-Type"
- "Location"
- "Status"
- the cgi-script may send back more header fields
- the server must check and modify few things :
- there is no field duplicate (resolve conflicts)
- there is no space between filed name and ":"
- change all the '\n' by '\r\n'
- if no Location field && no Status field -> status code = 200
- handle Location field, either :
- local : start with '/' --> rerun the request with new uri
- client : start with '<scheme>:' --> send back status code 302
- there is at least one header field followed by '\r\n\r\n' :
- "Content-Type"
- "Location"
- "Status"
- if status field, change server status for this one
- to pass the body-message to the cgi-script, we write it into the temporary fd on which the script read it's standard input
[3.1: server responsabilities](https://www.rfc-editor.org/rfc/rfc3875#section-3.1)
- The server [...] receives the request from the client
- selects a CGI script to handle the request
- converts the client request to a CGI request
- executes the script and converts the CGI response into a response for the client
[3.3: script uri](https://www.rfc-editor.org/rfc/rfc3875#section-3.3)
- the 'Script-URI' [...] MUST have the property that if the client had accessed this URI instead, then the script would have been executed
[4: how the server prepare the cgi requests](https://www.rfc-editor.org/rfc/rfc3875#section-4)
- the cgi receives 2 differents set of informations :
- the request meta-variables (in UNIX, by env variables)
- and the message-body
[4.1: request meta-variables](https://www.rfc-editor.org/rfc/rfc3875#section-4.1)
- a header field that spans multiple lines MUST be merged onto a single line
[4.2: request message-body](https://www.rfc-editor.org/rfc/rfc3875#section-4.2)
- unless defined otherwise, the script access request data by reading stdin
[6: how the response from the script is returned to the server](https://www.rfc-editor.org/rfc/rfc3875#section-6)
- The response comprises 2 parts, separated by a blank line :
- a message-header
- and a message-body
- The message-header contains one or more header fields
- The body may be NULL
[6.2: responses types](https://www.rfc-editor.org/rfc/rfc3875#section-6.2)
- four types of responses :
- document response
- local redirect response
- client redirect response
- client redirect response with document
- document response :
- it must return a Content-Type header field
- a Status-Header field is optional (200 is assumed if omited)
- the server must check the cgi-script output, and modifie it to comply with the protocol version
- local redirect response :
- it must return only a Location field
- it contains a local path URI and query string ('local-pathquery')
- the local path URI must start with a "/"
- the server must generate the response for this local-pathquery
- client redirect response :
- it must return only a Location field
- it contains an absolute URI path, to indicate the client that it should reprocess the request with this URI
- the absolute URI always start with the name of scheme followed by ":"
- the http server must generate a 302 'Found' message
- client redirect response with document
- it must return a Location field with an absolute URI path
- it must return the Status header field, with a value of 302 'Found'
- the server must check the cgi-script output, and modifie it to comply with the protocol version
[6.3: cgi header fields](https://www.rfc-editor.org/rfc/rfc3875#section-6.3)
- whitespace is permitted between the ":" and the field-value
- but not between the field-name and the ":"
- the CGI script can set three differents fields :
- Content-Type
- Location
- Status
- Content-Type :
- if there is a body in the response, a Content-Type field must be present
- if there is no Content-Type, the server must not attempt to determine one
- Location :
- the local URI path must be an absolut path, not a relative path, nor NULL
- the local URI path must, then, start with "/"
- the absolut URI start with "<name-of-scheme>:"
- Status :
- a 3-digit integer code
- 4 standards :
- 200 'OK' indicates success, it's the default value
- 302 'Found' with Location header and response message-body
- 400 'Bad Request' an unknown request format, like missing CONTENT-TYPE
- 501 'Not Implemented' the script received unsupported REQUEST-METHOD
- construction: `Status:400 "explication of the error"\n`
- the cgi-script can return other header fields, concerning the response message
- the server must translate cgi-headers syntax into http-header syntax
- for exemple, newline can be encoded in different ways
- the cgi-script must not return header fields concerning client-side communication
- the server can remove such fields
- (not sure : https://www.rfc-editor.org/rfc/rfc3875#section-6.3.4)
- the server must resolve conflicts between script-header fields and themselves
[6.3: cgi message body](https://www.rfc-editor.org/rfc/rfc3875#section-6.4)
- the server must read it untill EOF
- the server must not modify it, except to convert charset if needed
[7 and 8: usefull informations about implementation and security](https://www.rfc-editor.org/rfc/rfc3875#section-7)
---
## cgi env variables
[cgi env variables](https://www.rfc-editor.org/rfc/rfc3875#section-4.1)
[wikipedia variables environnements cgi](https://fr.wikipedia.org/wiki/Variables_d%27environnement_CGI)
[cgi server variables on adobe](https://helpx.adobe.com/coldfusion/cfml-reference/reserved-words-and-variables/cgi-environment-cgi-scope-variables/cgi-server-variables.html)
```None
AUTH_TYPE : if the srcipt is protected, the authentification method used to validate the user
CONTENT_LENGTH : length of the request body-message
CONTENT_TYPE : (Content-Type field) if there is attached information, as with method POST or PUT, this is the content type of the data (e.g. "text/plain", it is set by the attribute "enctype" in html <form> as three values : "application/x-www-form-urlencoded", "multipart/form-data", "text/plain")
GATEWAY_INTERFACE : CGI version (e.g. CGI/1.1)
PATH_INFO : if any, path of the resquest in addition to the cgi script path (e.g. for cgi script path = "/usr/web/cgi-bin/script.cgi", and the url = "http://server.org/cgi-bin/script.cgi/house", the PATH-INFO would be "house")
PATH_TRANSLATED : full path of the request, like path-to-cgi/PATH_INFO, null if PATH_INFO is null (e.g. for "http://server.org/cgi-bin/prog/the/path", PATH_INFO would be : "/the/path" and PATH_TRANSLATED would be : "/usr/web/cgi-bin/prog/the/path")
QUERY_STRING : everything following the ? in the url sent by client (e.g. for url "http://server.org/query?var1=val2&var2=val2", it would be : "var1=val2&var2=val2")
REMOTE_ADDR : ip address of the client
REMOTE_HOST : host name of the client, empty if not known, or equal to REMOTE_ADDR
REMOTE_IDENT : if known, username of the client, otherwise empty, use for logging only
REMOTE_USER : username of client, if script is protected and the server support user authentification
REQUEST_METHOD : method used for the request (for http, usually POST or GET)
SCRIPT_NAME : path to the cgi, relative to the root, used for self-referencing URLs (e.g. "/cgi-bin/script.cgi")
SERVER_NAME : name of the server, as hostname, IP address, or DNS (e.g. dns : "www.server.org")
SERVER_PORT : the port number your server is listening on (e.g. 80)
SERVER_PROTOCOL : protocol used for the request (e.g. HTTP/1.1)
SERVER_SOFTWARE : the server software you're using (e.g. Apache 1.3)
```
[redirect status for php-cgi](https://woozle.org/papers/php-cgi.html)
```None
REDIRECT_STATUS : for exemple, 200
```
g 50 34 48
p 30 23 32
l 20 14 20
71
---
## http status
[rfc 2616](https://datatracker.ietf.org/doc/html/rfc2616#section-10)
#### Informational
- 100 Continue
- 101 Switching Protocols
#### Successful
- 200 OK
- 201 Created
- 202 Accepted
- 203 Non-Authoritative Information
- 204 No Content
- 205 Reset Content
- 206 Partial Content
#### Redirection
- 300 Multiple Choices
- 301 Moved Permanently
- 302 Found
- 303 See Other
- 304 Not Modified
- 305 Use Proxy
- 306 (Unused)
- 307 Temporary Redirect
#### Client Error
- 400 Bad Request
- 401 Unauthorized
- 402 Payment Required
- 403 Forbidden
- 404 Not Found
- 405 Method Not Allowed
- 406 Not Acceptable
- 407 Proxy Authentication Required
- 408 Request Timeout
- 409 Conflict
- 410 Gone
- 411 Length Required
- 412 Precondition Failed
- 413 Request Entity Too Large
- 414 Request-URI Too Long
- 415 Unsupported Media Type
- 416 Requested Range Not Satisfiable
- 417 Expectation Failed
#### Server Error
- 500 Internal Server Error
- 501 Not Implemented
- 502 Bad Gateway
- 503 Service Unavailable
- 504 Gateway Timeout
- 505 HTTP Version Not Supported
---
## ressources
- [correction](https://github.com/AliMaskar96/42-Correction-Sheets/blob/master/ng_5_webserv.pdf)
- [create an http server](https://medium.com/from-the-scratch/http-server-what-do-you-need-to-know-to-build-a-simple-http-server-from-scratch-d1ef8945e4fa)
- [guide to network programming](https://beej.us/guide/bgnet/)
- [same, translated in french](http://vidalc.chez.com/lf/socket.html)
- [bind() vs connect()](https://stackoverflow.com/questions/27014955/socket-connect-vs-bind)
- [INADDR_ANY for bind](https://stackoverflow.com/questions/16508685/understanding-inaddr-any-for-socket-programming)
- [hack with CGI](https://www.youtube.com/watch?v=ph6-AKByBU4)
- [http headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers)
- [list of http headers fields](https://en.wikipedia.org/wiki/List_of_HTTP_header_fields)
- [http request ibm](https://www.ibm.com/docs/en/cics-ts/5.3?topic=protocol-http-requests)
- [http request other](https://www.tutorialspoint.com/http/http_requests.htm)
- [request line uri](https://stackoverflow.com/questions/40311306/when-is-absoluteuri-used-from-the-http-request-specs)