## work together #### TODO - `_is_cgi()` and `_fill_cgi_path()` - `_cgi_output()` change status in `client->status` - two cgi tests : ? - a basic form with "name" and "something", that return a html page with that ? - for GET and POST ? - a script called by a file extension in URI #### output cgi script : ! TODO : change all the '\n' by '\r\n' ! TODO : there is at least one header field followed by '\r\n\r\n' : - "Content-Type" - "Location" - "Status" ! TODO : there is no space between filed name and ":" ! TODO?: handle Location field, either : - local : start with '/' --> rerun the request with new uri - client : start with ':' --> send back status code 302 -> TODO : there is no field duplicate (resolve conflicts) -> TODO : if status field, change server status for this one -> TODO : if no Location field && no Status field -> status code = 200 #### questions - in client.cpp i fill the port, is there a default one in case it's not in the request ? - timeout server but still works ? - path contains double "//" from `Webserv::_get()` in response.cpp - cgi path ? defined in config ? and root path ? : - `Client.cpp : fill_script_path()` - `cgi.cpp : is_cgi()` - `cgi.cpp : set_env()` - what if the uri contains a php file, and the config said php must be handled by cgi, but the path to this php in the uri is wrong ? - is it ok ? `http://my_site.com/cgi-bin/php-cgi` (real path) - is it ok ? `http://my_site.com/php-cgi` (reconstruct path ?) - is it ok ? `http://my_site.com/something/php-cgi` (what about 'something' ?) - is it ok ? `http://my_site.com/something/cgi-bin/php-cgi` (real path with 'something' before ? ) - I don't save the STDIN and STDOUT before dup2 in child process, is it wrong ? - the response page is received long after the cgi-script is done, why ? #### notifications - i changed the Client getters in two categories : - getters for requests infos : `get_rq_` - getters for client sides infos : `get_cl_` (such as ip of client) - i changed the variables in request struct in Client : - `path` become `uri` (ex. `/path/to/file?var=val`) - add `abs_path` (ex. `/path/to/file` ) - add `query` (ex. `var=val`) - the header fields names, as key in map, are stored in lowercase, and getters are case-insensitives respsonse.cpp ``` _response() { _determine_process_server() _send_response() { _append_base_headers() _construct_response() { _process_method() { _get() { _exec_cgi() } } _insert_status_line() ::send(headers) ::send(body) } } } ``` --- ## man - **htons, htonl, ntohs, ntohl :** converts the unsigned short or integer argument between host byte order and network byte order - **poll :** waits for one of a set of file descriptors to become ready to perform I/O - alternatives : select, epoll (epoll_create, epoll_ctl, epoll_wait), kqueue (kqueue, kevent) - **socket :** creates an endpoint for communication and returns a file descriptor that refers to that endpoint - **listen :** marks a socket as a passive socket, that is, as a socket that will be used to accept incoming connection requests using accept() - **accept :** used with connection-based socket types. It extracts the first connection request on the queue of pending connections for the listening socket, creates a new connected socket, and returns a new file descriptor referring to that socket. The newly created socket is not in the listening state. The original socket is unaffected by this call - **send :** (~write) used to transmit a message to another socket. May be used only when the socket is in a connected state (so that the intended recipient is known). The only difference between send() and write() is the presence of flags. With a zero flags argument, send() is equivalent to write() - **recv :** (~read) used to receive messages from a socket. May be used to receive data on both connectionless and connection-oriented sockets. The only difference between recv() and read() is the presence of flags. With a zero flags argument, recv() is generally equivalent to read() - **bind :** associate a socket fd to a local address. When a socket is created with socket(), it exists in a name space (address family) but has no address assigned to it. It is normally necessary to assign a local address using bind() before a socket may receive connections (see accept()) - **connect :** connects a socket fd to a remote address - **inet_addr :** converts the Internet host address cp from IPv4 numbers-and-dots notation into binary data in network byte order. Use of this function is problematic because in case of error it returns -1, wich is a valid address (255.255.255.255). Avoid its use in favor of inet_aton(), inet_pton(), or getaddrinfo() - **setsockopt :** manipulate options for a socket fd. Options may exist at multiple protocol levels; they are always present at the uppermost socket level - **getsockname :** returns the current address to which a socket fd is bound - **fcntl :** manipulate an open fd, by performing some actions, like duplicate it or changing its flags --- ## todo - [ ] read the RFC and do some tests with telnet and NGINX #### parsing config - [ ] Your program has to take a configuration file as argument, or use a default path. - [ ] Choose the port and host of each ’server’. - [ ] Setup the server_names or not. - [ ] The first server for a host:port will be the default for this host:port (that means it will answer to all the requests that don’t belong to an other server). - [ ] Setup default error pages. - [ ] Limit client body size. - [ ] Setup routes with one or multiple of the following rules/configuration (routes wont be using regexp): - [ ] Define a list of accepted HTTP methods for the route. - [ ] Define a HTTP redirection. - [ ] Define a directory or a file from where the file should be searched (for example, if url /kapouet is rooted to /tmp/www, url /kapouet/pouic/toto/pouet is /tmp/www/pouic/toto/pouet). - [ ] Turn on or off directory listing. - [ ] Set a default file to answer if the request is a directory. - [ ] Execute CGI based on certain file extension (for example .php). - [ ] Make the route able to accept uploaded files and configure where they should be saved. #### connection basic - [ ] You can’t execve another web server. - [ ] Your server must never block and the client can be bounced properly if necessary. - [ ] It must be non-blocking and use only 1 poll() (or equivalent) for all the I/O operations between the client and the server (listen included). - [ ] poll() (or equivalent) must check read and write at the same time. - [ ] You must never do a read or a write operation without going through poll() (or equivalent). - [ ] Checking the value of errno is strictly forbidden after a read or a write operation. - [ ] You don’t need to use poll() (or equivalent) before reading your configuration file. Because you have to use non-blocking file descriptors, it is possible to use read/recv or write/send functions with no poll() (or equivalent), and your server wouldn’t be blocking. But it would consume more system resources. Thus, if you try to read/recv or write/send in any file descriptor without using poll() (or equivalent), your grade will be 0. - [ ] You can use every macro and define like FD_SET, FD_CLR, FD_ISSET, FD_ZERO (understanding what and how they do it is very useful). - [ ] A request to your server should never hang forever. - [ ] Your server must be compatible with the web browser of your choice. #### parsing request HTTP (fields, ...) - [ ] We will consider that NGINX is HTTP 1.1 compliant and may be used to compare headers and answer behaviors. #### response HTTP (fields, ...) - [ ] Your HTTP response status codes must be accurate. - [ ] You server must have default error pages if none are provided. - [ ] You can’t use fork for something else than CGI (like PHP, or Python, and so forth). - [ ] You must be able to serve a fully static website. #### upload files - [ ] Clients must be able to upload files. #### CGI - [ ] You need at least GET, POST, and DELETE methods. - [ ] Do you wonder what a CGI is? - [ ] Because you won’t call the CGI directly, use the full path as PATH_INFO. - [ ] Just remember that, for chunked request, your server needs to unchunked it and the CGI will expect EOF as end of the body. - [ ] Same things for the output of the CGI. If no content_length is returned from the CGI, EOF will mark the end of the returned data. - [ ] Your program should call the CGI with the file requested as first argument. - [ ] The CGI should be run in the correct directory for relative path file access. - [ ] Your server should work with one CGI (php-CGI, Python, and so forth). #### write tests - [ ] Stress tests your server. It must stay available at all cost. - [ ] Do not test with only one program. - [ ] Write your tests with a more convenient language such as Python or Golang, and so forth. Even in C or C++ if you want to #### persistent connexion - [ ] Your server must be able to listen to multiple ports (see Configuration file) - [ ] Your server should never die. --- ## cgi rfc [rfc 3875](https://www.rfc-editor.org/rfc/rfc3875) #### summary : - the cgi-script will send back at least one header field followed by an empty line - this header field will be one of three : - "Content-Type" - "Location" - "Status" - the cgi-script may send back more header fields - the server must check and modify few things : - there is no duplicate in headers fields (if there is, resolve conflict) - there is no space between the field name and the ":" - the newlines are of form "\r\n", and not "\n" only - if the location field is not present, then if the status field is not present either, then the status code is 200 - the cgi-script can return a location field, of two types : - local redirection : start with a "/", the server must answer as if this was the request uri - client redirection : start with ":", the server must send back a status 302 with this uri to the client - to pass the body-message to the cgi-script, we write it into the temporary fd on which the script read it's standard input [3.1: server responsabilities](https://www.rfc-editor.org/rfc/rfc3875#section-3.1) - The server [...] receives the request from the client - selects a CGI script to handle the request - converts the client request to a CGI request - executes the script and converts the CGI response into a response for the client [3.3: script uri](https://www.rfc-editor.org/rfc/rfc3875#section-3.3) - the 'Script-URI' [...] MUST have the property that if the client had accessed this URI instead, then the script would have been executed [4: how the server prepare the cgi requests](https://www.rfc-editor.org/rfc/rfc3875#section-4) - the cgi receives 2 differents set of informations : - the request meta-variables (in UNIX, by env variables) - and the message-body [4.1: request meta-variables](https://www.rfc-editor.org/rfc/rfc3875#section-4.1) - a header field that spans multiple lines MUST be merged onto a single line [4.2: request message-body](https://www.rfc-editor.org/rfc/rfc3875#section-4.2) - unless defined otherwise, the script access request data by reading stdin [6: how the response from the script is returned to the server](https://www.rfc-editor.org/rfc/rfc3875#section-6) - The response comprises 2 parts, separated by a blank line : - a message-header - and a message-body - The message-header contains one or more header fields - The body may be NULL [6.2: responses types](https://www.rfc-editor.org/rfc/rfc3875#section-6.2) - four types of responses : - document response - local redirect response - client redirect response - client redirect response with document - document response : - it must return a Content-Type header field - a Status-Header field is optional (200 is assumed if omited) - the server must check the cgi-script output, and modifie it to comply with the protocol version - local redirect response : - it must return only a Location field - it contains a local path URI and query string ('local-pathquery') - the local path URI must start with a "/" - the server must generate the response for this local-pathquery - client redirect response : - it must return only a Location field - it contains an absolute URI path, to indicate the client that it should reprocess the request with this URI - the absolute URI always start with the name of scheme followed by ":" - the http server must generate a 302 'Found' message - client redirect response with document - it must return a Location field with an absolute URI path - it must return the Status header field, with a value of 302 'Found' - the server must check the cgi-script output, and modifie it to comply with the protocol version [6.3: cgi header fields](https://www.rfc-editor.org/rfc/rfc3875#section-6.3) - whitespace is permitted between the ":" and the field-value - but not between the field-name and the ":" - the CGI script can set three differents fields : - Content-Type - Location - Status - Content-Type : - if there is a body in the response, a Content-Type field must be present - if there is no Content-Type, the server must not attempt to determine one - Location : - the local URI path must be an absolut path, not a relative path, nor NULL - the local URI path must, then, start with "/" - the absolut URI start with ":" - Status : - a 3-digit integer code - 4 standards : - 200 'OK' indicates success, it's the default value - 302 'Found' with Location header and response message-body - 400 'Bad Request' an unknown request format, like missing CONTENT-TYPE - 501 'Not Implemented' the script received unsupported REQUEST-METHOD - construction: `Status:400 "explication of the error"\n` - the cgi-script can return other header fields, concerning the response message - the server must translate cgi-headers syntax into http-header syntax - for exemple, newline can be encoded in different ways - the cgi-script must not return header fields concerning client-side communication - the server can remove such fields - (not sure : https://www.rfc-editor.org/rfc/rfc3875#section-6.3.4) - the server must resolve conflicts between script-header fields and themselves [6.3: cgi message body](https://www.rfc-editor.org/rfc/rfc3875#section-6.4) - the server must read it untill EOF - the server must not modify it, except to convert charset if needed [7 and 8: usefull informations about implementation and security](https://www.rfc-editor.org/rfc/rfc3875#section-7) --- ## cgi env variables [cgi env variables](https://www.rfc-editor.org/rfc/rfc3875#section-4.1) [wikipedia variables environnements cgi](https://fr.wikipedia.org/wiki/Variables_d%27environnement_CGI) [cgi server variables on adobe](https://helpx.adobe.com/coldfusion/cfml-reference/reserved-words-and-variables/cgi-environment-cgi-scope-variables/cgi-server-variables.html) ```None AUTH_TYPE : if the srcipt is protected, the authentification method used to validate the user CONTENT_LENGTH : length of the request body-message CONTENT_TYPE : (Content-Type field) if there is attached information, as with method POST or PUT, this is the content type of the data (e.g. "text/plain", it is set by the attribute "enctype" in html
as three values : "application/x-www-form-urlencoded", "multipart/form-data", "text/plain") GATEWAY_INTERFACE : CGI version (e.g. CGI/1.1) PATH_INFO : if any, path of the resquest in addition to the cgi script path (e.g. for cgi script path = "/usr/web/cgi-bin/script.cgi", and the url = "http://server.org/cgi-bin/script.cgi/house", the PATH-INFO would be "house") PATH_TRANSLATED : full path of the request, like path-to-cgi/PATH_INFO, null if PATH_INFO is null (e.g. for "http://server.org/cgi-bin/prog/the/path", PATH_INFO would be : "/the/path" and PATH_TRANSLATED would be : "/usr/web/cgi-bin/prog/the/path") QUERY_STRING : everything following the ? in the url sent by client (e.g. for url "http://server.org/query?var1=val2&var2=val2", it would be : "var1=val2&var2=val2") REMOTE_ADDR : ip address of the client REMOTE_HOST : host name of the client, empty if not known, or equal to REMOTE_ADDR REMOTE_IDENT : if known, username of the client, otherwise empty, use for logging only REMOTE_USER : username of client, if script is protected and the server support user authentification REQUEST_METHOD : method used for the request (for http, usually POST or GET) SCRIPT_NAME : path to the cgi, relative to the root, used for self-referencing URLs (e.g. "/cgi-bin/script.cgi") SERVER_NAME : name of the server, as hostname, IP address, or DNS (e.g. dns : "www.server.org") SERVER_PORT : the port number your server is listening on (e.g. 80) SERVER_PROTOCOL : protocol used for the request (e.g. HTTP/1.1) SERVER_SOFTWARE : the server software you're using (e.g. Apache 1.3) ``` [redirect status for php-cgi](https://woozle.org/papers/php-cgi.html) ```None REDIRECT_STATUS : for exemple, 200 ``` --- ## ressources - [correction](https://github.com/AliMaskar96/42-Correction-Sheets/blob/master/ng_5_webserv.pdf) - [create an http server](https://medium.com/from-the-scratch/http-server-what-do-you-need-to-know-to-build-a-simple-http-server-from-scratch-d1ef8945e4fa) - [guide to network programming](https://beej.us/guide/bgnet/) - [same, translated in french](http://vidalc.chez.com/lf/socket.html) - [bind() vs connect()](https://stackoverflow.com/questions/27014955/socket-connect-vs-bind) - [INADDR_ANY for bind](https://stackoverflow.com/questions/16508685/understanding-inaddr-any-for-socket-programming) - [hack with CGI](https://www.youtube.com/watch?v=ph6-AKByBU4) - [http headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers) - [list of http headers fields](https://en.wikipedia.org/wiki/List_of_HTTP_header_fields) - [http request ibm](https://www.ibm.com/docs/en/cics-ts/5.3?topic=protocol-http-requests) - [http request other](https://www.tutorialspoint.com/http/http_requests.htm) - [request line uri](https://stackoverflow.com/questions/40311306/when-is-absoluteuri-used-from-the-http-request-specs)