Skip to content

CGI (Common Gateway Interface)

NHariman edited this page Sep 29, 2022 · 3 revisions

What is it?

CGI stands for Common Gateway Interface, it is a standard for external gateway programs to interface with information servers (like HTTP servers).

CGI applications are usually written in scripting languages such as Perl, however nowadays they're also written in other languages, generally they get the file extension .cgi, but they can also end in others like .py for python etc.

It is used whe the webserver needs to dynamically interact with a user, usually this is then done in the way of a user filling in a form and submitting this to the server. the CGI retrieves the data, processes it and returns the result back to the webserver and then to the user.

Features of a CGI

  • applications run on the server
  • reusable pieces of code
  • well defined standard supported by most modern browsers
  • interface is consistent, ca be written in many languages like C, C++, Python, Java, PERL.
  • person writing the CGI can write it independently of the OS which the server uses
  • simple basic way of passing information about the user's request from the webserver to the application program and getting a response back.
  • by default, CGI scripts run in the security context of the server.

How it works on the webserver

This webserver's cgi directive can be set in 2 different ways:

  • 2 arguments (extension, executable) ie. cgi_pass .pl /usr/bin/perl; will go into the CGI bin and find the first CGI script with said extension and execute it with the executable given. (can be used if script is not an executable) if a directory is given (CGI-bin), OR will compare if the extension matches the desired CGI script and if so, run it.
  • 1 argument, (executable) ie. cgi_pass ./get_query.pl; executes the CGI executable given if request is the cgi directory OR if that specific cgi is requested.

Additional Info

General Info How to use it?

The CGI script is usually found at a certain URL which should run the script. Generally speaking it follows the following steps:

  • a diectory is created within the webserver which contains the scripts. This is folder is usually calld cgi-bin.
  • The user sends a request to the server in the form of http://mywebsite.com/cgi-bin/mycgiscript.pl
  • The server recognises the file being requested is a CGI script and instead of sending back the file it runs the script and passes the output of the script to the web client.

CGI processing

When an HTTP server receives a request for a CGI script, the server gives the script the details of the request. There are 4 major ways in which a HTTP server an CGI script communicate:

  1. Enviroment variables, HTTP server uses environment variables to pass information about the request to the CGI script. Depending on the type of request the variables may or may not contain information required by the script to function properly.

  2. The command line, mostly used for isindex queries. However, isindex queries are dissuaded as it can cause security risks due to direct communicating with the command line.

  3. standard input, for HTTP POST and PUT queries. The HTTP server communicates the information to the CGI script via standard input. The amount of information writtent o the standard input is stored in the CONTENT_LENGTH environment variable.

  4. Standard output A script returns its output on the standard output. The output can be a document generated by the script, or instructions to the server for retrieved the desired output.

Appropriate HTTP headers

HTTP Header Description
Content-type: string Format of the file is being returned as a string.
Content-type: string variable which sends back the length of the data in bytes. Used for the broswer to determine how much time is needed to download the result, used in the POST method
Location: URL string This can be used to redirect a request to any file. The URL string specified ddepicts the URL to be returned instead of the URL which is requested
Expires: Date string Date string is used by the browser to determine when the page expires and needs refreshing. The format is: dd mon yyyy hh:mm:ss, NOT USED
Set-Cookie: string cookie passed as a string to be set, NOT USED

Environment variables

CGI programs also use environment variables, all programs have access to the following variables

HTTP Header Description
content-type Used when a file is uploaded from the user. Depicts the data type of the content attached
http_user_agent gives info about the browser who initiated the request
query_string used in GET requests, it's the URL-encoded information sent from the browser
content_length used with POST request. Gives info about the length of the query information
script_name name of CGI script
path_info full path where CGI script is kept
document_root root of the document provided
Remote_host HOST of the request
script_filename file name of the script
server_name name of the server (host)
server_port port where the server is located (port)
server_protocol always HTTP/1.1 in our case
server_software foodserv in this case

the internals

CGI class takes the request class and finds the requested URI and its appropriate target configuration. It then performs the setup() function and creates the absolute path starting at the current directory (which it will use for path finding), and set up the argv and envp lists. These functions also validate if:

  1. the file exists
  2. the file is an executable
  3. the file is allowed to execute based on what the config file says If these checks fail the class throws an error.

If this goes well it proceeds to the execute() function, which executes the CGI and captures the output. It creates two pipe fds (one for reading (capture output), one for writing (for POST)). If the request is not a POST it immediately closes the writing fd.

Afterwards it forks, in the child process, the arguments and environment variables arrays are built and the appropriate fds are dupped and/or closed and the CGI script is executed with execve. If the execution fails it returns a 1. In the parent process if the method is a POST, then it writes to the stdout of the write fd (which is linked to the stdin of the child pipe). And retrieves the output of the child stdout and stores it for later. It also returns the exit code. 0 on success (leading to http code 200) and otherwise it throw 502 bad gateway.

Resources

Common Gateway Interface (CGI) – How it Works, Features & Applications

CGI environement Variables

How the web works: HTTP and CGI explained