↩︎ | ↖︎ | ↑︎ | ↘︎ | ↪︎ |
WASD has a global configuration, which applies characteristics to the entire running server, as well as per-service (virtual server) and conditional configuration, which applies characteristics or behaviours to specific requests. All configuration is provided via files located by logical names.
Historically, WASD has been configured by direct editing of four of the five primary configuration files listed below.
As of v12.1 there is another option — WASD_CONFIG_INLINE where configuration occurs in a single file, supplemented by [includeFile]s.
See 2.2 Inline Configuration for further information.
Name | Scope | Description |
---|---|---|
WASD_CONFIG_AUTH | loadable | request authorization control |
WASD_CONFIG_GLOBAL | global | global server configuration |
WASD_CONFIG_MAP | loadable | request processing control |
WASD_CONFIG_MSG | global | provides server messages |
WASD_CONFIG_SERVICE | global | specifies services (virtual servers) |
Simple editing of these files change the configuration. Comment lines may be included by prefixing them with the hash ("#") character. Comment lines prefixed with a quote and then a hash ("!#") are displayed in Server Admin reports and are WATCHable during rule proceessing. Configuration file directives are not case-sensitive. Any changes to global configuration file can only be enabled by restarting the HTTPd process using the following command on the server system.
Changes to request mapping or authorization configuration files also can be dynamically reloaded into the running server using the administration command-line interface.
Changes to configuration files can be validated at the command-line before reload or restart. This detects and reports any syntactical and fatal configuration errors but of course cannot check the intent of the rules.
The config check sequentially processes each of the authorization, global, mapping, message and service configuration files.
If additional server startup qualifiers are required to enable specific configuration features then these must also be provided when checking. For example:
A server's currently loaded configuration can be interrogated from the Server Administration menu (see Server Administration of WASD Features and Facilities).
WASD uses multiple configuration files for a server and its site, each one providing for a different functional aspect … configuration, virtual services, path mapping, authorization, etc. Generally these configuration files are "flat", with all required directives included in a single file. This provides a simple and straight-forward approach suitable for most sites and allows for the provision of Server Administration page online configuration of several aspects.
Included files are also the organisational basis for 2.2 Inline Configuration.
It is also possible to build site configurations by including the contents of referenced files. This may provide a structure and flexibility not possible using the flat-file approach. All WASD configuration files allow the use of an [IncludeFile] directive. This takes a VMS file specification parameter. The file's contents are then loaded and processed as if part of the parent configuration file. These included files are allowed to be nested to a depth of two (i.e. the configuration file can include a file which may then include another file).
The following is an example used to build up the mapping rules for four virtual services supported on the one server.
As mentioned above, historically WASD has been configured by direct editing of of the primary configuration files WASD_CONFIG_AUTH, _GLOBAL, _MAP and _SERVICE.
As of v12.1 there is another option — WASD_CONFIG_INLINE where configuration occurs in a single file, supplemented by [includeFile]s. This should be considered a dot-zero facility, that is first release and somewhat less exercised than the standard approach to WASD configuration.
With multiple virtual services and sites, each requiring some customisation, it has become increasingly cumbersome maintaining service, mapping, and authorisation configurations across multiple files. So this inline configuration allows those aspects to be combined, with emphemeral (temporary) WASD configuration files generated and fed to the standard WASD configuration mechanisms.
[IncludeFile]s are inspected for indication of content; does it look like WASD_CONFIG_GLOBAL directives, _SERVICE, _MAP or _AUTH directives? If not recognised an error is flagged and configuration stops. The following inline-specific directives can be used to explicitly specify a file's content.
Directive | Destination (into ephemeral file) | |
---|---|---|
[Auth] | WASD_CONFIG_AUTH | |
[Global] | WASD_CONFIG_GLOBAL | |
[Map] | WASD_CONFIG_MAP | |
[Service] | WASD_CONFIG_SERVICE |
These may also be used to specify the content at any point in a file during configuration. The destination remains current until the end-of-file or next destination directive.
Where the PRE_MAP_AUTH.CONF file contains general mapping and authorisation rules that should be applied to all services prior to specific rules, and POST_MAP_AUTH.CONF mapping and authorisation rules that ensure services that tidy up anything left undone by specific rules.
The WASD_ROOT:[LOCAL]INLINE_ALPHA_443.CONF could contain the likes of
Global directives provide overall operating characteristics to all services. If not explicitly [FileInclude]d at the beginning of the WASD_CONFIG_INLINE the inline configuration will use whatever is currently present in the WASD_CONFIG_GLOBAL file (creating a new ephemeral file).
All of the usual Server Administration interface and command-line facilities continue to be available. So after INLINE edits the configuration can be /DO=MAP=CHECK and /DO=MAP, etc., as with non-INLINE configuration.
A server's currently loaded configuration can be interrogated from the Server Administration menu (see Server Administration of WASD Features and Facilities). When using inline configuration each of the configuration reports clearly indicates that the file is generated From: inline.
The term emphemeral file has been used multiple times so far. Without a major addition to the code base of WASD, and to allow the historical configuration to continue to be used, the approach employed by WASD_CONFIG_INLINE configuration is to use the contents of this file (including of course [IncludeFile]s) to populate temporary WASD_CONFIG_... files and have the server startup using those. These temporary files are deleted at server image shutdown.
It is recommended that the server distribution tree and any document and other web-specific data areas be kept separate and distinct.
The former in WASD_ROOT:[000000], the latter perhaps in something like WEB:[000000]. This logical device could be provided with the following DCL introduced into the site or server startup procedures:
See 10.2 VMS File System Specifications for further information on the use of logical names in locating and defining the content and structure of a site.
Note that logical device names like this need not appear in in the structure of the Web site. The root of the Web-accessible path can be concealed using a final mapping rule similar to the following
Mapping rules are the tools used to build a logical structure to a site from the physical area, perhaps multiple areas, used to house the associated files. The logical organisation of served data is largely hierarchical, organised under the Web-server path root, and is achieved via two mechanisms.
Physically distinct areas are used for good physical reasons (e.g. the area can best be hosted on a task-local disk), for historical reasons (e.g. the area existed before any Web environment existed) or for reasons of convenience (e.g. lets put this where access controls already allow the maintainers to manage it).
There are no good reasons for having site-specific documents integrated into the package directory structure!
All site-served files should be located in an autonomous, dedicated area or areas. The only reason to place script files into WASD_ROOT:[CGI-BIN] or WASD_ROOT:[architecture_BIN] is that the script script is traditionally accessible via a /cgi-bin/ path or that the site is a small and/or low usage environment where this directory is conveniently available for the few extra scripts being made available.
For any significant site (size that as best suits your perception), or for when a specific software system or systems is being built or exists and it is being "Web-ified", design that software system as you would be any other. That is place the documentation in one directory are, executables and support procedures in their own, management files in another, data in yet another area, etc. Then make those portions that are required to be accessible via the Web interface accessible via the logical associations afforded through the use of the server's mapping rules (10. Request Processing Configuration). Of course existing areas that are to be now made available via the Web can be mapped in the same way. This includes the active components - executable scripts. There is no reason (apart from historical) why the /cgi-bin/ path should be used to activate scripts associated with a dedicated software system. Use a specific and unique path for scripts associated with each such system.
When making a directory structure available via the Web care must be taken that only the portions required to be accessed can be. Other areas should or must not be accessible. The server process can only access files that are world-accessible, it is specifically granted access via VMS protection mechanisms (e.g. ACLs), or that the individual SYSUAF-authorized accessor can access and which have specifically been made available via server authorization rules. Use the recommendations in 3.2 Recommended Package Security as guidlines when designing your own site's protections and permissions.
A particular area of the file system may be specified as the root of a particular (virtual) sites documents. This is done using the WASD_CONFIG_MAP SET map=root=<string> mapping rule. After this rule is applied all subsequent rules have the specified string prefixed to mapped strings before file-system resolution.
For example, the following WASD_CONFIG_MAP rule set
when applied to the following request URLs results in the described mappings being applied.
With the request for a directory icon using
And a request for a script using
Care must be taken in getting the sequence of mapping rules correct for access to non-site resources before actually setting the document root which then ties every other resource to that root.
A single WASD server process is capable of concurrently supporting the same host name on different port numbers and a number of different host names (DNS aliased or multi-homed) using the same port number. This capability is generally known as a virtual server. There is no design limitation on the number of these services that WASD will concurrently support. Virtual services offer versatile and powerful multi-site capabilities using the one system and server. Service determination is based on the contents of the request's "Host:" header field. If none is present it defaults to base service for the interface's IP address and port.
The logical name WASD_CONFIG_SERVICE locates the virtual services configuration. It is a mandatory logical and file name.
See 7.7 Service Directives for further detail.
The essential profile of a site is established by its mapped resources and any authorization controls, the WASD_CONFIG_MAP and WASD_CONFIG_AUTH configuration files respectively, and these two files support directives that allow configuration rules to be applied to all virtual services (i.e. a default), to a host name (all ports), or to a single specified service (host name and specific port).
To restrict rules to a specified server (virtual or real) add a line containing the server host name, and optionally a port number, between double-square brackets. All following rules will be applied only to that service. If a port number is not present it applies to all ports for that service name, otherwise only to the service using that port. To resume applying rules to all services use a single asterisk instead of a host name. In this way default (all service) and server-specific rules may be interleaved to build a composite environment, server-specific yet with defaults. Note that service-specific and service-common rules may be mixed in any order allowing common rules to be shared. This descriptive example shows a file with one rule per line.
Both the mapping and authorization modules report if rules are provided for services that are not configured for the particular server process (i.e. not in the server's [Service] or /SERVICE parameter list). This provides feedback to the site administrator about any configuration problems that exist, but may also appear if a set of rules are shared between multiple processes on a system or cluster where processes deliver differing services. In this latter case the reports can be considered informational, but should be checked initially and then occasionally for misconfiguration.
If a service is not configured for the particular host address and port of a request one of two actions will be taken.
This applies to dotted-decimal addresses as well as alpha-numeric. Therefore if there is a requirement to connect via a numeric IP address such a service must have been configured.
Note also that the converse is possible. That is, it's possible to configure a service that the server cannot ever possibly respond to because it does not have an interface using the IP address represented by the service host.
WASD can apply GZIP compression (gzip, deflate) to any suitable response body and can accept similarly compressed request bodies. It dynamically maps required functions from a ZLIB shareable image. Originally developed against the ZLIB v1.2.n port by Jean-François Piéronne, the VMS-PORTS (GNV) LIBZ package is also supported.
WASD dynamically maps the associated shareable image by successively accessing the (optionally defined) WASD_LIBZ_SHR32 logical name, then GNV$LIBZSHR32, then LIBZ_SHR32, before reporting GZIP unavailable.
The shareable image must be INSTALLed (without any particular privileges) before it can be activated by the privileged WASD HTTPd image (the WASD startup will automatically do this if necessary). The server process log and the Server Administration page, Statistics Report panel named Environment contains the version activated or a VMS status message if an error was encountered.
The WASD_CONFIG_GLOBAL directive [GzipResponse] controls whether this feature is enabled for the gzip content-encoding of suitable response bodies. This directive requires at least one parameter, the compression level in the range 1..9. Smaller values provide faster but poorer compression ratios while larger values better compression at the cost of more CPU cycles and latency. This corresponds to the GZIP utility's -1..-9 CLI switches. Two optional parameters could allow ZLIB's 'memLevel' and 'windowBits' to be adjusted by ZLIB afficiendos (level[,memory,window]). A small amount of experimentation by this author indicates minor changes in memory usage and compression ratio by fiddling with these.
Be aware that GZIP encoding is memory intensive. From 132kB to 265kB has been observed per compressing request (WATCH provides this in a summary line). These values apply across a wide range of transfer sizes (from kilobytes to tens of megabytes). It also is CPU intensive and adds response latency, though that might be well be offset by significant reductions in transfer time on the Internet or other slower, non-intranet infrastructures. Text content compression has been observed from 30% to 10% of the original file size (even down to 1% in the case of the extremely redundant content of [EXAMPLE]64K.TXT). VMS executables (for want of another binary test case) at around 40%. In other words, GZIP encoding may not be suitable or efficient for every site or every request!
Once enabled WASD will GZIP the responses for all suitable contents provided the client accepts the encoding and the response is not one of the following:
Additional control may be exercised with the following path SETings:
Using path settings GZIP compression may be disabled for specified file types (apart from those already suppressed as described above).
A script using the Script-Control: X-content-encoding-gzip=0 CGI response header can similarly suppress GZIP compression of its output if required. See "Scripting Overview" for further detail.
By default GZIP encoding flushes the internal buffer only when full. Most commonly this is not an issue because of high rates of output. However with slow output sources, such as from some classes of script, this can result in considerable latency before a client sees an initial response, and then between transmission of further output. By default output is initially flushed after 5 seconds and thereafter at a maximum interval of 15 seconds. The WASD_CONFIG_GLOBAL directive [GzipFlushSeconds] allows this period to be adjusted.
Decoding of GZIP content-encoded request bodies is enabled using the WASD_CONFIG_GLOBAL directive [GzipAccept]. Enabling this using a value 15 (or 1) results in the server advertising its acceptance of GZIPed requests using the "Accept-Encoding: gzip, deflate" response header. Requests containing bodies GZIP compressed will have these decoded as they are read from the client and before further processing, such as the upload of files into server accessible file-system space. This decoding is optional and not the default with DCL and DECnet script processing. That is, a request body will be passed to the script still encoded unless specific mapping directs otherwise. Decoding by the server into the original data prior to transfering to the script can be enabled for all or selected scripts using the following path settings:
Note that scripts need to be specially aware of both GZIP encoded bodies and those already decoded by the server. In the first case the stream must be read to the specified content-length and then decoded. In the second case, a content-length cannot be provided by the server (without unencoding the entire stream ahead of time it cannot predict the final size). Where the server is to decode the request body before transfering it to the script it changes the CGI variable CONTENT_LENGTH to a single question-mark ("?"). Scripts may use this to detect the server's intention and then must ignore any transfer-encoding and/or content-encoding header information and read the request body until end-of-file is received.
GZIP decoding (decompression) is understandably much less memory and CPU intensive. Experimentation indicates it does not contribute significantly to latency either.
Request "throttling" is a term adopted to describe controlling the number of requests that can be processing against any specified path at any one time. Requests in excess of this value are First-In-First-Out (FIFO) queued, up to an optional limit, waiting for a currently processing request to conclude allowing the next queued request to resume processing. This is primarily intended to limit concurrent resource-intensive script execution but could be applied to any resource path. Here's one dictionary description.
throttle n 1: a valve that regulates the supply of fuel to the engine [syn: accelerator, throttle valve] 2: a pedal that controls the throttle valve; "he stepped on the gas" [syn: accelerator, accelerator pedal, gas pedal, gas, gun] v 1: place limits on; "restrict the use of this parking lot" [syn: restrict, restrain, trammel, limit, bound, confine] 2: squeeze the throat of; "he tried to strangle his opponent" [syn: strangle, strangulate] 3: reduce the air supply; of carburetors [syn: choke]
This is applied to a path (or paths) using the WASD_CONFIG_MAP mapping SET THROTTLE= rule (10.5.5 SET Rule). The general format is
One way to read a throttle rule is "begin to throttle (queue) requests from the n1 value up to the n2 value, after which the queue is FIFOed up to the n3 value when it resumes queuing-only, up until the busy n4 value".
Each integer represents the number of concurrent requests against the throttle rule path. Parameters not required may be specified as zero or omitted in a comma-separated list. The schema of the rule requires that each successive parameter be larger than that preceding it. This basic consistency check is performed when the rule is loaded.
For any rule the possible maximum number of requests that can be processed at any one time may be simply calculated through the addition of the n1 value to the difference of the n3 and n2 values (i.e. max = n1 + (n3 - n2)). The maximum concurrently queued as the difference of the n4 and the maximum concurrently processed.
A comprehensive throttle statistics report is available from the Server Administration facility.
If the concurrent processing value (n1) has a second, slash-delimited integer, this serves to limit the number of authenticated user-associated requests that can be concurrently processing.
When a request is available for processing the associated remote user name is checked for activity against the queue. The u1 (or per-user throttle value) is a limit on that user name's concurrent processing. If it would exceed the specified value the request is queued until the number of requests processing drops below the u1 value. All other values in the throttle rule are applied as for non-per-user throttling.
If an unauthenticated request is matched against the throttle rule (i.e. there is no authorization rule matching the request path) the client has a 500 (server error) response returned. Obviously per-user throttling must have a remote user name to throttle against and this is a configuration issue.
Requests up to 10 are concurrently processed. When 10 is reached futher requests are queued to server capacity.
Concurrent requests to 10 are processed immediately. From 11 to 20 requests are queued. After 20 all requests are queued but also result in a request FIFOing off the queue to be processed (queue length is static, number being processed increases to server capacity).
Concurrent requests up to 15 are immediately processed. Requests 16 through to 30 are queued, while 31 to 40 requests result in the new requests being queued and waiting requests being FIFOed into processing. Concurrent requests from 41 onwards are again queued, in this scenario to server capacity.
Concurrent requests up to 10 are immediately processed. Requests 11 through to 20 will be queued. Concurrent requests from 21 to 30 are queued too, but at the same time waiting requests are FIFOed from the queue (resulting in 10 (n1) + 10 (n3-n2) = 20 being processed). From 31 onwards requests are just queued. Up to 40 concurrent requests may be against the path before all new requests are immediately returned with a 503 "busy" status. With this scenario no more than 20 can be concurrently processed with 20 concurrently queued.
Concurrent requests up to 10 are processed. When 10 is reached requests are queued up to request 30. When request 31 arrives it is immediately given a 503 "busy" status.
This is basically the same as scenario 4) but with a resume-on-timeout of two minutes. If there are currently 15 (or 22 or 28) requests (n1 exceeded, n3 still within limit) the queued requests will begin processing on timeout. Should there be 32 processing (n3 has reached limit) the request will continue to sit in the queue. The timeout would not be reset.
This is basically the same as scenario 3) but with a busy-on-timeout of three minutes. When the timeout expires the request is immediately dequeued with a 503 "busy" status.
Concurrent requests up to 10 are processed. The requests must be of authenticated users. Each authenticated user is allowed to execute at most one concurrent request against this path. When 10 is reached, or if less than 10 users are currently executing requests, then further requests are queued to server capacity.
This is basically the same as scenario 8) but with a busy-on-timeout of three minutes. When the timeout expires any requests still queued against the user name is immediately dequeued with a 503 "busy" status.
Throttling is applied using mapping rules. The set of these rules may be changed within an executing server using map reload functionality. This means the number of, and/or contents of, throttle rules may change during server execution. The throttle functionality needs to be independent of the the mapping functionality (requests are processed independently of mapping rules once the rules have been applied). After a mapping reload the contents of the throttle data structures may be at variance with the constraints currently executing requests began processing under.
This should have little deleterious effect. The worst case is mis-applied constraints on the execution limits of changed request paths, and slightly confusing data in the Throttle Report. This quickly passes as requests being processed under the previous throttle constraints conclude and an entirely new collection of requests created using the constraints of the currently loaded rules are processed.
The "client_connect_gt:" mapping conditional (5. Conditional Configuration) attempts to allow some measurement of the number of requests a particular client currently has being processed. Using this decision criterion appropriate request mapping for controlling the additional requests can be undertaken. It is not intended to provide fine-grained control over activities, rather just to prevent a single client using an unreasonable proportion of the resources.
For example. If the number of requests from one particulat client looks like it has got out of control (at the client end) then it becomes possible to queue (throttle) or reject further requests. In WASD_CONFIG_MAP
While not completely foolproof it does offer some measure of control over gross client concurrency abuse or error.
HTTP uses an implementation of the MIME (Multi-purpose Internet Mail Extensions) specification for identifying the type of data returned in a response. A MIME content-type consists of a plain text string describing the data as a type and slash-separated subtype, as illustrated in the following examples:
In common with most HTTP servers WASD uses a file's suffix (extension, type, e.g. .HTML, .TXT, .GIF) to identify the data type within the file. The [AddType] directive is used during configuration to bind a file type to a MIME content-type. To make the server recognise and return specific content-types these directives map file types to content-types.
With the VMS file system there is no effective file characteristic or algorithm for identifying a file's content without an exhaustive examination of the data contained there-in … a very expensive process (and probably still inconclusive in many cases), hence the reliance on the file type.
Mappings using [AddType] look like these.
To allow the server to share content-type definitions with other MIME-aware applications, and for WASD scripts to be able to perform their own mapping on a shared understanding of MIME content it is possible to move the file suffix to content-type mapping from a collection of [AddType]s in WASD_CONFIG_GLOBAL to an external file. This file is usually named MIME.TYPES and is specified in WASD_CONFIG_GLOBAL using the [AddMimeTypesFile] directive.
Mappings using MIME.TYPES look like these.
A leading content-type is mapped to single or multiple file suffixes. A general MIME.TYPES file commonly has content-types listed with no corresponding file suffix. These are ignored by WASD. Where a file suffix is repeated during configuration the latter version completely supercedes the former (with the Server Administration page showing an italicised and struck-through content-type to help identify duplicates).
To allow the configuration information used by the server to generate directory listings with additional detail, WASD-specific extensions to the standard MIME.TYPES format are provided. These are "hidden" in comment structures so as not to interfere with non-WASD application use. All begin with a hash then an exclamation character ("#!") then another reserved character indicating the purpose of the extension. Existing comments are unaffected provided the second character is anything but an exclamation mark!
These directives are placed following the MIME-type entry they apply to. An example of the contents of a MIME.TYPES file with various WASD extensions.
Other reserved characters have been specified for development purposes but are not (perhaps currently) employed by the HTTP server.
If a file type is not recognised (i.e. no [AddType] or [AddMimeTypesFile] mapping corresponding to the file type) then by default WASD identifies its data as application/octet-stream (i.e. essentially binary data). Most browsers respond to this content-type with a download dialog, allowing the data to be saved as a file. Most commonly these unknown types manifest themselves when authors use "interesting" file names to indicate their purpose. Here are some examples the author has encountered:
If the site administrator would prefer another default content-type, perhaps "text/plain" so that any unidentified files default to plain text, then this may be configured by specifying that content-type as the description of the catch-all file type entry. Examples (use one of):
When accessing files it is possible to explicitly specify the identifying content-type to be returned to the browser in the HTTP response header. Of course this does not change the actual content of the file, just the header content-type! This is primarily provided to allow access to plain-text documents that have obscure, non-"standard" or non-configured file extensions.
It could also be used for other purposes, "forcing" the browser to accept a particular file as a particular content-type. This can be useful if the extension is not configured (as mentioned above) or in the case where the file contains data of a known content-type but with an extension conflicting with an already configured extension specifying data of a different content-type.
Enter the file path into the browser's URL specification field ("Location:", "Address:"). Then, for plain-text, append the following query string:
For another content-type substitute it appropriately. For example, to retrieve a text file in binary (why I can't imagine :-) use
This is an example:
It is posssible to "force" the content-type for all files in a particular directory. Enter the path to the directory and then add
(or what-ever type is desired). Links to files in the listing will contain the appropriate "?httpd=content&type=..." appended as a query string.
This is an example:
Language-specific variants of a document may be configured to be served automatically and transparently. This is organized as a basic file and name with language-specific variant indicated by an additional "tag", one of ISO language abbreviations used by the "Accept-Language:" request header field, e.g. en for English, fr for French, de for German, ru for Russian, etc.
Two variants of the basic file specification are possible; file name (the default) and file type. Hence if the basic file name is EXAMPLE.HTML then specifically German, English, French and Russian language versions in the directory would be either
A path must be explicitly SET using the accept=lang mapping rule as containing language variants. As searching for variants is a relatively expensive operation the rule(s) applying this functionality should be carefully crafted. The accept=lang rule accepts an optional default language representing the contents of the basic, untagged files. This provides an opportunity to more efficiently handle requests with a language first preference matching that of the default. In this case no variant search is undertaken, the basic file is simply served. The following example sets a path to contain files with a default language of French and possibly containing other language variants.
In this case the behaviour would be as follows. With the default language set to "fr" a request's "Accept-Language:" field is initially processed to check if the first preference is for "fr". If it is then there is no need for further accept language processing and the basic file is returned as the response. If not then the directory is searched for other files matching the EXAMPLE_*.HTML specification. All files matching this wildcard have the "*" portion (e.g. "EN", "FR", "DE", "RU") added to a list of variants. When the search is complete this list is compared to the request's "Accept-Language:" list. The first one to be matched has the contents of the corresponding file returned. If none are matched the default version would be returned.
This example of the behaviour is based on the contents of the directory described above. A request that specifies
will have EXAMPLE.HTML returned (without having searched for any other variants). For a request specifying
then the EXAMPLE_RU.HTML file is returned, and if no "Accept-Language:" is supplied with the request EXAMPLE.HTML would be returned. One or other file is always returned, with the default, non-language file always the fallback source of data. If it does not exist and no other language variant is selected the request returns a 404 file-not-found error.
When using the accept=lang=(variant=type) form of the rule (i.e. the variant is placed on the file type rather than the default file name) each possible file extension must also must have its content-type made known to the server. Using the example above the variants would need to be configured in a similar way to the following.
Normally only files with a content-type of "text/.." are subject to variant searching. If the rule path includes a file type then those files matching the rule are also variant-searched. In this way images, audio files, etc., may also have language-specific versions supplied transparently. The following illustrates this usage
The default character set sent in the response header for text documents (plain and HTML) is set using the [CharsetDefault] directive and/or the SET charset mapping rule. English language sites should specify ISO-8859-1, other Latin alphabet sites, ISO-8859-2, 3, etc. Cyrillic sites might wish to specify ISO-8859-5 or KOI8-R, and so on.
Document and CGI script output may be dynamically converted from one character set to another using the standard VMS NCS conversion library. The [CharsetConvert] directive provides the server with character set aliases (those that are for all requirements the same) and which NCS conversion function may be used to convert one character set into another.
When this directive is configured the server compares each text response's character set (if any) to each of the directive's document charset string. If it matches it then compares each of the accepted charset (if multiple) to the request "Accept-Charset:" list of accepted characters sets.
At least one doc-charset and one accept-charset must be present. If only these two are present (i.e. no NCS-conversion-function) it indicates that the two character sets are aliases (i.e. the same set of characters, different name) and no conversion is necessary.
If an NCS-conversion-function is supplied it indicates that the document doc-charset can be converted to the request "Accept-Charset:" preference of the accept-charset using the NCS conversion function name specified.
A factor parameter can be appended to the conversion function. Some conversion functions require more than one output byte to represent one input byte for some characters. The 'factor' is an integer between 1 and 4 indicating how much more buffer space may be required for the converted string. It works by allocating that many times more output buffer space than is occupied by the input buffer. If not specified it defaults to 1, or an output buffer the same size as the input buffer.
Multiple comma-separated accept-charsets may be included as the second component for either of the above behaviours, with each being matched individually. Wildcard * (asterisk) and % (percentage) may be used in the doc-charset and accept-charset strings.
By default the server provides its own internal error reporting facility. These reports may be configured as basic or detailed on a per-path basis, as well as determining the basic "look-and-feel". For more demanding requirements the [ErrorReportPath] configuration directive allows a redirection path to be specified for error reporting, permitting the site administrator to tailor both the nature and format of the information provided. A Server Side Include document, CGI script or even standard HTML file(s) may be specified. Generally an SSI document would be recommended for the simplicity yet versatility.
Internally generated error reports are the most efficient. These can be delivered with two levels of error information. The default is more detailed.
ERROR 404 - The requested resource could not be found.
Document not found ... /wasd_root/index.html
(document, bookmark, or reference requires revision)
Additional information: 1xx, 2xx, 3xx, 4xx, 5xx, help
WASD/10.0.0 server at www.example.com port 80
There is also the more basic.
ERROR 404 - The requested resource could not be found.
Additional information: 1xx, 2xx, 3xx, 4xx, 5xx, help
WASD/10.0.0 server at www.example.com port 80
These can be set per-server using the [ReportBasicOnly] configuration directive, or on a per-path basis in the WASD_CONFIG_MAP configuration file. The basic report is intended for environments where traditionally a minimum of information might be provided to the user community, both to reduce site configuration information leakage but also where a general user population may only need or want the information that a document was either found or not found. The detailed report often provides far more specific information as to the nature of the event and so may be more appropriate to a more technical group of users. Either way it is relatively simple to provide one as the default and the other for specific audiences. Note that the detailed report also includes in page <META> information the code module and line references for reported errors.
To default to a basic report for all but selected resource paths introduce the following to the top of the WASD_CONFIG_MAP configuration file.
To provide the converse, default to a detailed report for all but selected paths use the following.
The additional reference information included in the report may be disabled using the appropriate WASD_CONFIG_MSG [status] message item. Emptying this message results in an error report similar to the following.
ERROR 404 - The requested resource could not be found.
WASD/10.0.0 server at www.example.com port 80
The server signature may be disabled using the WASD_CONFIG_GLOBAL [ServerSignature] configuration directive. This results in a minimal error report.
ERROR 404 - The requested resource could not be found.
A simple approach to providing a site-specific "look-and-feel" to server reports is to customize the [ServerReportBodyTag] WASD_CONFIG_GLOBAL configuration directive. Using this directive report page background colour, background image, text and link colours, etc., may be specified for all reports. It is also possible to more significantly change the report format and contents (within some constraints), without resorting to the site-specific mechansims refered to below, by changing the contents of the appropriate WASD_CONFIG_MSG [status] item. This should be undertaken with care.
Customized error reports can be generated for all or selected HTTP status status associated with errors reported by the server using the WASD_CONFIG_GLOBAL [ErrorReportPath] and WASD_CONFIG_SERVER [ServiceErrorReportPath] configuration directives. To explicitly handle all error reports specify the path to the error reporting mechanism (see description below) as in the following example.
To handle only selected error reports add the HTTP status codes following the report path. In this example only 403 and 404 errors are explicitly handled, the rest remain server-generated. This is particularly useful for static error documents.
To exclude selected error reports (and handle all others by default) add the HTTP status codes preceded by a hyphen following the report path. In this example 401 and 500 errors are server-generated.
Site-specific error reporting works by internal redirection. When an error is reported the original request is concluded and the request reconstructed using the error report path before internally being reprocessed. For SSI and CGI script handlers error information becomes available via a specially-built query string, and from that as CGI variables in the error report context. One implication is the original request path and query string are no longer available. All error information must be obtained from the error information in the new query string.
It is suggested with any use of this facility the reporting document(s) be located somewhere local, probably WASD_ROOT:[RUNTIME.HTTPD], and then enabled by placing the appropriate path into the [ErrorReportPath] configuration directive.
Note that virtual services can subsequently have this path mapped to other documents (or even scripts) so that some or all services may have custom error reports. For instance the following arrangement provides each host (service) with an customized error report.
Static HTML documents are a good choice for site-specific error messages. They are very low overhead and are easily customizable. One per possible response error status code is required. When providing an error report path including a "!UL" introduces the response status code into the file path, providing a report path that includes a three digit number representing the HTTP status code. A file for each possible or configured code must then be provided, in this example for 403 (authorization failure), 404 (resource not found) and 502 (bad gateway/script).
This mapping will generate paths such as the following, and require the three specified to respond to those errors.
SSI documents provide the versatility of dynamic report generation for but they do take time and CPU for processing, and this may be a significant consideration on busy sites.
Three example SSI error report documents are provided.
The following SSI variables are available specifically for generating error reports. The <!--#printenv --> statement near the top of the file may be uncommented to view all SSI and CGI variables available.
Variable | Description |
---|---|
ERROR_LINE | The HTTPd source code line from where the error was generated. |
ERROR_MODULE | The HTTPd source code module corresponding to the line described above. |
ERROR_REPORT | A single HTML string providing a detailed error message. |
ERROR_REPORT2 | A single HTML comment providing more detailed VMS error information if available |
ERROR_REPORT3 | A server-generated HTML string providing a brief explanation of the error if available |
ERROR_STATUS_CLASS | Essentially the single hundreds digit from the status code (e.g. 4). |
ERROR_STATUS_CODE | The HTTP response status code representing the error (e.g. 404). |
ERROR_STATUS_EXPLANATION | The HTTP response status code descriptive meaning (e.g. "The requested resource could not be found.") |
ERROR_STATUS_TEXT | The HTTP response status code abbreviated meaning (e.g. "Not Found"). |
ERROR_STATUS_TYPE | "basic" or "detailed". |
ERROR_STATUS_URI | The HTML-escaped URI of the request reporting the error. |
FORM_ERROR_… | A series of CGI variables providing the sources for the above SSI variables, as well as other general environment information. |
It is also possible to report using a script. The same error information is available via corresponding CGI variables. The source code WASD_ROOT:[SRC.MISC]REPORTERROR.C provides such an implementation example.
Significant server events may be optionally displayed via a selected operator's console and recorded in the operator log. Various categories of these events may be selectively enabled via WASD_CONFIG_GLOBAL directives (6. Global Configuration).
Some significant server events are always logged to OPCOM if any one of the above categories is enabled.
WASD provides a versatile access log, allowing data to be collected in Web-standard common and combined formats, as well as allowing customization of the log record format. It is also possible to specify a log period. If this is done log files are automatically changed according to the period specified.
Where multiple access log files are generated with per-instance, per-period and/or per-service logging (see below) these can be merged into single files for administrative or archival purposes using the CALOGS utility.
The Quick-and-Dirty LOG STATisticS utility can be used to provide elementary ad hoc log analysis from the command-line or CGI interface.
Exclude requests from specified hosts using the [LogExcludeHosts] configuration parameter, or using the "SET NOLOG" mapping directive.
The configuration parameter [LogFormat] and the server qualifier /FORMAT specifies one of three pre-defined formats, or a user-definable format. Most log analysis tools can process the three pre-defined formats. There is a small performance impost when using the user-defined format, as the log entry must be specially formatted for each request.
The user-defined format allows customised log formats to be specified using a selection of commonly required data. The specification must begin with a character that is used as a substitute when a particular field is empty (use "0" for no substitute, as in the "windows log format" example below).
Two different "escape" characters introduce the following parameters:
Characters | Description |
---|---|
AR | authentication realm (if any) |
AU | authenticated user name (if any) |
BB | bytes in body (excludes response header) |
BQ | quadword bytes in response (includes header) |
BY | bytes in response (includes header) |
CA | client address |
CC | X509 client certificate authorization distinguishing name |
CI | SSL session cipher (e.g. "AES128-SHA", "AES256-SHA256") |
CL | value provided by "Content-Length:" header (cf. "PL") |
CN | client host name (or address if DNS lookup disabled) |
CP | client port |
DI | specified dictionary value |
ID | session track ID - obsolete |
EM | request elapsed time in milliseconds |
ES | request elapsed time in fractional seconds |
ME | request method |
NP | specified notepad value |
PA | request path (not to be confused with "RQ") |
PL | actual body (payload) length received with POST or PUT (cf. "CL") |
PR | request URL (includes protocol scheme) |
QS | request query string (if any) |
RF | referer (if any) |
RQ | complete request string (see below) |
RP | request protocol |
RS | response status code |
SN | server host name |
SC | script name (if any) |
SM | request scheme (http: or https:) |
SP | server port |
SR | SSL session reused |
SV | SSL protocol (e.g. "SSLv3", "TLSv1") |
TC | request time (common log format) |
TI | request time (local in ISO 8601 extended format) |
TS | request time (UTC in ISO 8601 basic format) sortable |
TU | request time (UTC) |
TV | request time (VMS format) |
UA | user agent |
VS | virtual service (service host:port) |
XX | custom, usually site/client-specific, logging item see module [SRC.HTTPD]LOGGING.C functions LoggingCustom..() |
Character | Description |
---|---|
0 | a null character (used to define the empty field character) |
! | insert an "!" |
^ | insert a "^" |
n | insert a newline |
q | insert a quote (so that in DCL the quotes won't need escaping!) |
t | insert a TAB |
Any other character is directly inserted into the log entry.
It is possible to use one of the pre-defined log format keywords with additional user-defined directive appended. The appended directives must include ALL additional literal characters and directives required in the log entry. The syntax is <pre-defined keyword>+<appended format> as in "COMMON+ !EM".
The access log file may have a period specified against it, producing an automatic generation of log file based on that period. This allows logs to be systematically named, ordered and kept to a managable size. This is also known as log rotation. The period specified can be one of
The log file changes on the first request after the entering of the new period.
When using a periodic log file, the file name specified by WASD_CONFIG_LOG or the configuration parameter [LogFile] is partially ignored, only partially because the directory component of it is used to located the generated file name. The periodic log file name generated comprises
as in the following example
For the daily period the date represents the request date. For the weekly period it is the date of the previous (or current) day specified. That is, if the request occurs on the Wednesday for a weekly period specified by Monday the log date show the last Monday's. For the monthly period it uses the first.
By default a single access log file is created for each HTTP server process. Using the [LogPerService] configuration directive a log file for each service provided by the HTTPd is generated (2.4 Virtual Services). The [LogNaming] format can be any of "NAME" (default) which names the log file using the first period-delimited component of the IP host name, "HOST" (which uses as much of the IP host name as can be accomodated within the maximum 39 character filename limitation under ODS-2), or "ADDRESS" which uses the full IP host address in the name. Both HOST and ADDRESS have hyphens substituted for periods in the string. If these are specified then by default the service port follows the host name component. This may be suppressed using the [LogPerServiceHostOnly] directive, allowing a minimum extra 3 characters in the name, and combining entries for all ports associated with the host name (for example, a standard HTTP service on port 80 and an SSL service on port 443 would have entries in the one file).
To reduce physical disk activity, and thereby significantly improve performance, the RMS characteristics of the logging stream are set to buffer records for as long as possible and only write to disk when buffer space is exhausted (a periodic flush ensures records from times of low activity are written to disk). However when multiple server processes (either in the case of multiple instances on a single node, single instance on each of multiple clustered nodes, or a combination of the two) have the same log files open for write then this buffering and defered write-to-disk is disabled by RMS, it insisting that all records must be flushed to disk for correct serialization and coherency.
This introduces measurable latency and a potentially significant bottleneck to high-demand processing. Note that it only becomes a real issue under load. Sites with a low load should not experience any impact.
Sites that may be affected by this issue can revert to the original buffered log stream by enabling the [LogPerInstance] configuration directive. This ensures that each log stream has only one writer by creating a unique log file for each instance process executing on the node and/or cluster. It does this by appending the node and process name to the file type. This would change the log name from something like
Of course the number-of and naming-of log files is beginning to become a little itimidating at this stage! To assist with managing this seeming plethora of access log files is the calogs utility, which allows multiple log files to be merged whilst keeping the records in timestamp order.
When per-period or per-service logging is enabled the access log file has a specific name generated. Part of this name is the host's name or IP address. By default the host name is used, however if the host IP address is specified the literal address is used, hyphens being substituted for the periods. Accepted values for the [LogNaming] configuration directive are:
Examples of generated per-service (non-per-period) log names:
Examples of generated per-period (with/without per-service) log names:
Examples of generated per-instance (per-service and per-period) log names:
Connections can be accepted and rejected with considerable granularity. If rejected the connection is immediately closed without further interaction. This can be useful for excluding problematic hosts, domains, specific IP addresses, and ranges of IP address.
If [DNSLookup] directive enabled and an IP address is resolved to a host name then wildcard pattern matching against the name can be used. If not enabled or the IP address does not resolve then IP addresses or IP address ranges must be used.
The global configuration directive [Accept] can permit a host or IP to access the server unconditionally, even if otherwise forbidden by a [Reject] host name or IP address.
As noted in [Accept] and [Reject] directives can contain logical or physical file names providing a source of directives indepedent of the global configuration (which requires a server restart to "reload"). For consistency it is suggested to use the logical names WASD_CONFIG_ACCEPT and WASD_CONFIG_REJECT to locate these files and associated directives. The contents of these files can be reloaded into the running server using
simple IP address, range of addresses, CIDR masks
WASD provides mapping rule to specific HTTP status allowing complex evaluations of requests and query strings. So the complex mapping rule
Using the [Reject] rule
Global directives must be located before any list of host name or IP address matching patterns.
Directive | Action | |
---|---|---|
$DNS | reject connection if host name lookup failed ** | |
$LOG | add entry to access log, e.g. **
152.208.29.248 - - [25/Aug/2023:10:37:03 -0400] "CONNECT /REJECT~152.208.29.248 HTTP/1.0" 418 0 "-" "-" | |
$NOTE | report to server process log, e.g.
%HTTPD-I-REJECT, 25-AUG-2023 10:39:34, 18, 152.208.29.248 the.site.name:443 | |
$OFF | disable processing | |
$OPCOM | report to OPCOM (essentially as per $NOTE) | |
$4nn or $5nn | map specified HTTP status to IP address rejection **
HTTP-status maximum-IP-addresses timeout-minutes (see example above) | |
$400 | include requests with signficantly broken headers ** | |
$403 | include authorisation failures exceeding limit ** |
** applies only to reject not to accept
Access tracking has been obsoleted with WASD v11.0.
It is possible to mark a path as being of specific interest. When this is accessed by a request the server puts a message into the the server process log and perhaps of greater immediate utility the increase in alert hits is detected by HTTPDMON and this (optionally) provides an audible alert allowing immediate attention. This is enabled on a per-path basis using the SET mapping rule. Variations on the basic rule allow some control over when the alert is generated.
The special case ALERT=integer allows a path to be alerted if the final response HTTP status is the same as the integer specified (e.g. 501, 404) or within the category specified (599, 499).
↩︎ | ↖︎ | ↑︎ | ↘︎ | ↪︎ |