|1Document Searching|
|^ The |/query| and |/extract| scripts provide real-time searching of
plain-text and HTML documents, and document retrieval. The search is a
simple-string search, not a GREP-style search. It is designed to provide a
useful mechanism for locating documents containing a keyword, not for document
analysis. It has the useful feature for plain-text documents of allowing the
selective extraction of only the portion near the |/hit||.
|^ Only files with a plain-text or HTML MIME data type (see |link|Document
Access and Specification||) will be searched. Others may be specified, or be
selected from wildcard file specification, but they will not actually have
their contents searched.
|^ Directory specifications may include a wildcard elipsis (allowing a
directory tree to be traversed) and/or file name wildcards. In other words,
anything acceptable as VMS file system syntax (except in URL-format of course).
See examples in |link|Standard Search Form||.
|2Plain-Text Search|
|^ A search of a plain-text file is straight-forward. Each line in the file
is searched for the required string. The first time it is encountered is
considered a |/hit||. The line is not searched for any further
occurances.
|^ Searches of plain text files allow the subsequent selection of partial
documents (i.e. the retrieval of only a number of lines around any actual
hit). This allows the user to selectively extract a portion of a document,
avoiding the need to explcitly scan through to the section of interest.
|2HTML Search|
|^ A search of an HTML file is a little more complex. As might be expected,
only text presented in the document text is searched, markup text is ignored.
That is, all text not part of an HTML |/tag| construct is extracted and
searched. For example, out of the following HTML fragment
|code|
The document entitled "Example Document"
provides only an overview of the full capabilities of HTML.
|!code|
only the following text would actually be searched
|code|
The document entitled "Example Document" provides only an overview
of the full capabilities of HTML.
|!code|
|^ The mechanism for partial document retrieval available with plain-text
files is |*not| present with HTML documents. HTML files generally must be
treated as a whole, with the formatting of current sections often very
dependent on the formatting of previous sections. This makes extracting a
subsection perilous without extensive syntactical analyis. On the positive
side, HTML documents tend to be already divided into meaningful subdocuments
(files), making retrieval of a hit naturally more-or-less within context.
|^ Instead of partial document retrieval, the document is processed to place
anchors for each hit, making it possible to jump directly to a particular
section of interest. Generally this works well but may occasionally distort
the presentation of a document.
|2Search Syntax|
|^ A search may be initiated in basic three ways:
|number|
|item| Appending a question-mark and search string to a file specification (the
simple syntax of "ISINDEX"-style searching). This is standard HTTP, and of
course must conform to HTTP syntax.
|item| Providing the name of the query script followed by the directory path to
be searched. The script then returns a standard search form.
|item| |/Forms||-based search, which allows the format and mechanism of
the search to be controlled.
|!number|
|note|
|0. tag obsolete (as of HTML4)|
|^ Placing the HTML tag "" within a document's text is sufficient to
inform the browser that searching is available for that document. The browser
will inform the user of this and allow a search of that document to be
initiated at any time. Note that it is limited to the one document.
|^ Using the keyword search syntax explicitly is another method of initiating
a search, and additionally can use a wildcard in the document specification.
For example:
|code|
/wasd_root/doc/env/*.*?formatted
|!code|
|^ The following link provides an online demonstration search using the above
syntax. Note the difference in the way plain-text file hits are presented
compared with those of HTML files.
|^+ |link%=|/wasd_root/wasdoc/env/*.*?formatted|
|!note|
|3Standard Search Form|
|^ Using the "QUERY" script name followed by a URL-format path
specifying the directory to be searched returns a standard, script-generated
search form.
|^ The following link provides an online demonstration of the standard search
form.
|^+ |link%=|/cgi-bin/query/wasd_root/wasdoc/env/|
|^ As with all search specifications, the directory specification may include
wildcard a elipsis (allowing a directory tree to be traversed) and/or file name
wildcards. In other words, anything acceptable as VMS file system syntax
(except in URL-format of course). See the following examples.
|table|
|~ |. |link%=|/cgi-bin/query/wasd_root/wasdoc/env/*.html|
|~ |. |link%=|/cgi-bin/query/wasd_root/wasdoc/.../|
|~ |. |link%=|/cgi-bin/query/wasd_root/wasdoc/.../*.html|
|!table|
|3Forms-Based Search|
|^ A "forms-based" search is initiated by the server receiving a file
specification, which of course may contain wildcards, followed by a |/search|
parameter. This is a typical HTML |/forms| format URL. For example:
|code|
*.txt?search=SIMPLE
/web/.../*.*?search=THIS
sub_directory/*.*?search=THAT
../sibling_directory/*.HTML?search=OTHER
|!code|
|^ The following link provides an online demonstration search using the
form-based syntax.
|^+ |link%=|/wasd_root/wasdoc/env/*.*?search=formatted|
|3Search Options|
|^ Additional URI components may be appended after the initial "search="
parameter. These are appended with intervening "&") characters.
|bullet|
|item| |*Case-Sensitivity |-|| An optional URI component of
"case=yes" or "case=no" makes the search case-sensistive or
case-insensistive (the default). The following example illustrates the use of
this syntax:
|table|
|~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=Protocol&case=yes|
|. case-sensistive search for "Protocol"
|~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=PrOtOcOl&case=no|
|. case-|*in||sensistive search for "PrOtOcOl"
|!table|
|item| |*Hits |-|| An optional URI component of "hits=document" or "hits=line"
makes the search results be presented by-document (file) or by line-by-line
(the default). The following example illustrates the use of this syntax:
/web/html/.../*.html?search=protocol&hits=document
/web/html/.../*.html?search=protocol&hits=line
|table|
|~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=protocol&hits=document|
|. search result granularity by document
|~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=protocol&hits=line|
|. search result granularity by line (the default)
|!table|
|!bullet|
|3Example Search Form|
|^ To allow the client to enter a search string and submit a search to the
server a HTML level 2 |/form| construct can be used. Here is an example:
|code|
|!code|
|^ The following provides an online demonstration of the form used above:
|asis+|
||||
|0Bells and Whistles|
|^ A form providing all the options refered to in |link|Search Options| is
shown below (some additional white-space introduced for clarity):
|code|
|!code|
|^ The following provides an online demonstration of the form used above:
|asis+|
||||