Configuration Logging

27 06 2009

Apache is able to report to a client a great deal of what is happening to it internally. The necessary module is contained in the mod_info.c file, which should be included at build time. It provides a comprehensive overview of the server configuration, including all installed modules and directives in the configuration files. This module is not compiled into the server by default. To enable it, either load the corresponding module if you are running Win32 or Unix with DSO support enabled, or add the following line to the server build Config file and rebuild the server:

AddModule modules/standard/mod_info.o

It should also be noted that if mod_info is compiled into the server, its handler capability is available in all configuration files, including per-directory files (e.g., .htaccess). This may have security-related ramifications for your site. To demonstrate how this facility can be applied to any site, the Config file on …/site.info is the …/site.authent file slightly modified:

User webuser
Group webgroup
ServerName www.butterthlies.com

NameVirtualHost 192.168.123.2

LogLevel debug

<VirtualHost www.butterthlies.com>
#CookieLog logs/cookies
AddModuleInfo mod_setenvif.c "This is what I've added to mod_setenvif"
ServerAdmin sales@butterthlies.com
DocumentRoot /usr/www/APACHE3/site.info/htdocs/customers
ServerName www.butterthlies.com
ErrorLog /usr/www/APACHE3/site.info/logs/error_log
TransferLog /usr/www/APACHE3/site.info/logs/customers/access_log
ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin

<Location /server-info>
SetHandler server-info
</Location>

</VirtualHost>

<VirtualHost sales.butterthlies.com>
CookieLog logs/cookies
ServerAdmin sales_mgr@butterthlies.com
DocumentRoot /usr/www/APACHE3/site.info/htdocs/salesmen
ServerName sales.butterthlies.com
ErrorLog /usr/www/APACHE3/site.info/logs/error_log
TransferLog /usr/www/APACHE3/site.info/logs/salesmen/access_log
ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin
<Directory /usr/www/APACHE3/site.info/htdocs/salesmen>
AuthType Basic
#AuthType Digest
AuthName darkness

AuthUserFile /usr/www/APACHE3/ok_users/sales
AuthGroupFile /usr/www/APACHE3/ok_users/groups

#AuthDBMUserFile /usr/www/APACHE3/ok_dbm/sales
#AuthDBMGroupFile /usr/www/APACHE3/ok_dbm/groups

#AuthDigestFile /usr/www/APACHE3/ok_digest/sales
require valid-user
satisfy any
order deny,allow
allow from 192.168.123.1
deny from all
#require user daphne bill
#require group cleaners
#require group directors
</Directory>

<Directory /usr/www/APACHE3/cgi-bin>
AuthType Basic
AuthName darkness
AuthUserFile /usr/www/APACHE3/ok_users/sales
AuthGroupFile /usr/www/APACHE3/ok_users/groups
#AuthDBMUserFile /usr/www/APACHE3/ok_dbm/sales
#AuthDBMGroupFile /usr/www/APACHE3/ok_dbm/groups
require valid-user
</Directory>

</VirtualHost>

Note the AddModuleInfo line and the <Location ...> block.

1 AddModuleInfo

The AddModule directive allows the content of string to be shown as HTML-interpreted additional information for the module module-name.

AddModuleInfo module-name string
Server config, virtual host

For example:

AddModuleInfo mod_auth.c 'See <A HREF="http://www.apache.org/docs/mod/
    mod auth.html">http://www.apache.org/docs/mod/mod_auth.html</A>'

To invoke the module, browse to www.butterthlies.com/server-info,and you will see something like the following:

Apache Server Information
Server Settings, mod_setenvif.c, mod_usertrack.c, mod_auth_digest.c, mod_auth_db.c,
mod_auth_anon.c, mod_auth.c, mod_access.c, mod_rewrite.c, mod_alias.c, mod_userdir.c,
mod_actions.c, mod_imap.c, mod_asis.c, mod_cgi.c, mod_dir.c, mod_autoindex.c, mod_
include.c, mod_info.c, mod_status.c, mod_negotiation.c, mod_mime.c, mod_log_config.c,
mod_env.c, http_core.c
Server Version: Apache/1.3.14 (Unix)
Server Built: Feb 13 2001 15:20:23
API Version: 19990320:10
Run Mode: standalone
User/Group: webuser(1000)/1003
Hostname/port: www.butterthlies.com:0
Daemons: start: 5 min idle: 5 max idle: 10 max: 256
Max Requests: per child: 0 keep alive: on max per connection: 100
Threads: per child: 0
Excess requests: per child: 0
Timeouts: connection: 300 keep-alive: 15
Server Root: /usr/www/APACHE3/site.info
Config File: /usr/www/APACHE3/site.info/conf/httpd.conf
PID File: logs/httpd.pid
Scoreboard File: logs/apache_runtime_status

Module Name: mod_setenvif.c
Content handlers: none
Configuration Phase Participation: Create Directory Config, Merge Directory Configs,
Create Server Config, Merge Server Configs
Request Phase Participation: Post-Read Request, Header Parse
Module Directives:
SetEnvIf - A header-name, regex and a list of variables.
SetEnvIfNoCase - a header-name, regex and a list of variables.
BrowserMatch - A browser regex and a list of variables.
BrowserMatchNoCase - A browser regex and a list of variables.
Current Configuration:
Additional Information:
This is what I've added to mod_setenvif
............

The file carries on to document all the compiled-in modules.





Apache’s Logging Facilities

27 06 2009

Apache offers a wide range of options for controlling the format of the log files. In line with current thinking, older methods (RefererLog, AgentLog, and CookieLog) have now been replaced by the config_log_module. To illustrate this, we have taken … /site.authent and copied it to … /site.logging so that we can play with the logs:

User webuser
Group webgroup
ServerName www.butterthlies.com

IdentityCheck	on
NameVirtualHost 192.168.123.2
<VirtualHost www.butterthlies.com>
LogFormat "customers: host %h, logname %l, user %u, time %t, request %r,
    status %s,bytes %b,"
CookieLog logs/cookies
ServerAdmin sales@butterthlies.com
DocumentRoot /usr/www/APACHE3/site.logging/htdocs/customers
ServerName www.butterthlies.com
ErrorLog /usr/www/APACHE3/site.logging/logs/customers/error_log
TransferLog /usr/www/APACHE3/site.logging/logs/customers/access_log
ScriptAlias /cgi_bin /usr/www/APACHE3/cgi_bin
</VirtualHost>
<VirtualHost sales.butterthlies.com>
LogFormat "sales: agent %{httpd_user_agent}i, cookie: %{http_Cookie}i,
    referer: %{Referer}o, host %!200h, logname %!200l, user %u, time %t,
    request %r, status %s,bytes %b,"
CookieLog logs/cookies
ServerAdmin sales_mgr@butterthlies.com
DocumentRoot /usr/www/APACHE3/site.logging/htdocs/salesmen
ServerName sales.butterthlies.com
ErrorLog /usr/www/APACHE3/site.logging/logs/salesmen/error_log
TransferLog /usr/www/APACHE3/site.logging/logs/salesmen/access_log
ScriptAlias /cgi_bin /usr/www/APACHE3/cgi_bin
<Directory /usr/www/APACHE3/site.logging/htdocs/salesmen>
AuthType Basic
AuthName darkness
AuthUserFile /usr/www/APACHE3/ok_users/sales
AuthGroupFile /usr/www/APACHE3/ok_users/groups
require valid-user
</Directory>
<Directory /usr/www/APACHE3/cgi_bin>
AuthType Basic
AuthName darkness
AuthUserFile /usr/www/APACHE3/ok_users/sales
AuthGroupFile /usr/www/APACHE3/ok_users/groups
#AuthDBMUserFile /usr/www/APACHE3/ok_dbm/sales
#AuthDBMGroupFile /usr/www/APACHE3/ok_dbm/groups
require valid-user
</Directory>
</VirtualHost>

There are a number of directives.

ErrorLog

ErrorLog filename|syslog[:facility]
Default: ErrorLog logs/error_log
Server config, virtual host

The ErrorLog directive sets the name of the file to which the server will log any errors it encounters. If the filename does not begin with a slash (/), it is assumed to be relative to the server root.

figs/unix.gif

If the filename begins with a pipe (|), it is assumed to be a command to spawn a file to handle the error log.

Apache 1.3 and Above

figs/unix.gif

Using syslog instead of a filename enables logging via syslogd(8) if the system supports it. The default is to use syslog facility local7, but you can override this by using the syslog:facility syntax, where facility can be one of the names usually documented in syslog(1). Using syslog allows you to keep logs for multiple servers in a centralized location, which can be very convenient in larger installations.

Your security could be compromised if the directory where log files are stored is writable by anyone other than the user who starts the server.

TransferLog

TransferLog [ file | "| command "]
Default: none
Server config, virtual host

TransferLog specifies the file in which to store the log of accesses to the site. If it is not explicitly included in the Config file, no log will be generated.

file
This is a filename relative to the server root (if it doesn’t start with a slash), or an absolute path (if it does).

command
Note the format: "| command". The double quotes are needed in the Config file. command is a program to receive the agent log information on its standard input. Note that a new program is not started for a virtual host if it inherits the TransferLog from the main server. If a program is used, it runs using the permissions of the user who started httpd. This is root if the server was started by root, so be sure the program is secure. A useful Unix program to which to send is rotatelogs,[1] which can be found in the Apache support subdirectory. It closes the log periodically and starts a new one, and it’s useful for long-term archiving and log processing. Traditionally, this is done by shutting Apache down, moving the logs elsewhere, and then restarting Apache, which is obviously no fun for the clients connected at the time!

AgentLog

AgentLog file-pipe
AgentLog logs/agent_log
Server config, virtual host
Not in Apache v2

The AgentLog directive sets the name of the file to which the server will log the User-Agent header of incoming requests. file-pipe is one of the following:

A filename
A filename relative to the ServerRoot.
"| <command>"

This is a program to receive the agent log information on its standard input. Note that a new program will not be started for a VirtualHost if it inherits the AgentLog from the main server.

This directive is provided for compatibility with NCSA 1.4.

LogLevel

LogLevel level
Default: error
Server config, virtual host

LogLevel controls the amount of information recorded in the error_log file. The levels are as follows:

emerg
The system is unusable — exiting. For example:

"Child cannot open lock file. Exiting"
alert
Immediate action is necessary. For example:

"getpwuid: couldn't determine user name from uid"
crit
Critical condition. For example:

"socket: Failed to get a socket, exiting child"
error
Client is not getting a proper service. For example:

"Premature end of script headers"
warn
Nonthreatening problems, which may need attention. For example:

"child process 1234 did not exit, sending another SIGHUP"
notice
Normal events, which may need to be evaluated. For example:

"httpd: caught SIGBUS, attempting to dump core in ..."
info
For example:

"Server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers)..."
debug
Logs normal events for debugging purposes.

Each level will report errors that would have been printed by higher levels. Use debug for development, then switch to, say, crit for production. Remember that if each visitor on a busy site generates one line in the error_log, the hard disk will soon fill up and stop the system.

LogFormat

LogFormat format_string [nickname]
Default: "%h %l %u %t \"%r\" %s %b"
Server config, virtual host

LogFormat sets the information to be included in the log file and the way in which it is written. The default format is the Common Log Format (CLF), which is expected by off-the-shelf log analyzers such as wusage (http://www.boutell.com/) or ANALOG, so if you want to use one of them, leave this directive alone. The CLF format is as follows:

host ident authuser date request status bytes
host
Hostname of the client or its IP number.

ident
If IdentityCheck is enabled and the client machine runs identd, the identity information reported by the client. (This can cause performance issues as the server makes identd requests that may or may not be answered.)

authuser
If the request was for a password-protected document, is the user ID.

date
The date and time of the request, in the following format:

[day/month/year:hour:minute:second  tzoffset].
request
Request line from client, in double quotes.

status
Three-digit status code returned to the client.

bytes
The number of bytes returned, excluding headers.

The log format can be customized using a format_string. The commands in it have the format %[condition]key_letter ; the condition need not be present. If it is and the specified condition is not met, the output will be a -. The key_letter s are as follows:

%...a: Remote IP-address
%...A: Local IP-address
%...B: Bytes sent, excluding HTTP headers.
%...b: Bytes sent, excluding HTTP headers. In CLF format i.e. a '-' rather than a 0
when no bytes are sent.
%...{Foobar}C: The contents of cookie "Foobar" in the request sent to the server.
%...D: The time taken to serve the request, in microseconds.
%...{FOOBAR}e: The contents of the environment variable FOOBAR
%...f: Filename
%...h: Remote host
%...H The request protocol
%...{Foobar}i: The contents of Foobar: header line(s) in the request sent to the
server.
%...l: Remote logname (from identd, if supplied)
%...m The request method
%...{Foobar}n: The contents of note "Foobar" from another module.
%...{Foobar}o: The contents of Foobar: header line(s) in the reply.
%...p: The canonical Port of the server serving the request
%...P: The process ID of the child that serviced the request.
%...q The query string (prepended with a ? if a query string exists, otherwise an
empty string) %...r: First line of request
%...s: Status. For requests that got internally redirected, this is the status of the
*original* request ---
%...>s for the last.
%...t: Time, in common log format time format (standard english format) %...
{format}t: The time, in the form given by format, which should be in strftime(3)
format. (potentially localized)
%...T: The time taken to serve the request, in seconds.
%...u: Remote user (from auth; may be bogus if return status (%s) is 401)
%...U: The URL path requested, not including any query string.
%...v: The canonical ServerName of the server serving the request.
%...V: The server name according to the UseCanonicalName setting.
%...X: Connection status when response is completed. 'X' = connection aborted before
the response completed. '+' = connection may be kept alive after the response is
sent. '-' = connection will be closed after the response is sent. (This directive was
%...c in late versions of Apache 1.3, but this conflicted with the historical ssl %...{var}c syntax.)

The format string can contain ordinary text of your choice in addition to the % directives.

CustomLog

CustomLog file|pipe format|nickname
Server config, virtual host

The first argument is the filename to which log records should be written. This is used exactly like the argument to TransferLog; that is, it is either a full path, relative to the current server root, or a pipe to a program.

The format argument specifies a format for each line of the log file. The options available for the format are exactly the same as those for the argument of the LogFormat directive. If the format includes any spaces (which it will in almost all cases), it should be enclosed in double quotes.

Instead of an actual format string, you can use a format nickname defined with the LogFormat directive.

10.2.1 site.authent — Another Example

site.authent is set up with two virtual hosts, one for customers and one for salespeople, and each has its own logs in … /logs/customers and … /logs/salesmen. We can follow that scheme and apply one LogFormat to both, or each can have its own logs with its own LogFormats inside the <VirtualHost> directives. They can also have common log files, set up by moving ErrorLog and TransferLog outside the <VirtualHost> sections, with different LogFormats within the sections to distinguish the entries. In this last case, the LogFormat files could look like this:

<VirtualHost www.butterthlies.com>
LogFormat "Customer:..."
...
</VirtualHost>

<VirtualHost sales.butterthlies.com>
LogFormat "Sales:..."
...
</VirtualHost>

Let’s experiment with a format for customers, leaving everything else the same:

<VirtualHost www.butterthlies.com>
LogFormat "customers: host %h, logname %l, user %u, time %t, request %r
    status %s, bytes %b,"
...

We have inserted the words host, logname, and so on to make it clear in the file what is doing what. In real life you probably wouldn’t want to clutter the file up in this way because you would look at it regularly and remember what was what or, more likely, process the logs with a program that would know the format. Logging on to www.butterthlies.com and going to summer catalog produces this log file:

customers: host 192.168.123.1, logname unknown, user -, time [07/Nov/
    1996:14:28:46 +0000], request GET / HTTP/1.0, status 200,bytes -
customers: host 192.168.123.1, logname unknown, user -, time [07/Nov/
    1996:14:28:49 +0000], request GET /hen.jpg HTTP/1.0, status 200,
    bytes 12291,
customers: host 192.168.123.1, logname unknown, user -, time [07/Nov
    /1996:14:29:04 +0000], request GET /tree.jpg HTTP/1.0, status 200,
    bytes 11532,
customers: host 192.168.123.1, logname unknown, user -, time [07/Nov/
    1996:14:29:19 +0000], request GET /bath.jpg HTTP/1.0, status 200,
    bytes 5880,

This is not too difficult to follow. Notice that while we have logname unknown, the user is -, the usual report for an unknown value. This is because customers do not have to give an ID; the same log for salespeople, who do, would have a value here.

We can improve things by inserting lists of conditions based on the error codes after the % and before the command letter. The error codes are defined in the HTTP 1.0 specification:

200 OK
302 Found
304 Not Modified
400 Bad Request
401 Unauthorized
403 Forbidden
404 Not found
500 Server error
503 Out of resources
501 Not Implemented
502 Bad Gateway

The list from HTTP 1.1 is as follows:

100  Continue
101  Switching Protocols
200  OK
201  Created
202  Accepted
203  Non-Authoritative Information
204  No Content
205  Reset Content
206  Partial Content
300  Multiple Choices
301  Moved Permanently
302  Moved Temporarily
303  See Other
304  Not Modified
305  Use Proxy
400  Bad Request
401  Unauthorized
402  Payment Required
403  Forbidden
404  Not Found
405  Method Not Allowed
406  Not Acceptable
407  Proxy Authentication Required
408  Request Time-out
409  Conflict
410  Gone
411  Length Required
412  Precondition Failed
413  Request Entity Too Large
414  Request-URI Too Large
415  Unsupported Media Type
500  Internal Server Error
501  Not Implemented
502  Bad Gateway
503  Service Unavailable
504  Gateway Time-out
505  HTTP Version not supported

You can use ! before a code to mean “if not.” !200 means “log this if the response was not OK.” Let’s put this in salesmen:

<VirtualHost sales.butterthlies.com>
LogFormat "sales: host %!200h, logname %!200l, user %u, time %t, request %r,
    status %s,bytes %b,"
...

An attempt to log in as fred with the password don't know produces the following entry:

sales: host 192.168.123.1, logname unknown, user fred, time [19/Aug/
    1996:07:58:04 +0000], request GET HTTP/1.0, status 401, bytes -

However, if it had been the infamous bill with the password theft, we would see:

host -, logname -, user bill, ...

because we asked for host and logname to be logged only if the request was not OK. We can combine more than one condition, so that if we only want to know about security problems on sales, we could log usernames only if they failed to authenticate:

LogFormat "sales: bad user: %400,401,403u"

We can also extract data from the HTTP headers in both directions:

%[condition]{user-agent}i

This prints the user agent (i.e., the software the client is running) if condition is met. The old way of doing this was AgentLog logfile and ReferLog logfile.





Logging by Script and Database

27 06 2009

If your site uses a database manager, you could sidestep this cumbersome procedure by writing scripts on the fly to log everything you want to know about your visitors, reading data about them from the environment variables, and recording their choices as they work through the site. Depending on your needs, it can be much easier to log the data directly than to mine it out of the log files. For instance, one of the authors (PL) has a medical encyclopedia web site (www.Medic-Planet.com). Simple Perl scripts write database records to keep track of the following:

  • How often each article has been read
  • How visitors got to it
  • How often search engine spiders visit and who they are
  • How often visitors click through the many links on the site and where they go

Having stored this useful information in the database manager, it is then not hard to write a script, accessed via an SSL connection (see Chapter 11), which can only be accessed by the site management to generate HTML reports with totals and statistics that illuminate marketing problems.





Performance

27 06 2009

The proxy server’s performance can be improved by caching incoming pages so that the next time one is called for, it can be served straight up without having to waste time going over the Web. We can do the same thing for outgoing pages, particularly pages generated on the fly by CGI scripts and database accesses (bearing in mind that this can lead to stale content and is not invariably desirable).

1 Inward Caching

Another reason for using a proxy server is to cache data from the Web to save the bandwidth of the world’s clogged telephone systems and therefore to improve access time on our server. Note, however, that it in practice it often saves bandwidth at the expense of increased access times.

The directive CacheRoot, cunningly inserted in the Config file shown earlier, and the provision of a properly permissioned cache directory allow us to show this happening. We start by providing the directory … /site.proxy/cache, and Apache then improves on it with some sort of directory structure like … /site.proxy/cache/d/o/j/gfqbZ@49rZiy6LOCw.

The file gfqbZ@49rZiy6LOCw contains the following:

320994B6 32098D95 3209956C 00000000 0000001E
X-URL: http://192.168.124.1/message
HTTP/1.0 200 OK
Date: Thu, 08 Aug 1996 07:18:14 GMT
Server: Apache/1.1.1
Content-length: 30
Last-modified Thu, 08 Aug 1996 06:47:49 GMT

I am a web site far out there

Next time someone wants to access http://192.168.124.1/message, the proxy server does not have to lug bytes over the Web; it can just go and look it up.

There are a number of housekeeping directives that help with caching.

CacheRoot

CacheRoot directory
Default: none
Server config, virtual host

This directive sets the directory to contain cache files; must be writable by Apache.

CacheSize

CacheSize size_in_kilobytes
Default: 5
Server config, virtual host

This directive sets the size of the cache area in kilobytes. More may be stored temporarily, but garbage collection reduces it to less than the set number.

CacheGcInterval

CacheGcInterval hours
Default: never
Server config, virtual host

This directive specifies how often, in hours, Apache checks the cache and does a garbage collection if the amount of data exceeds CacheSize.

CacheMaxExpire

CacheMaxExpire hours
Default: 24
Server config, virtual host

This directive specifies how long cached documents are retained. This limit is enforced even if a document is supplied with an expiration date that is further in the future.

CacheLastModifiedFactor

CacheLastModifiedFactor factor
Default: 0.1
Server config, virtual host

If no expiration time is supplied with the document, then estimate one by multiplying the time since last modification by factor. CacheMaxExpire takes precedence.

CacheDefaultExpire

CacheDefaultExpire hours
Default: 1
Server config, virtual host

If the document is fetched by a protocol that does not support expiration times, use this number. CacheMaxExpire does not override it.

CacheDirLevels and CacheDirLength

CacheDirLevels number
Default: 3
CacheDirLength number
Default: 1
Server config, virtual host

The proxy module stores its cache with filenames that are a hash of the URL. The filename is split into CacheDirLevels of directory using CacheDirLength characters for each level. This is for efficiency when retrieving the files (a flat structure is very slow on most systems). So, for example:

CacheDirLevels 3
CacheDirLength 2

converts the hash “abcdefghijk” into ab/cd/ef/ghijk. A real hash is actually 22 characters long, each character being one of a possible 64 (26), so that three levels, each with a length of 1, gives 218 directories. This number should be tuned to the anticipated number of cache entries (218 being roughly a quarter of a million, and therefore good for caches up to several million entries in size).

CacheNegotiatedDocs

CacheNegotiatedDocs
Default: none
Server config, virtual host

If present in the Config file, this directive allows content-negotiated documents to be cached by proxy servers. This could mean that clients behind those proxys could retrieve versions of the documents that are not the best match for their abilities, but it will make caching more efficient.

This directive only applies to requests that come from HTTP 1.0 browsers. HTTP 1.1 provides much better control over the caching of negotiated documents, and this directive has no effect on responses to HTTP 1.1 requests. Note that very few browsers are HTTP 1.0 anymore.

NoCache

NoCache [host|domain] [host|domain] ...

This directive specifies a list of hosts and/or domains, separated by spaces, from which documents are not cached, such as the site delivering your real-time stock market quotes .





Apparent Bug

27 06 2009

When a server is set up as a proxy, then requests of the form:

GET http://someone.else.com/ HTTP/1.0

are accepted and proxied to the appropriate web server. By default, Apache does not proxy, but it can appear that it is prepared to — requests like the previous will be accepted and handled by the default configuration. Apache assumes that someone.else.com is a virtual host on the current machine. People occasionally think this is a bug, but it is, in fact, correct behavior. Note that pages served will be the same as those that would be served for any real unknown virtual host on the same machine, so this does not pose a security risk.





Proxy Directives

27 06 2009

We are not concerned here with firewalls, so we take them for granted. The interesting thing is how we configure the proxy Apache to make life with a firewall tolerable to those behind it.

site.proxy has three subdirectories: cache, proxy, real. The Config file from … /site. proxy/proxy is as follows:

User webuser
Group webgroup
ServerName www.butterthlies.com

Port 8000
ProxyRequests on
CacheRoot /usr/www/APACHE3/site.proxy/cache
CacheSize 1000

The points to notice are as follows:

  • On this site we use ServerName www.butterthlies.com.
  • The Port number is set to 8000 so we don’t collide with the real web server running on the same machine.
  • We turn ProxyRequests on and provide a directory for the cache, which we will discuss later in this chapter.
  • CacheRoot is set up in a special directory.
  • CacheSize is set to 1000 kilobytes.
AllowCONNECT

AllowCONNECT port [port] ...
AllowCONNECT 443 563
Server config, virtual host
Compatibility: AllowCONNECT is only available in Apache 1.3.2 and later.

The AllowCONNECT directive specifies a list of port numbers to which the proxy CONNECT method may connect. Today’s browsers use this method when a https connection is requested and proxy tunneling over http is in effect.

By default, only the default https port (443) and the default snews port (563) are enabled. Use the AllowCONNECT directive to override this default and allow connections to the listed ports only.

ProxyRequests

ProxyRequests [on|off]
Default: off
Server config

This directive turns proxy serving on. Even if ProxyRequests is off, ProxyPass directives are still honored.

ProxyRemote

ProxyRemote match remote-server
Server config

This directive defines remote proxies to this proxy (that is, proxies that should be used for some requests instead of being satisfied directly). match is either the name of a URL scheme that the remote server supports, a partial URL for which the remote server should be used, or * to indicate that the server should be contacted for all requests. remote-server is the URL that should be used to communicate with the remote server (i.e., it is of the form protocol://hostname[:port]). Currently, only HTTP can be used as the protocol for the remote-server. For example:

  ProxyRemote ftp http://ftpproxy.mydomain.com:8080
  ProxyRemote http://goodguys.com/ http://mirrorguys.com:8000
  ProxyRemote * http://cleversite.com
ProxyPass

ProxyPass path url
Server config

This command runs on an ordinary server and translates requests for a named directory and below to a demand to a proxy server. So, on our ordinary Butterthlies site, we might want to pass requests to /secrets onto a proxy server darkstar.com:

ProxyPass /secrets http://darkstar.com

Unfortunately, this is less useful than it might appear, since the proxy does not modify the HTML returned by darkstar.com. This means that URLs embedded in the HTML will refer to documents on the main server unless they have been written carefully. For example, suppose a document one.html is stored on darkstar.com with the URL http://darkstar.com/one.html, and we want it to refer to another document in the same directory. Then the following links will work, when accessed as http://www.butterthlies.com/secrets/one.html:

<A HREF="two.html">Two</A>
<A HREF="/secrets/two.html">Two</A>
<A HREF="http://darkstar.com/two.html">Two</A>

But this example will not work:

<A HREF="/two.html">Not two</A>

When accessed directly, through http://darkstar.com/one.html, these links work:

<A HREF="two.html">Two</A>
<A HREF="/two.html">Two</A>
<A HREF="http://darkstar.com/two.html">Two</A>

But the following doesn’t:

<A HREF="/secrets/two.html">Two</A>
ProxyDomain

ProxyDomain domain
Server config

This directive tends to be useful only for Apache proxy servers within intranets. The ProxyDomain directive specifies the default domain to which the Apache proxy server will belong. If a request to a host without a fully qualified domain name is encountered, a redirection response to the same host with the configured domain appended will be generated. The point of this is that users on intranets often only type the first part of the domain name into the browser, but the server requires a fully qualified domain name to work properly.

NoProxy

NoProxy { domain | subnet | ip_addr | hostname }
Server config

The NoProxy directive specifies a list of subnets, IP addresses, hosts, and/or domains, separated by spaces. A request to a host that matches one or more of these is always served directly, without forwarding to the configured ProxyRemote proxy server(s).

ProxyPassReverse

ProxyPassReverse path url
Server config, virtual host

A reverse proxy is a way to masquerade one server as another — perhaps because the “real” server is behind a firewall or because you want part of a web site to be served by a different machine but not to look that way. It can also be used to share loads between several servers — the frontend server simply accepts requests and forwards them to one of several backend servers. The optional module mod_rewrite has some special stuff in it to support this. This directive lets Apache adjust the URL in the Location response header. If a ProxyPass (or mod_rewrite) has been used to do reverse proxying, then this directive will rewrite Location headers coming back from the reverse-proxied server so that they look as if they came from somewhere else (normally this server, of course).

ProxyVia

ProxyVia on|off|full|block
Default: ProxyVia off
Server config, virtual host

This directive controls the use of the Via: HTTP header by the proxy. Its intended use is to control the flow of proxy requests along a chain of proxy servers. See RFC2068 (HTTP 1.1) for an explanation of Via: header lines.

  • If set to off, which is the default, no special processing is performed. If a request or reply contains a Via: header, it is passed through unchanged.
  • If set to on, each request and reply will get a Via: header line added for the current host.
  • If set to full, each generated Via: header line will additionally have the Apache server version shown as a Via: comment field.
  • If set to block, every proxy request will have all its Via: header lines removed. No new Via: header will be generated.
ProxyReceiveBufferSize

ProxyReceiveBufferSize bytes
Default: None
Server config, virtual host

The ProxyReceiveBufferSize directive specifies an explicit network buffer size for outgoing HTTP and FTP connections for increased throughput. It has to be greater than 512 or set to 0 to indicate that the system’s default buffer size should be used.

Example

ProxyReceiveBufferSize 2048
ProxyBlock

ProxyBlock *|word|host|domain [word|host|domain] ...
Default: None
Server config, virtual host

The ProxyBlock directive specifies a list of words, hosts and/or domains, separated by spaces. HTTP, HTTPS, and FTP document requests to sites whose names contain matched words, hosts, or domains that are blocked by the proxy server. The proxy module will also attempt to determine IP addresses of list items that may be hostnames during startup and cache them for match test as well. For example:

ProxyBlock joes-garage.com some-host.co.uk rocky.wotsamattau.edu

rocky.wotsamattau.edu would also be matched if referenced by IP address.

Note that wotsamattau would also be sufficient to match wotsamattau.edu.

Note also that:

ProxyBlock *

blocks connections to all sites.





Security

27 06 2009

An important concern on the Web is keeping the Bad Guys out of your network. One established technique is to keep the network hidden behind a firewall; this works well, but as soon as you do it, it also means that everyone on the same network suddenly finds that their view of the Net has disappeared (rather like people living near Miami Beach before and after the building boom). This becomes an urgent issue at Butterthlies, Inc., as competition heats up and naughty-minded Bad Guys keep trying to break our security and get in. We install a firewall and, anticipating the instant outcries from the marketing animals who need to get out on the Web and surf for prey, we also install a proxy server to get them out there.

So, in addition to the Apache that serves clients visiting our sites and is protected by the firewall, we need a copy of Apache to act as a proxy server to let us, in our turn, access other sites out on the Web. Without the proxy server, those inside are safe but blind.





Rewrite

27 06 2009

The preceding section described the Alias module and its allies. Everything these directives can do, and more, can be done instead by mod_rewrite.c, an extremely compendious module that is almost a complete software product in its own right. But for simple tasks Alias and friends are much easier to use.

The documentation is thorough, and the reader is referred to http://www.engelschall.com/pw/apache/rewriteguide/ for any serious work. You should also look at http://www.apache.org/docs/mod/mod_rewrite.html. This section is intended for orientation only.

Rewrite takes a rewriting pattern and applies it to the URL. If it matches, a rewriting substitution is applied to the URL. The patterns are regular expressions familiar to us all in their simplest form — for example, mod.*\.c, which matches any module filename. The complete science of regular expressions is somewhat extensive, and the reader is referred to … /src/regex/regex.7, a manpage that can be read with nroff -man regex.7 (on FreeBSD, at least). Regular expressions are also described in the POSIX specification and in Jeffrey Friedl’s Mastering Regular Expressions (O’Reilly, 2002).

It might well be worth using Perl to practice with regular expressions before using them in earnest. To make complicated expressions work, it is almost essential to build them up from simple ones, testing each change as you go. Even the most expert find that convoluted regular expressions often do not work the first time.

The essence of regular expressions is that a number of special characters can be used to match parts of incoming URLs. The substitutions available in mod_rewrite can include mapping functions that take bits of the incoming URL and look them up in databases or even apply programs to them. The rules can be applied repetitively and recursively to the evolving URL. It is possible (as the documentation says) to create “rewriting loops, rewriting breaks, chained rules, pseudo if-then-else constructs, forced redirects, forced MIME-types, forced proxy module throughout.” The functionality is so extensive that it is probably impossible to master it in the abstract. When and if you have a problem of this sort, it looks as if mod_rewrite can solve it, given enough intellectual horsepower on your part!

The module can be used in four situations:

  • By the administrator inside the server Config file to apply in all contexts. The rules are applied to all URLs of the main server and all URLs of the virtual servers.
  • By the administrator inside <VirtualHost> blocks. The rules are applied only to the URLs of the virtual server.
  • By the administrator inside <Directory> blocks. The rules are applied only to the specified directory.
  • By users in their .htaccess files. The rules are applied only to the specified directory.

The directives look simple enough.

RewriteEngine

RewriteEngine on_or_off
Server config, virtual host, directory

Enables or disables the rewriting engine. If off, no rewriting is done at all. Use this directive to switch off functionality rather than commenting out Rewrite-Rule lines.

RewriteLog

RewriteLog filename
Server config, virtual host

Sends logging to the specified filename. If the name does not begin with a slash, it is taken to be relative to the server root. This directive should appear only once in a Config file.

RewriteLogLevel

RewriteLogLevel number
Default number: 0
Server config, virtual host

Controls the verbosity of the logging: 0 means no logging, and 9 means that almost every action is logged. Note that any number above 2 slows Apache down.

RewriteMap

RewriteMap mapname {txt,dbm,prg,rnd,int}: filename
Server config, virtual host

Defines an external mapname file that inserts substitution strings through key lookup.Keys may be stored in a variety of formats, described as follows. The module passes mapname a query in the form:

$(mapname : Lookupkey | DefaultValue)

If the Lookupkey value is not found, DefaultValue is returned.

The type of mapname must be specified by the next argument:

txt
Indicates plain-text format — that is, an ASCII file with blank lines, comments that begin with #, or useful lines, in the format:

MatchingKey
SubstituteValue
dbm
Indicates DBM hashfile format — that is, a binary NDBM (the “new” dbm interface, now about 15 years old, also used for dbm auth) file containing the same material as the plain-text format file. You create it with any ndbm tool or by using the Perl script dbmmanage from the support directory of the Apache distribution.

prg
Indicates program format — that is, an executable (a compiled program or a CGI script) that is started by Apache. At each lookup, it is passed the key as a string terminated by newline on stdin and returns the substitution value, or the word NULL if lookup fails, in the same way on stdout. The manual gives two warnings:

  • Keep the program or script simple because if it hangs, it hangs the Apache server.
  • Don’t use buffered I/O on stdout because it causes a deadlock. In C, use:

    setbuf(stdout,NULL)

    In Perl, use:

    select(STDOUT); $|=1;]

rnd
Indicates randomized plain text, which is similar to the standard plain-text variant but has a special postprocessing feature: after looking up a value, it is parsed according to contained “|” characters that have the meaning of “or”. In other words, they indicate a set of alternatives from which the actual returned value is chosen randomly. Although this sounds crazy and useless, it was actually designed for load balancing in a reverse-proxy situation, in which the looked-up values are server names — each request to a reverse proxy is routed to a randomly selected server behind it. See also Section 12.6 in Chapter 12.

int
Indicates an internal Apache function. Two functions exist: toupper( ) and tolower( ), which convert the looked-up key to all upper- or all lowercase.

RewriteBase

RewriteBase BaseURL
directory, .htaccess

The effects of this command can be fairly easily achieved by using the rewrite rules, but it may sometimes be simpler to encapsulate the process. It explicitly sets the base URL for per-directory rewrites. If RewriteRule is used in an .htaccess file, it is passed a URL that has had the local directory stripped off so that the rules act only on the remainder. When the substitution is finished, RewriteBase supplies the necessary prefix. To quote the manual’s example in .htaccess:

Alias /xyz /abc/def"
RewriteBase   /xyz
RewriteRule   ^oldstuff\.html$  newstuff.html

In this example, a request to /xyz/oldstuff.html gets rewritten to the physical file /abc/def/newstuff.html. Internally, the following happens:

Request
/xyz/oldstuff.html

Internal processing
/xyz/oldstuff.html     -> /abc/def/oldstuff.html  (per-server Alias)
/abc/def/oldstuff.html -> /abc/def/newstuff.html  (per-dir    RewriteRule)
/abc/def/newstuff.html -> /xyz/newstuff.html      (per-dir    RewriteBase)
/xyz/newstuff.html     -> /abc/def/newstuff.html  (per-server Alias)
Result
/abc/def/newstuff.html

RewriteCond

RewriteCond TestString CondPattern
Server config, virtual host, directory

One or more RewriteCond directives can precede a RewriteRule directive to define conditions under which it is to be applied. CondPattern is a regular expression matched against the value retrieved for TestString, which contains server variables of the form %{NAME_OF_VARIABLE}, where NAME_OF_VARIABLE can be one of the following list:

API_VERSION PATH_INFO SERVER_PROTOCOL
AUTH_TYPE QUERY_STRING SERVER_SOFTWARE
DOCUMENT_ROOT REMOTE_ADDR THE_REQUEST
ENV:any_environment_variable REMOTE_HOST TIME
HTTP_ACCEPT REMOTE_USER TIME_DAY
HTTP_COOKIE REMOTE_IDENT TIME_HOUR
HTTP_FORWARDED REQUEST_FILENAME TIME_MIN
HTTP_HOST REQUEST_METHOD TIME_MON
HTTP_PROXY_CONNECTION REQUEST_URI TIME_SEC
HTTP_REFERER SCRIPT_FILENAME TIME_WDAY
HTTP_USER_AGENT SERVER_ADMIN TIME_YEAR
HTTP:any_HTTP_header SERVER_NAME
IS_SUBREQ SERVER_PORT

These variables all correspond to the similarly named HTTP MIME headers, C variables of the Apache server, or the current time. If the regular expression does not match, the RewriteRule following it does not apply.

RewriteLock

RewriteLock Filename
Server config

This directive sets the filename for a synchronization lockfile, which mod_rewrite needs to communicate with RewriteMap programs. Set this lockfile to a local path (not on a NFS-mounted device) when you want to use a rewriting map program. It is not required for other types of rewriting maps.

RewriteOptions

RewriteOptions Option
Default: None
Server config, virtual host, directory, .htaccess

The RewriteOptions directive sets some special options for the current per-server or per-directory configuration. Currently, there is only one Option:

inherit

This forces the current configuration to inherit the configuration of the parent. In per-virtual-server context this means that the maps, conditions, and rules of the main server are inherited. In per-directory context this means that conditions and rules of the parent directory’s .htaccess configuration are inherited.

RewriteRule

RewriteRule Pattern Substitution [flags]
Server config, virtual host, directory

This directive can be used as many times as necessary. Each occurrence applies the rule to the output of the preceding one, so the order matters. Pattern is matched to the incoming URL; if it succeeds, the Substitution is made. An optional argument, flags, can be given. The flags, which follow, can be abbreviated to one or two letters:

redirect|R
Force redirect.

proxy|P
Force proxy.

last|L
Last rule — go to top of rule with current URL.

chain|C
Apply following chained rule if this rule matches.

type|T= mime-type
Force target file to be mime-type.

nosubreq|NS
Skip rule if it is an internal subrequest.

env|E=VAR:VAL
Set an environment variable.

qsappend|QSA
Append a query string.

passthrough|PT
Pass through to next handler.

skip|S= num
Skip the next num rules.

next|N
Next round — start at the top of the rules again.

gone|G
Returns HTTP response 410 — “URL Gone.”

forbidden|F
Returns HTTP response 403 — “URL Forbidden.”

nocase|NC
Makes the comparison case insensitive.

For example, say we want to rewrite URLs of the form:

/Language/~Realname/.../File

into:

/u/Username/.../File.Language

We take the rewrite map file and save it under /anywhere/map.real-to-user. Then we only have to add the following lines to the Apache server Config file:

RewriteLog   /anywhere/rewrite.log
RewriteMap   real-to-user  txt:/anywhere/map.real-to-host
RewriteRule  ^/([^/]+)/~([^/]+)/(.*)$   /u/${real-to-user:$2|nobody}/$3.$1

1 A Rewrite Example

The Butterthlies salespeople seem to be taking their jobs more seriously. Our range has increased so much that the old catalog based around a single HTML document is no longer workable because there are too many cards. We have built a database of cards and a utility called cardinfo that accesses it using the arguments:

cardinfo cardid query

where cardid is the number of the card and query is one of the following words: “price,” “artist,” or “size.” The problem is that the salespeople are too busy to remember the syntax, so we want to let them log on to the card database as if it were a web site. For instance, going to http://sales.butterthlies.com/info/2949/price would return the price of card number 2949. The Config file is in … /site.rewrite :

User webuser
Group webgroup
# Apache requires this server name, although in this case it will
# never be used.
# This is used as the default for any server that does not match a
# VirtualHost section.
ServerName www.butterthlies.com

NameVirtualHost 192.168.123.2

<VirtualHost www.butterthlies.com>
ServerAdmin sales@butterthlies.com
DocumentRoot /usr/www/APACHE3/site.rewrite/htdocs/customers
ServerName www.butterthlies.com
ErrorLog /usr/www/APACHE3/site.rewrite/logs/customers/error_log
TransferLog /usr/www/APACHE3/site.rewrite/logs/customers/access_log
</VirtualHost>

<VirtualHost sales.butterthlies.com>
ServerAdmin sales_mgr@butterthlies.com
DocumentRoot /usr/www/APACHE3/site.rewrite/htdocs/salesmen
Options ExecCGI indexes
ServerName sales.butterthlies.com
ErrorLog /usr/www/APACHE3/site.rewrite/logs/salesmen/error_log
TransferLog /usr/www/APACHE3/site.rewrite/logs/salesmen/access_log
RewriteEngine on
RewriteLog logs/rewrite
RewriteLogLevel 9
RewriteRule ^/info/([^/]+)/([^/]+)$   /cgi-bin/cardinfo?$2+$1 [PT]
ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin
</VirtualHost>

In real life cardinfo would be an elaborate program. However, here we just have to show that it could work, so it is extremely simple:

#!/bin/sh
#
echo "content-type: text/html"
echo sales.butterthlies.com
echo "You made the query $1 on the card $2"

To make sure everything is in order before we do it for real, we turn RewriteEngine off and access http://sales.butterthlies.com/cgi-bin/cardinfo. We get back the following message:

The requested URL /info/2949/price was not found on this server.

This is not surprising. We now stop Apache, turn RewriteEngine on and restart with ./go. Look at the crucial line in the Config file:

RewriteRule ^/info/([^/]+)/([^/]+)$ /cgi-bin/cardinfo?$2+$1 [PT]

Translated into English, this means the following: at the start of the string, match /info/, followed by one or more characters that aren’t /, and put those characters into the variable $1 (the parentheses do this; $1 because they are the first set). Then match a /, then one or more characters aren’t /, and put those characters into $2. Then match the end of the string, and pass the result through [PT] to the next rule, which is ScriptAlias. We end up as if we had accessed http://sales.butterthlies.com/cgi-bin/cardinfo?<card ID>+<query>.

If the CGI script is on a different web server for some reason, we could write:

RewriteRule ^/info/([^/]+)/([^/]+)$ http://somewhere.else.com/cgi-bin/
    cardinfo?$2+$1 [PT]

Note that this pattern won’t match /info/123/price/fred because it has too many slashes in it.

If we run all this with ./go and access http://sales.butterthlies.com/info/2949/price from the client, we see the following message:

You made the query price on card 2949




Alias

27 06 2009

One of the most useful directives is Alias, which lets you store documents elsewhere. We can demonstrate this simply by creating a new directory, /usr/www/APACHE3/somewhere_else, and putting in it a file lost.txt, which has this message in it:

I am somewhere else

httpd2.conf has an extra line:

...
Alias /somewhere_else /usr/www/APACHE3/somewhere_else
...

Stop Apache and run ./go 2. From the browser, access http://www.butterthlies.com/somewhere_else/. We see the following:

Index of /somewhere_else
. Parent Directory
. lost.txt

If we click on Parent Directory, we arrive at the DocumentRoot for this server, /usr/www/APACHE3/site.alias/htdocs/customers, not, as might be expected, at /usr/www/APACHE3. This is because Parent Directory really means “parent URL,” which is http://www.butterthlies.com/ in this case.

What sometimes puzzles people (even those who know about it but have temporarily forgotten) is that if you go to http://www.butterthlies.com/ and there’s no ready-made index, you don’t see somewhere_else listed.

1 A Subtle Problem

Note that you do not want to write:

Alias /somewhere_else/ /usr/www/APACHE3/somewhere_else

The trailing / on the alias will prevent things working. To understand this, imagine that you start with a web server that has a subdirectory called fred in its DocumentRoot. That is, there’s a directory called /www/docs/fred, and the Config file says:

DocumentRoot /www/docs

The URL http://your.webserver.com/fred fails because there is no file called fred. However, the request is redirected by Apache to http://your.webserver.com/fred/, which is then handled by looking for the directory index of /fred.

So, if you have a web page that says:

<a href="/fred">Take a look at fred</a>

it will work. When you click on “Take a look at fred,” you get redirected, and your browser looks for:

 http://your.webserver.com/fred/

as its URL, and all is well.

One day, you move fred to /some/where/else. You alter your Config file:

Alias /fred/ /some/where/else

or, equally ill-advisedly:

Alias /fred/ /some/where/else/

You put the trailing / on the aliases because you wanted to refer to a directory. But either will fail. Why?

The URL http://your.webserver.com/fred fails because there is no file /www/docs/fred anymore. In spite of the altered line in the Config file, this is what the URL still maps to, because /fred doesn’t match /fred/, and Apache no longer has a reason to redirect.

But using this Alias (without the trailing / on the alias):

Alias /fred /some/where/else

means that http://your.webserver.com/fred maps to /some/where/else instead of /www/docs/fred. It is once more recognized as a directory and is automatically redirected to the right place.

Note that it would be wrong to make Apache detect this and do the redirect, because it is legitimate to actually have both a file called fred in /www/docs and an alias for /fred/ that sends requests for /fred/* elsewhere.

It would also be wrong to make Apache bodge the URL and add a trailing slash when it is clear that a directory is meant rather than a filename. The reason is that if a file in that directory wants to refer visitors to a subdirectory …/fred/bill, the new URL is made up by the browser. It can only do this if it knows that fred is a directory, and the only way it can get to know this is if Apache redirects the request for …/fred to /fred/.

The same effect was produced on our system by leaving the ServerName directive outside the VirtualHost block. This is because, being outside the VirtualHost block, it doesn’t apply to the virtual host. So the previously mentioned redirect doesn’t work because it uses ServerName in autogenerated redirects. Presumably this would only cause a problem depending on IPs, reverse DNS, and so forth.

Script

Script method cgi-script
Server config, virtual host, directory
Script is only available in Apache 1.1 and later; arbitrary method use is only
available with 1.3.10 and later.

This directive adds an action, which will activate cgi-script when a file is requested using the method of method. It sends the URL and file path of the requested document using the standard CGI PATH_INFO and PATH_TRANSLATED environment variables. This is useful if you want to compress on the fly, for example, or implement PUT.

Prior to Apache 1.3.10, method can only be one of GET, POST, PUT, or DELETE. As of 1.3.10, any arbitrary method name may be used. Method names are case sensitive, so Script PUT and Script put have two entirely different effects.

Note that the Script command defines default actions only. If a CGI script is called, or some other resource that is capable of handling the requested method internally, it will do so. Also note that Script with a method of GET will only be called if there are query arguments present (e.g., foo.html?hi). Otherwise, the request will proceed normally.

Examples

# For <ISINDEX>-style searching
Script GET /cgi-bin/search
# A CGI PUT handler
Script PUT /~bob/put.cgi
ScriptAlias

ScriptAlias url_path directory_or_filename 
Server config, virtual host

ScriptAlias allows scripts to be stored safely out of the way of prying fingers and, moreover, automatically marks the directory where they are stored as containing CGI scripts. For instance, see …site.cgi/conf/httpd0.conf:

...
ScriptAlias /cgi-bin/ /usr/www/apache3/cgi-bin/
...
ScriptAliasMatch

ScriptAliasMatch regex directory_or_filename 
Server config, virtual host

The supplied regular expression is matched against the URL; if it matches, the server will substitute any parenthesized matches into the given string and use them as a filename. For example, to activate the standard /cgi-bin, one might use:

ScriptAliasMatch ^/cgi-bin/(.*) /usr/local/apache/cgi-bin/$1

.* is a regular expression like those in Perl that match any character (.) any number of times (*). Here, this will be the name of the file we want to execute. Putting it in parentheses (.*) stores the characters in the variable $1, which is then invoked:

/usr/local/apache/cgi-bin/$1.

You can start the matching further along. If all your script filenames start with the letters “BT,” you could write:

ScriptAliasMatch ^/cgi-bin/BT(.*) /usr/local/apache/cgi-bin/BT$1

If the visitor got here by following a link on the web page:

...<a href="/cgi-bin/BTmyscript/customer56/ice_cream">...

ScriptAliasMatch will run BTmyscript. If it accesses the environment variable PATH_INFO , it will find /customer56/ice_cream.

You can have as many of these useful directives as you like in your Config file to cover different situations. For more information on regular expressions, see Mastering Regular Expressions by Jeffrey Friedl (O’Reilly, 2002) or Programming Perl by Larry Wall, Jon Orwant, and Tom Christiansen (O’Reilly, 2001).

ScriptInterpreterSource

ScriptInterpreterSource registry|script
Default: ScriptInterpreterSource script
directory, .htaccess

figs/win32.gif

This directive is used to control how Apache 1.3.5 and later finds the interpreter used to run CGI scripts. The default technique is to use the interpreter pointed to by the #! line in the script. Setting the ScriptInterpreterSource registry will cause the Windows registry to be searched using the script file extension (e.g., .pl) as a search key.

Alias

Alias url_path directory_or_filename
Server config, virtual host

Alias is used to map a resource’s URL to its physical location in the filesystem, regardless of where it is relative to the document root. For instance, see …/site.alias/conf/httpd.conf:

...
Alias /somewhere_else/ /usr/www/APACHE3/somewhere_else/
...

There is a directory /usr/www/APACHE3/somewhere_else/, which contains a file lost.txt. If we navigate to www.butterthlies.com/somewhere_else, we see:

Index of /somewhere_else
    Parent Directory
    lost.txt
AliasMatch

AliasMatch regex directory_or_filename
Server config, virtual host

Again, like ScriptAliasMatch, this directive takes a regular expression as the first argument. Otherwise, it is the same as Alias.

UserDir

UserDir directory
Default: UserDir public_html
Server config, virtual host

The basic idea here is that the client is asking for data from a user’s home directory. He asks for http://www.butterthlies.com/~peter, which means “Peter’s home directory on the computer whose DNS name is www.butterthlies.com.” The UserDir directive sets the real directory in a user’s home directory to use when a request for a document is received from a user. directory is one of the following:

  • The name of a directory or a pattern such as those shown in the examples that follow.
  • The keyword disabled. This turns off all username-to-directory translations except those explicitly named with the enabled keyword.
  • The keyword disabled followed by a space-delimited list of usernames. Usernames that appear in such a list will never have directory translation performed, even if they appear in an enabled clause.
  • The keyword enabled followed by a space-delimited list of usernames. These usernames will have directory translation performed even if a global disable is in effect, but not if they also appear in a disabled clause.

If neither the enabled nor the disabled keyword appears in the UserDir directive, the argument is treated as a filename pattern and is used to turn the name into a directory specification. A request for http://www.foo.com/~bob/one/two.html will be translated as follows:

UserDir public_html     -> ~bob/public_html/one/two.html
UserDir /usr/web        -> /usr/web/bob/one/two.html
UserDir /home/*/www/APACHE3     -> /home/bob/www/APACHE3/one/two.html

The following directives will send the redirects shown to their right to the client:

UserDir http://www.foo.com/users -> http://www.foo.com/users/bob/one/two.html
UserDir http://www.foo.com/*/usr -> http://www.foo.com/bob/usr/one/two.html
UserDir http://www.foo.com/~*/   -> http://www.foo.com/~bob/one/two.html

Be careful when using this directive; for instance, UserDir ./ would map /~root to /, which is probably undesirable. If you are running Apache 1.3 or above, it is strongly recommended that your configuration include a UserDir disabled root declaration.

figs/win32.gif

Under Win32, Apache does not understand home directories, so translations that end up in home directories on the righthand side (see the first example) will not work.

Redirect

Redirect [status] url-path url
Server config, virtual host, directory, .htaccess

The Redirect directive maps an old URL into a new one. The new URL is returned to the client, which attempts to fetch the information again from the new address. url-path is a (%-decoded) path; any requests for documents beginning with this path will be returned a redirect error to a new (%-encoded) URL beginning with url.

Example

Redirect /service http://foo2.bar.com/service

If the client requests http://myserver/service/foo.txt, it will be told to access http://foo2.bar.com/service/foo.txt instead.

If no status argument is given, the redirect will be “temporary” (HTTP status 302). This indicates to the client that the resource has moved temporarily. The status argument can be used to return other HTTP status codes:

permanent
Returns a permanent redirect status (301) indicating that the resource has moved permanently.

temp
Returns a temporary redirect status (302). This is the default.

seeother
Returns a “See Other” status (303) indicating that the resource has been replaced.

gone
Returns a “Gone” status (410) indicating that the resource has been permanently removed. When this status is used, the url argument should be omitted.

Other status codes can be returned by giving the numeric status code as the value of status. If the status is between 300 and 399, the url argument must be present, otherwise it must be omitted. Note that the status must be known to the Apache code (see the function send_error_response in http_protocol.c).

RedirectMatch

RedirectMatch regex url
Server config, virtual host, directory, .htaccess

Again, RedirectMatch works like Redirect, except that it takes a regular expression (discussed earlier under ScriptAliasMatch) as its first argument.

In the Butterthlies business, sad to relate, the salespeople have been abusing their powers and perquisites, and it has been decided to teach them a lesson by hiding their beloved secrets file and sending them to the ordinary customers’ site when they try to access it. How humiliating! Easily done, though.

The Config file is httpd3.conf :

...
<VirtualHost sales.butterthlies.com>
ServerAdmin sales_mgr@butterthlies.com
Redirect /secrets http://www.butterthlies.com
DocumentRoot /usr/www/APACHE3/site.alias/htdocs/salesmen
...

The exact placing of the Redirect doesn’t matter, as long as it is somewhere in the <VirtualHost> section. If you now access http://sales.butterthlies.com/secrets, you are shunted straight to the customers’ index at http://www.butterthlies.com /.

It is somewhat puzzling that if the Redirect line fails to work because you have misspelled the URL, there may be nothing in the error_log because the browser is vainly trying to find it out on the Web.

An important difference between Alias and Redirect is that the browser becomes aware of the new location in a Redirect, but not in an Alias, and this new location will be used as the basis for relative hot links found in the retrieved HTML.

RedirectTemp

RedirectTemp url-path url
Server config, virtual host, directory, .htaccess

This directive makes the client know that the Redirect is only temporary (status 302). This is exactly equivalent to Redirect temp.

RedirectPermanent

RedirectPermanent url-path url
Server config, virtual host, directory, .htaccess

This directive makes the client know that the Redirect is permanent (status 301). This is exactly equivalent to Redirect permanent.





Redirection

27 06 2009

Few things are ever in exactly the right place at the right time, and this is as true of most web servers as of anything else. Alias and Redirect allow requests to be shunted about your filesystem or around the Web. Although in a perfect world it should never be necessary to do this, in practice it is often useful to move HTML files around on the server — or even to a different server — without having to change all the links in the HTML document.[1] A more legitimate use — of Alias, at least — is to rationalize directories spread around the system. For example, they may be maintained by different users and may even be held on remotely mounted filesystems. But Alias can make them appear to be grouped in a more logical way.

A related directive, ScriptAlias, allows you to run CGI scripts, discussed in Chapter 16. You have a choice: everything that ScriptAlias does, and much more, can be done by the new Rewrite directive (described later in this chapter), but at a cost of some real programming effort. ScriptAlias is relatively simple to use, but it is also a good example of Apache’s modularity being a little less modular than we might like. Although ScriptAlias is defined in mod_alias.c in the Apache source code, it needs mod_cgi.c (or any module that does CGI) to function — it does, after all, run CGI scripts. mod_alias.c is compiled into Apache by default.

Some care is necessary in arranging the order of all these directives in the Config file. Generally, the narrower choices should come first, with the “catch-all” versions at the bottom. Be prepared to move them around (restarting Apache each time, of course) until you get the effect you want.

Our base httpd1.conf file on … /site.alias, to which we will add some directives, contains the following:

User webuser
Group webgroup

NameVirtualHost 192.168.123.2

<VirtualHost www.butterthlies.com>
ServerName www.butterthlies.com
DocumentRoot /usr/www/APACHE3/site.alias/htdocs/customers
ErrorLog /usr/www/APACHE3/site.alias/logs/error_log
TransferLog /usr/www/APACHE3/site.alias/logs/access_log
</VirtualHost>

<VirtualHost sales.butterthlies.com>
DocumentRoot /usr/www/APACHE3/site.alias/htdocs/salesmen
ServerName sales.butterthlies.com
ErrorLog /usr/www/APACHE3/site.alias/logs/error_log
TransferLog /usr/www/APACHE3/site.alias/logs/access_log
</VirtualHost>

Start it with ./go 1. It should work as you would expect, showing you the customers’ and salespeople’s directories.