|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
|
| News | Recommended Books | Recommended Links | Installation | Usage | Troubleshooting | Download |
| Modules | Authentication | Security Fixes | Commercial support | Humor | Etc |
Apache operation proceeds in two phases: start-up and operational. System start-up takes place as root, and includes parsing the configuration file(s), loading modules, and initializing system resources such as log files, shared memory segments, and database connections. For normal operation, Apache relinquishes its system privileges and runs as an unprivileged user before accepting and processing connections from clients over the network. This basic security measure helps to prevent a simple bug in Apache (or a module or script) from becoming a devastating system vulnerability, like those exploited by malware such as "Code Red" and "Nimda" in MS IIS.
This two-stage operation has some implications for applications architecture. First, anything that requires system privileges must be run at system start-up. Second, it is good practice to run as much initialization as possible at start-up, so as to minimize the processing required to service each request. Conversely, because so many slow and expensive operations are concentrated in system start-up, it would be hugely inefficient to try to run Apache from a generic server such as inetd or tcpserver.
One non-intuitive quirk of the architecture is that the configuration code is, in fact, executed twice at start-up (although not at restart).
The first time through checks that the configuration is valid (at least to the point that Apache can successfully start)
the second pass is "live" and leads into the operational phase.
The purpose of Apache's start-up phase is to read the configuration, load modules and libraries, and initialize required resources. Each module may have its own resources, and has the opportunity to initialize those resources. At start-up, Apache runs as a single-process, single-thread program and has full system privileges.
Apache's main configuration file is normally called httpd.conf. However, this nomenclature is just a convention, and third-party Apache distributions such as those provided as .rpm or .deb packages may use a different naming scheme. In addition, httpd.conf may be a single file, or it may be distributed over several files using the Include directive to include different configuration files. Some distributions have highly intricate configurations. For example, Debian GNU/Linux ships an Apache configuration that relies heavily on familiarity with Debian, rather than with Apache.
The httpd.conf configuration file is a plain text file and is parsed line-by-line at server start-up. The contents of httpd.conf comprise directives, containers, and comments. Blank lines and leading whitespace are also allowed, but will be ignored.
Most of the contents of httpd.conf are directives. A directive may have zero or more arguments, separated by whitespace. Each directive determines its own syntax, so different directives may permit different numbers of arguments, and different argument types (e.g., string, numeric, enumerated, Boolean on/off, or filename). Each directive is implemented by some module or the core, as described in Chapter 9.
For example:
LoadModule foo_module modules/mod_foo.so
This directive is implemented by mod_so and tells it to load a module. The first argument is the module name (string, alphanumeric). The second argument is a filename, which may be absolute or relative to the server root.
DocumentRoot /usr/local/apache/htdocs
This directive is implemented by the core, and sets the directory that is the root of the main document tree visible from the Web.
SetEnv hello "Hello, World!"
This directive is implemented by mod_env and sets an environment variable. Note that because the second argument contains a space, we must surround it with quotation marks.
Choices On
This directive is implemented by mod_choices (Chapter 6) and activates that module's options.
A container is a special form of directive, characterized by a syntax that superficially resembles markup, using angle brackets. Containers differ semantically from other directives in that they comprise a start and an end on separate lines, and they affect directives falling between the start and the end of the container. For example, the <VirtualHost> container is implemented by the core and defines a virtual host:
<VirtualHost 10.31.2.139> ServerName www.example.com DocumentRoot /usr/www/example ServerAdmin webmaster@example.com CustomLog /var/log/www/example.log </VirtualHost>
The container provides a context for the directives within it. In this case, the directives apply to requests to www.example.com, but not to requests to any other names this server responds to. Containers can be nested unless a module explicitly prevents it. Directives, including containers, may be context sensitive, so they are valid only in some specified type of context.
Any line whose first character is a hash is read as a comment.
# This line is a comment
A hash within a directive doesn't in general make a comment, unless the module implementing the directive explicitly supports it.
If a module is not loaded, directives that it implements are not recognized, and Apache will stop with a syntax error when it encounters them. Therefore mod_so must be statically linked to load other modules. This is pretty much essential whenever you're developing new modules, as without LoadModule you'd have to rebuild the entire server every time you change your module!
At the end of the start-up phase, control passes to the Multi-Processing Module (see Section 2.3). The MPM is responsible for managing Apache's operation at a systems level. It typically does so by maintaining a pool of worker processes and/or threads, as appropriate to the operating system and other applicable constraints (such as optimization for a particular usage scenario). The original process remains as "master," maintaining a pool of worker children. These workers are responsible for servicing incoming connections, while the parent process deals with creating new children, removing surplus ones as necessary, and communicating signals such as "shut down" or "restart."
Because of the MPM architecture, it is not possible to describe the operational phase in definite terms. Whereas the standard MPMs use worker children in some manner, they are not constrained to work in only one way. Thus another MPM could, in principle, implement an entirely different server architecture at the system level.
There is no shutdown phase as such. Instead, anything that needs be done on shutdown is registered as a cleanup, as described in Chapter 3. When Apache stops, all registered cleanups are run.
At the end of the start-up phase, after the configuration has been read, overall control of Apache passes to a Multi-Processing Module. The MPM provides the interface between the running Apache server and the underlying operating system. Its primary role is to optimize Apache for each platform, while ensuring the server runs efficiently and securely.
As indicated by the name, the MPM is itself a module. But the MPM is uniquely a systems-level module (so developing an MPM falls outside the scope of a book on applications development). Also uniquely, every Apache instance must contain exactly one MPM, which is selected at build-time.
The old NCSA server, and Apache 1, grew up in a UNIX environment. It was a multiprocess server, where each client would be serviced by one server instance. If there were more concurrent clients than server processes, Apache would fork additional server processes to deal with them. Under normal operation, Apache would maintain a pool of available server processes to deal with incoming requests.
Whereas this scheme works well on UNIX-family[1] systems, it is an inefficient solution on platforms such as Windows, where forking a process is an expensive operation. So making Apache truly cross-platform required another solution. The approach adopted for Apache 2 is to turn the core processing into a pluggable module, the MPM, which can be optimized for different environments. The MPM architecture also allows different Apache models to coexist even within a single operating system, thus providing users with options for different usages.
[1] Here and elsewhere in this book, terms such as "UNIX-family" imply both UNIX itself and other POSIX-centered operating systems such as Linux and MacOSX.
In practice, only UNIX-family operating systems offer a useful[2] choice: Other supported platforms (Windows, Netware, OS/2, BeOS) have a single MPM optimized for each platform. UNIX has two production-quality MPMs (Prefork and Worker) available as standard, a third (Event) that is thought to be stable for non-SSL uses in Apache 2.2, and several experimental options unsuitable for production use. Third-party MPMs are also available.
[2] MPMs are not necessarily tied to an operating system (most systems have some kind of POSIX support and might be able to use it to run Prefork, for instance). But if you try to build Apache with a "foreign" MPM, you're on your own!
The Prefork MPM is a nonthreaded model essentially similar to Apache 1.x. It is a safe option in all cases, and for servers running non-thread-safe software such as PHP, it is the only safe option. For some applications, including many of those popular with Apache 1.3 (e.g., simple static pages, CGI scripts), this MPM may be as good as anything.[3]
[3] This depends on the platform. On Linux versions without NPTL, Prefork is commonly reported to be as fast as Worker. On Solaris, Worker is reported to be much faster than Prefork. Your mileage may vary.
The Worker MPM is a threaded model, whose advantages include lower memory usage (important on busy servers) and much greater scalability than that provided by Prefork in certain types of applications. We will discuss some of these cases later when we introduce SQL database support and mod_dbd.
Both of the stable MPMs suffer from a limitation that affects very busy servers. Whereas HTTP Keepalive is necessary to reduce TCP connection and network overhead, it ties up a server process or thread while the keepalive is active. As a consequence, a very busy server may run out of available threads. The Event MPM is a new model that deals with this problem by decoupling the server thread from the connection. Cases where the Event MPM may prove most useful are servers with extremely high hit rates but for which the server processing is fast, so that the number of available threads is a critical resource limitation. A busy server with the Worker MPM may sustain tens of thousands of hits per second (as happens, for example, with popular news outlets at peak times), but the Event MPM might help to handle high loads more easily. Note that the Event MPM will not work with secure HTTP (HTTPS).
There are also several experimental MPMs for UNIX that are not, at the time of this book's writing, under active development; they may or may not ever be completed. The Perchild MPM promised a much-requested feature: It runs servers for different virtual hosts under different user IDs. Several alternatives offer similar features, including the third-party Metux[4] and Peruser[5] MPMs, and (for Linux only) mod_ruid.[6] For running external programs, other options include fastcgi/mod_fcgid[7] and suexec (CGI). The author does not have personal knowledge of these third-party solutions and so cannot make recommendations about them.
The one-sentence summary: MPMs are invisible to applications and should be ignored!
Applications developed for Apache should normally be MPM-agnostic. Given that MPM internals are not part of the API, this is basically straightforward, provided programmers observe basic rules of good practice (namely, write thread-safe, cross-process-safe, reentrant code), as briefly discussed in Chapter 4. This issue is closely related to the broader question of developing platform-independent code. Indeed, it is sometimes useful to regard the MPM, rather than the operating system, as the applications platform.
Sometimes an application is naturally better suited to some MPMs than others. For example, database-driven or load-balancing applications benefit substantially from connection pooling (discussed later in this book) and therefore from a threaded MPM. In contrast, forking a child process (the original CGI implementation or mod_ext_filter) creates greater overhead in a threaded program and, therefore, works best with the Prefork MPM. Nevertheless, an application should work even when used with a suboptimal MPM, unless there are compelling reasons to limit it.
If you wish to run Apache on an operating system that is not yet supported, the main task is to add support for your target platform to the APR, which provides the operating system layer. A custom MPM may or may not be necessary, but is likely to deliver better performance than an existing one. From the point of view of Apache, this is a systems programming task, and hence it falls outside the scope of an applications development book.
|
|
|
|