Diagnosing suspected problems with mod_deflate

Introduction to some unfortunate issues with compression

First, several problems have been fixed in mod_deflate since it was introduced in IHS 2.0.42. These fixes are in IHS 2.0.42.2+PQ78925 and IHS 2.0.47. It is likely to save a considerable amount of debugging time if these fixes are applied to the system exhibiting the problem before spending extensive time researching problem symptoms.

Beyond the mod_deflate fixes, customer experiences with HTTP compression will depend on what type of data is compressed. Different browsers have problems with certain types of data being compressed. Compressing plain HTML works fine on any modern browser, though people have experienced browser problems when HTML with embedded javascript is compressed. There is some indication that in Internet Explorer the decompression path changes the timing of javascript loading and some javascript which would otherwise work will then fail with this changed timing. This has happened not just with mod_deflate but also with mod_gzip, which has been available for Apache 1.3 for a long time.

The Adobe Acrobat plug-in has known problems dealing with PDF files that mod_deflate has compressed. This is not a mod_deflate problem, and the same type of problem has occured with mod_gzip, which is a completely independent implementation of HTTP compression.

A couple of our IHS 2 customers have had problems using mod_deflate to compress javascript. With the last occurrence of this, the customer discovered that their javascript when compressed would run fine with Netscape but not with Internet Explorer. In both cases, the customers gathered traces and the data sent by the server was valid, but for some reason IE would not run the javascript properly if it had arrived compressed. IHS configuration directives would have to be used to disable compression for certain URLs and/or certain browsers.

Here's an interesting article regarding an IE 6 bug (there is a similar one for IE 5.5):

http://support.microsoft.com/default.aspx?scid=kb;[LN];Q312496

Here's another article about an IE bug with javascript in the presence of cache-control: no-cache:

http://support.microsoft.com/default.aspx?scid=kb%3Ben-us%3B327286.

Here's another one, outlining one person's experience with compression changing the timing of javascript execution enough that the javascript no longer worked:

http://lists.over.net/pipermail/mod_gzip/2001-March/001708.html

Symptoms

Common symptoms are blank pages or error messages from the browser, or javascript execution failures, or problems in Adobe Acrobat displaying pdf files.

Configuring mod_deflate to refuse to compress certain types of content

As long as some text in the uri can be checked for when to disable compression, it is easy. Here are examples for all uris ending in .pdf or .jpg (upper or lower case):

 SetEnvIfNoCase Request_URI "\.pdf$" no-gzip
 SetEnvIfNoCase Request_URI "\.jpg$" no-gzip

Isolating the guilty party

Assuming that the known mod_deflate problems have been corrected with available fixes, the most likely cause is that the browser or plug-in cannot handle compressed data in the specific context. The browser/plug-in may not be able to uncompress certain media types at all or may not be able to uncompress certain media types when received in a certain order or some other limitation can be encountered. The problem could also depend on whether or not SSL is used.

Some theoretical causes that could be caused by mod_delate include

  • the response body isn't compressed properly
  • the proper HTTP headers aren't specified, so that the browser doesn't realize that the response body should be uncompressed

    Here are the steps for determining whether or not mod_deflate generated the proper response to deliver to the browser:

    1. A testcase (particular request) that can reproduce the symptom needs to be determined. Hopefully this testcase will consistently show the symptom.
    2. Set up mod_net_trace to trace the data flows between IHS and a particular client (IP address) that will be used to reproduce the problem. Configure mod_net_trace according to these instructions. On the NetTrace directive, be sure to specify the IP address of the client that you'll use for reproducing the problem, as well as a large senddata value so that the entire response is traced.

      Example:

      NetTraceFile /tmp/nettrace
      NetTrace client 111.222.333.444 dest file event senddata=5000000 event recvdata=1024
      

      (The data sent to the server in the request body isn't normally an issue with compression issues, so we'll only trace the first 1024 bytes of the request body, if any.)

      Also, it is recommended that you set LogLevel to Debug and use the DeflateFilterNote directive to log the request, compression ratio, and user agent string from the browser (see the DeflateFilterNote documentation).

    3. With these configuration changes in place, restart IHS and reproduce the problem from the browser. After the problem has been reproduced, close the browser to ensure that the connections from the browser are closed and copy the trace file (/tmp/nettrace in the example above) to another location so that additional browser traffic isn't written to the trace file that we'll examine next.
    4. Disable mod_net_trace and revert LogLevel to the prior setting and then restart IHS.
    5. The file with the network trace (e.g., /tmp/nettrace) is the key file to send to IHS support if there is a PMR open. But additional verification can be done by the customer relatively quickly using the ServerDoc tool to parse the network trace.

      Here is an example where the input file is /tmp/nettrace and the results of parsing it are to be stored in a new directory called /tmp/nettrace.parsed:

      $ java -jar /tmp/ServerDoc.jar ParseNetTrace /tmp/nettrace /tmp/nettrace.parsed
       checking gzip integrity of /tmp/nettrace.parsed/127.0.0.1/0/sent.body.0
       checking gzip integrity of /tmp/nettrace.parsed/127.0.0.1/1/sent.body.0
       checking gzip integrity of /tmp/nettrace.parsed/127.0.0.1/1/sent.body.1
       checking gzip integrity of /tmp/nettrace.parsed/127.0.0.1/1/sent.body.2
      

      (In this example trace, there were four compressed response bodies.)

      If the compressed data is invalid and could cause a problem for the browser, errors will be encountered and displayed by ServerDoc, as in the following example:

      java -jar /tmp/ServerDoc.jar ParseNetTrace /tmp/nettrace.bad/tmp/nettrace.bad.parsed
       checking gzip integrity of /tmp/nettrace.bad.parsed/127.0.0.1/0/sent.body.0
          /tmp/nettrace.bad.parsed/127.0.0.1/0/sent.body.0 is not properly gzipped!
          java.util.zip.ZipException: invalid bit length repeat
       checking gzip integrity of /tmp/nettrace.bad.parsed/127.0.0.1/1/sent.body.0
       checking gzip integrity of /tmp/nettrace.bad.parsed/127.0.0.1/1/sent.body.1
       checking gzip integrity of /tmp/nettrace.bad.parsed/127.0.0.1/1/sent.body.2
      

      (For this example, we took a valid network trace generated by mod_net_trace but replaced some of the hex data in the trace file with a different sequence of bytes to simulate a corrupted response.)

    Beyond the automatic gzip integrity checking performed by ServerDoc, the response headers and the uncompressed data may need to be examined as well. The response headers will be created by ServerDoc in files called sent.hdr.0, sent.hdr.1, and so on. The header field Content-Encoding must be present whenever the response body is compressed. ServerDoc will not try to check the gzip integrity of responses that did not contain the Content-Encoding header field, so gzipped bodies that weren't checked by ServerDoc possibly have invalid or missing header information.

    It is possible that the response body that was gzipped was incomplete such that the gzipped response is valid from a gzipped encoding perspective yet when it is uncompressed by the browser there is missing information (e.g., a truncation occurred). To uncompress the response bodies and see what content was sent, use the gunzip utility.

    $ gunzip < /tmp/nettrace.parsed/127.0.0.1/1/sent.body.0 > /tmp/uncompressed
    

    The uncompressed data in file /tmp/uncompressed will have to be examined by someone that knows what is expected in order to determine if the data is truncated or is otherwise malformed.

    If a problem is discovered in the data written by mod_deflate or a problem is suspected in the HTTP header, the documentation to send to IBM is

  • your modified httpd.conf
  • the request that led to the problem (i.e., "what URL?")
  • the network trace generated when recreating the problem
  • the uncompressed data that should have been sent to the browser
  • any messages written to IHS log files (e.g., access_log, error_log, any other custom logs) during the test case