HTTPbis Working Group R. Peon
Internet-Draft Google, Inc
Intended status: Informational H. Ruellan
Expires: January 02, 2014 Canon CRF
July 2013

HPACK
draft-ietf-httpbis-header-compression-01

Abstract

This document describes HPACK, a format adapted to efficiently represent HTTP headers in the context of HTTP/2.0.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on January 02, 2014.

Copyright Notice

Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

This document describes HPACK, a format adapted to efficiently represent HTTP headers in the context of HTTP/2.0.

2. Overview

In HTTP/1.X, HTTP headers, which are necessary for the functioning of the protocol, are transmitted with no transformations. Unfortunately, the amount of redundancy in both the keys and the values of these headers is high, and is the cause of increased latency on lower bandwidth links. This indicates that an alternate more compact encoding for headers would be beneficial to latency, and that is what is proposed here.

As shown by SPDY [SPDY], Deflate compresses HTTP very effectively. However, the use of a compression scheme which allows for arbitrary matches against the previously encoded data (such as Deflate) exposes users to security issues. In particular, the compression of sensitive data, together with other data controlled by an attacker, may lead to leakage of that sensitive data, even when the resultant bytes are transmitted over an encrypted channel.

Another consideration is that processing and memory costs of a compressor such as Deflate may also be too high for some classes of devices, for example when doing forward or reverse proxying.

2.1. Outline

The HTTP header encoding described in this document is based on a header table that map (name, value) pairs to index values. This scheme is believed to be safe for all known attacks against the compression context today. Header tables are incrementally updated during the HTTP/2.0 session.

The encoder is responsible for deciding which headers to insert as new entries in the header table. The decoder then does exactly what the encoder prescribes, ending in a state that exactly matches the encoder's state. This enables decoders to remain simple and understand a wide variety of encoders.

As two consecutive sets of headers often have headers in common, each set of headers is coded as a difference from the previous set of headers. The goal is to only encode the changes (headers present in one of the set and not in the other) between the two sets of headers.

An example illustrating the use of these different mechanisms to represent headers is available in Appendix C.

3. Header Encoding

3.1. Encoding Concepts

The encoding and decoding of headers relies on some components and concepts. The set of components used form an encoding context.

Header Table:
The header table (see Section 3.1.2) is a component used to associate headers to index values.
Reference Set:
The reference set (see Section 3.1.3) is a component containing a group of headers used as a reference for the differential encoding of a new set of headers.
Header Set:
A header set (see Section 3.1.4) is a group of headers that are encoded jointly. A header set usually consists of all the headers contained in an HTTP request or an HTTP response.
Header Representation:
A header can be represented in encoded form either as a literal or as an index (see Section 3.1.5). The indexed representation is based on the header table.
Header Emission:
When decoding a set of headers, some operations emit a header (see Section 3.1.6). An emitted header is added to the set of headers. Once emitted, a header can't be removed from the set of headers.

3.1.1. Encoding Context

The set of components used to encode or decode a header set form an encoding context: an encoding context contains a header table and a reference set.

Using HTTP, messages are exchanged between a client and a server in both direction. To keep the encoding of headers in each direction independent from the other direction, there is one encoding context for each direction.

The headers contained in a PUSH_PROMISE frame sent by a server to a client are encoded within the same context as the headers contained in the HEADERS frame corresponding to a response sent from the server to the client.

3.1.2. Header Table

A header table consists of an ordered list of (name, value) pairs. The first entry of a header table is assigned the index 0.

A header can be represented by an entry of the header table if they match. A header and an entry match if both their name and their value match. A header name and an entry name match if they are equal using a character-based, case insensitive comparison (the case insensitive comparison is used because HTTP header names are defined in a case insensitive way). A header value and an entry value match if they are equal using a character-based, case sensitive comparison.

Generally, the header table will not contain duplicate entries. However, implementations MUST be prepared to accept duplicates without signalling an error. If duplicates are added to the table, they MUST be treated as distinct entries with their own index positions.

Initially, a header table contains a list of common headers. Two initial lists of header are provided in Appendix B. One list is for headers transmitted from a client to a server, the other for the reverse direction.

A header table is modified by either adding a new entry at the end of the table, or by replacing an existing entry.

The encoder decides how to update the header table and as such can control how much memory is used by the header table. To limit the memory requirements on the decoder side, the header table size is bounded (see the SETTINGS_MAX_BUFFER_SIZE in Section 5).

The size of an entry is the sum of its name's length in bytes (as defined in Section 4.1.2), of its value's length in bytes and of 32 bytes. The 32 bytes are an accounting for the entry structure overhead. For example, an entry structure using two 64-bits pointers to reference the name and the value and the entry, and two 64-bits integer for counting the number of references to these name and value would use these 32 bytes.

The size of a header table is the sum of the size of its entries.

3.1.3. Reference Set

A reference set is defined as an unordered set of references to entries of the header table.

The initial reference set is the empty set.

The reference set is updated during the processing of a set of headers.

Using the differential encoding, a header that is not present in the reference set can be encoded either with an indexed representation (if the header is present in the header table), or with a literal representation (if the header is not present in the header table).

A header that is to be removed from the reference set is encoded with an indexed representation.

3.1.4. Header set

A header set is a group of header fields that are encoded as a whole. Each header field is a (name, value) pair.

A header set is encoded using an ordered list of zero or more header representations. All the header representations describing a header set a grouped into a header block.

3.1.5. Header Representation

A header can be represented either as a literal or as an index.

Literal Representation:
A literal representation defines a new header. The header name is represented either literally or as a reference to an entry of the header table. The header value is represented literally.
Three different literal representations are provided:

Indexed Representation:
The indexed representation defines a header as a reference in the header table (see Section 4.2).

3.1.6. Header Emission

The emission of header is the process of adding a header to the current set of headers. Once an header is emitted, it can't be removed from the current set of headers.

The concept of header emission allows a decoder to know when it can pass a header safely to a higher level on the receiver side. This allows a decoder to be implemented in a streaming way, and as such to only keep in memory the header table and the reference set. With such an implementation, the amount of memory used by the decoder is bounded, even in presence of a very large set of headers. The management of memory for handling very large sets of headers can therefore be deferred to the application.

3.2. Header Set Processing

The processing of an encoded header set to obtain a list of headers is defined in this section. To ensure a correct decoding of a header set, a decoder MUST obey the following rules.

3.2.1. Header Representation Processing

All the header representations contained in a header block are processed in their occurring order, as specified below.

An indexed representation corresponding to an entry not present in the reference set entails the following actions:

An indexed representation corresponding to an entry present in the reference set entails the following actions:

A literal representation that is not added to the header table entails the following action:

A literal representation that is added to the header table entails the following actions:

3.2.2. Reference Set Emission

Once all the representations contained in a header block have been processed, the headers that are in common with the previous header set are emitted, during the reference set emission.

For the reference set emission, each header contained in the reference set that has not been emitted during the processing of the header block is emitted.

3.2.3. Header Table Management

The header table can be modified by either adding a new entry to it or by replacing an existing one. Before doing such a modification, it has to be ensured that the header table size will stay lower than or equal to the SETTINGS_MAX_BUFFER_SIZE limit (see Section 5). To achieve this, repeatedly, the first entry of the header table is removed, until enough space is available for the modification.

A consequence of removing one or more entries at the beginning of the header table is that the remaining entries are renumbered. The first entry of the header table is always associated to the index 0.

When the modification of the header table is the replacement of an existing entry, the replaced entry is the one indicated in the literal representation before any entry is removed from the header table. If the entry to be replaced is removed from the header table when performing the size adjustment, the replacement entry is inserted at the beginning of the header table.

The addition of a new entry with a size greater than the SETTINGS_MAX_BUFFER_SIZE limit causes all the entries from the header table to be dropped and the new entry not to be added to the header table. The replacement of an existing entry with a new entry with a size greater than the SETTINGS_MAX_BUFFER_SIZE has the same consequences.

3.2.4. Specific Use Cases

Three occurrences of the same indexed representation, corresponding to an entry not present in the reference set, emit the associated header twice:

The first occurrence of the indexed representation can be replaced by a literal representation creating an entry for the header.

4. Detailed Format

4.1. Low-level representations

4.1.1. Integer representation

Integers are used to represent name indexes, pair indexes or string lengths. To allow for optimized processing, an integer representation always finishes at the end of a byte.

An integer is represented in two parts: a prefix that fills the current byte and an optional list of bytes that are used if the integer value does not fit in the prefix. The number of bits of the prefix (called N) is a parameter of the integer representation.

The N-bit prefix allows filling the current byte. If the value is small enough (strictly less than 2^N-1), it is encoded within the N-bit prefix. Otherwise all the bits of the prefix are set to 1 and the value is encoded using an unsigned variable length integer representation.

The algorithm to represent an integer I is as follows:

  1. If I < 2^N - 1, encode I on N bits
  2. Else, encode 2^N - 1 on N bits and do the following steps:
    1. Set I to (I - (2^N - 1)) and Q to 1
    2. While Q > 0
      1. Compute Q and R, quotient and remainder of I divided by 2^7
      2. If Q is strictly greater than 0, write one 1 bit; otherwise, write one 0 bit
      3. Encode R on the next 7 bits
      4. I = Q

4.1.1.1. Example 1: Encoding 10 using a 5-bit prefix

The value 10 is to be encoded with a 5-bit prefix.

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| X | X | X | 0 | 1 | 0 | 1 | 0 |   10 stored on 5 bits
+---+---+---+---+---+---+---+---+

4.1.1.2. Example 2: Encoding 1337 using a 5-bit prefix

The value I=1337 is to be encoded with a 5-bit prefix.

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| X | X | X | 1 | 1 | 1 | 1 | 1 |   Prefix = 31
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |   Q>=1, R=26
| 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |   Q=0 , R=10
+---+---+---+---+---+---+---+---+

4.1.2. String Literal Representation

Literal strings can represent header names or header values. They are encoded in two parts:

  1. The string length, defined as the number of bytes needed to store its UTF-8 representation, is represented as an integer with a zero bits prefix. If the string length is strictly less than 128, it is represented as one byte.
  2. The string value represented as a list of UTF-8 characters.

4.2. Indexed Header Representation

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 1 |        Index (7+)         |
+---+---------------------------+

Indexed Header

This representation starts with the '1' 1-bit pattern, followed by the index of the matching pair, represented as an integer with a 7-bit prefix.

4.3. Literal Header Representation

4.3.1. Literal Header without Indexing

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 1 | 1 |    Index (5+)     |
+---+---+---+-------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

Literal Header without Indexing - Indexed Name

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 1 | 1 |         0         |
+---+---+---+-------------------+
|       Name Length (8+)        |
+-------------------------------+
|  Name String (Length octets)  |
+-------------------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

Literal Header without Indexing - New Name

This representation, which does not involve updating the header table, starts with the '011' 3-bit pattern.

If the header name matches the header name of a (name, value) pair stored in the Header Table, the index of the pair increased by one (index + 1) is represented as an integer with a 5-bit prefix. Note that if the index is strictly below 31, one byte is used.

If the header name does not match a header name entry, the value 0 is represented on 5 bits followed by the header name, represented as a literal string.

Header name representation is followed by the header value represented as a literal string as described in Section 4.1.2.

4.3.2. Literal Header with Incremental Indexing

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 1 | 0 |    Index (5+)     |
+---+---+---+-------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

Literal Header with Incremental Indexing - Indexed Name

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 1 | 0 |         0         |
+---+---+---+-------------------+
|       Name Length (8+)        |
+-------------------------------+
|  Name String (Length octets)  |
+-------------------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

Literal Header with Incremental Indexing - New Name

This representation starts with the '010' 3-bit pattern.

If the header name matches the header name of a (name, value) pair stored in the Header Table, the index of the pair increased by one (index + 1) is represented as an integer with a 5-bit prefix. Note that if the index is strictly below 31, one byte is used.

If the header name does not match a header name entry, the value 0 is represented on 5 bits followed by the header name, represented as a literal string.

Header name representation is followed by the header value represented as a literal string as described in Section 4.1.2.

4.3.3. Literal Header with Substitution Indexing

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 0 |      Index (6+)       |
+---+---+-----------------------+
|    Substituted Index (8+)     |
+-------------------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

Literal Header with Substitution Indexing - Indexed Name

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 0 |           0           |
+---+---+-----------------------+
|       Name Length (8+)        |
+-------------------------------+
|  Name String (Length octets)  |
+-------------------------------+
|    Substituted Index (8+)     |
+-------------------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

Literal Header with Substitution Indexing - New Name

This representation starts with the '00' 2-bit pattern.

If the header name matches the header name of a (name, value) pair stored in the Header Table, the index of the pair increased by one (index + 1) is represented as an integer with a 6-bit prefix. Note that if the index is strictly below 62, one byte is used.

If the header name does not match a header name entry, the value 0 is represented on 6 bits followed by the header name, represented as a literal string.

The index of the substituted (name, value) pair is inserted after the header name representation as a 0-bit prefix integer.

The index of the substituted pair MUST correspond to a position in the header table containing a non-void entry. An index for the substituted pair that corresponds to empty position in the header table MUST be treated as an error.

This index is followed by the header value represented as a literal string as described in Section 4.1.2.

5. Parameter Negotiation

A few parameters can be used to accommodate client and server processing and memory requirements.

SETTINGS_MAX_BUFFER_SIZE:
Allows the sender to inform the remote endpoint of the maximum size it accepts for the header table.
The default value is 4096 bytes.

When the remote endpoint receives a SETTINGS frame containing a SETTINGS_MAX_BUFFER_SIZE setting with a value smaller than the one currently in use, it MUST send as soon as possible a HEADER frame with a stream identifier of 0x0 containing a value smaller than or equal to the received setting value.

A HEADER frame with a stream identifier of 0x0 indicates that the sender has reduced the maximum size of the header table. The new maximum size of the header table is encoded on 32-bit. The decoder MUST reduce its own header table by dropping entries from it until the size of the header table is lower than or equal to the transmitted maximum size.

6. Security Considerations

TODO

7. IANA Considerations

This memo includes no request to IANA.

8. Informative References

[SPDY] Belshe, M. and R. Peon, "SPDY Protocol", February 2012.

Appendix A. Change Log (to be removed by RFC Editor before publication

A.1. Since draft-ietf-httpbis-header-compression-01

Appendix B. Initial Header Tables


B.1. Requests

The following table lists the pre-defined headers that make-up the initial header table user to represent requests sent from a client to a server.

Initial Header Table for Requests
Index Header Name Header Value
0 :scheme http
1 :scheme https
2 :host
3 :path /
4 :method GET
5 accept
6 accept-charset
7 accept-encoding
8 accept-language
9 cookie
10 if-modified-since
11 user-agent
12 referer
13 authorization
14 allow
15 cache-control
16 connection
17 content-length
18 content-type
19 date
20 expect
21 from
22 if-match
23 if-none-match
24 if-range
25 if-unmodified-since
26 max-forwards
27 proxy-authorization
28 range
29 via

B.2. Responses

The following table lists the pre-defined headers that make-up the initial header table used to represent responses sent from a server to a client. The same header table is also used to represent request headers sent from a server to a client in a PUSH_PROMISE frame.

Initial Header Table for Responses
Index Header Name Header Value
0 :status 200
1 age
2 cache-control
3 content-length
4 content-type
5 date
6 etag
7 expires
8 last-modified
9 server
10 set-cookie
11 vary
12 via
13 access-control-allow-origin
14 accept-ranges
15 allow
16 connection
17 content-disposition
18 content-encoding
19 content-language
20 content-location
21 content-range
22 link
23 location
24 proxy-authenticate
25 refresh
26 retry-after
27 strict-transport-security
28 transfer-encoding
29 www-authenticate

Appendix C. Example

Here is an example that illustrates different representations and how tables are updated.

C.1. First header set

:path: /my-example/index.html
user-agent: my-user-agent
x-my-header: first

0x44      (literal header with incremental indexing, name index = 3)
0x16      (header value string length = 22)
/my-example/index.html
0x4D      (literal header with incremental indexing, name index = 12)
0x0D      (header value string length = 13)
my-user-agent
0x40      (literal header with incremental indexing, new name)
0x0B      (header name string length = 11)
x-my-header
0x05      (header value string length = 5)
first

Header table
+---------+----------------+---------------------------+
|  Index  | Header Name    | Header Value              |
+---------+----------------+---------------------------+
|    0    | :scheme        | http                      |
+---------+----------------+---------------------------+
|    1    | :scheme        | https                     |
+---------+----------------+---------------------------+
|   ...   | ...            | ...                       |
+---------+----------------+---------------------------+
|   37    | warning        |                           |
+---------+----------------+---------------------------+
|   38    | :path          | /my-example/index.html    | added header
+---------+----------------+---------------------------+
|   39    | user-agent     | my-user-agent             | added header
+---------+----------------+---------------------------+
|   40    | x-my-header    | first                     | added header
+---------+----------------+---------------------------+

Reference Set:
:path, /my-example/index.html
user-agent, my-user-agent
x-my-header, first

The first header set to represent is the following:

C.2. Second header set

:path: /my-example/resources/script.js
user-agent: my-user-agent
x-my-header: second

0xa6       (indexed header, index = 38: removal from reference set)
0xa8       (indexed header, index = 40: removal from reference set)
0x04       (literal header, substitution indexing, name index = 3)
0x26       (replaced entry index = 38)
0x1f       (header value string length = 31)
/my-example/resources/script.js
0x5f 0x0a  (literal header, incremental indexing, name index = 40)
0x06       (header value string length = 6)
second

Header table
+---------+----------------+---------------------------+
|  Index  | Header Name    | Header Value              |
+---------+----------------+---------------------------+
|    0    | :scheme        | http                      |
+---------+----------------+---------------------------+
|    1    | :scheme        | https                     |
+---------+----------------+---------------------------+
|   ...   | ...            | ...                       |
+---------+----------------+---------------------------+
|   37    | warning        |                           |
+---------+----------------+---------------------------+
|   38    | :path          | /my-example/resources/    | replaced
|         |                |     script.js             | header
+---------+----------------+---------------------------+
|   39    | user-agent     | my-user-agent             |
+---------+----------------+---------------------------+
|   40    | x-my-header    | first                     |
+---------+----------------+---------------------------+
|   41    | x-my-header    | second                    | added header
+---------+----------------+---------------------------+

Reference Set:
:path, /my-example/resources/script.js
user-agent, my-user-agent
x-my-header, second

The second header set to represent is the following:

Authors' Addresses

Roberto Peon Google, Inc EMail: fenix@google.com
Hervé Ruellan Canon CRF EMail: herve.ruellan@crf.canon.fr