link + html cleanup

author: Kristian Aune <kraune@verizonmedia.com> 2023-09-20 15:48:55 +0200
committer: Kristian Aune <kraune@verizonmedia.com> 2023-09-20 15:48:55 +0200
commit: 09a2a7c21a1ca82a4ca56a449283217fedf8125b (patch)
tree: 6d442e73905068c18f86f64a658d33b59b26cda2 /document/doc
parent: 16050559773b552387bc702aa15c24ab30ece982 (diff)
1 files changed, 730 insertions, 517 deletions
diff --git a/document/doc/document-format.html b/document/doc/document-format.html
index ce985b8a10d..bf67f1723e8 100644
--- a/document/doc/document-format.html
+++ b/document/doc/document-format.html
@@ -1,546 +1,759 @@
 <!-- Copyright Yahoo. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root. -->
-<html>
-<head>
-<title>Developers guide to the serialized document format</title>
-</head>
-<body>
-<h1>Developers guide to the serialized document format</h1>
+<html lang="en">
+  <head>
+    <title>Developers guide to the serialized document format</title>
+  </head>
+  <body>
+    <h1>Developers guide to the serialized document format</h1>
 
-<p>When a Vespa document is stored or transferred from one application to
-another, it is serialized. The serialization format tries to achieve
-serialization robustness and speed. The most important fields are kept in a
-header that is accessible at low cost. The other fields are located by table
-look-ups.</p>
+    <p>
+      When a Vespa document is stored or transferred from one application to
+      another, it is serialized. The serialization format tries to achieve
+      serialization robustness and speed. The most important fields are kept in
+      a header that is accessible at low cost. The other fields are located by
+      table look-ups.
+    </p>
 
-<h2>Purpose</h2>
+    <h2>Purpose</h2>
 
-<p>The purpose of the serialized format is
-<ul>
-<li><b>Robustness</b>. The format shall detect errors gracefully.</li>
-<li><b>Speed</b>. Deserialization shall be fast, especially for basic fields like <b>DocumentId</b>.</li> 
-<li><b>Size</b>. The serialized format shall be compact and allow for efficient storage and transfer.
-</ul>
-</p>
+    <p>The purpose of the serialized format is</p>
+    <ul>
+      <li><b>Robustness</b>. The format shall detect errors gracefully.</li>
+      <li>
+        <b>Speed</b>. Deserialization shall be fast, especially for basic fields
+        like <b>DocumentId</b>.
+      </li>
+      <li>
+        <b>Size</b>. The serialized format shall be compact and allow for
+        efficient storage and transfer.
+      </li>
+    </ul>
 
-<p><strong>All fields are in network byte order.</strong></p>
+    <p><strong>All fields are in network byte order.</strong></p>
 
-<h2>Changelog</h2>
+    <h2>Changelog</h2>
 
-<h3>Current version: 8</h3>
+    <h3>Current version: 8</h3>
 
-<ul>
-<li>CRC removed from document format. There used to be a 4 byte CRC in the end
-of a header or header + body serialization, calculated as a crc32 of all the
-other data in the serialization. This CRC was included in the document length.
-</ul>
+    <ul>
+      <li>
+        CRC removed from document format. There used to be a 4 byte CRC in the
+        end of a header or header + body serialization, calculated as a crc32 of
+        all the other data in the serialization. This CRC was included in the
+        document length.
+      </li>
+    </ul>
 
+    <h3>Version: 7</h3>
 
-<h3>Version: 7</h3>
+    <ul>
+      <li>
+        The document length is now a static sized 4 byte value, instead of a
+        variable 2,4,8 byte value.
+      </li>
+      <li>
+        (Anything else? I wrote this changelog when bopping from 7 to 8. Dunno
+        if more was changed in 7.)
+      </li>
+    </ul>
 
-<ul>
-<li>The document length is now a static sized 4 byte value, instead of a variable 2,4,8 byte value.
-<li>(Anything else? I wrote this changelog when bopping from 7 to 8. Dunno if more was changed in 7.)
-</ul>
+    <h3>Version: 6</h3>
 
-<h3>Version: 6</h3>
+    This is the oldest version that we currently support. No known installation
+    stores documents with a version smaller than this.
 
-This is the oldest version that we currently support. No known installation stores documents with a version smaller than this.
+    <h2>Document Format</h2>
 
-<h2>Document Format</h2>
+    <p>This is the description of the serialized document format.</p>
 
-<p>This is the description of the serialized document format.</p>
+    <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+      <caption>
+        <em>Document serialization format</em>
+      </caption>
+      <tr>
+        <td width="10%"><b>Field</b></td>
+        <td width="10%"><b>Type</b></td>
+        <td width="10%"><b>Length</b></td>
+        <td><b>Description</b></td>
+      </tr>
+      <tr>
+        <td>Version</td>
+        <td>Short integer</td>
+        <td>2</td>
+        <td>Version number. Current is 6.</td>
+      </tr>
+      <tr>
+        <td>Length</td>
+        <td>Integer</td>
+        <td>4 bytes</td>
+        <td>Total length of object (excluding this field and version).</td>
+      </tr>
+      <tr>
+        <td>Document ID</td>
+        <td>Bytes</td>
+        <td>&nbsp;</td>
+        <td>Unique ID for document. 0-terminated string, UTF-8 encoding.</td>
+      </tr>
+      <tr>
+        <td>Field Map</td>
+        <td>Bytes</td>
+        <td>See below</td>
+        <td>
+          Placeholder for fields. (Note: Fieldmaps may contain other fieldmaps)
+        </td>
+      </tr>
+    </table>
 
-<table  border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Document serialization format</em></caption>
-<tr>
-<td width="10%"><b>Field</td>
-<td width="10%"><b>Type</td>
-<td width="10%"><b>Length</td>
-<td><b>Description</td>
-</tr>
-<tr><td>Version</td>
-<td>Short integer</td>
-<td>2</td>
-<td>Version number. Current is 6.</td>
-</tr>
-<tr><td>Length</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>Total length of object (excluding this field and version).</td>
-</tr>
-<tr><td>Document ID</td>
-<td>Bytes</td>
-<td>&nbsp;</td>
-<td>Unique ID for document. 0-terminated string, UTF-8 encoding.</td>
-</tr>
-<tr><td>Field Map</td>
-<td>Bytes</td>
-<td>See below</td><td>Placeholder for fields. (Note: Fieldmaps may contain other fieldmaps)</td>
-</tr>
-</table>
+    <p>Field maps are serialized like this</p>
+    <p></p>
 
-<p>Field maps are serialized like this</p></p><p>
+    <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+      <caption>
+        <em>Fieldmap serialization format</em>
+      </caption>
+      <tr>
+        <td width="10%"><b>Field</b></td>
+        <td width="10%"><b>Type</b></td>
+        <td width="10%"><b>Length (bytes)</b></td>
+        <td><b>Description</b></td>
+      </tr>
+      <tr>
+        <td>Inventory bit mask</td>
+        <td>Byte</td>
+        <td>1</td>
+        <td>
+          Inventory bits describing the FieldMap element with data:<br />
+          &nbsp;Bit 0 set: FieldMap has document type <br />
+          &nbsp;Bit 1 set: FieldMap has header fields <br />
+          &nbsp;Bit 2 set: FieldMap has body fields <br />
+          &nbsp;Bit 3 set: FieldMap has external body fields<br />
+        </td>
+      </tr>
 
-<table  border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Fieldmap serialization format</em></caption>
-<tr>
-<td width="10%"><b>Field</td>
-<td width="10%"><b>Type</td>
-<td width="10%"><b>Length (bytes)</td>
-<td><b>Description</td>
-</tr>
-<tr><td>Inventory bit mask</td>
-<td>Byte</td>
-<td>1</td>
-<td>
-Inventory bits describing the FieldMap element with data:<br>
-&nbsp;Bit 0 set: FieldMap has document type <br>
-&nbsp;Bit 1 set: FieldMap has header fields <br>
-&nbsp;Bit 2 set: FieldMap has body fields <br>
-&nbsp;Bit 3 set: FieldMap has external body fields<br>
-</tr>
-<tr><td colspan = "4"><b>Below section is present when bit 0 of inventory is set</b></td></tr>
-<tr><td>Document Type</td>
-<td>Bytes</td>
-<td>&nbsp;</td>
-<td>Document type. (0-terminated string, UTF-8 encoding.)</td>
-</tr>
-<tr><td>Version</td>
-<td>Short integer</td>
-<td>2</td>
-<td>Document type version number.</td></tr>
-<tr><td colspan = "4"><b>Below section is present when bit 1 of inventory is set</b></td></tr>
-<tr><td>Header data</td>
-<td>Data array</td>
-<td>See below</td>
-<td>Header data packed in data array</td></tr>
-<tr><td colspan = "4"><b>Below section is present when bit 2 of inventory is set</b></td></tr>
-<tr><td>Body data</td>
-<td>Data array</td>
-<td>See below</td>
-<td>Body data packed in data array</td></tr>
-</table>
+      <tr>
+        <td colspan="4">
+          <b>Below section is present when bit 0 of inventory is set</b>
+        </td>
+      </tr>
+      <tr>
+        <td>Document Type</td>
+        <td>Bytes</td>
+        <td>&nbsp;</td>
+        <td>Document type. (0-terminated string, UTF-8 encoding.)</td>
+      </tr>
+      <tr>
+        <td>Version</td>
+        <td>Short integer</td>
+        <td>2</td>
+        <td>Document type version number.</td>
+      </tr>
+      <tr>
+        <td colspan="4">
+          <b>Below section is present when bit 1 of inventory is set</b>
+        </td>
+      </tr>
+      <tr>
+        <td>Header data</td>
+        <td>Data array</td>
+        <td>See below</td>
+        <td>Header data packed in data array</td>
+      </tr>
+      <tr>
+        <td colspan="4">
+          <b>Below section is present when bit 2 of inventory is set</b>
+        </td>
+      </tr>
+      <tr>
+        <td>Body data</td>
+        <td>Data array</td>
+        <td>See below</td>
+        <td>Body data packed in data array</td>
+      </tr>
+    </table>
 
+    <p></p>
+    <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+      <caption>
+        <em>Data array serialization</em>
+      </caption>
+      <tr>
+        <td width="10%"><b>Field</b></td>
+        <td width="10%"><b>Type</b></td>
+        <td width="10%"><b>Length (bytes)</b></td>
+        <td><b>Description</b></td>
+      </tr>
+      <tr>
+        <td>Data length</td>
+        <td>Integer_2_4_8</td>
+        <td>2, 4 or 8</td>
+        <td>
+          Length of data block (see below). NOTE THAT THIS LENGTH INCLUDE
+          ITSELF.
+        </td>
+      </tr>
+      <tr>
+        <td>Number of fields</td>
+        <td>Integer_1_4</td>
+        <td>1 or 4</td>
+        <td>Number of fields in data array</td>
+      </tr>
 
-<p>
-<table  border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Data array serialization</em></caption>
-<tr>
-<td width="10%"><b>Field</td>
-<td width="10%"><b>Type</td>
-<td width="10%"><b>Length (bytes)</td>
-<td><b>Description</b></td>
-</tr>
-<tr><td>Data length</td>
-<td>Integer_2_4_8</td>
-<td>2, 4 or 8</td>
-<td>Length of data block (see below). NOTE THAT THIS LENGTH INCLUDE ITSELF.</td>
-</tr>
-<tr><td>Number of fields<td>Integer_1_4</td>
-<td>1 or 4</td>
-<td>Number of fields in data array</td>
-<tr><td colspan = "4"><b>Below block is repeated "Number of fields" times</b></td></tr>
-<tr><td>Field ID<td>Integer_1_4</td>
-<td>1 or 4</td>
-<td>ID of field.</td>
-<tr><td>Field Size<td>Integer_1_2_4</td>
-<td>1, 2 or 4</td>
-<td>Length of field.</td>
-</td>
-<tr><td colspan = "4"><b>End of repeated block </b></td></tr>
-<tr><td>Data block<td>Bytes</td>
-<td>&nbsp;</td>
-<td>The data block.<br> 
-&nbsp; - Items are ordered the same way field array is sorted.<br> 
-&nbsp; - Use lengths from field array above to find item offset and length.<br>
-&nbsp; - If the block is compressed, lengths refer to decompressed version</td>
-</table>
+      <tr>
+        <td colspan="4">
+          <b>Below block is repeated "Number of fields" times</b>
+        </td>
+      </tr>
+      <tr>
+        <td>Field ID</td>
+        <td>Integer_1_4</td>
+        <td>1 or 4</td>
+        <td>ID of field.</td>
+      </tr>
+      <tr>
+        <td>Field Size</td>
+        <td>Integer_1_2_4</td>
+        <td>1, 2 or 4</td>
+        <td>Length of field.</td>
+      </tr>
+      <tr>
+        <td colspan="4"><b>End of repeated block </b></td>
+      </tr>
+      <tr>
+        <td>Data block</td>
+        <td>Bytes</td>
+        <td>&nbsp;</td>
+        <td>
+          The data block.<br />
+          &nbsp; - Items are ordered the same way field array is sorted.<br />
+          &nbsp; - Use lengths from field array above to find item offset and
+          length.<br />
+          &nbsp; - If the block is compressed, lengths refer to decompressed
+          version
+        </td>
+      </tr>
+    </table>
 
+    <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+      <caption>
+        <em>Data type serialization</em>
+      </caption>
+      <tr>
+        <td width="15%"><b>Data type</b></td>
+        <td width="10%"><b>Length</b></td>
+        <td><b>Serialization</b></td>
+      </tr>
+      <tr>
+        <td>Integer (ID 0)</td>
+        <td>4</td>
+        <td>Signed integer, two's complement notation, network byte order.</td>
+      </tr>
+      <tr>
+        <td>Floating point number (ID 1)</td>
+        <td>4</td>
+        <td>IEEE 754, single precision, network byte order.</td>
+      </tr>
+      <tr>
+        <td>String (ID 2)</td>
+        <td>1 + (1 or 4) + length + 1</td>
+        <td>
+          Strings are serialization format:<br />
+          &nbsp;- First byte represents coding. This has traditionally denoted
+          the maximum number of bits per character in the UTF-8 encoded string,
+          but has never been used in deserialization code.
+          <ul>
+            <li>Set to 32 if not used.</li>
+            <li>
+              Set to &lt;32 if you know the UTF-8 string uses less bits per
+              character; e.g. ASCII could use 8.
+            </li>
+            <li>
+              Set bit 6 (decimal 64) if the string has an
+              <a href="#annotations">annotation tree</a>.
+            </li>
+          </ul>
+          <br />
+          &nbsp;- Integer_1_4 with length of string. <br />
+          &nbsp;- The string (UTF-8 encoding), including 0-terminating byte.<br />
+          &nbsp;- An annotation tree, if bit 6 (decimal 64) of coding byte is
+          set:
+          <ul>
+            <li>
+              total length of all span trees excl. itself:
+              <strong>uint32</strong>
+            </li>
+            <li>number of span trees <strong>int_1_2_4</strong></li>
+            <li>
+              for each root node:
+              <ol>
+                <li>tree name serialized as String</li>
+                <li>
+                  serialized SpanNode as given below, see
+                  <a href="#annotations">annotation serialization</a>
+                </li>
+              </ol>
+            </li>
+          </ul>
+        </td>
+      </tr>
+      <tr>
+        <td>Raw bytes (ID 3)</td>
+        <td>Length of buffer</td>
+        <td>Byte for byte copy</td>
+      </tr>
+      <tr>
+        <td>Long integer (ID 4)</td>
+        <td>8</td>
+        <td>Signed integer, two's complement notation, network byte order.</td>
+      </tr>
+      <tr>
+        <td>Double floating point number (ID 5)</td>
+        <td>8</td>
+        <td>IEEE 754, double precision, network byte order.</td>
+      </tr>
+      <tr>
+        <td>Array (ID 6)</td>
+        <td>At least 8 bytes</td>
+        <td>
+          Arrays of any fields are serialized like this:<br />
+          &nbsp;- 4 bytes: Data type array consists of <br />
+          &nbsp;- 4 bytes: Number of elements in array<br />
+          &nbsp;Below sequence is repeated "number of element" times<br />
+          &nbsp;- 4 bytes: Length of element<br />
+          &nbsp;- Serialized element<br />
+        </td>
+      </tr>
+      <tr>
+        <td>Fieldmap (ID 7)</td>
+        <td>&nbsp;</td>
+        <td>Field maps (embedded or not) are defined above</td>
+      </tr>
+      <tr>
+        <td>Document (ID 8)</td>
+        <td>&nbsp;</td>
+        <td>Document objects (embedded or not) are defined above</td>
+      </tr>
+      <tr>
+        <td>Timestamp (ID 9)</td>
+        <td>&nbsp;</td>
+        <td>Same as long integer</td>
+      </tr>
+      <tr>
+        <td>Uri (ID 10)</td>
+        <td>&nbsp;</td>
+        <td>Same as string</td>
+      </tr>
+      <tr>
+        <td>Exact string (ID 11)</td>
+        <td>&nbsp;</td>
+        <td>Same as string</td>
+      </tr>
+      <tr>
+        <td>Content (ID 12)</td>
+        <td>At least 11 bytes</td>
+        <td>
+          Content fields are serialized like this:<br />
+          &nbsp;- Content type length (1 byte)<br />
+          &nbsp;- Content type (0 terminated string, UTF-8 encoding.)<br />
+          &nbsp;- Content encoding length (1 byte)<br />
+          &nbsp;- Content encoding (0 terminated string, UTF-8 encoding.)<br />
+          &nbsp;- Content language length (1 byte)<br />
+          &nbsp;- Content language (0 terminated string, UTF-8 encoding.)<br />
+          &nbsp;- Content length (Integer, 4 bytes)<br />
+          &nbsp;- Content (including 0-terminating char)<br />
+        </td>
+      </tr>
+      <tr>
+        <td>Content meta (ID 13)</td>
+        <td>At least 12 bytes</td>
+        <td>
+          Content (attachment) meta data are serialized like this:<br />
+          &nbsp;- Attachment size (Integer, 4 bytes)<br />
+          &nbsp;- Attachment name (0 terminated string, UTF-8 encoding.)<br />
+          &nbsp;- Attachment encoding (0 terminated string, UTF-8 encoding.)<br />
+          &nbsp;- Attachment content type (0 terminated string, UTF-8
+          encoding.)<br />
+          &nbsp;- Attachment part (0 terminated string, UTF-8 encoding.)<br />
+          &nbsp;- Attachment flag (Integer, 4 bytes)<br />
+        </td>
+      </tr>
+      <tr>
+        <td>Term boost (ID 15)</td>
+        <td>&nbsp;</td>
+        <td>Same as string</td>
+      </tr>
+      <tr>
+        <td>Byte (ID 16)</td>
+        <td>1</td>
+        <td>One single byte</td>
+      </tr>
+      <tr>
+        <td>Set (ID 17)</td>
+        <td>At least 8 bytes</td>
+        <td>
+          Set of any fields are serialized like this:<br />
+          &nbsp;- Integer (4 bytes): Data type set is made up of<br />
+          &nbsp;- Integer (4 bytes): Number of elements in set<br />
+          &nbsp;Below sequence is repeated "number of element" times<br />
+          &nbsp;- Serialized element<br />
+        </td>
+      </tr>
+    </table>
 
-<table  border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Data type serialization</em></caption>
-<tr>
-<td width="15%"><b>Data type</td>
-<td width="10%"><b>Length</td>
-<td><b>Serialization</td>
-</tr>
-<tr><td>Integer (ID 0)</td>
-<td>4</td>
-<td>Signed integer, two's complement notation, network byte order.</td>
-</tr>
-<tr><td>Floating point number (ID 1)</td>
-<td>4</td>
-<td>IEEE 754, single precision, network byte order.</td>
-</tr>
-<tr><td>String (ID 2)</td>
-<td>1 + (1 or 4) + length + 1</td>
-<td>Strings are serialization format:<br />
-&nbsp;- First byte represents coding. This has traditionally denoted the maximum number of bits
-        per character in the UTF-8 encoded string, but has never been used in deserialization code.
-<ul>
-  <li>Set to 32 if not used.</li>
-  <li>Set to &lt;32 if you know the UTF-8 string uses less bits per character; e.g. ASCII could use 8.</li>
-  <li>Set bit 6 (decimal 64) if the string has an <a href="#annotations">annotation tree</a>.</li>
-</ul>
-<br />
-&nbsp;- Integer_1_4 with length of string. <br />
-&nbsp;- The string (UTF-8 encoding), including 0-terminating byte.<br>
-&nbsp;- An annotation tree, if bit 6 (decimal 64) of coding byte is set:
-<ul>
-  <li>total length of all span trees excl. itself: <strong>uint32</strong></li>
-  <li>number of span trees <strong>int_1_2_4</strong></li>
-  <li>for each root node:
-  <ol>
-    <li>tree name serialized as String</li>
-    <li>serialized SpanNode as given below, see <a href="#annotations">annotation serialization</a></li>
-  </ol>
-</ul>
-</td>
-</tr>
-<tr><td>Raw bytes (ID 3)</td>
-<td>Length of buffer</td>
-<td>Byte for byte copy</td>
-</tr>
-<tr><td>Long integer (ID 4)</td>
-<td>8</td>
-<td>Signed integer, two's complement notation, network byte order.</td>
-</tr>
-<tr><td>Double floating point number (ID 5)</td>
-<td>8</td>
-<td>IEEE 754, double precision, network byte order.</td>
-</tr>
-<tr><td>Array (ID 6)</td>
-<td>At least 8 bytes</td>
-<td>Arrays of any fields are serialized like this:<br>
-&nbsp;- 4 bytes: Data type array consists of <br>
-&nbsp;- 4 bytes: Number of elements in array<br>
-&nbsp;Below sequence is repeated "number of element" times<br>
-&nbsp;- 4 bytes: Length of element<br>
-&nbsp;- Serialized element<br>
-</td></tr>
-<tr><td>Fieldmap (ID 7)</td>
-<td>&nbsp;</td>
-<td>Field maps (embedded or not) are defined above</td>
-</tr>
-<tr><td>Document (ID 8)</td>
-<td>&nbsp;</td>
-<td>Document objects (embedded or not) are defined above</td>
-</tr>
-<tr><td>Timestamp (ID 9)</td>
-<td>&nbsp;</td>
-<td>Same as long integer</td>
-</tr>
-<tr><td>Uri (ID 10)</td>
-<td>&nbsp;</td>
-<td>Same as string</td>
-</tr>
-<tr><td>Exact string (ID 11)</td>
-<td>&nbsp;</td>
-<td>Same as string</td>
-</tr>
-<tr><td>Content (ID 12)</td>
-<td>At least 11 bytes</td>
-<td>Content fields are serialized like this:<br>
-&nbsp;- Content type length (1 byte)<br>
-&nbsp;- Content type (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Content encoding length (1 byte)<br>
-&nbsp;- Content encoding (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Content language length (1 byte)<br>
-&nbsp;- Content language (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Content length (Integer, 4 bytes)<br>
-&nbsp;- Content (including 0-terminating char)<br>
-</td>
-</tr>
-<tr><td>Content meta (ID 13)</td>
-<td>At least 12 bytes</td>
-<td>Content (attachment) meta data are serialized like this:<br>
-&nbsp;- Attachment size (Integer, 4 bytes)<br>
-&nbsp;- Attachment name  (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Attachment encoding (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Attachment content type (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Attachment part (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Attachment flag (Integer, 4 bytes)<br>
-</td>
-</tr>
-<tr><td>Term boost (ID 15)</td>
-<td>&nbsp;</td>
-<td>Same as string</td>
-</tr>
-<tr><td>Byte (ID 16)</td>
-<td>1</td>
-<td>One single byte</td>
-</tr>
-<tr><td>Set (ID 17)</td>
-<td>At least 8 bytes</td>
-<td>Set of any fields are serialized like this:<br>
-&nbsp;- Integer (4 bytes): Data type set is made up of<br>
-&nbsp;- Integer (4 bytes): Number of elements in set<br>
-&nbsp;Below sequence is repeated "number of element" times<br>
-&nbsp;- Serialized element<br>
-</td></tr>
-</table>
+    <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+      <caption id="annotations">
+        <em>Annotation tree serialization</em>
+      </caption>
+      <tr>
+        <td width="15%"><b>Data type</b></td>
+        <td width="10%"><b>Length</b></td>
+        <td><b>Serialization</b></td>
+      </tr>
+      <tr>
+        <td>SpanNode (base class)</td>
+        <td>1 + (1, 2 or 4) + Annotation serialization + subclass payload</td>
+        <td>
+          <ul>
+            <li>
+              type <strong>byte</strong> (1: Span, 2: SpanList, 4:
+              AlternateSpanList)
+            </li>
+            <li>number of annotations <strong>int_1_2_4</strong></li>
+            <li>each annotation as given below</li>
+            <li>
+              (remaining payload serialized as given below by subclasses Span,
+              SpanList and AlternateSpanList)
+            </li>
+          </ul>
+        </td>
+      </tr>
+      <tr>
+        <td>Annotation</td>
+        <td>4 + (1, 2 or 4) + (possibly 4 + FieldValue serialization)</td>
+        <td>
+          <ul>
+            <li>
+              MD5 name hash (4 LSBytes) <strong>uint32</strong> (NOTE: 0-127
+              reserved for internal Vespa usage.)
+            </li>
+            <li>length <strong>int_1_2_4</strong></li>
+            <li>
+              the following fields are <em>only</em> present if length &gt; 0:
+              <ul>
+                <li>data type id <strong>uint32</strong></li>
+                <li>
+                  NOTE: no sequence id, as we will rely on annotations being
+                  serialized/deserialized in particular order, so we don't need
+                  to write this explicitly
+                </li>
+                <li>FieldValue as given by its own serialization</li>
+              </ul>
+            </li>
+          </ul>
+        </td>
+      </tr>
+      <tr>
+        <td>Span</td>
+        <td>SpanNode serialization + (1, 2 or 4) + (1, 2 or 4)</td>
+        <td>
+          <ul>
+            <li>serialization from SpanNode base class</li>
+            <li>
+              from index, as given by Java String (UTF-16)
+              <strong>int_1_2_4</strong>
+            </li>
+            <li>
+              length, as given by Java String (UTF-16)
+              <strong>int_1_2_4</strong>
+            </li>
+          </ul>
+        </td>
+      </tr>
+      <tr>
+        <td>SpanList</td>
+        <td>
+          SpanNode serialization + (1, 2 or 4) + n times SpanNode serialization
+        </td>
+        <td>
+          <ul>
+            <li>serialization from SpanNode base class</li>
+            <li>number of children <strong>int_1_2_4</strong></li>
+            <li>
+              each child node serialized as SpanNode (Span, SpanList,
+              AlternateSpanList)
+            </li>
+          </ul>
+        </td>
+      </tr>
+      <tr>
+        <td>AlternateSpanList</td>
+        <td>
+          SpanNode serialization + (1, 2 or 4) + n times (8 + SpanList
+          serialization)
+        </td>
+        <td>
+          <ul>
+            <li>serialization from SpanNode base class</li>
+            <li>number of child trees <strong>int_1_2_4</strong></li>
+            <li>
+              for each child tree:
+              <ul>
+                <li>probability <strong>double</strong></li>
+                <li>serialization as given by SpanList above</li>
+              </ul>
+            </li>
+          </ul>
+        </td>
+      </tr>
+      <tr>
+        <td>AnnotationRef</td>
+        <td>1, 2 or 4</td>
+        <td>
+          AnnotationRef serialization
+          <ul>
+            <li>
+              unique sequence id of annotation being referred to
+              <strong>int1_2_4</strong>
+            </li>
+          </ul>
+        </td>
+      </tr>
+    </table>
 
+    <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+      <caption>
+        <em>Data types used in serialized format</em>
+      </caption>
+      <tr>
+        <td width="15%"><b>Data type</b></td>
+        <td><b>Serialization</b></td>
+      </tr>
+      <tr>
+        <td>Integer_1_4</td>
+        <td>
+          If bit 7 of first byte is unset, coded using 1 byte.<br />
+          If bit 7 of first byte is set, coded using 4 bytes (bit 7 of first
+          byte must be masked away).<br />
+          <em>Range: 0 - 2**31-1.</em>
+        </td>
+      </tr>
+      <tr>
+        <td>Integer_1_2_4</td>
+        <td>
+          If bit 7 of first byte is unset, coded using 1 byte.<br />
+          If bit 7 of first byte is set and bit 6 of first byte is unset, coded
+          using 2 bytes (bit 7 and 6 of first byte must be masked away).<br />
+          If bit 7 and 6 of first byte are set, coded using 4 bytes (bit 7 and 6
+          of first byte must be masked away).<br />
+          <em>Range: 0 - 2**30-1.</em>
+        </td>
+      </tr>
+      <tr>
+        <td>Integer_2_4_8</td>
+        <td>
+          If bit 7 of first byte is unset, coded using 2 byte.<br />
+          If bit 7 of first byte is set and bit 6 of first byte is unset, coded
+          using 4 bytes (bit 7 and 6 of first byte must be masked away).<br />
+          If bit 7 and 6 of first byte are set, coded using 8 bytes (bit 7 and 6
+          of first byte must be masked away).<br />
+          <em>Range: 0 - 2**62-1.</em>
+        </td>
+      </tr>
+    </table>
 
-<a name="annotations"><table  border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Annotation tree serialization</em></caption>
-<tr>
-<td width="15%"><b>Data type</td>
-<td width="10%"><b>Length</td>
-<td><b>Serialization</td>
-</tr>
-<tr>
-<td>SpanNode (base class)</td>
-<td>1 + (1, 2 or 4) + Annotation serialization + subclass payload</td>
-<td>
- <ul>
-<li> type <strong>byte</strong> (1: Span, 2: SpanList, 4: AlternateSpanList)
-</li> <li> number of annotations <strong>int_1_2_4</strong>
-</li> <li> each annotation as given below
-</li> <li> (remaining payload serialized as given below by subclasses Span, SpanList and AlternateSpanList)
-</li></ul>
-</td>
-</tr>
-<tr>
-<td>Annotation</td>
-<td>4 + (1, 2 or 4) + (possibly 4 + FieldValue serialization)</td>
-<td>
-<ul>
-<li>MD5 name hash (4 LSBytes) <strong>uint32</strong> (NOTE: 0-127 reserved for internal Vespa usage.)
-</li> <li> length <strong>int_1_2_4</strong>
-</li> <li> the following fields are <em>only</em> present if length &gt; 0: <ul>
-<li> data type id <strong>uint32</strong>
-</li> <li> NOTE: no sequence id, as we will rely on annotations being serialized/deserialized in particular order, so we don't need to write this explicitly
-</li> <li> FieldValue as given by its own serialization
-</li></ul> 
-</li></ul> 
-</td>
-</tr>
-<tr>
-<td>Span</td>
-<td>SpanNode serialization + (1, 2 or 4) + (1, 2 or 4)</td>
-<td><ul>
-<li>serialization from SpanNode base class</li>
-<li> from index, as given by Java String (UTF-16) <strong>int_1_2_4</strong>
-</li> <li> length, as given by Java String (UTF-16) <strong>int_1_2_4</strong>
-</li></ul>
-</td>
-</tr>
-<tr>
-<td>SpanList</td>
-<td>SpanNode serialization + (1, 2 or 4) + n times SpanNode serialization</td>
-<td>
-<ul>
-<li>serialization from SpanNode base class</li>
-<li> number of children <strong>int_1_2_4</strong>
-</li> <li> each child node serialized as SpanNode (Span, SpanList, AlternateSpanList)
-</li></ul>
-</td>
-</tr>
-<tr>
-<td>AlternateSpanList</td>
-<td>SpanNode serialization + (1, 2 or 4) + n times (8 + SpanList serialization)</td>
-<td><ul>
-<li>serialization from SpanNode base class</li>
-<li> number of child trees <strong>int_1_2_4</strong>
-</li> <li> for each child tree: <ul>
-<li> probability <strong>double</strong>
-</li> <li> serialization as given by SpanList above
-</li></ul> 
-</li></ul>
-</td>
-</tr>
-<tr>
-<td>AnnotationRef</td>
-<td>1, 2 or 4</td>
-<td>AnnotationRef serialization <ul>
-<li> unique sequence id of annotation being referred to <strong>int1_2_4</strong>
-</li></ul>
-</td>
-</tr>
-</table>
+    <h2>Document Update Format</h2>
 
-<table  border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Data types used in serialized format</em></caption>
-<tr>
-<td width="15%"><b>Data type</td>
-<td><b>Serialization</b></td>
-</tr>
-<tr><td>Integer_1_4</td>
-<td>If bit 7 of first byte is unset, coded using 1 byte.<br />
-    If bit 7 of first byte is set, coded using 4 bytes (bit 7 of first byte must be masked away).<br />
-    <em>Range: 0  -  2**31-1.</em></td>
-</tr>
-<tr><td>Integer_1_2_4</td>
-<td>If bit 7 of first byte is unset, coded using 1 byte.<br />
-    If bit 7 of first byte is set and bit 6 of first byte is unset, coded using 2 bytes (bit 7 and 6 of first byte must be masked away).<br />
-    If bit 7 and 6 of first byte are set, coded using 4 bytes (bit 7 and 6 of first byte must be masked away).<br />
-    <em>Range: 0  -  2**30-1.</em></td>
-</td>
-</tr>
-<tr><td>Integer_2_4_8</td>
-<td>If bit 7 of first byte is unset, coded using 2 byte.<br />
-    If bit 7 of first byte is set and bit 6 of first byte is unset, coded using 4 bytes (bit 7 and 6 of first byte must be masked away).<br />
-    If bit 7 and 6 of first byte are set, coded using 8 bytes (bit 7 and 6 of first byte must be masked away).<br />
-    <em>Range: 0  -  2**62-1.</em></td>
-</td>
-</tr>
-</table>
+    <p>This is the description of the serialized document update format.</p>
 
+    <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+      <caption>
+        <em>Document update serialization format</em>
+      </caption>
+      <tr>
+        <td width="10%"><b>Field</b></td>
+        <td width="10%"><b>Type</b></td>
+        <td width="10%"><b>Length</b></td>
+        <td><b>Description</b></td>
+      </tr>
+      <tr>
+        <td>Document ID</td>
+        <td>Bytes</td>
+        <td>&nbsp;</td>
+        <td>Unique ID for document. 0-terminated string, UTF-8 encoding.</td>
+      </tr>
+      <tr>
+        <td>Content byte</td>
+        <td>Byte</td>
+        <td>1 byte</td>
+        <td>Always set to 1</td>
+      </tr>
+      <tr>
+        <td>Document Type</td>
+        <td>Bytes</td>
+        <td>&nbsp;</td>
+        <td>Document type. (0-terminated string, UTF-8 encoding.)</td>
+      </tr>
+      <tr>
+        <td>Number of fields to update</td>
+        <td>Integer</td>
+        <td>4 bytes</td>
+        <td>The number of fields to update</td>
+      </tr>
+      <tr>
+        <td>Serialized field updates</td>
+        <td>Field Update</td>
+        <td>&nbsp;</td>
+        <td>The serialized field updates. See below.</td>
+      </tr>
+    </table>
 
-<h2>Document Update Format</h2>
+    <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+      <caption>
+        <em>Document update serialization format</em>
+      </caption>
+      <tr>
+        <td width="10%"><b>Field</b></td>
+        <td width="10%"><b>Type</b></td>
+        <td width="10%"><b>Length</b></td>
+        <td><b>Description</b></td>
+      </tr>
+      <tr>
+        <td>Field Id</td>
+        <td>Integer</td>
+        <td>4 bytes</td>
+        <td>Field id within document type.</td>
+      </tr>
+      <tr>
+        <td>Number of value updates</td>
+        <td>Integer</td>
+        <td>4 bytes</td>
+        <td>Numer of value updates to this field.</td>
+      </tr>
+      <tr>
+        <td>Serialized field update values</td>
+        <td>Bytes</td>
+        <td>&nbsp;</td>
+        <td>The serialized field update values. See below.</td>
+      </tr>
+    </table>
 
-<p>This is the description of the serialized document update format.</p>
-
-<table  border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Document update serialization format</em></caption>
-<tr>
-<td width="10%"><b>Field</td>
-<td width="10%"><b>Type</td>
-<td width="10%"><b>Length</td>
-<td><b>Description</td>
-</tr>
-<tr><td>Document ID</td>
-<td>Bytes</td>
-<td>&nbsp;</td>
-<td>Unique ID for document. 0-terminated string, UTF-8 encoding.</td>
-</tr>
-<tr><td>Content byte</td>
-<td>Byte</td>
-<td>1 byte</td>
-<td>Always set to 1</td>
-</tr>
-<tr><td>Document Type</td>
-<td>Bytes</td>
-<td>&nbsp;</td>
-<td>Document type. (0-terminated string, UTF-8 encoding.)</td>
-</tr>
-<tr><td>Number of fields to update</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>The number of fields to update</td>
-</tr>
-<tr><td>Serialized field updates</td>
-<td>Field Update</td>
-<td>&nbsp;</td>
-<td>The serialized field updates. See below.</td>
-</tr>
-</table>
-
-<table  border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Document update serialization format</em></caption>
-<tr>
-<td width="10%"><b>Field</td>
-<td width="10%"><b>Type</td>
-<td width="10%"><b>Length</td>
-<td><b>Description</td>
-</tr>
-<tr><td>Field Id</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>Field id within document type.</td>
-</tr>
-<tr><td>Number of value updates</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>Numer of value updates to this field.</td>
-</tr>
-<tr><td>Serialized field update values</td>
-<td>Bytes</td>
-<td>&nbsp;</td>
-<td>The serialized field update values. See below.</td>
-</tr>
-</table>
-
-<table  border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Document update value serialization format</em></caption>
-<tr>
-<td width="10%"><b>Field</td>
-<td width="10%"><b>Type</td>
-<td width="10%"><b>Length</td>
-<td><b>Description</td>
-</tr>
-<tr><th colspan="4">Add Value Update</th></tr>
-<tr><td>Add Value Update ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>25 + 0x1000 for value updates.</td>
-</tr>
-<tr><td>Field serialization</td>
-<td>FieldValue</td>
-<td>&nbsp;</td>
-<td>Serialization of the field to add.</td>
-</tr>
-<tr><td>Weight</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>Weight. Used if update applies to weighted set.</td>
-</tr>
-<tr><th colspan="4">Arithmetic Update</th></tr>
-<tr><td>Arithmetic Update ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>26 + 0x1000 for arithmetic updates.</td>
-</tr>
-<tr><td>Operator ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>Identifies whether this does add, subtract, multiply or divide.</td>
-</tr>
-<tr><td>Operand</td>
-<td>Double</td>
-<td>8 bytes</td>
-<td>The right operand to use in the arithmetic operation.</td>
-</tr>
-<tr><th colspan="4">Assign Update</th></tr>
-<tr><td>Assign Update ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>27 + 0x1000 for assign updates.</td>
-</tr>
-<tr><td>Content flag</td>
-<td>Byte</td>
-<td>1 bytes</td>
-<td>Contains 1 if we have content, 0 if not.</td>
-</tr>
-<tr><td>Field serialization</td>
-<td>FieldValue</td>
-<td>&nbsp;</td>
-<td>Serialization of the field to assign.</td>
-</tr>
-<tr><th colspan="4">Clear Update</th></tr>
-<tr><td>Clear Update ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>28 + 0x1000 for clear updates.</td>
-</tr>
-<tr><th colspan="4">Map Value Update</th></tr>
-<tr><td>Map Value Update ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>29 + 0x1000 for map value updates.</td>
-</tr>
-<tr><td>Field serialization</td>
-<td>FieldValue</td>
-<td>4 bytes</td>
-<td>The field indicating what entry to update.</td>
-</tr>
-<tr><td>Value Update</td>
-<td>Document Value Update</td>
-<td>&nbsp;</td>
-<td>The update operation to apply to the field indicated above.</td>
-</tr>
-<tr><th colspan="4">Remove Value Update</th></tr>
-<tr><td>Remove Update ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>30 + 0x1000 for remove updates.</td>
-</tr>
-<tr><td>Field serialization</td>
-<td>FieldValue</td>
-<td>&nbsp;</td>
-<td>The field indicating what entry to update.</td>
-</tr>
-</table>
-
-</body>
+    <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+      <caption>
+        <em>Document update value serialization format</em>
+      </caption>
+      <tr>
+        <td width="10%"><b>Field</b></td>
+        <td width="10%"><b>Type</b></td>
+        <td width="10%"><b>Length</b></td>
+        <td><b>Description</b></td>
+      </tr>
+      <tr>
+        <th colspan="4">Add Value Update</th>
+      </tr>
+      <tr>
+        <td>Add Value Update ID</td>
+        <td>Integer</td>
+        <td>4 bytes</td>
+        <td>25 + 0x1000 for value updates.</td>
+      </tr>
+      <tr>
+        <td>Field serialization</td>
+        <td>FieldValue</td>
+        <td>&nbsp;</td>
+        <td>Serialization of the field to add.</td>
+      </tr>
+      <tr>
+        <td>Weight</td>
+        <td>Integer</td>
+        <td>4 bytes</td>
+        <td>Weight. Used if update applies to weighted set.</td>
+      </tr>
+      <tr>
+        <th colspan="4">Arithmetic Update</th>
+      </tr>
+      <tr>
+        <td>Arithmetic Update ID</td>
+        <td>Integer</td>
+        <td>4 bytes</td>
+        <td>26 + 0x1000 for arithmetic updates.</td>
+      </tr>
+      <tr>
+        <td>Operator ID</td>
+        <td>Integer</td>
+        <td>4 bytes</td>
+        <td>Identifies whether this does add, subtract, multiply or divide.</td>
+      </tr>
+      <tr>
+        <td>Operand</td>
+        <td>Double</td>
+        <td>8 bytes</td>
+        <td>The right operand to use in the arithmetic operation.</td>
+      </tr>
+      <tr>
+        <th colspan="4">Assign Update</th>
+      </tr>
+      <tr>
+        <td>Assign Update ID</td>
+        <td>Integer</td>
+        <td>4 bytes</td>
+        <td>27 + 0x1000 for assign updates.</td>
+      </tr>
+      <tr>
+        <td>Content flag</td>
+        <td>Byte</td>
+        <td>1 bytes</td>
+        <td>Contains 1 if we have content, 0 if not.</td>
+      </tr>
+      <tr>
+        <td>Field serialization</td>
+        <td>FieldValue</td>
+        <td>&nbsp;</td>
+        <td>Serialization of the field to assign.</td>
+      </tr>
+      <tr>
+        <th colspan="4">Clear Update</th>
+      </tr>
+      <tr>
+        <td>Clear Update ID</td>
+        <td>Integer</td>
+        <td>4 bytes</td>
+        <td>28 + 0x1000 for clear updates.</td>
+      </tr>
+      <tr>
+        <th colspan="4">Map Value Update</th>
+      </tr>
+      <tr>
+        <td>Map Value Update ID</td>
+        <td>Integer</td>
+        <td>4 bytes</td>
+        <td>29 + 0x1000 for map value updates.</td>
+      </tr>
+      <tr>
+        <td>Field serialization</td>
+        <td>FieldValue</td>
+        <td>4 bytes</td>
+        <td>The field indicating what entry to update.</td>
+      </tr>
+      <tr>
+        <td>Value Update</td>
+        <td>Document Value Update</td>
+        <td>&nbsp;</td>
+        <td>The update operation to apply to the field indicated above.</td>
+      </tr>
+      <tr>
+        <th colspan="4">Remove Value Update</th>
+      </tr>
+      <tr>
+        <td>Remove Update ID</td>
+        <td>Integer</td>
+        <td>4 bytes</td>
+        <td>30 + 0x1000 for remove updates.</td>
+      </tr>
+      <tr>
+        <td>Field serialization</td>
+        <td>FieldValue</td>
+        <td>&nbsp;</td>
+        <td>The field indicating what entry to update.</td>
+      </tr>
+    </table>
+  </body>
 </html>
author	Kristian Aune <kraune@verizonmedia.com>	2023-09-20 15:48:55 +0200
committer	Kristian Aune <kraune@verizonmedia.com>	2023-09-20 15:48:55 +0200
commit	09a2a7c21a1ca82a4ca56a449283217fedf8125b (patch)
tree	6d442e73905068c18f86f64a658d33b59b26cda2 /document/doc
parent	16050559773b552387bc702aa15c24ab30ece982 (diff)