diff options
author | Geir Storli <geirstorli@yahoo.no> | 2017-05-03 10:53:30 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2017-05-03 10:53:30 +0200 |
commit | 478f3db664db1a335cf90bf022297371783cdd77 (patch) | |
tree | 21ee513bbf3fd883341bf13746919823ec229b86 /eval | |
parent | 3cd9c747e62ee82edfb5fedf4415f5daa45b3ab3 (diff) | |
parent | b4af39d746ad0993b646fabf5114f3cdfb49030e (diff) |
Merge pull request #2340 from yahoo/havardpe/mixed-tensor-binary-format-spec
added description of new mixed serialization format
Diffstat (limited to 'eval')
-rw-r--r-- | eval/src/vespa/eval/tensor/serialization/format.txt | 42 |
1 files changed, 42 insertions, 0 deletions
diff --git a/eval/src/vespa/eval/tensor/serialization/format.txt b/eval/src/vespa/eval/tensor/serialization/format.txt new file mode 100644 index 00000000000..8c5d3b331d2 --- /dev/null +++ b/eval/src/vespa/eval/tensor/serialization/format.txt @@ -0,0 +1,42 @@ +This file explains how the typed binary formats of serialized tensors +for different archetypes (sparse[1], dense[2] and mixed[3]) can be +interpreted as a single unified binary format. The description below +uses data types defined by document serialization (nbostream) combined +with some comments and python-inspired flow-control. The mixed[3] +binary format is defined in such a way that it overlays as +effortlessly as possible with both existing formats. + +//----------------------------------------------------------------------------- + +byte: type (1:sparse, 2:dense, 3:mixed) + bit 0 -> 'sparse' + bit 1 -> 'dense' + (mixed tensors are tagged as both 'sparse' and 'dense') + +if ('sparse'): + 1_4_int: number of mapped dimensions -> 'n_mapped' + 'n_mapped' times: (sorted by dimension name) + small_string: dimension name + +if ('dense'): + 1_4_int: number of indexed dimensions -> 'n_indexed' + 'n_indexed' times: (sorted by dimension name) + small_string: dimensions name + 1_4_int: dimensions size (must be at least 1) -> 'size_i' + +if ('n_mapped' > 0 || !'dense'): + 1_4_int: number of named dense sub-spaces -> 'n_blocks' +else: + 'n_blocks' = 1 (a single dense space) + +'n_blocks' times: + 'n_mapped' times: + small_string: dimension label (same order as dimension names) + prod('size_i') times: (product of all indexed dimension sizes) + double: cell value (last indexed dimension is nested innermost) + +//----------------------------------------------------------------------------- + +Note: A tensor with no dimensions should not be serialized as +sparse[1], but when it is, it will contain an integer indicating the +number of cells. |