summaryrefslogtreecommitdiffstats
path: root/eval
diff options
context:
space:
mode:
authorHaavard <havardpe@yahoo-inc.com>2017-05-02 15:27:11 +0000
committerHaavard <havardpe@yahoo-inc.com>2017-05-02 15:27:11 +0000
commit9b3b15a8f6ea14cd0d81d4714b253761cd4309d4 (patch)
tree6b1d812e9ec80857a5c19439ddae19213b61d695 /eval
parentc638dea6f73bb3b250326f2c386b18a2120abfc3 (diff)
added description of new mixed serialization format
... and how it relates to old formats (sparse/dense)
Diffstat (limited to 'eval')
-rw-r--r--eval/src/vespa/eval/tensor/serialization/format.txt37
1 files changed, 37 insertions, 0 deletions
diff --git a/eval/src/vespa/eval/tensor/serialization/format.txt b/eval/src/vespa/eval/tensor/serialization/format.txt
new file mode 100644
index 00000000000..9d0a387c36a
--- /dev/null
+++ b/eval/src/vespa/eval/tensor/serialization/format.txt
@@ -0,0 +1,37 @@
+This file explains how the typed binary formats of serialized tensors
+for different archetypes (sparse[1], dense[2] and mixed[3]) can be
+interpreted as a single unified binary format. The description below
+uses data types defined by document serialization (nbostream) combined
+with some comments and python-inspired flow-control. The mixed[3]
+binary format is defined in such a way that it overlays as
+effortlessly as possible with both existing formats. The only thing
+needed to go from sparse[1] or dense[2] binary formats to the mixed[3]
+format for a specific tensor is to add a single byte indicating there
+are no dimensions of the other kind (mapped/indexed).
+
+byte: type (1:sparse, 2:dense, 3:mixed)
+ bit 0 -> 'sparse'
+ bit 1 -> 'dense'
+ (mixed tensors are tagged as both 'sparse' and 'dense')
+
+if ('sparse'):
+ 1_4_int: number of mapped dimensions -> ''n_mapped'
+ 'n_mapped' times: (sorted by dimension name)
+ small_string: dimension name
+
+if ('dense'):
+ 1_4_int: number of indexed dimensions -> 'n_indexed'
+ 'n_indexed' times: (sorted by dimension name)
+ small_string: dimensions name
+ 1_4_int: dimensions size (must be at least 1) -> 'size_i'
+
+if ('n_mapped > 0'):
+ 1_4_int: number of named dense sub-spaces -> 'n_blocks'
+else:
+ 'n_blocks' = 1 (a single dense space)
+
+'n_blocks' times:
+ 'n_mapped' times:
+ small_string: dimension label (same order as dimension names)
+ prod('size_i') times: (product of all indexed dimension sizes)
+ double: cell value (last indexed dimension is nested innermost)