summaryrefslogtreecommitdiffstats
path: root/vespaclient-java/src/main/sh/vespa-visit.1
diff options
context:
space:
mode:
Diffstat (limited to 'vespaclient-java/src/main/sh/vespa-visit.1')
-rw-r--r--vespaclient-java/src/main/sh/vespa-visit.1160
1 files changed, 160 insertions, 0 deletions
diff --git a/vespaclient-java/src/main/sh/vespa-visit.1 b/vespaclient-java/src/main/sh/vespa-visit.1
new file mode 100644
index 00000000000..7b8f7521865
--- /dev/null
+++ b/vespaclient-java/src/main/sh/vespa-visit.1
@@ -0,0 +1,160 @@
+." Copyright 2017 Yahoo Inc. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
+.TH VESPAVISIT 1 2008-03-07 "Vespa" "Vespa Documentation"
+.SH NAME
+vespa-visit \- Visit documents from a Vespa installation
+.SH SYNPOSIS
+.B vespa-visit
+[\fIOPTION\fR]...
+.SH DESCRIPTION
+.PP
+In the regular case, retrieve documents stored in VESPA, and either print
+them to STDOUT or send them to a given MessageBus route.
+.PP
+A Vespa visit operation processes a set of stored documents, in undefined
+order, locally on the storage nodes where they are stored. A visitor library
+available on all storage nodes will receive the documents stored locally, and
+can process these and send messages to the visitor data handler. The regular
+case is to use the DumpVisitor library to merely send the documents themselves
+in blocks back to the data handler, which by default is this client that will
+write the documents to STDOUT.
+.PP
+Mandatory arguments to long options are mandatory for short options too.
+Short options can not currently be concatenated together.
+.TP
+\fB\-s\fR, \fB\-\-selection\fR \fISELECTION\fR
+A document selection string, specifying what documents to visit. Documentation
+on the language itself can be found in the documentation. Note that this argument
+should probably be quoted to prevent your shell from invalidating your
+selection.
+.TP
+\fB\-f\fR, \fB\-\-from\fR \fITIME\fR
+If this option is given, only documents from given timestamp or newer will be
+visited. The time is given in microseconds since 1970.
+.TP
+\fB\-t\fR, \fB\-\-to\fR \fITIME\fR
+If this option is given, only documents up to and including the given timestamp
+will be visited. The time is given in microseconds since 1970.
+.TP
+\fB\-e\fR, \fB\-\-headersonly\fR
+By default, the whole documents stored are processed. If this option is given
+only the header parts of documents will be processed. By defining the big
+document fields as body fields, you can efficiently visit all the header fields
+using this option.
+.TP
+\fB\-i\fR, \fB\-\-printids\fR
+Using this option, only the document identifiers will be printed to STDOUT.
+In addition, if visiting removes, an additional tag will be added so you can
+see whether document has been removed or not. This option implies headers only
+visiting, and can only be used if no datahandler is specified.
+.TP
+\fB\-d\fR, \fB\-\-datahandler\fR \fIVISITTARGET\fR
+The data handler is the destination of messages sent from the visitor library.
+By default, the data handler is the vespa-visit process you start, which will
+merely print all returned data to STDOUT. A visit target can be specified
+instead. See the chapter below on visit targets.
+.TP
+\fB\-p\fR, \fB\-\-progress\fR \fIFILE\fR
+By setting a progress file, current visitor progress will be saved to this
+file at regular intervals. If this file exists on startup, the visitor will
+continue from this point.
+.TP
+\fB\-o\fR, \fB\-\-timeout\fR \fITIMEOUT\fR
+Time out the visitor after given number of milliseconds.
+.TP
+\fB\-r\fR, \fB\-\-visitremoves\fR
+By default, only documents existing in Vespa will be processed. By giving
+this option, also entries identifying documents previously existing will
+be returned. This is useful for secondary copies of data that wants to know
+whether documents it has stored has been removed. Note that documents deleted
+a long time ago will no longer be tracked. Vespa keeps remove entries for
+a configurable amount of time.
+.TP
+\fB\-m\fR, \fB\-\-maxpending\fR \fINUM\fR
+Maximum pending docblock messages to data handlers. This may be used to
+increase or reduce visiting speed, but should not be set too high so that data
+handlers run out of memory. To get an estimate of memory consumption on each
+data handler, multiply maxpending with defaultdocblocksize in stor-visitor
+config and divide by number of data handlers. Default value for maxpending is
+16.
+.TP
+\fB\-c\fR, \fB\-\-cluster\fR \fICLUSTER\fR
+Visit the given VDS cluster.
+.TP
+\fB\-v\fR, \fB\-\-verbose\fR
+More verbose output. Indent XML and add progress and info to STDERR.
+.TP
+\fB\-h\fR, \fB\-\-help\fR
+Shows a short syntax reminder.
+.PP
+Advanced options:
+.PP
+The below options are used for advanced usage or for testing.
+.TP
+\fB\-\-visitlibrary\fR \fILIBRARY\fR
+By default, the DumpVisitor library, sending documents back to the data handler,
+is used when visiting. Another library can be specified using this option. The
+library filename should be the name given here, with lib prepended and .so
+appended.
+.TP
+\fB\-\-libraryparam\fR \fIKEY\fR \fIVALUE\fR
+The default DumpVisitor library has no options to set, but custom libraries
+may need user specifiable options. Here such options can be specified. Look
+at visitor library documentation for legal parameters.
+.TP
+\fB\-\-polling\fR \fIarg\fR
+The document API implements both a polling and a callback visitor API. The
+callback API is most efficient and used by default. The polling API might be
+simpler for users used to such APIs. Some VESPA system tests use this option
+to test that the polling API works.
+.TP
+\fB\-\-visitinconsistentbuckets\fR
+In some cases Vespa may temporarily be in an inconsistent state, that is,
+different nodes contain different copies of the data. Collections of documents
+are grouped into so-called buckets. The normal behavior of visiting is to wait
+for the inconsistencies to resolve before actually visiting the data. This
+might be a problem for time critical applications. Setting this option will
+result in the bucket copy with most documents to be visited in case of
+inconsistencies, which means that the data returned by the visitor are not
+guaranteed to be correct.
+.SH VISIT TARGET
+Results from visiting can be sent to many different kind of targets.
+.TP
+\fBMessage bus routes\fR
+You can specify a message bus route name directly, and this route will be used
+to send the results. This is typically used when doing reprocessing within
+Vespa. Message bus routes are set up in the application package. In addition
+some routes may have been autogenerated in simple setups, for instance a
+route called \fIdefault\fR is generated if your setup is so simple that Vespa
+can guess where you want to send your data.
+.TP
+\fBSlobrok address\fR
+You can also specify a slobrok address for data to be sent to. A slobrok address
+is a slash separated path where you can use asterisk to mean any element within
+this path. For instance, if you have a docproc cluster called \fImydpcluster\fR
+it will have registered its nodes with slobrok names like
+\fIdocproc/cluster.mydpcluster/docproc/0/feed_processor\fR, where the 0 here
+indicates the first node in the cluster. You can thus specify to send visit data
+to this docproc cluster by stating a slobrok address of
+\fIdocproc/cluster.mydpcluster/docproc/*/feed_processor\fR. Note that this will
+not send all the data to one or all the nodes. The data sent from the visitor
+will be distributed among the matching nodes, but each message will just be sent
+to one node.
+
+Slobrok names may also be used if you use the \fBvespa-visit-target\fR tool to
+retrieve the data at some location. If you start vespa-visit-target on two nodes,
+listening to slobrok names \fImynode/0/visit-destination\fR and
+\fImynode/1/visit-destination\fR you can send the results to these nodes by
+specifying \fImynode/*/visit-destination\fR as the data handler. See
+\fBman vespa-visit-target\fR for naming conventions used for such targets.
+.TP
+\fBTCP socket\fR
+TCP sockets can also be specified directly. This requires that the endpoint
+speaks FNET RPC though. This is typically done, either by using the
+\fBvespa-visit-target\fR tool, or by using a visitor destination programmatically
+by using utility class in the document API. A socket address looks like the
+following: tcp/\fIhostname\fR:\fIport\fR/\fIservicename\fR. For instance, an
+address generated by the \fBvespa-visit-target\fR tool might look like the
+following: \fItcp/myhost.com:12345/visit-destination\fR.
+
+.SH AUTHOR
+Written by Haakon Humberset.