vespa-fbench - fastserver benchmarking program
----------------------------------------------


1 Installing vespa-fbench
-------------------------

vespa-fbench is distributed together with Vespa in the published RPM
and Docker image. Using the pre-built vespa-fbench is preferred, but
if you have to compile it yourself consult the README.md in
https://github.com/vespa-engine/vespa. 

Installing on Vespa CentOS / RHEL 7:
  yum-config-manager --add-repo \
    https://copr.fedorainfracloud.org/coprs/g/vespa/vespa/repo/epel-7/group_vespa-vespa-epel-7.repo
  yum -y install epel-release centos-release-scl
  yum -y install vespa

The above installation provides the follwing vespa-fbench executables:
  /opt/vespa/bin/vespa-fbench
  /opt/vespa/bin/vespa-fbench-filter-file
  /opt/vespa/bin/vespa-fbench-geturl
  /opt/vespa/bin/vespa-fbench-split-file

It is also possible to use Docker to directly execute vespa-fbench by
using the pre-built Vespa docker image:
  docker run --entrypoint /opt/vespa/bin/vespa-fbench \
    docker.io/vespaengine/vespa <your fbench options>

2 Benchmark overview
--------------------

vespa-fbench measures the performance of the server by running a number of
clients that send requests to the server in parallel. Each client has
its own input file containing urls that should be requested from the
server. When benchmarking fastserver, the urls contained in these
files correspond to searches. Before you may start benchmarking you
must collect the query urls to be used and distribute them into a
number of files depending on how many clients you are planning to run
in parallel. The most realistic results are obtained by using access
logs collected by fastserver itself from actual usage (AllTheWeb is a
good place to look for such logs). You should always collect enough
query urls to perform a single test run without having to reuse any
queries.


3 Preparing the test data
-------------------------

This step assumes you have obtained some fastserver access log
files. The first step is to extract the query urls from the logs. This
is done with the 'vespa-fbench-filter-file' program.

| usage: vespa-fbench-filter-file [-a] [-h]
| 
| Read concatenated fastserver logs from stdin and write
| extracted query urls to stdout.
| 
|  -a : all parameters to the original query urls are preserved.
| 	If the -a switch is not given, only 'query' and 'type'
| 	parameters are kept in the extracted query urls.
|  -h : print this usage information.

You then need to split the query urls into a number of files. This is
done with the 'splitfile' program.

| usage: splitfile [-p pattern] <numparts> [<file>]
| 
|  -p pattern : output name pattern ['query%03d.txt']
|  <numparts> : number of output files to generate.
| 
| Reads from <file> (stdin if <file> is not given) and
| randomly distributes each line between <numpart> output
| files. The names of the output files are generated by
| combining the <pattern> with sequential numbers using
| the sprintf function.

Since each parallel client should have its own file, you should split
the query urls into at least as many files as the number of clients
you are planning to run.

Example: the file 'logs' contains access logs from fastserver. You
want to extract the query urls from it and save the query urls into
200 separate files (because you are planning to run 200 clients when
benchmarking). You may do the following:

$ cat logs | bin/vespa-fbench-filter-file | bin/splitfile 200

This will create 200 files with names 'query000.txt', 'query001.txt',
'query002.txt' etc. You may control the filename pattern of the output
files by using the -p switch with the 'splitfile' program.


4 Running a single test
-----------------------

You are now ready to begin benchmarking. The actual benchmarking is
done with the vespa-fbench program. vespa-fbench usage information ([] are used to
mark optional parameters and default values):

| usage: vespa-fbench [-n numClients] [-c cycleTime] [-l limit] [-i ignoreCount]
| 		[-s seconds] [-q queryFilePattern] [-o outputFilePattern]
| 		[-r restartLimit] [-k] <hostname> <port>
| 
|  -n <num> : run with <num> parallel clients [10]
|  -c <num> : each client will make a request each <num> milliseconds [0]
| 	      ('-1' -> cycle time should be twice the response time)
|  -l <num> : minimum response size for successful requests [0]
|  -i <num> : do not log the <num> first results. -1 means no logging [0]
|  -s <num> : run the test for <num> seconds. -1 means forever [60]
|  -q <str> : pattern defining input query files ['query%03d.txt']
| 	      (the pattern is used with sprintf to generate filenames)
|  -o <str> : save query results to output files with the given pattern
| 	      (default is not saving.)
|  -r <num> : number of times to re-use each query file. -1 means no limit [-1]
|  -k       : disable HTTP keep-alive.
| 
|  <hostname> : the host you want to benchmark.
|  <port>     : the port to use when contacting the host.

The only mandatory parameters are the hostname and the port of the
server you want to benchmark.

Example: You want to test just how well fastserver does under massive
preassure by letting 200 clients search continuously as fast as they
can (they issue new queries immediately after the results from the
previous query are obtained). Assuming you have at least 200 query
files with default filename pattern you may do the following:

$ bin/vespa-fbench -n 200 -c 0 <host> <port>

This will run the test over a period of 60 seconds. Use the -s option
to change the duration of the test.

Example: You want to manually observe fastserver with a certain amount
of load. You may use vespa-fbench to produce 'background noise' by using the
-s option with argument 0, like this:

$ bin/vespa-fbench -n 50 -c 1000 -s 0 <host> <port>

This will start 50 clients that ask at most 1 query per second each,
giving a maximum load of 50 queries per second if the server allows
it. This test run will run forever due to the '-s 0' option given.


5 Understanding Benchmarking Results
------------------------------------

After a test run has completed, vespa-fbench outputs various test
results. This section will explain what each of these numbers mean.

'connection reuse count' This value indicates how many times HTTP
                         connections were reused to issue another
                         request. Note that this number will only be
                         displayed if the -k switch (disable HTTP
                         keep-alive) is not used.

'clients'                Echo of the -n parameter.

'cycle time'             Echo of the -c parameter.

'lower response limit'   Echo of the -l parameter.

'skipped requests'       Number of requests that was skipped by
                         vespa-fbench. vespa-fbench will typically skip a request
                         if the line containing the query url exceeds
                         a pre-defined limit. Skipped requests will
                         have minimal impact on the statistical
                         results.

'failed requests'        The number of failed requests. A request will be
                         marked as failed if en error occurred while
                         reading the result or if the result contained
                         less bytes than 'lower response limit'.

'successful requests'    Number of successful requests. Each performed
                         request is counted as either successful or
                         failed. Skipped requests (see above) are not
                         performed and therefore not counted.

'cycles not held'        Number of cycles not held. The cycle time is
                         specified with the -c parameter. It defines
                         how often a client should perform a new
                         request. However, a client may not perform
                         another request before the result from the
                         previous request has been obtained. Whenever a
                         client is unable to initiate a new request
                         'on time' due to not being finished with the
                         previous request, this value will be
                         increased.

'minimum response time'  The minimum response time. The response time
                         is measured as the time period from just
                         before the request is sent to the server,
                         till the result is obtained from the server.

'maximum response time'  The maximum response time. The response time
                         is measured as the time period from just
                         before the request is sent to the server,
                         till the result is obtained from the server.

'average response time'  The average response time. The response time
                         is measured as the time period from just
                         before the request is sent to the server,
                         till the result is obtained from the server.

'X percentile'           The X percentile of the response time samples;
                         a value selected such that X percent of the
                         response time samples are below this
                         value. In order to calculate percentiles, a
                         histogram of response times is maintained for
                         each client at runtime and merged after the
                         test run ends. If a percentile value exceeds
                         the upper bound of this histogram, it will be
                         approximated (and thus less accurate) and
                         marked with '(approx)'.

'max query rate'         The cycle time tells each client how often it
                         should perform a request. If a client is not
                         able to perform a new request on time due to
                         a previous outstanding request, this
                         increases the overtime counter, and the
                         client will preform the next request as soon
                         as the previous one is completed. The
                         opposite may also happen; a request may
                         complete in less than the cycle time. In this
                         case the client will wait the remaining time
                         of the cycle before performing another
                         request. The max query rate is an
                         extrapolated value indicating what the query
                         rate would be if no client would wait for the
                         completion of cycles, and that the average
                         response time would not increase. NOTE: This
                         number is only supplied as an inverse of the
                         average response time and should NEVER be
                         used to describe the query rate of a server.

'actual query rate'      The average number of queries per second; QPS.

'utilization'            The percentage of time used waiting for
                         the server to complete (successful)
                         requests. Note that if a request fails, the
                         utilization will drop since the client has
                         'wasted' the time spent on the failed
                         request.