vespa-fbench - fastserver benchmarking program
----------------------------------------------


1 Installing vespa-fbench
-------------------------

vespa-fbench is distributed together with Vespa in the published RPM
and Docker image. Using the pre-built vespa-fbench is preferred, but
if you have to compile it yourself consult the README.md in
https://github.com/vespa-engine/vespa. 

Installing on Vespa CentOS / RHEL 7:
  yum-config-manager --add-repo \
    https://copr.fedorainfracloud.org/coprs/g/vespa/vespa/repo/epel-7/group_vespa-vespa-epel-7.repo
  yum -y install epel-release centos-release-scl
  yum -y install vespa

The above installation provides the follwing vespa-fbench executables:
  /opt/vespa/bin/vespa-fbench
  /opt/vespa/bin/vespa-fbench-filter-file
  /opt/vespa/bin/vespa-fbench-geturl
  /opt/vespa/bin/vespa-fbench-result-filter.pl
  /opt/vespa/bin/vespa-fbench-split-file

Additional utilities referenced in this document can be fetched from
https://github.com/vespa-engine/vespa/tree/master/fbench/util:
  plot.pl
  pretest.sh
  runtests.sh
  separate.pl

It is also possible to use Docker to directly execute vespa-fbench by
using the pre-built Vespa docker image:
  docker run --entrypoint /opt/vespa/bin/vespa-fbench \
    docker.io/vespaengine/vespa <your fbench options>

2 Benchmark overview
--------------------

vespa-fbench measures the performance of the server by running a number of
clients that send requests to the server in parallel. Each client has
its own input file containing urls that should be requested from the
server. When benchmarking fastserver, the urls contained in these
files correspond to searches. Before you may start benchmarking you
must collect the query urls to be used and distribute them into a
number of files depending on how many clients you are planning to run
in parallel. The most realistic results are obtained by using access
logs collected by fastserver itself from actual usage (AllTheWeb is a
good place to look for such logs). You should always collect enough
query urls to perform a single test run without having to reuse any
queries.


3 Preparing the test data
-------------------------

This step assumes you have obtained some fastserver access log
files. The first step is to extract the query urls from the logs. This
is done with the 'vespa-fbench-filter-file' program.

| usage: vespa-fbench-filter-file [-a] [-h]
| 
| Read concatenated fastserver logs from stdin and write
| extracted query urls to stdout.
| 
|  -a : all parameters to the original query urls are preserved.
| 	If the -a switch is not given, only 'query' and 'type'
| 	parameters are kept in the extracted query urls.
|  -h : print this usage information.

You then need to split the query urls into a number of files. This is
done with the 'splitfile' program.

| usage: splitfile [-p pattern] <numparts> [<file>]
| 
|  -p pattern : output name pattern ['query%03d.txt']
|  <numparts> : number of output files to generate.
| 
| Reads from <file> (stdin if <file> is not given) and
| randomly distributes each line between <numpart> output
| files. The names of the output files are generated by
| combining the <pattern> with sequential numbers using
| the sprintf function.

Since each parallel client should have its own file, you should split
the query urls into at least as many files as the number of clients
you are planning to run.

Example: the file 'logs' contains access logs from fastserver. You
want to extract the query urls from it and save the query urls into
200 separate files (because you are planning to run 200 clients when
benchmarking). You may do the following:

$ cat logs | bin/vespa-fbench-filter-file | bin/splitfile 200

This will create 200 files with names 'query000.txt', 'query001.txt',
'query002.txt' etc. You may control the filename pattern of the output
files by using the -p switch with the 'splitfile' program.


4 Running a single test
-----------------------

You are now ready to begin benchmarking. The actual benchmarking is
done with the vespa-fbench program. vespa-fbench usage information ([] are used to
mark optional parameters and default values):

| usage: vespa-fbench [-n numClients] [-c cycleTime] [-l limit] [-i ignoreCount]
| 		[-s seconds] [-q queryFilePattern] [-o outputFilePattern]
| 		[-r restartLimit] [-k] <hostname> <port>
| 
|  -n <num> : run with <num> parallel clients [10]
|  -c <num> : each client will make a request each <num> milliseconds [1000]
| 	      ('-1' -> cycle time should be twice the response time)
|  -l <num> : minimum response size for successful requests [0]
|  -i <num> : do not log the <num> first results. -1 means no logging [0]
|  -s <num> : run the test for <num> seconds. -1 means forever [60]
|  -q <str> : pattern defining input query files ['query%03d.txt']
| 	      (the pattern is used with sprintf to generate filenames)
|  -o <str> : save query results to output files with the given pattern
| 	      (default is not saving.)
|  -r <num> : number of times to re-use each query file. -1 means no limit [-1]
|  -k       : disable HTTP keep-alive.
| 
|  <hostname> : the host you want to benchmark.
|  <port>     : the port to use when contacting the host.

The only mandatory parameters are the hostname and the port of the
server you want to benchmark.

Example: You want to test just how well fastserver does under massive
preassure by letting 200 clients search continuously as fast as they
can (they issue new queries immediately after the results from the
previous query are obtained). Assuming you have at least 200 query
files with default filename pattern you may do the following:

$ bin/vespa-fbench -n 200 -c 0 <host> <port>

This will run the test over a period of 60 seconds. Use the -s option
to change the duration of the test.

Example: You want to manually observe fastserver with a certain amount
of load. You may use vespa-fbench to produce 'background noise' by using the
-s option with argument 0, like this:

$ bin/vespa-fbench -n 50 -c 1000 -s 0 <host> <port>

This will start 50 clients that ask at most 1 query per second each,
giving a maximum load of 50 queries per second if the server allows
it. This test run will run forever due to the '-s 0' option given.


5 Understanding Benchmarking Results
------------------------------------

After a test run has completed, vespa-fbench outputs various test
results. This section will explain what each of these numbers mean.

'connection reuse count' This value indicates how many times HTTP
                         connections were reused to issue another
                         request. Note that this number will only be
                         displayed if the -k switch (disable HTTP
                         keep-alive) is not used.

'clients'                Echo of the -n parameter.

'cycle time'             Echo of the -c parameter.

'lower response limit'   Echo of the -l parameter.

'skipped requests'       Number of requests that was skipped by
                         vespa-fbench. vespa-fbench will typically skip a request
                         if the line containing the query url exceeds
                         a pre-defined limit. Skipped requests will
                         have minimal impact on the statistical
                         results.

'failed requests'        The number of failed requests. A request will be
                         marked as failed if en error occurred while
                         reading the result or if the result contained
                         less bytes than 'lower response limit'.

'successful requests'    Number of successful requests. Each performed
                         request is counted as either successful or
                         failed. Skipped requests (see above) are not
                         performed and therefore not counted.

'cycles not held'        Number of cycles not held. The cycle time is
                         specified with the -c parameter. It defines
                         how often a client should perform a new
                         request. However, a client may not perform
                         another request before the result from the
                         previous request has been obtained. Whenever a
                         client is unable to initiate a new request
                         'on time' due to not being finished with the
                         previous request, this value will be
                         increased.

'minimum response time'  The minimum response time. The response time
                         is measured as the time period from just
                         before the request is sent to the server,
                         till the result is obtained from the server.

'maximum response time'  The maximum response time. The response time
                         is measured as the time period from just
                         before the request is sent to the server,
                         till the result is obtained from the server.

'average response time'  The average response time. The response time
                         is measured as the time period from just
                         before the request is sent to the server,
                         till the result is obtained from the server.

'X percentile'           The X percentile of the response time samples;
                         a value selected such that X percent of the
                         response time samples are below this
                         value. In order to calculate percentiles, a
                         histogram of response times is maintained for
                         each client at runtime and merged after the
                         test run ends. If a percentile value exceeds
                         the upper bound of this histogram, it will be
                         approximated (and thus less accurate) and
                         marked with '(approx)'.

'max query rate'         The cycle time tells each client how often it
                         should perform a request. If a client is not
                         able to perform a new request on time due to
                         a previous outstanding request, this
                         increases the overtime counter, and the
                         client will preform the next request as soon
                         as the previous one is completed. The
                         opposite may also happen; a request may
                         complete in less than the cycle time. In this
                         case the client will wait the remaining time
                         of the cycle before performing another
                         request. The max query rate is an
                         extrapolated value indicating what the query
                         rate would be if no client would wait for the
                         completion of cycles, and that the average
                         response time would not increase. NOTE: This
                         number is only supplied as an inverse of the
                         average response time and should NEVER be
                         used to describe the query rate of a server.

'actual query rate'      The average number of queries per second; QPS.

'utilization'            The percentage of time used waiting for
                         the server to complete (successful)
                         requests. Note that if a request fails, the
                         utilization will drop since the client has
                         'wasted' the time spent on the failed
                         request.


6 Running test series
---------------------

For more complete benchmarking you will want to combine the results
from several test runs and present them together in a graph or maybe a
spreadsheet. The perl script vespa-fbench-result-filter.pl may be used to convert
the output from vespa-fbench into a single line of numbers. Lines of numbers
produced from several test runs may then be concatenated into the same
text file and used to plot a graph with gnuplot or imported into an
application accepting structured text files (like Excel).

The task described above is performed by the runtests.sh script. It
runs vespa-fbench several times with varying client count and cycle
time. Between each test run, the script pretest.sh (located in the bin
directory) is run. The pretest.sh script should make sure that the
server you want to benchmark is in the same state before each of the
test runs. This typically means that the caches should be cleared. The
supplied pretest.sh file does nothing, and should therefore be
modified to fit your needs before you start benchmarking with the
runtests.sh script. NOTE: 'runtests.sh' must be run from the vespa-fbench
install directory in order to find the scripts and programs it depends
on.  (vespa-fbench is run as 'bin/vespa-fbench' etc.).

| usage: runtests.sh [-o] [-l] <minClients> <maxClients> <deltaClients>
| 	   <minCycle> <maxCycle> <deltaCycle> [vespa-fbench options] <hostname> <port>
| 
| The number of clients varies from <minClients> to <maxClients> with
| <deltaClients> increments. For each client count, the cycle time will
| vary in the same way according to <minCycle>, <maxCycle> and <deltaCycle>.
| vespa-fbench is run with each combination of client count and cycle time, and
| the result output is filtered with the 'vespa-fbench-result-filter.pl' script.
| If you want to save the results you should redirect stdout to a file.
| 
|  -o : change the order in which the tests are performed so that client
| 	count varies for each cycle time.
|  -l : output a blank line between test subseries. If -o is not specified this
| 	will output a blank line between test series using different client count.
| 	If -o was specified this will output blank lines between test series
| 	using different cycle time.
| 
| [vespa-fbench options] <hostname> <port>: These arguments are passed to vespa-fbench.
|   There are 2 things to remember: first; do not specify either of the -n
|   or -c options since they will override the values for client count and
|   cycle time generated by this script. secondly; make sure you specify
|   the correct host and port number. See the vespa-fbench usage (run vespa-fbench
|   without parameters) for more info on how to invoke vespa-fbench.

Example: You want to see how well fastserver performs with varying
client count and cycle time. Assume that you have already prepared 200
query files. To test with client count
from 10 to 200 with intervals of 10 clients and cycle time from 0 to
5000 milliseconds with 500 ms intervals you may do the following:

$ bin/runtests.sh 10 200 10 0 5000 500 <host> <port>

The duration of each test run will be 60 seconds (the default). This
may be a little short. You will also get all results written directly
to your console. Say you want to run each test run for 5 minutes and
you want to collect the results in the file 'results.txt'. You may
then do the following:

$ bin/runtests.sh 10 200 10 0 5000 500 -s 300 <host> <port> > result.txt

The '-s 300' option will be given to vespa-fbench causing each test run to
have a duration of 300 seconds = 5 minutes. The standard output is
simply redirected to a file to collect the results for future use.

The perl utility scripts separate.pl and plot.pl may be used to create
graphs using gnuplot.

| usage: separate.pl <sepcol>
| 	 Separate a tabular numeric file into chunks using a blank
| 	 line whenever the value in column 'sepcol' changes.

| usage: plot.pl [-h] [-x] <plotno>
| Plot the contents of 'result.txt'.
| 	  -h      This help
| 	  -x      Output to X11 window (default PS-file 'graph.ps')
| 	  plotno: 1: Response Time Percentiles by NumCli
| 		  2: Rate by NumCli
| 		  3: Response Time Percentiles by Rate

Note that the separate.pl script does the same thing as the -l option
of runtests.sh; it inserts blank lines into the result to let gnuplot
interpret each chunk as a separate dataseries.