vespa-fbench - fastserver benchmarking program ---------------------------------------------- 1 Installing vespa-fbench ------------------------- vespa-fbench is distributed together with Vespa in the published RPM and Docker image. Using the pre-built vespa-fbench is preferred, but if you have to compile it yourself consult the README.md in https://github.com/vespa-engine/vespa. Installing on Vespa CentOS / RHEL 7: yum-config-manager --add-repo \ https://copr.fedorainfracloud.org/coprs/g/vespa/vespa/repo/epel-7/group_vespa-vespa-epel-7.repo yum -y install epel-release centos-release-scl yum -y install vespa The above installation provides the follwing vespa-fbench executables: /opt/vespa/bin/vespa-fbench /opt/vespa/bin/vespa-fbench-filter-file /opt/vespa/bin/vespa-fbench-geturl /opt/vespa/bin/vespa-fbench-result-filter.pl /opt/vespa/bin/vespa-fbench-split-file Additional utilities referenced in this document can be fetched from https://github.com/vespa-engine/vespa/tree/master/fbench/util: plot.pl pretest.sh runtests.sh separate.pl It is also possible to use Docker to directly execute vespa-fbench by using the pre-built Vespa docker image: docker run --entrypoint /opt/vespa/bin/vespa-fbench \ docker.io/vespaengine/vespa 2 Benchmark overview -------------------- vespa-fbench measures the performance of the server by running a number of clients that send requests to the server in parallel. Each client has its own input file containing urls that should be requested from the server. When benchmarking fastserver, the urls contained in these files correspond to searches. Before you may start benchmarking you must collect the query urls to be used and distribute them into a number of files depending on how many clients you are planning to run in parallel. The most realistic results are obtained by using access logs collected by fastserver itself from actual usage (AllTheWeb is a good place to look for such logs). You should always collect enough query urls to perform a single test run without having to reuse any queries. 3 Preparing the test data ------------------------- This step assumes you have obtained some fastserver access log files. The first step is to extract the query urls from the logs. This is done with the 'vespa-fbench-filter-file' program. | usage: vespa-fbench-filter-file [-a] [-h] | | Read concatenated fastserver logs from stdin and write | extracted query urls to stdout. | | -a : all parameters to the original query urls are preserved. | If the -a switch is not given, only 'query' and 'type' | parameters are kept in the extracted query urls. | -h : print this usage information. You then need to split the query urls into a number of files. This is done with the 'splitfile' program. | usage: splitfile [-p pattern] [] | | -p pattern : output name pattern ['query%03d.txt'] | : number of output files to generate. | | Reads from (stdin if is not given) and | randomly distributes each line between output | files. The names of the output files are generated by | combining the with sequential numbers using | the sprintf function. Since each parallel client should have its own file, you should split the query urls into at least as many files as the number of clients you are planning to run. Example: the file 'logs' contains access logs from fastserver. You want to extract the query urls from it and save the query urls into 200 separate files (because you are planning to run 200 clients when benchmarking). You may do the following: $ cat logs | bin/vespa-fbench-filter-file | bin/splitfile 200 This will create 200 files with names 'query000.txt', 'query001.txt', 'query002.txt' etc. You may control the filename pattern of the output files by using the -p switch with the 'splitfile' program. 4 Running a single test ----------------------- You are now ready to begin benchmarking. The actual benchmarking is done with the vespa-fbench program. vespa-fbench usage information ([] are used to mark optional parameters and default values): | usage: vespa-fbench [-n numClients] [-c cycleTime] [-l limit] [-i ignoreCount] | [-s seconds] [-q queryFilePattern] [-o outputFilePattern] | [-r restartLimit] [-k] | | -n : run with parallel clients [10] | -c : each client will make a request each milliseconds [1000] | ('-1' -> cycle time should be twice the response time) | -l : minimum response size for successful requests [0] | -i : do not log the first results. -1 means no logging [0] | -s : run the test for seconds. -1 means forever [60] | -q : pattern defining input query files ['query%03d.txt'] | (the pattern is used with sprintf to generate filenames) | -o : save query results to output files with the given pattern | (default is not saving.) | -r : number of times to re-use each query file. -1 means no limit [-1] | -k : disable HTTP keep-alive. | | : the host you want to benchmark. | : the port to use when contacting the host. The only mandatory parameters are the hostname and the port of the server you want to benchmark. Example: You want to test just how well fastserver does under massive preassure by letting 200 clients search continuously as fast as they can (they issue new queries immediately after the results from the previous query are obtained). Assuming you have at least 200 query files with default filename pattern you may do the following: $ bin/vespa-fbench -n 200 -c 0 This will run the test over a period of 60 seconds. Use the -s option to change the duration of the test. Example: You want to manually observe fastserver with a certain amount of load. You may use vespa-fbench to produce 'background noise' by using the -s option with argument 0, like this: $ bin/vespa-fbench -n 50 -c 1000 -s 0 This will start 50 clients that ask at most 1 query per second each, giving a maximum load of 50 queries per second if the server allows it. This test run will run forever due to the '-s 0' option given. 5 Understanding Benchmarking Results ------------------------------------ After a test run has completed, vespa-fbench outputs various test results. This section will explain what each of these numbers mean. 'connection reuse count' This value indicates how many times HTTP connections were reused to issue another request. Note that this number will only be displayed if the -k switch (disable HTTP keep-alive) is not used. 'clients' Echo of the -n parameter. 'cycle time' Echo of the -c parameter. 'lower response limit' Echo of the -l parameter. 'skipped requests' Number of requests that was skipped by vespa-fbench. vespa-fbench will typically skip a request if the line containing the query url exceeds a pre-defined limit. Skipped requests will have minimal impact on the statistical results. 'failed requests' The number of failed requests. A request will be marked as failed if en error occurred while reading the result or if the result contained less bytes than 'lower response limit'. 'successful requests' Number of successful requests. Each performed request is counted as either successful or failed. Skipped requests (see above) are not performed and therefore not counted. 'cycles not held' Number of cycles not held. The cycle time is specified with the -c parameter. It defines how often a client should perform a new request. However, a client may not perform another request before the result from the previous request has been obtained. Whenever a client is unable to initiate a new request 'on time' due to not being finished with the previous request, this value will be increased. 'minimum response time' The minimum response time. The response time is measured as the time period from just before the request is sent to the server, till the result is obtained from the server. 'maximum response time' The maximum response time. The response time is measured as the time period from just before the request is sent to the server, till the result is obtained from the server. 'average response time' The average response time. The response time is measured as the time period from just before the request is sent to the server, till the result is obtained from the server. 'X percentile' The X percentile of the response time samples; a value selected such that X percent of the response time samples are below this value. In order to calculate percentiles, a histogram of response times is maintained for each client at runtime and merged after the test run ends. If a percentile value exceeds the upper bound of this histogram, it will be approximated (and thus less accurate) and marked with '(approx)'. 'max query rate' The cycle time tells each client how often it should perform a request. If a client is not able to perform a new request on time due to a previous outstanding request, this increases the overtime counter, and the client will preform the next request as soon as the previous one is completed. The opposite may also happen; a request may complete in less than the cycle time. In this case the client will wait the remaining time of the cycle before performing another request. The max query rate is an extrapolated value indicating what the query rate would be if no client would wait for the completion of cycles, and that the average response time would not increase. NOTE: This number is only supplied as an inverse of the average response time and should NEVER be used to describe the query rate of a server. 'actual query rate' The average number of queries per second; QPS. 'utilization' The percentage of time used waiting for the server to complete (successful) requests. Note that if a request fails, the utilization will drop since the client has 'wasted' the time spent on the failed request. 6 Running test series --------------------- For more complete benchmarking you will want to combine the results from several test runs and present them together in a graph or maybe a spreadsheet. The perl script vespa-fbench-result-filter.pl may be used to convert the output from vespa-fbench into a single line of numbers. Lines of numbers produced from several test runs may then be concatenated into the same text file and used to plot a graph with gnuplot or imported into an application accepting structured text files (like Excel). The task described above is performed by the runtests.sh script. It runs vespa-fbench several times with varying client count and cycle time. Between each test run, the script pretest.sh (located in the bin directory) is run. The pretest.sh script should make sure that the server you want to benchmark is in the same state before each of the test runs. This typically means that the caches should be cleared. The supplied pretest.sh file does nothing, and should therefore be modified to fit your needs before you start benchmarking with the runtests.sh script. NOTE: 'runtests.sh' must be run from the vespa-fbench install directory in order to find the scripts and programs it depends on. (vespa-fbench is run as 'bin/vespa-fbench' etc.). | usage: runtests.sh [-o] [-l] | [vespa-fbench options] | | The number of clients varies from to with | increments. For each client count, the cycle time will | vary in the same way according to , and . | vespa-fbench is run with each combination of client count and cycle time, and | the result output is filtered with the 'vespa-fbench-result-filter.pl' script. | If you want to save the results you should redirect stdout to a file. | | -o : change the order in which the tests are performed so that client | count varies for each cycle time. | -l : output a blank line between test subseries. If -o is not specified this | will output a blank line between test series using different client count. | If -o was specified this will output blank lines between test series | using different cycle time. | | [vespa-fbench options] : These arguments are passed to vespa-fbench. | There are 2 things to remember: first; do not specify either of the -n | or -c options since they will override the values for client count and | cycle time generated by this script. secondly; make sure you specify | the correct host and port number. See the vespa-fbench usage (run vespa-fbench | without parameters) for more info on how to invoke vespa-fbench. Example: You want to see how well fastserver performs with varying client count and cycle time. Assume that you have already prepared 200 query files. To test with client count from 10 to 200 with intervals of 10 clients and cycle time from 0 to 5000 milliseconds with 500 ms intervals you may do the following: $ bin/runtests.sh 10 200 10 0 5000 500 The duration of each test run will be 60 seconds (the default). This may be a little short. You will also get all results written directly to your console. Say you want to run each test run for 5 minutes and you want to collect the results in the file 'results.txt'. You may then do the following: $ bin/runtests.sh 10 200 10 0 5000 500 -s 300 > result.txt The '-s 300' option will be given to vespa-fbench causing each test run to have a duration of 300 seconds = 5 minutes. The standard output is simply redirected to a file to collect the results for future use. The perl utility scripts separate.pl and plot.pl may be used to create graphs using gnuplot. | usage: separate.pl | Separate a tabular numeric file into chunks using a blank | line whenever the value in column 'sepcol' changes. | usage: plot.pl [-h] [-x] | Plot the contents of 'result.txt'. | -h This help | -x Output to X11 window (default PS-file 'graph.ps') | plotno: 1: Response Time Percentiles by NumCli | 2: Rate by NumCli | 3: Response Time Percentiles by Rate Note that the separate.pl script does the same thing as the -l option of runtests.sh; it inserts blank lines into the result to let gnuplot interpret each chunk as a separate dataseries.