vespa-fbench - fastserver benchmarking program ---------------------------------------------- 1 Installing vespa-fbench ------------------------- vespa-fbench is distributed together with Vespa in the published RPM and Docker image. Using the pre-built vespa-fbench is preferred, but if you have to compile it yourself consult the README.md in https://github.com/vespa-engine/vespa. Installing on Vespa CentOS / RHEL 7: yum-config-manager --add-repo \ https://copr.fedorainfracloud.org/coprs/g/vespa/vespa/repo/epel-7/group_vespa-vespa-epel-7.repo yum -y install epel-release centos-release-scl yum -y install vespa The above installation provides the follwing vespa-fbench executables: /opt/vespa/bin/vespa-fbench /opt/vespa/bin/vespa-fbench-filter-file /opt/vespa/bin/vespa-fbench-geturl /opt/vespa/bin/vespa-fbench-split-file It is also possible to use Docker to directly execute vespa-fbench by using the pre-built Vespa docker image: docker run --entrypoint /opt/vespa/bin/vespa-fbench \ docker.io/vespaengine/vespa 2 Benchmark overview -------------------- vespa-fbench measures the performance of the server by running a number of clients that send requests to the server in parallel. Each client has its own input file containing urls that should be requested from the server. When benchmarking fastserver, the urls contained in these files correspond to searches. Before you may start benchmarking you must collect the query urls to be used and distribute them into a number of files depending on how many clients you are planning to run in parallel. The most realistic results are obtained by using access logs collected by fastserver itself from actual usage (AllTheWeb is a good place to look for such logs). You should always collect enough query urls to perform a single test run without having to reuse any queries. 3 Preparing the test data ------------------------- This step assumes you have obtained some fastserver access log files. The first step is to extract the query urls from the logs. This is done with the 'vespa-fbench-filter-file' program. | usage: vespa-fbench-filter-file [-a] [-h] | | Read concatenated fastserver logs from stdin and write | extracted query urls to stdout. | | -a : all parameters to the original query urls are preserved. | If the -a switch is not given, only 'query' and 'type' | parameters are kept in the extracted query urls. | -h : print this usage information. You then need to split the query urls into a number of files. This is done with the 'splitfile' program. | usage: splitfile [-p pattern] [] | | -p pattern : output name pattern ['query%03d.txt'] | : number of output files to generate. | | Reads from (stdin if is not given) and | randomly distributes each line between output | files. The names of the output files are generated by | combining the with sequential numbers using | the sprintf function. Since each parallel client should have its own file, you should split the query urls into at least as many files as the number of clients you are planning to run. Example: the file 'logs' contains access logs from fastserver. You want to extract the query urls from it and save the query urls into 200 separate files (because you are planning to run 200 clients when benchmarking). You may do the following: $ cat logs | bin/vespa-fbench-filter-file | bin/splitfile 200 This will create 200 files with names 'query000.txt', 'query001.txt', 'query002.txt' etc. You may control the filename pattern of the output files by using the -p switch with the 'splitfile' program. 4 Running a single test ----------------------- You are now ready to begin benchmarking. The actual benchmarking is done with the vespa-fbench program. vespa-fbench usage information ([] are used to mark optional parameters and default values): | usage: vespa-fbench [-n numClients] [-c cycleTime] [-l limit] [-i ignoreCount] | [-s seconds] [-q queryFilePattern] [-o outputFilePattern] | [-r restartLimit] [-k] | | -n : run with parallel clients [10] | -c : each client will make a request each milliseconds [0] | ('-1' -> cycle time should be twice the response time) | -l : minimum response size for successful requests [0] | -i : do not log the first results. -1 means no logging [0] | -s : run the test for seconds. -1 means forever [60] | -q : pattern defining input query files ['query%03d.txt'] | (the pattern is used with sprintf to generate filenames) | -o : save query results to output files with the given pattern | (default is not saving.) | -r : number of times to re-use each query file. -1 means no limit [-1] | -k : disable HTTP keep-alive. | | : the host you want to benchmark. | : the port to use when contacting the host. The only mandatory parameters are the hostname and the port of the server you want to benchmark. Example: You want to test just how well fastserver does under massive preassure by letting 200 clients search continuously as fast as they can (they issue new queries immediately after the results from the previous query are obtained). Assuming you have at least 200 query files with default filename pattern you may do the following: $ bin/vespa-fbench -n 200 -c 0 This will run the test over a period of 60 seconds. Use the -s option to change the duration of the test. Example: You want to manually observe fastserver with a certain amount of load. You may use vespa-fbench to produce 'background noise' by using the -s option with argument 0, like this: $ bin/vespa-fbench -n 50 -c 1000 -s 0 This will start 50 clients that ask at most 1 query per second each, giving a maximum load of 50 queries per second if the server allows it. This test run will run forever due to the '-s 0' option given. 5 Understanding Benchmarking Results ------------------------------------ After a test run has completed, vespa-fbench outputs various test results. This section will explain what each of these numbers mean. 'connection reuse count' This value indicates how many times HTTP connections were reused to issue another request. Note that this number will only be displayed if the -k switch (disable HTTP keep-alive) is not used. 'clients' Echo of the -n parameter. 'cycle time' Echo of the -c parameter. 'lower response limit' Echo of the -l parameter. 'skipped requests' Number of requests that was skipped by vespa-fbench. vespa-fbench will typically skip a request if the line containing the query url exceeds a pre-defined limit. Skipped requests will have minimal impact on the statistical results. 'failed requests' The number of failed requests. A request will be marked as failed if en error occurred while reading the result or if the result contained less bytes than 'lower response limit'. 'successful requests' Number of successful requests. Each performed request is counted as either successful or failed. Skipped requests (see above) are not performed and therefore not counted. 'cycles not held' Number of cycles not held. The cycle time is specified with the -c parameter. It defines how often a client should perform a new request. However, a client may not perform another request before the result from the previous request has been obtained. Whenever a client is unable to initiate a new request 'on time' due to not being finished with the previous request, this value will be increased. 'minimum response time' The minimum response time. The response time is measured as the time period from just before the request is sent to the server, till the result is obtained from the server. 'maximum response time' The maximum response time. The response time is measured as the time period from just before the request is sent to the server, till the result is obtained from the server. 'average response time' The average response time. The response time is measured as the time period from just before the request is sent to the server, till the result is obtained from the server. 'X percentile' The X percentile of the response time samples; a value selected such that X percent of the response time samples are below this value. In order to calculate percentiles, a histogram of response times is maintained for each client at runtime and merged after the test run ends. If a percentile value exceeds the upper bound of this histogram, it will be approximated (and thus less accurate) and marked with '(approx)'. 'max query rate' The cycle time tells each client how often it should perform a request. If a client is not able to perform a new request on time due to a previous outstanding request, this increases the overtime counter, and the client will preform the next request as soon as the previous one is completed. The opposite may also happen; a request may complete in less than the cycle time. In this case the client will wait the remaining time of the cycle before performing another request. The max query rate is an extrapolated value indicating what the query rate would be if no client would wait for the completion of cycles, and that the average response time would not increase. NOTE: This number is only supplied as an inverse of the average response time and should NEVER be used to describe the query rate of a server. 'actual query rate' The average number of queries per second; QPS. 'utilization' The percentage of time used waiting for the server to complete (successful) requests. Note that if a request fails, the utilization will drop since the client has 'wasted' the time spent on the failed request.