Add unit tests and some doc for DynamicThrottlingPolicy

author: Jon Marius Venstad <venstad@gmail.com> 2020-12-02 15:09:25 +0100
committer: Jon Marius Venstad <venstad@gmail.com> 2020-12-02 15:09:25 +0100
commit: ac5041033b8782e2cb8b1f64f880f004c510bea6 (patch)
tree: 45b4236f1d68ed9c4e34ee7fed3c7c1897b86784 /messagebus/src/main
parent: 2461523aca39b5af3b5bafef44e51c7bb13be76b (diff)
1 files changed, 57 insertions, 18 deletions
diff --git a/messagebus/src/main/java/com/yahoo/messagebus/DynamicThrottlePolicy.java b/messagebus/src/main/java/com/yahoo/messagebus/DynamicThrottlePolicy.java
index 514f1234e89..63ac44fca74 100644
--- a/messagebus/src/main/java/com/yahoo/messagebus/DynamicThrottlePolicy.java
+++ b/messagebus/src/main/java/com/yahoo/messagebus/DynamicThrottlePolicy.java
@@ -8,11 +8,43 @@ import java.util.logging.Logger;
 
 /**
  * This is an implementation of the {@link ThrottlePolicy} that offers dynamic limits to the number of pending messages a
- * {@link SourceSession} is allowed to have.
- *
- * <b>NOTE:</b> By context, "pending" is referring to the number of sent messages that have not been replied to yet.
+ * {@link SourceSession} is allowed to have. Pending means the number of sent messages that have not been replied to yet.
+ * <p>
+ * The algorithm works by increasing the number of messages allowed to be pending, the <em>winidow size</em>, until
+ * this no longer increases throughput. At this point, the algorithm is driven by synthetic attraction towards latencies
+ * which satisfy <code>log10(1 / latency) % 1 = e</code>, for some constant <code>0 < e < 1</code>. Weird? Most certainly!
+ * </p><p>
+ * The effect is that the window size the algorithm produces, for a saturated ideal server, has a level for each power
+ * of ten with an attractor the window size tends towards while on this level, determined by the <code>e</code> above.
+ * The {@link #setEfficiencyThreshold} determines the <code>e</code> constant. When <code>e</code> is set so the
+ * attractor is close to the start of the interval, this has an inhibiting effect on the algorithm, and it is basically
+ * reduced to "increase window size until this no longer increases throughput enough that it defeats the random noise".
+ * As the attractor moves towards the end of the intervals, the algorithm becomes increasingly eager in increasing
+ * its window size past what it measures as effective — if moved to the very end of the interval, the algorithm explodes.
+ * The default setting has the attractor at <code>log10(2)</code> of the way from start to end of these levels.
+ * </p><p>
+ * Because the algorithm stops increasing the window size when increases in throughput drown in random variations, the
+ * {@link #setWindowSizeIncrement} directly influences the efficient work domain of the algorithm. With the default
+ * setting of <code>20</code>, it seems to struggle to increase past window sizes of a couple thousand. Using a static
+ * increment (and a backoff factor) seems to be necessary to effectively balance the load different, competing policies
+ * are allowed to produce.
+ * </p><p>
+ * When there are multiple policies that compete against each other, they will increase load until saturating the server.
+ * If keeping all other policies but one fixed, this one policy would still see an increase in throughput with increased
+ * window size, even with a saturated server, as it would be given a greater share of the server's resources. However,
+ * since all algorithms adjust their windows concurrently, they will all reduce the throughput of the other algorithms.
+ * The result is that the commonly see the server as saturated, and commonly reach the behaviour where some increases in
+ * window size lead to measurable throughput gains, while others don't.
+ * </p><p>
+ * Now the weighting ({@link #setWeight} comes into play: with equals weights, two algorithms would have a break-even
+ * between being governed by the attractors (above), which eventually limits window size, and increases due to positive
+ * measurements, at the same point along the window size axis. With smaller weights, i.e., smaller increases to window
+ * size, this break-even occurs where the curve is steeper, i.e., where the client has a smaller share of the server.
+ * Thus, competing algorithms with different weights end up with a resource distribution roughly proportional to weight.
+ * </p>
  *
  * @author Simon Thoresen Hult
+ * @author jonmv
  */
 public class DynamicThrottlePolicy extends StaticThrottlePolicy {
 
@@ -97,30 +129,31 @@ public class DynamicThrottlePolicy extends StaticThrottlePolicy {
         numSent = 0;
         numOk = 0;
 
-
         if (maxThroughput > 0 && throughput > maxThroughput * 0.95) {
             // No need to increase window when we're this close to max.
-        } else if (throughput >= localMaxThroughput) {
+            // TODO jonmv: Not so sure — what if we're too high, and should back off?
+        } else if (throughput > localMaxThroughput) {
             localMaxThroughput = throughput;
-            windowSize += weight*windowSizeIncrement;
+            windowSize += weight * windowSizeIncrement;
             if (log.isLoggable(Level.FINE)) {
                 log.log(Level.FINE, "windowSize " + windowSize + " throughput " + throughput + " local max " + localMaxThroughput);
             }
         } else {
             // scale up/down throughput for comparing to window size
             double period = 1;
-            while(throughput * period/windowSize < 2) {
+            while(throughput * period / windowSize < 2) {
                 period *= 10;
             }
-            while(throughput * period/windowSize > 2) {
+            while(throughput * period / windowSize > 2) {
                 period *= 0.1;
             }
-            double efficiency = throughput*period/windowSize;
+            double efficiency = throughput * period / windowSize; // "efficiency" is a strange name. This is where on the level it is.
+            if (Math.random() < 1e-2) System.err.println(efficiency);
             if (efficiency < efficiencyThreshold) {
                 windowSize = Math.min(windowSize * windowSizeBackOff, windowSize - decrementFactor * windowSizeIncrement);
                 localMaxThroughput = 0;
             } else {
-                windowSize += weight*windowSizeIncrement;
+                windowSize += weight * windowSizeIncrement;
             }
             if (log.isLoggable(Level.FINE)) {
                 log.log(Level.FINE, "windowSize " + windowSize + " throughput " + throughput + " local max " + localMaxThroughput + " efficiency " + efficiency);
@@ -139,6 +172,11 @@ public class DynamicThrottlePolicy extends StaticThrottlePolicy {
     }
 
     /**
+     * Determines where on each latency level the attractor sits. 2 is at the very end, and makes this to *boom*.
+     * 0.2 is at the very start, and makes the algorithm more conservative. Probably fine to stay away from this.
+     */
+    // Original javadoc is non-sense, but kept for historical reasons.
+    /*
      * Sets the lower efficiency threshold at which the algorithm should perform window size back off. Efficiency is
      * the correlation between throughput and window size. The algorithm will increase the window size until efficiency
      * drops below the efficiency of the local maxima times this value.
@@ -177,8 +215,7 @@ public class DynamicThrottlePolicy extends StaticThrottlePolicy {
 
     /**
      * Sets the factor of window size to back off to when the algorithm determines that efficiency is not increasing.
-     * A value of 1 means that there is no back off from the local maxima, and means that the algorithm will fail to
-     * reduce window size to something lower than a previous maxima. This value is capped to the [0, 1] range.
+     * Capped to [0, 1]
      *
      * @param windowSizeBackOff the back off to set
      * @return this, to allow chaining
@@ -190,20 +227,20 @@ public class DynamicThrottlePolicy extends StaticThrottlePolicy {
 
     /**
      * Sets the rate at which the window size is updated. The larger the value, the less responsive the resizing
-     * becomes. However, the smaller the value, the less accurate the measurements become.
+     * becomes. However, the smaller the value, the less accurate the measurements become. Capped to [2, )
      *
      * @param resizeRate the rate to set
      * @return this, to allow chaining
      */
     public DynamicThrottlePolicy setResizeRate(double resizeRate) {
-        this.resizeRate = resizeRate;
+        this.resizeRate = Math.max(2, resizeRate);
         return this;
     }
 
     /**
      * Sets the weight for this client. The larger the value, the more resources
-     * will be allocated to this clients. Resources are shared between clients
-     * proportionally to the set weights.
+     * will be allocated to this clients. Resources are shared between clients roughly
+     * proportionally to the set weights. Must be a positive number.
      *
      * @param weight the weight to set
      * @return this, to allow chaining
@@ -214,7 +251,7 @@ public class DynamicThrottlePolicy extends StaticThrottlePolicy {
     }
 
     /**
-     * Sets the maximium number of pending operations allowed at any time, in
+     * Sets the maximum number of pending operations allowed at any time, in
      * order to avoid using too much resources.
      *
      * @param max the max to set
@@ -236,7 +273,7 @@ public class DynamicThrottlePolicy extends StaticThrottlePolicy {
 
 
     /**
-     * Sets the minimium number of pending operations allowed at any time, in
+     * Sets the minimum number of pending operations allowed at any time, in
      * order to keep a level of performance.
      *
      * @param min the min to set
@@ -273,4 +310,6 @@ public class DynamicThrottlePolicy extends StaticThrottlePolicy {
         return (int) windowSize;
     }
 
+    double getWindowSize() { return windowSize; }
+
 }
author	Jon Marius Venstad <venstad@gmail.com>	2020-12-02 15:09:25 +0100
committer	Jon Marius Venstad <venstad@gmail.com>	2020-12-02 15:09:25 +0100
commit	ac5041033b8782e2cb8b1f64f880f004c510bea6 (patch)
tree	45b4236f1d68ed9c4e34ee7fed3c7c1897b86784 /messagebus/src/main
parent	2461523aca39b5af3b5bafef44e51c7bb13be76b (diff)