Implement failover load balancing strategy

As per the supported load balancing strategies in the [initial design](https://github.com/AbsaOSS/ohmyglb/blob/master/docs/index.md#load-balancing-strategies) a _failover_ strategy should be implemented to ensure the guarantees stated:

> **Failover** - Pinned to a specified primary cluster until that cluster has no available Pods, upon which the next available cluster's Ingress node IPs will be resolved. When Pods are again available on the primary cluster, the primary cluster will once again be the only eligible cluster for which cluster Ingress node IPs will be resolved

Scenario 1:

* Given 2 separate Kubernetes clusters, X, and Y
* Each cluster has a _healthy_ `Deployment` with a backend `Service` called `app` and that backend service exposed with a `Gslb` resource on all 2 clusters as:

```yaml
apiVersion: ohmyglb.absa.oss/v1beta1
kind: Gslb
metadata:
  name: app-gslb
  namespace: test-gslb
spec:
  ingress:
    rules:
      - host: app.cloud.example.com
        http:
          paths:
            - backend:
                serviceName: app
                servicePort: http
              path: /
  strategy: failover 
    primary: cluster-x
```
* Each cluster has _one_ worker node that accepts Ingress traffic. The worker node in each cluster has the following name and IP:

```
cluster-x-worker-1: 10.0.1.10
cluster-y-worker-1: 10.1.1.11
```

When issuing the following command, `curl -v http://app.cloud.example.com`, I would expect the IP's resolved to reflect as follows (if this command was executed 3 times consecutively):

```
$ curl -v http://app.cloud.example.com # execution 1
*   Trying 10.0.1.10...
...

$ curl -v http://app.cloud.example.com # execution 2
*   Trying 10.0.1.10...
...

$ curl -v http://app.cloud.example.com # execution 3
*   Trying 10.0.1.10...
...
```

The resolved node IP's that ingress traffic will be sent should be "pinned" to the `primary` cluster named explicitly in the `Gslb` resource above, even though there was a healthy `Deployment` in cluster Y, the Ingress node IPs for cluster Y would _not_ be resolved.

Scenario 2:

* Same configuration as Scenario 1 except that the `Deployment` only has _healthy Pods on one cluster_, cluster Y. I.e. The `Deployment` on cluster X has _no healthy Pods_.

When issuing the following command, `curl -v http://app.cloud.example.com`, I would expect the IP's resolved to reflect as follows (if this command was executed 3 times consecutively):

```
$ curl -v http://app.cloud.example.com # execution 1
*   Trying 10.1.1.11...
...

$ curl -v http://app.cloud.example.com # execution 2
*   Trying 10.1.1.11...
...

$ curl -v http://app.cloud.example.com # execution 3
*   Trying 10.1.1.11...
...
```

In this scenario, only Ingress node IPs for cluster Y are resolved given that there is not a healthy `Deployment` for the `Gslb` host on the `primary` cluster, cluster X. Therefore, the "failover" cluster(s) are resolved instead (cluster Y in this scenario).

Now, given that the `Deployment` on cluster X (the primary cluster) now becomes _healthy_ once again, I would expect the IP's resolved to reflect as follows (if this command was executed 2 times consecutively):

```
$ curl -v http://app.cloud.example.com # execution 1
*   Trying 10.0.1.10...
...

$ curl -v http://app.cloud.example.com # execution 2
*   Trying 10.0.1.10...
...
``` 

The primary cluster's Ingress node IPs are now resolved exclusively once again.

NOTE:

* The design of the specification around how to indicate the primary cluster as described in this issue is solely for the purpose of describing the scenario. It should not be considered a design.
* The existence of _multiple_ "secondary" failover clusters should also be considered. For example, if there were 3 clusters (X, Y and Z) in the scenario 2 above, could the Ingress node IPs for both clusters (X and Z) be resolved and if so, how (in terms of "load balancing") would the Ingress node IPs across both those secondary/failover clusters be resolved? Would they use the default round robin strategy, if any strategy at all?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement failover load balancing strategy #46

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development