Horizontal Pod Autoscaling (HPA)

Introduction

IMPORTANT:

HPA can be enabled on self-managed deployments starting with version L2025.1 SP2.

Horizontal Pod Autoscaling (HPA) is a built-in Kubernetes mechanism that dynamically adjusts the number of pod replicas based on resource utilization, optimizing how available system resources are leveraged to efficiently handle current operations and tasks.

At a high level, the process can be described in the following steps:

Ongoing monitoring of resource consumption.
Evaluation of scaling actions against target metrics.
Scaling action – adjusting the number of pod replicas by scaling up or down.

The core benefit of HPA is its ability to provide automatic scaling for Kubernetes microservices, ensuring optimal resource utilization based on real-time demand. By continuously monitoring resource usage, such as CPU and memory, HPA helps maintain application responsiveness during peak traffic while minimizing resource waste during periods of lower activity. This automated scaling improves application reliability, optimizes infrastructure costs, simplifies management, and enables organizations to efficiently handle fluctuating workloads without manual intervention.

For more information, see:

How HPA Works

Core Components

HorizontalPodAutoscaler - A Kubernetes API object that defines the scaling behavior for a specific workload. It specifies the target resource metrics and the desired range for the number of replicas, including conditions for scaling up or down, as well as minimum and maximum replica limits.
Metrics Provider - The source of metrics used by the system to determine whether pod replicas should be scaled up or down. This can be the native Kubernetes Metrics Adapter (Metrics Server) or a Custom Metrics Adapter, which allows integration with external monitoring systems such as Prometheus.
Controller Manager - Runs the HPA controller, which periodically evaluates current resource utilization against the defined scaling thresholds and adjusts the number of replicas accordingly.

HPA in Sisense is based on CPU consumption as the primary resource metric for scaling decisions.

Supported Services

The following Sisense services support HPA:

ai-integration-hpa
analyticalengine-hpa
api-gateway-hpa
configuration-hpa
exporter-xlsx-hpa
exporting-hpa
galaxy-hpa
identity-hpa
model-graphql-hpa
nlq-compile-hpa
nlq-rt-hpa
oxygen-hpa
pivot2-be-hpa
query-hpa
reporting-hpa
translation-hpa

Note:

Do not create scaling configuration files in services which are not included in the list. They are not certified by Sisense, and can lead to unpredictable behavior.

Configuration

HPA Settings

minReplicaCount - Defines the minimum number of pod replicas that must always be running. Even if workload demand is low, the system will not scale below this value to ensure baseline availability.
maxReplicaCount - Defines the maximum number of pod replicas that HPA can scale up to. This prevents excessive scaling, ensuring that resource usage and costs remain controlled.
stabilizationWindowSeconds - Specifies the cooldown period (in seconds) that HPA waits before applying another scaling action. This prevents rapid fluctuations (scaling up and down too frequently), ensuring stable operations.
pollingInterval - Determines how often (in seconds) HPA checks resource metrics (e.g., CPU, memory) to decide whether scaling is needed. Frequent polling provides more responsive scaling but may increase system overhead.
scaleUpPercent - Defines the CPU utilization threshold that triggers the Horizontal Pod Autoscaler (HPA) to scale up pod replicas.

The following example shows how these settings work together:

Copy

 autoscaling:
    minReplicaCount: 2
    maxReplicaCount: 20
    stabilizationWindowSeconds: 300
    pollingInterval: 30
    scaleUpPercent: 700

The system starts with at least 2 pods and can scale up to 20 pods.
HPA checks resource usage every 30 seconds (pollingInterval).
When the average CPU utilization of pods reaches 700% (0.7 CPU, or 70% of a single CPU core), HPA recognizes increased demand and triggers scaling actions.
Scaling up/down actions will not occur more frequently than every 5 minutes (stabilizationWindowSeconds) to prevent instability.

Sample Configuration

The configuration is mostly common to all supported services and has different values depending on whether it is a single or multi node deployment type.

Copy

## Sisense chart extra values

## Single node sample:
service:
  autoscaling:
    minReplicaCount: 1
    maxReplicaCount: 10
    stabilizationWindowSeconds: 300
    pollingInterval: 30
    scaleUpPercent: 700

## Multi node sample:
service:
  autoscaling:
    minReplicaCount: 2
    maxReplicaCount: 30
    stabilizationWindowSeconds: 300
    pollingInterval: 30
    scaleUpPercent: 700

For the following services, the configuration differs from the common configuration:

Important Note:

It is strongly recommended to use the Sisense default configuration, as it provides certified, optimized, and constantly improved settings for optimal performance.

Copy

exporter-xlsx:
  autoscaling:
    minReplicaCount: 1
    maxReplicaCount: 30
    stabilizationWindowSeconds: 300
    pollingInterval: 30
    scaleUpPercent: 700
nlq-compile:
  autoscaling:
    minReplicaCount: 1
    maxReplicaCount: 20
    stabilizationWindowSeconds: 300
    pollingInterval: 30
    scaleUpPercent: 5600
nlq-rt:
  autoscaling:
    minReplicaCount: 1
    maxReplicaCount: 20
    stabilizationWindowSeconds: 300
    pollingInterval: 30
## depending on the deployment size - small or large (cluster_config.yaml)
    small:
      scaleUpPercent: 2100
    large:
      scaleUpPercent: 3500
query:
  autoscaling:
    minReplicaCount: 1
    maxReplicaCount: 20
    stabilizationWindowSeconds: 300
    pollingInterval: 30
## depending on the deployment size - small or large (cluster_config.yaml)
    large:
      scaleUpPercent: 3500
    small:
      scaleUpPercent: 2100
translation:
  autoscaling:
    minReplicaCount: 1
    maxReplicaCount: 20
    stabilizationWindowSeconds: 300
    pollingInterval: 30
## depending on the deployment size - small or large (cluster_config.yaml)
    large:
      scaleUpPercent: 3500
    small:
      scaleUpPercent: 2100

Enablement

To enable HPA on your deployment:

Enable HPA in the extra values - In the installer/extra_values/installer/installer-values.yaml file, set hpa.enabled to true.
Modify the HPA Settings:

Sisense has a default configuration that is continuously tuned based on performance insights and feedback from all Managed Cloud customers. If the configuration in the deployment file is left empty, the default settings will be applied automatically.

Important Note:

It is strongly recommended to use the Sisense default configuration, as it provides certified, optimized, and constantly improved settings for optimal performance.

However, if you need to override the default configuration, you can do so by updating the /installer/extra_values/helm/sisense.yml file in the extra_values folder for each service.

For example, in Galaxy, you can configure it as follows:
- Single node:
  Copy
```
galaxy:
  autoscaling:
    minReplicaCount: 1
    maxReplicaCount: 10
    stabilizationWindowSeconds: 300
    pollingInterval: 30
    scaleUpPercent: 700
```
- Multi node:
  Copy
```
galaxy:
  autoscaling:
    minReplicaCount: 2
    maxMultiReplicaCount: 30
    stabilizationWindowSeconds: 300
    pollingInterval: 30
    scaleUpPercent: 700
```

Verifying the Deployment

Check that the HPA has been created and is functioning correctly:

Copy

kubectl -n sisense get hpa 
kubectl -n sisense describe hpa <hpa-name>

Copy

[sisense]$ kubectl -n sisense get hpa
NAME                   REFERENCE                     TARGETS               MINPODS   MAXPODS   REPLICAS   AGE
ai-integration-hpa     Deployment/ai-integration     cpu: 3%/700%          2         30        2          47h
analyticalengine-hpa   Deployment/analyticalengine   cpu: 1%/700%          2         30        2          47h
api-gateway-hpa        Deployment/api-gateway        cpu: 5%/700%          2         30        2          47h
configuration-hpa      Deployment/configuration      cpu: 19%/700%         2         30        2          47h
exporter-xlsx-hpa      Deployment/exporter-xlsx      cpu: 33%/700%         1         30        1          47h
exporting-hpa          Deployment/exporting          cpu: 1%/700%          2         30        2          47h
galaxy-hpa             Deployment/galaxy             cpu: 1%/700%          2         30        2          47h
identity-hpa           Deployment/identity           cpu: 1%/700%          2         30        2          47h
model-graphql-hpa      Deployment/model-graphql      cpu: 1%/700%          2         30        2          47h
oxygen-hpa             Deployment/oxygen             cpu: 1%/700%          2         30        2          47h
pivot2-be-hpa          Deployment/pivot2-be          cpu: 1%/700%          2         30        2          47h
query-hpa              Deployment/query              cpu: 2%/2100%         2         30        2          47h
reporting-hpa          Deployment/reporting          cpu: 1%/700%          2         30        2          47h
translation-hpa        Deployment/translation        cpu: 2%/2100%         2         30        2          47h

Tuning Recommendations

A prerequisite for optimization of the existing configuration is establishing reliable means of monitoring. The suggested monitoring tool is Grafana. Sisense provides already preset dashboards that can be used to adjust the configuration. It is also possible to use official HPA Grafana dashboards.

Note:

If the current configuration is not suitable for your needs or you observe unexpected behavior, Sisense recommends that you open a ticket with Support so that the official default configuration can be adjusted. This ensures that certified and supported settings will be used.

Horizontal Pod Autoscaling (HPA)

Introduction

How HPA Works

Core Components

Supported Services

Configuration

HPA Settings

Sample Configuration

Enablement

Verifying the Deployment

Tuning Recommendations

FAQ