Enable TiDB Cluster Auto-scaling in Kubernetes
TiDB has implemented an auto-scaling mechanism since v5.0 based on Kubernetes. You can enable the auto-scaling feature in TiDB Operator. This document introduces how to enable and use the auto-scaling feature for a TiDB cluster in Kubernetes.
Warning
- The auto-scaling feature is in the alpha stage. It is highly not recommended to enable this feature in the critical production environment.
- It is recommended to try this feature in a test environment. PingCAP welcomes your comments and suggestions to help improve this feature.
- Currently, the auto-scaling feature is based solely on CPU utilization.
Enable the auto-scaling feature​
The auto-scaling feature is disabled by default. To enable this feature, perform the following steps:
Edit the
values.yaml
file of TiDB Operator and enableAutoScaling
in thefeatures
option:features:
- AutoScaling=trueUpdate TiDB Operator to make the configuration take effect.
helm upgrade tidb-operator pingcap/tidb-operator --version=${chart_version} --namespace=tidb-admin -f ${HOME}/tidb-operator/values-tidb-operator.yaml
Confirm the resource configuration of the target TiDB cluster.
Before using the auto-scaling feature on the target TiDB cluster, you need to configure the CPU setting of the corresponding components. For example, you need to configure
spec.tikv.requests.cpu
in TiKV:spec:
tikv:
requests:
cpu: "1"
tidb:
requests:
cpu: "1"
Define the auto-scaling behavior​
To define the auto-scaling behavior in the TiDB cluster, configure the TidbClusterAutoScaler
CR object. The following is an example:
apiVersion: pingcap.com/v1alpha1
kind: TidbClusterAutoScaler
metadata:
name: auto-scaling-demo
spec:
cluster:
name: auto-scaling-demo
tikv:
resources:
storage_small:
cpu: 1000m
memory: 2Gi
storage: 10Gi
count: 3
rules:
cpu:
max_threshold: 0.8
min_threshold: 0.2
resource_types:
- storage_small
scaleInIntervalSeconds: 500
scaleOutIntervalSeconds: 300
tidb:
resources:
compute_small:
cpu: 1000m
memory: 2Gi
count: 3
rules:
cpu:
max_threshold: 0.8
min_threshold: 0.2
resource_types:
- compute_small
Implementation principles​
According to the configuration of the TidbClusterAutoScaler
CR, TiDB Operator sends requests to PD to query the result of scaling. Based on the result, TiDB Operator makes use of the heterogeneous cluster feature to create, update, or delete the heterogeneous TiDB cluster (only the TiDB component or the TiKV component is configured). In this way, the auto-scaling of the TiDB cluster is achieved.
Related fields​
spec.cluster
: the TiDB cluster to be elastically scheduled.name
: the name of the TiDB cluster.namespace
: the namespace of the TiDB cluster. If not configured, this field is set to the same namespace as theTidbClusterAutoScaler
CR by default.
spec.tikv
: the configuration related to TiKV elastic scheduling.spec.tikv.resources
: the resource types that TiKV can use for elastic scheduling. If not configured, this field is set to the same value asspec.tikv.requests
in theTidbCluster
CR corresponding tospec.cluster
.cpu
: CPU configuration.memory
: memory configuration.storage
: storage configuration.count
: the number of resources that the current configuration can use. If this field is not configured, there is no limit on resources.
spec.tikv.rules
: the rules of TiKV elastic scheduling. Currently only CPU-based rules are supported.max_threshold
: If the average CPU utilization of all Pods is higher thanmax_threshold
, the scaling-out operation is triggered.min_threshold
: If the average CPU utilization of all Pods is lower thanmin_threshold
, the scaling-in operation is triggered.resource_types
: the resource types that can be used for CPU-based elastic scheduling. This field corresponds tokey
inspec.tikv.resources[]
. If not configured, this field is set to allkey
s inspec.tikv.resources[]
by default.
spec.tikv.scaleInIntervalSeconds
: the interval between this scaling-in operation and the last scaling in/out operation. If not configured, the field is set to500
by default, which means 500 seconds.spec.tikv.scaleOutIntervalSeconds
: the interval between this scaling-out operation and the last scaling in/out operation. If not configured, the field is set to300
by default, which means 300 seconds.spec.tidb
: the configuration related to TiDB elastic scheduling. Other fields are the same asspec.tikv
.
For more information about configuration fields, refer to API references.
Example​
Run the following commands to quickly deploy a TiDB cluster with 3 PD instances, 3 TiKV instances, 2 TiDB instances, and the monitoring and the auto-scaling features.
kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/master/examples/auto-scale/tidb-cluster.yaml -n ${namespace}
kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/master/examples/auto-scale/tidb-monitor.yaml -n ${namespace}
kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/master/examples/auto-scale/tidb-cluster-auto-scaler.yaml -n ${namespace}
Prepare data using sysbench.
Copy the following content and paste it to the local
sysbench.config
file:mysql-host=${tidb_service_ip}
mysql-port=4000
mysql-user=root
mysql-password=
mysql-db=test
time=120
threads=20
report-interval=5
db-driver=mysqlPrepare data by running the following command:
sysbench --config-file=${path}/sysbench.config oltp_point_select --tables=1 --table-size=20000 prepare
Start the stress test:
sysbench --config-file=${path}/sysbench.config oltp_point_select --tables=1 --table-size=20000 run
The command above will return the following result:
Initializing worker threads...
Threads started!
[ 5s ] thds: 20 tps: 37686.35 qps: 37686.35 (r/w/o: 37686.35/0.00/0.00) lat (ms,95%): 0.99 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 20 tps: 38487.20 qps: 38487.20 (r/w/o: 38487.20/0.00/0.00) lat (ms,95%): 0.95 err/s: 0.00 reconn/s: 0.00Create a new terminal session and view the Pod changing status of the TiDB cluster by running the following command:
watch -n1 "kubectl -n ${namespace} get pod"
The output is as follows:
auto-scaling-demo-discovery-fbd95b679-f4cb9 1/1 Running 0 17m
auto-scaling-demo-monitor-6857c58564-ftkp4 3/3 Running 0 17m
auto-scaling-demo-pd-0 1/1 Running 0 17m
auto-scaling-demo-tidb-0 2/2 Running 0 15m
auto-scaling-demo-tidb-1 2/2 Running 0 15m
auto-scaling-demo-tikv-0 1/1 Running 0 15m
auto-scaling-demo-tikv-1 1/1 Running 0 15m
auto-scaling-demo-tikv-2 1/1 Running 0 15mView the changing status of Pods and the TPS and QPS of sysbench. When new Pods are created in TiKV and TiDB, the TPS and QPS of sysbench increase significantly.
After sysbench finishes the test, the newly created Pods in TiKV and TiDB disappear automatically.
Destroy the environment by running the following commands:
kubectl delete tidbcluster auto-scaling-demo -n ${namespace}
kubectl delete tidbmonitor auto-scaling-demo -n ${namespace}
kubectl delete tidbclusterautoscaler auto-scaling-demo -n ${namespace}