- Overview
- Requirements
- Deployment templates
- Manual: Preparing the installation
- Manual: Preparing the installation
- Step 2: Configuring the OCI-compliant registry for offline installations
- Step 3: Configuring the external objectstore
- Step 4: Configuring High Availability Add-on
- Step 5: Configuring SQL databases
- Step 7: Configuring the DNS
- Step 8: Configuring the disks
- Step 9: Configuring kernel and OS level settings
- Step 10: Configuring the node ports
- Step 11: Applying miscellaneous settings
- Step 12: Validating and installing the required RPM packages
- Step 13: Generating cluster_config.json
- Cluster_config.json Sample
- General configuration
- Profile configuration
- Certificate configuration
- Database configuration
- External Objectstore configuration
- Pre-signed URL configuration
- ArgoCD configuration
- Kerberos authentication configuration
- External OCI-compliant registry configuration
- Disaster recovery: Active/Passive and Active/Active configurations
- High Availability Add-on configuration
- Orchestrator-specific configuration
- Insights-specific configuration
- Process Mining-specific configuration
- Document Understanding-specific configuration
- Automation Suite Robots-specific configuration
- AI Center-specific configuration
- Monitoring configuration
- Optional: Configuring the proxy server
- Optional: Enabling resilience to zonal failures in a multi-node HA-ready production cluster
- Optional: Passing custom resolv.conf
- Optional: Increasing fault tolerance
- Adding a dedicated agent node with GPU support
- Adding a Dedicated Agent Node for Automation Suite Robots
- Step 15: Configuring the temporary Docker registry for offline installations
- Step 16: Validating the prerequisites for the installation
- Running uipathctl
- Manual: Performing the installation
- Post-installation
- Cluster administration
- Managing products
- Getting Started with the Cluster Administration portal
- Migrating Redis from in-cluster to external High Availability Add-on
- Migrating data between objectstores
- Migrating in-cluster objectstore to external objectstore
- Migrating from in-cluster registry to an external OCI-compliant registry
- Switching to the secondary cluster manually in an Active/Passive setup
- Disaster Recovery: Performing post-installation operations
- Converting an existing installation to multi-site setup
- Guidelines on upgrading an Active/Passive or Active/Active deployment
- Guidelines on backing up and restoring an Active/Passive or Active/Active deployment
- Scaling a single-node (evaluation) deployment to a multi-node (HA) deployment
- Monitoring and alerting
- Migration and upgrade
- Migrating between Automation Suite clusters
- Upgrading Automation Suite
- Downloading the installation packages and getting all the files on the first server node
- Retrieving the latest applied configuration from the cluster
- Updating the cluster configuration
- Configuring the OCI-compliant registry for offline installations
- Executing the upgrade
- Performing post-upgrade operations
- Product-specific configuration
- Best practices and maintenance
- Troubleshooting
- How to troubleshoot services during installation
- How to reduce permissions for an NFS backup directory
- How to uninstall the cluster
- How to clean up offline artifacts to improve disk space
- How to clear Redis data
- How to enable Istio logging
- How to manually clean up logs
- How to clean up old logs stored in the sf-logs bucket
- How to disable streaming logs for AI Center
- How to debug failed Automation Suite installations
- How to delete images from the old installer after upgrade
- How to disable TX checksum offloading
- How to manually set the ArgoCD log level to Info
- How to expand AI Center storage
- How to generate the encoded pull_secret_value for external registries
- How to address weak ciphers in TLS 1.2
- How to check the TLS version
- How to work with certificates
- How to schedule Ceph backup and restore data
- How to collect DU usage data with in-cluster objectstore (Ceph)
- How to install RKE2 SELinux on air-gapped environments
- How to clean up old differential backups on an NFS server
- Error in downloading the bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- SQL connection string validation error
- Azure disk not marked as SSD
- Failure after certificate update
- Antivirus causes installation issues
- Automation Suite not working after OS upgrade
- Automation Suite requires backlog_wait_time to be set to 0
- Temporary registry installation fails on RHEL 8.9
- Frequent restart issue in uipath namespace deployments during offline installations
- DNS settings not honored by CoreDNS
- Upgrade fails due to unhealthy Ceph
- RKE2 not getting started due to space issue
- Upgrade fails due to classic objects in the Orchestrator database
- Ceph cluster found in a degraded state after side-by-side upgrade
- Service upgrade fails for Apps
- In-place upgrade timeouts
- Upgrade fails in offline environments
- snapshot-controller-crds pod in CrashLoopBackOff state after upgrade
- Upgrade fails due to overridden Insights PVC sizes
- Upgrade failure due to uppercase hostname
- Setting a timeout interval for the management portals
- Authentication not working after migration
- Kinit: Cannot find KDC for realm <AD Domain> while getting initial credentials
- Kinit: Keytab contains no suitable keys for *** while getting initial credentials
- GSSAPI operation failed due to invalid status code
- Alarm received for failed Kerberos-tgt-update job
- SSPI provider: Server not found in Kerberos database
- Login failed for AD user due to disabled account
- ArgoCD login failed
- Update the underlying directory connections
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis probe failure
- RKE2 server fails to start
- Secret not found in UiPath namespace
- ArgoCD goes into progressing state after first installation
- Missing Ceph-rook metrics from monitoring dashboards
- Mismatch in reported errors during diagnostic health checks
- No healthy upstream issue
- Redis startup blocked by antivirus
- Running High Availability with Process Mining
- Process Mining ingestion failed when logged in using Kerberos
- Unable to connect to AutomationSuite_ProcessMining_Warehouse database using a pyodbc format connection string
- Airflow installation fails with sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string ''
- How to add an IP table rule to use SQL Server port 1433
- Automation Suite certificate is not trusted from the server where CData Sync is running
- Running the diagnostics tool
- Using the Automation Suite support bundle
- Exploring Logs
- Exploring summarized telemetry

Automation Suite on Linux installation guide
Kubernetes resources alerts
k8s.rules, kube-apiserver-availability.rules, kube-apiserver-slos
KubeAPIErrorBudgetBurn
The Kubernetes API server is burning too much error budget.
kube-state-metrics
KubeStateMetricsListErrors, KubeStateMetricsWatchErrors
The Kube State Metrics collector is not able to collect metrics from the cluster without errors. This means important alerts may not fire. Contact UiPath® Support.
See also: Kube state metrics at release.
KubernetesMemoryPressure
This alert indicates that memory usage is very high on the Kubernetes node.
The Kubernetes nodes with MemoryPressure incident type occurs when a Kubernetes cluster node is running low on memory, which can be caused by a memory leak in an application. This incident type requires immediate attention to prevent any downtime and ensure the proper functioning of the Kubernetes cluster.
If this alert fires, try to identify the pod on the node that is consuming more memory, by taking these steps:
-
Retrieve the nodes CPU and memory stats:
kubectl top nodekubectl top node -
Retrieve the pods running on the node:
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${NODE_NAME}kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${NODE_NAME} -
Check the memory usage for pods in a namespace using:
kubectl top pod --namespace <namespace> kubectl logs -f <pod-name> -n <ns>kubectl top pod --namespace <namespace> kubectl logs -f <pod-name> -n <ns>
If you are able to identify any pod with high memory usage, check the logs of the pod and look for memory leak errors.
To address the issue, increase the memory spec for the nodes if possible.
If the issue persists, generate the support bundle and contact UiPath® Support.
kubernetes-apps
KubePodCrashLooping
A pod that keeps restarting unexpectedly. This can happen due to an out-of-memory (OOM) error, in which case the limits can be adjusted. Check the pod events with kubectl describe, and logs with kubectl logs to see details on possible crashes. If the issue persists, contact UiPath® Support.
KubePodNotReady
A pod has started, but it is not responding to the health probe with success. This may mean that it is stuck and is not able to serve traffic. You can check pod logs with kubectl logs to see if there is any indication of progress. If the issue persists, contact UiPath® Support.
KubeDeploymentGenerationMismatch, KubeStatefulSetGenerationMismatch
There has been an attempted update to a deployment or statefulset, but it has failed, and a rollback has not yet occurred. Contact UiPath® Support.
KubeDeploymentReplicasMismatch, KubeStatefulSetReplicasMismatch
In high availability clusters with multiple replicas, this alert fires when the number of replicas is not optimal. This may occur when there are not enough resources in the cluster to schedule. Check resource utilization, and add capacity as necessary. Otherwise contact UiPath® Support.
KubeStatefulSetUpdateNotRolledOut
An update to a statefulset has failed. Contact UiPath® Support.
See also: StatefulSets.
KubeDaemonSetRolloutStuck
Daemonset rollout has failed. Contact UiPath® Support.
See also: DaemonSet.
KubeContainerWaiting
A container is stuck in the waiting state. It has been scheduled to a worker node, but it cannot run on that machine. Check kubectl describe of the pod for more information. The most common cause of waiting containers is a failure to pull the image. For air-gapped clusters, this could mean that the local registry is not available. If the issue persists, contact UiPath® Support.
KubeDaemonSetNotScheduled, KubeDaemonSetMisScheduled
This may indicate an issue with one of the nodes Check the health of each node, and remediate any known issues. Otherwise contact UiPath® Support.
KubeJobCompletion
A job takes more than 12 hours to complete. This is not expected. Contact UiPath® Support.
KubeJobFailed
A job has failed; however, most jobs are retried automatically. If the issue persists, contact UiPath® Support.
KubeHpaReplicasMismatch
The autoscaler cannot scale the targeted resource as configured. If desired is higher than actual, then there may be a lack of resources. If desired is lower than actual, pods may be stuck while shutting down. If the issue persists, contact UiPath® Support.
See also: Horizontal Pod Autoscaling
KubeHpaMaxedOut
The number of replicas for a given service has reached its maximum. This happens when the amount of requests being made to the cluster is very high. If high traffic is expected and temporary, you may silence this alert. However, this alert is a sign that the cluster is at capacity and cannot handle much more traffic. If more resource capacity is available on the cluster, you can increase the number of maximum replicas for the service by following these instructions:
# Find the horizontal autoscaler that controls the replicas of the desired resource
kubectl get hpa -A
# Increase the number of max replicas of the desired resource, replacing <namespace> <resource> and <maxReplicas>
kubectl -n <namespace> patch hpa <resource> --patch '{"spec":{"maxReplicas":<maxReplicas>}}'
# Find the horizontal autoscaler that controls the replicas of the desired resource
kubectl get hpa -A
# Increase the number of max replicas of the desired resource, replacing <namespace> <resource> and <maxReplicas>
kubectl -n <namespace> patch hpa <resource> --patch '{"spec":{"maxReplicas":<maxReplicas>}}'
See also: Horizontal Pod Autoscaling.
kubernetes-resources
KubeCPUOvercommit, KubeMemoryOvercommit
These warnings indicate that the cluster cannot tolerate node failure. For single-node evaluation clusters, this is known, and these alerts may be silenced. For multi-node HA-ready production setups, these alerts fire when too many nodes become unhealthy to support high availability, and they indicate that the nodes should be brought back to health or replaced.
KubeCPUQuotaOvercommit, KubeMemoryQuotaOvercommit, KubeQuotaAlmostFull, KubeQuotaFullyUsed, KubeQuotaExceeded
These alerts pertain to namespace resource quotas that only exist in the cluster if added through customization. Namespace resource quotas are not added as part of Automation Suite installation.
See also: Resource Quotas.
AggregatedAPIErrors, AggregatedAPIDown, KubeAPIDown, KubeAPITerminatedRequests
Indicates problems with the Kubernetes control plane. Check the health of master nodes, resolve any outstanding issues, and contact UiPath® Support if the issues persist.
See also:
kubernetes-system-kubelet
KubeNodeNotReady, KubeNodeUnreachable, KubeNodeReadinessFlapping, KubeletPlegDurationHigh, KubeletPodStartUpLatencyHigh, KubeletDown
These alerts indicate a problem with a node. In multi-node HA-ready production clusters, pods would likely be rescheduled onto other nodes. If the issue persists, you should remove and drain the node to maintain the health of the cluster. In clusters without extra capacity, another node should be joined to the cluster first.
If the issues persist, contact UiPath® Support.
KubeletTooManyPods
There are too many pods running on the specified node.
Join another node to the cluster.
kubernetes-system
KubeVersionMismatch
There are different semantic versions of Kubernetes components running. This can happen as a result of an unsuccessful Kubernetes upgrade.
KubeClientErrors
Kubernetes API server client is experiencing greater than 1% errors. There may be an issue with the node this client is running on, or the Kubernetes API server itself.
etdc Alerts
EtcdInsufficientMembers
This alert indicates that the etcd cluster has an insufficient number of members. Note that the cluster must have an odd number of members. The severity of this alert is critical.
Make sure that there is an odd number of server nodes in the cluster, and all of them are up and healthy.
EtcdNoLeader
This alert shows that the etcd cluster has no leader. The severity of this alert is critical.
EtcdHighNumberOfLeaderChanges
This alert indicates that the etcd leader changes more than twice in 10 minutes. This is a warning.
EtcdHighNumberOfFailedGrpcRequests
This alert indicates that a certain percentage of GRPC request failures was detected in etcd.
EtcdGrpcRequestsSlow
This alert indicates that etcd GRPC requests are slow. This is a warning.
If this alert persists, contact UiPath® support.
EtcdHighNumberOfFailedHttpRequests
This alert indicates that a certain percentage of HTTP failures was detected in etcd.
EtcdHttpRequestsSlow
This alert indicates that HTTP requests are slowing down. This is a warning.
EtcdMemberCommunicationSlow
This alert indicates that etcd member communication is slowing down. This is a warning.
EtcdHighNumberOfFailedProposals
This alert indicates that the etcd server received more than 5 failed proposals in the last hour. This is a warning.
EtcdHighFsyncDurations
This alert indicates that etcd WAL fsync duration is increasing. This is a warning.
EtcdHighCommitDurations
This alert indicates that etcd commit duration is increasing. This is a warning.
kube-api
KubernetesApiServerErrors
This alert indicates that the Kubernetes API server is experiencing a high error rate. This issue could lead to other failures, so it is recommended that you investigate the problem proactively.
Check logs for the api-server pod to find out the root cause of the issue using the kubectl logs <pod-name> -n kube-system command.
- k8s.rules, kube-apiserver-availability.rules, kube-apiserver-slos
- KubeAPIErrorBudgetBurn
- kube-state-metrics
- KubeStateMetricsListErrors, KubeStateMetricsWatchErrors
- KubernetesMemoryPressure
- kubernetes-apps
- KubePodCrashLooping
- KubePodNotReady
- KubeDeploymentGenerationMismatch, KubeStatefulSetGenerationMismatch
- KubeDeploymentReplicasMismatch, KubeStatefulSetReplicasMismatch
- KubeStatefulSetUpdateNotRolledOut
- KubeDaemonSetRolloutStuck
- KubeContainerWaiting
- KubeDaemonSetNotScheduled, KubeDaemonSetMisScheduled
- KubeJobCompletion
- KubeJobFailed
- KubeHpaReplicasMismatch
- KubeHpaMaxedOut
- kubernetes-resources
- KubeCPUOvercommit, KubeMemoryOvercommit
- KubeCPUQuotaOvercommit, KubeMemoryQuotaOvercommit, KubeQuotaAlmostFull, KubeQuotaFullyUsed, KubeQuotaExceeded
- AggregatedAPIErrors, AggregatedAPIDown, KubeAPIDown, KubeAPITerminatedRequests
- kubernetes-system-kubelet
- KubeNodeNotReady, KubeNodeUnreachable, KubeNodeReadinessFlapping, KubeletPlegDurationHigh, KubeletPodStartUpLatencyHigh, KubeletDown
- KubeletTooManyPods
- kubernetes-system
- KubeVersionMismatch
- KubeClientErrors
- etdc Alerts
- EtcdInsufficientMembers
- EtcdNoLeader
- EtcdHighNumberOfLeaderChanges
- EtcdHighNumberOfFailedGrpcRequests
- EtcdGrpcRequestsSlow
- EtcdHighNumberOfFailedHttpRequests
- EtcdHttpRequestsSlow
- EtcdMemberCommunicationSlow
- EtcdHighNumberOfFailedProposals
- EtcdHighFsyncDurations
- EtcdHighCommitDurations
- kube-api
- KubernetesApiServerErrors