- Overview
- Requirements
- Deployment templates
- Manual: Preparing the installation
- Manual: Preparing the installation
- Step 2: Configuring the OCI-compliant registry for offline installations
- Step 3: Configuring the external objectstore
- Step 4: Configuring High Availability Add-on
- Step 5: Configuring SQL databases
- Step 7: Configuring the DNS
- Step 8: Configuring the disks
- Step 9: Configuring kernel and OS level settings
- Step 10: Configuring the node ports
- Step 11: Applying miscellaneous settings
- Step 12: Validating and installing the required RPM packages
- Step 13: Generating cluster_config.json
- Cluster_config.json Sample
- General configuration
- Profile configuration
- Certificate configuration
- Database configuration
- External Objectstore configuration
- Pre-signed URL configuration
- ArgoCD configuration
- Kerberos authentication configuration
- External OCI-compliant registry configuration
- Disaster recovery: Active/Passive and Active/Active configurations
- High Availability Add-on configuration
- Orchestrator-specific configuration
- Insights-specific configuration
- Process Mining-specific configuration
- Document Understanding-specific configuration
- Automation Suite Robots-specific configuration
- AI Center-specific configuration
- Monitoring configuration
- Optional: Configuring the proxy server
- Optional: Enabling resilience to zonal failures in a multi-node HA-ready production cluster
- Optional: Passing custom resolv.conf
- Optional: Increasing fault tolerance
- Adding a dedicated agent node with GPU support
- Adding a Dedicated Agent Node for Automation Suite Robots
- Step 15: Configuring the temporary Docker registry for offline installations
- Step 16: Validating the prerequisites for the installation
- Running uipathctl
- Manual: Performing the installation
- Post-installation
- Cluster administration
- Managing products
- Getting Started with the Cluster Administration portal
- Migrating Redis from in-cluster to external High Availability Add-on
- Migrating data between objectstores
- Migrating in-cluster objectstore to external objectstore
- Migrating from in-cluster registry to an external OCI-compliant registry
- Switching to the secondary cluster manually in an Active/Passive setup
- Disaster Recovery: Performing post-installation operations
- Converting an existing installation to multi-site setup
- Guidelines on upgrading an Active/Passive or Active/Active deployment
- Guidelines on backing up and restoring an Active/Passive or Active/Active deployment
- Scaling a single-node (evaluation) deployment to a multi-node (HA) deployment
- Monitoring and alerting
- Migration and upgrade
- Migrating between Automation Suite clusters
- Upgrading Automation Suite
- Downloading the installation packages and getting all the files on the first server node
- Retrieving the latest applied configuration from the cluster
- Updating the cluster configuration
- Configuring the OCI-compliant registry for offline installations
- Executing the upgrade
- Performing post-upgrade operations
- Product-specific configuration
- Best practices and maintenance
- Troubleshooting
- How to troubleshoot services during installation
- How to reduce permissions for an NFS backup directory
- How to uninstall the cluster
- How to clean up offline artifacts to improve disk space
- How to clear Redis data
- How to enable Istio logging
- How to manually clean up logs
- How to clean up old logs stored in the sf-logs bucket
- How to disable streaming logs for AI Center
- How to debug failed Automation Suite installations
- How to delete images from the old installer after upgrade
- How to disable TX checksum offloading
- How to manually set the ArgoCD log level to Info
- How to expand AI Center storage
- How to generate the encoded pull_secret_value for external registries
- How to address weak ciphers in TLS 1.2
- How to check the TLS version
- How to work with certificates
- How to schedule Ceph backup and restore data
- How to collect DU usage data with in-cluster objectstore (Ceph)
- How to install RKE2 SELinux on air-gapped environments
- How to clean up old differential backups on an NFS server
- Error in downloading the bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- SQL connection string validation error
- Azure disk not marked as SSD
- Failure after certificate update
- Antivirus causes installation issues
- Automation Suite not working after OS upgrade
- Automation Suite requires backlog_wait_time to be set to 0
- Temporary registry installation fails on RHEL 8.9
- Frequent restart issue in uipath namespace deployments during offline installations
- DNS settings not honored by CoreDNS
- Upgrade fails due to unhealthy Ceph
- RKE2 not getting started due to space issue
- Upgrade fails due to classic objects in the Orchestrator database
- Ceph cluster found in a degraded state after side-by-side upgrade
- Service upgrade fails for Apps
- In-place upgrade timeouts
- Upgrade fails in offline environments
- snapshot-controller-crds pod in CrashLoopBackOff state after upgrade
- Upgrade fails due to overridden Insights PVC sizes
- Upgrade failure due to uppercase hostname
- Setting a timeout interval for the management portals
- Authentication not working after migration
- Kinit: Cannot find KDC for realm <AD Domain> while getting initial credentials
- Kinit: Keytab contains no suitable keys for *** while getting initial credentials
- GSSAPI operation failed due to invalid status code
- Alarm received for failed Kerberos-tgt-update job
- SSPI provider: Server not found in Kerberos database
- Login failed for AD user due to disabled account
- ArgoCD login failed
- Update the underlying directory connections
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis probe failure
- RKE2 server fails to start
- Secret not found in UiPath namespace
- ArgoCD goes into progressing state after first installation
- Missing Ceph-rook metrics from monitoring dashboards
- Mismatch in reported errors during diagnostic health checks
- No healthy upstream issue
- Redis startup blocked by antivirus
- Running High Availability with Process Mining
- Process Mining ingestion failed when logged in using Kerberos
- Unable to connect to AutomationSuite_ProcessMining_Warehouse database using a pyodbc format connection string
- Airflow installation fails with sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string ''
- How to add an IP table rule to use SQL Server port 1433
- Automation Suite certificate is not trusted from the server where CData Sync is running
- Running the diagnostics tool
- Using the Automation Suite support bundle
- Exploring Logs
- Exploring summarized telemetry

Automation Suite on Linux installation guide
Running the diagnostics tool
The Automation Suite diagnostics tool runs a set of checks to generate a report on the cluster health, which you can analyze to identify issues and their potential root causes. The tool helps you find common issues, such as lost database connectivity or invalid or expired credentials.
The Automation Suite diagnostics tool is available in both uipathctl and uipathtools, which you can download on your management machine.
uipathtools is a CLI tool that contains a subset of uipathctl capabilities specific to health commands. The tool is backwards compatible and works with any of the supported Automation Suite versions. We recommend using uipathtools as the first step if you face any issue.
Quick validation
Quick validation
The check and test commands provide quick insights into the state of the cluster without running a deep analysis.
checkrelies on the ArgoCD health and sync status and does not modify any state in the clustertestlooks into the applications, deployment, or pods and temporarily mutates the state of the cluster to provide you with those insights.
Health check
To run a health check, use one of the following commands, depending on the CLI tool you use:
-
If you use
uipathctl, run:./bin/uipathctl health check./bin/uipathctl health check -
If you use
uipathtools, run:./bin/uipathtools health check./bin/uipathtools health check
Sample output of the generated report:
INFO[0038] Found 3 pods for etcd
INFO[0038] Running the health command - [etcdctl endpoint health --endpoints https://localhost:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key]
INFO[0039] No credentials provided for registry: registry.uipath.com
INFO[0039] Checking if pods for component argocd-server exist
INFO[0039] Checking if pods for component argocd-repo-server exist
INFO[0039] Found 2 pods for Istio
INFO[0039] Checking if pods for component argocd-application-controller exist
INFO[0039] Checking if pods for component redis-ha exist
INFO[0040] application actioncenter-actions has sync enabled
INFO[0040] application actioncenter-bupproxyservice has sync enabled
INFO[0040] application actioncenter-processes has sync enabled
INFO[0040] application ai-app-deployment has sync enabled
INFO[0040] application ai-appmanager-deployment has sync enabled
INFO[0040] application ai-deployer-deployment has sync enabled
INFO[0040] application ai-helper-deployment has sync enabled
INFO[0040] application ai-pkgmanager-deployment has sync enabled
INFO[0040] application ai-trainer-deployment has sync enabled
INFO[0040] application aievents-deploy has sync enabled
INFO[0040] application ailoadbalancer-cleanup has sync enabled
INFO[0040] application ailoadbalancer-service has sync enabled
INFO[0040] application aimetering has sync enabled
INFO[0040] application airflow-scheduler has sync enabled
INFO[0040] application airflow-statsd has sync enabled
INFO[0040] application airflow-webserver has sync enabled
INFO[0040] application aistorage has sync enabled
INFO[0040] application aistorage-cleanup has sync enabled
INFO[0040] application apps-designer has sync enabled
INFO[0040] application apps-runtime has sync enabled
INFO[0040] application apps-server has sync enabled
INFO[0040] application apps-signalr has sync enabled
INFO[0040] application asrobots has sync enabled
INFO[0040] application auth-dex has sync enabled
INFO[0040] application auth-oauth2-proxy has sync enabled
INFO[0040] application automationhub-ah-frontdoor-service has sync enabled
INFO[0040] application automationhub-ah-open-api-service has sync enabled
INFO[0040] application automationhub-ah-tenant-service has sync enabled
INFO[0040] application automationhub-ah-web-client has sync enabled
INFO[0040] application automationsolutions has sync enabled
INFO[0040] application datapipeline-api has sync enabled
INFO[0040] application dataservice-designer has sync enabled
INFO[0040] application dataservice-runtime has sync enabled
INFO[0040] application dataservice-taskrunner has sync enabled
INFO[0040] application du-aimodelhost-2404 has sync enabled
INFO[0040] application du-aimodelhost-classifier-2404 has sync enabled
INFO[0040] application du-annotations has sync enabled
INFO[0040] application du-annotations-background-tasks has sync enabled
INFO[0040] application du-app-service has sync enabled
INFO[0040] application du-audit-cleanup has sync enabled
INFO[0040] application du-audit-service has sync enabled
INFO[0040] application du-classifier has sync enabled
INFO[0040] application du-deployments has sync enabled
INFO[0040] application du-digitizer has sync enabled
INFO[0040] application du-digitizer-cleanup has sync enabled
INFO[0040] application du-digitizer-worker-deployment has sync enabled
INFO[0040] application du-document-processor-cleanup has sync enabled
INFO[0040] application du-document-processor-service has sync enabled
INFO[0040] application du-document-types-service has sync enabled
INFO[0040] application du-document-types-service-cleanup has sync enabled
INFO[0040] application du-documentmanager-dm has sync enabled
INFO[0040] application du-documents-service has sync enabled
INFO[0040] application du-documents-service-cleanup has sync enabled
INFO[0040] application du-extended-ocr has sync enabled
INFO[0040] application du-extended-ocr-proxy has sync enabled
INFO[0040] application du-framework has sync enabled
INFO[0040] application du-framework-cleanup has sync enabled
INFO[0040] application du-framework-worker has sync enabled
INFO[0040] application du-frontend has sync enabled
INFO[0040] application du-measure-service has sync enabled
INFO[0040] application du-ocr has sync enabled
INFO[0040] application du-provisioning has sync enabled
INFO[0040] application du-services-fe has sync enabled
INFO[0040] application du-services-ikc has sync enabled
INFO[0040] application du-ssde has sync enabled
INFO[0040] application du-training-classifier-2404 has sync enabled
INFO[0040] application du-training-service has sync enabled
INFO[0040] application du-training-service-cleanup has sync enabled
INFO[0040] application du-training-ssde-2404 has sync enabled
INFO[0040] application identity-service-api has sync enabled
INFO[0040] application identity-service-web has sync enabled
INFO[0040] application insights-insightsportal has sync enabled
INFO[0040] application insights-insightsprovisioning has sync enabled
INFO[0040] application notificationcoreworker has sync enabled
INFO[0040] application notificationserviceapi has sync enabled
INFO[0040] application orchestrator has sync enabled
INFO[0040] application platform-authorization-service has sync enabled
INFO[0040] application platform-license-accountant has sync enabled
INFO[0040] application platform-license-accountant-worker has sync enabled
INFO[0040] application platform-license-resource-manager has sync enabled
INFO[0040] application platform-license-resource-manager-worker has sync enabled
INFO[0040] application platform-location-service has sync enabled
INFO[0040] application platform-messagebus-service has sync enabled
INFO[0040] application platform-organization-management-service has sync enabled
INFO[0040] application platform-portal has sync enabled
INFO[0040] application platform-resource-catalog-service has sync enabled
INFO[0040] application process-mining has sync enabled
INFO[0040] application process-mining-dbt-exec has sync enabled
INFO[0040] application process-mining-frontend has sync enabled
INFO[0040] application process-mining-technology-webapi has sync enabled
INFO[0040] application process-mining-technology-workerservice has sync enabled
INFO[0040] application publishermetaservice has sync enabled
INFO[0040] application pushgateway-prometheus-pushgateway has sync enabled
INFO[0040] application reloader-reloader has sync enabled
INFO[0040] application robotube has sync enabled
INFO[0040] application studio-governance-api has sync enabled
INFO[0040] application studio-governance-web has sync enabled
INFO[0040] application testmanager has sync enabled
INFO[0040] application usergroupresolverworker has sync enabled
INFO[0040] application usersubscriptionservice has sync enabled
INFO[0040] application webhook-service has sync enabled
INFO[0040] Pod etcd-server0 is healthy
INFO[0040] Running the health command - [etcdctl endpoint health --endpoints https://localhost:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key]
INFO[0041] Pod etcd-server1 is healthy
INFO[0041] Running the health command - [etcdctl endpoint health --endpoints https://localhost:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key]
INFO[0042] Pod etcd-server2 is healthy
INFO[0044] Waited for job uipath-check/grafana-vj8m5-curl-podfzvvb to reach status COMPLETED, desiredStatus [COMPLETED] with message CompletionsReached
INFO[0044] status COMPLETED is contained in desiredStatus [COMPLETED]
INFO[0044] Querying for running pod in namespace uipath-check, and job.Name grafana-vj8m5-curl-podfzvvb
INFO[0060] Fetching from helm-credential URL in argocd namespace
INFO[0061] Fetching from helm-credential URL in argocd namespace
INFO[0064] There were errors when reading cluster config: cannot parse proxy enabled: strconv.ParseBool: parsing "": invalid syntax
Ran cluster/ checks...
Ran ACTIONCENTER checks...
✔ successful
Ran AICENTER checks...
✔ successful
Ran AIEVENTS checks...
✔ successful
Ran AIMETERING checks...
✔ successful
Ran AIRFLOW checks...
✔ successful
Ran ARGOCD checks...
✔ successful
Ran ASROBOTS checks...
✔ successful
Ran AUTOMATIONHUB checks...
✔ successful
Ran AUTOMATIONOPS checks...
✔ successful
Ran AUTOMATIONSOLUTIONS checks...
✔ successful
Ran BA checks...
✔ successful
Ran CERT-MANAGER checks...
✔ successful
Ran CILIUM checks...
✔ successful
Ran DATAPIPELINE-API checks...
✔ successful
Ran DATASERVICE checks...
✔ successful
Ran DOCUMENTUNDERSTANDING checks...
✔ successful
Ran ETCD checks...
✔ successful
Ran GATEKEEPER checks...
✔ successful
Ran GRAFANA checks...
✔ successful
Ran INSIGHTS checks...
✔ successful
Ran ISTIO checks...
✔ successful
Ran LOGGING checks...
✔ successful
Ran MAINTENANCE checks...
✔ successful
Ran NODE checks...
✔ successful
Ran NOTIFICATIONSERVICE checks...
✔ successful
Ran ORCHESTRATOR checks...
✔ successful
Ran PLATFORM checks...
✔ successful
Ran POD checks...
✔ successful
Ran PROCESSMINING checks...
✔ successful
Ran RELOADER checks...
✔ successful
Ran REPLICAS checks...
✔ successful
Ran ROBOTUBE checks...
✔ successful
Ran SFCORE checks...
✔ successful
Ran TESTMANAGER checks...
✔ successful
Ran WEBHOOK checks...
✔ successful
Checks complete!
INFO[0038] Found 3 pods for etcd
INFO[0038] Running the health command - [etcdctl endpoint health --endpoints https://localhost:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key]
INFO[0039] No credentials provided for registry: registry.uipath.com
INFO[0039] Checking if pods for component argocd-server exist
INFO[0039] Checking if pods for component argocd-repo-server exist
INFO[0039] Found 2 pods for Istio
INFO[0039] Checking if pods for component argocd-application-controller exist
INFO[0039] Checking if pods for component redis-ha exist
INFO[0040] application actioncenter-actions has sync enabled
INFO[0040] application actioncenter-bupproxyservice has sync enabled
INFO[0040] application actioncenter-processes has sync enabled
INFO[0040] application ai-app-deployment has sync enabled
INFO[0040] application ai-appmanager-deployment has sync enabled
INFO[0040] application ai-deployer-deployment has sync enabled
INFO[0040] application ai-helper-deployment has sync enabled
INFO[0040] application ai-pkgmanager-deployment has sync enabled
INFO[0040] application ai-trainer-deployment has sync enabled
INFO[0040] application aievents-deploy has sync enabled
INFO[0040] application ailoadbalancer-cleanup has sync enabled
INFO[0040] application ailoadbalancer-service has sync enabled
INFO[0040] application aimetering has sync enabled
INFO[0040] application airflow-scheduler has sync enabled
INFO[0040] application airflow-statsd has sync enabled
INFO[0040] application airflow-webserver has sync enabled
INFO[0040] application aistorage has sync enabled
INFO[0040] application aistorage-cleanup has sync enabled
INFO[0040] application apps-designer has sync enabled
INFO[0040] application apps-runtime has sync enabled
INFO[0040] application apps-server has sync enabled
INFO[0040] application apps-signalr has sync enabled
INFO[0040] application asrobots has sync enabled
INFO[0040] application auth-dex has sync enabled
INFO[0040] application auth-oauth2-proxy has sync enabled
INFO[0040] application automationhub-ah-frontdoor-service has sync enabled
INFO[0040] application automationhub-ah-open-api-service has sync enabled
INFO[0040] application automationhub-ah-tenant-service has sync enabled
INFO[0040] application automationhub-ah-web-client has sync enabled
INFO[0040] application automationsolutions has sync enabled
INFO[0040] application datapipeline-api has sync enabled
INFO[0040] application dataservice-designer has sync enabled
INFO[0040] application dataservice-runtime has sync enabled
INFO[0040] application dataservice-taskrunner has sync enabled
INFO[0040] application du-aimodelhost-2404 has sync enabled
INFO[0040] application du-aimodelhost-classifier-2404 has sync enabled
INFO[0040] application du-annotations has sync enabled
INFO[0040] application du-annotations-background-tasks has sync enabled
INFO[0040] application du-app-service has sync enabled
INFO[0040] application du-audit-cleanup has sync enabled
INFO[0040] application du-audit-service has sync enabled
INFO[0040] application du-classifier has sync enabled
INFO[0040] application du-deployments has sync enabled
INFO[0040] application du-digitizer has sync enabled
INFO[0040] application du-digitizer-cleanup has sync enabled
INFO[0040] application du-digitizer-worker-deployment has sync enabled
INFO[0040] application du-document-processor-cleanup has sync enabled
INFO[0040] application du-document-processor-service has sync enabled
INFO[0040] application du-document-types-service has sync enabled
INFO[0040] application du-document-types-service-cleanup has sync enabled
INFO[0040] application du-documentmanager-dm has sync enabled
INFO[0040] application du-documents-service has sync enabled
INFO[0040] application du-documents-service-cleanup has sync enabled
INFO[0040] application du-extended-ocr has sync enabled
INFO[0040] application du-extended-ocr-proxy has sync enabled
INFO[0040] application du-framework has sync enabled
INFO[0040] application du-framework-cleanup has sync enabled
INFO[0040] application du-framework-worker has sync enabled
INFO[0040] application du-frontend has sync enabled
INFO[0040] application du-measure-service has sync enabled
INFO[0040] application du-ocr has sync enabled
INFO[0040] application du-provisioning has sync enabled
INFO[0040] application du-services-fe has sync enabled
INFO[0040] application du-services-ikc has sync enabled
INFO[0040] application du-ssde has sync enabled
INFO[0040] application du-training-classifier-2404 has sync enabled
INFO[0040] application du-training-service has sync enabled
INFO[0040] application du-training-service-cleanup has sync enabled
INFO[0040] application du-training-ssde-2404 has sync enabled
INFO[0040] application identity-service-api has sync enabled
INFO[0040] application identity-service-web has sync enabled
INFO[0040] application insights-insightsportal has sync enabled
INFO[0040] application insights-insightsprovisioning has sync enabled
INFO[0040] application notificationcoreworker has sync enabled
INFO[0040] application notificationserviceapi has sync enabled
INFO[0040] application orchestrator has sync enabled
INFO[0040] application platform-authorization-service has sync enabled
INFO[0040] application platform-license-accountant has sync enabled
INFO[0040] application platform-license-accountant-worker has sync enabled
INFO[0040] application platform-license-resource-manager has sync enabled
INFO[0040] application platform-license-resource-manager-worker has sync enabled
INFO[0040] application platform-location-service has sync enabled
INFO[0040] application platform-messagebus-service has sync enabled
INFO[0040] application platform-organization-management-service has sync enabled
INFO[0040] application platform-portal has sync enabled
INFO[0040] application platform-resource-catalog-service has sync enabled
INFO[0040] application process-mining has sync enabled
INFO[0040] application process-mining-dbt-exec has sync enabled
INFO[0040] application process-mining-frontend has sync enabled
INFO[0040] application process-mining-technology-webapi has sync enabled
INFO[0040] application process-mining-technology-workerservice has sync enabled
INFO[0040] application publishermetaservice has sync enabled
INFO[0040] application pushgateway-prometheus-pushgateway has sync enabled
INFO[0040] application reloader-reloader has sync enabled
INFO[0040] application robotube has sync enabled
INFO[0040] application studio-governance-api has sync enabled
INFO[0040] application studio-governance-web has sync enabled
INFO[0040] application testmanager has sync enabled
INFO[0040] application usergroupresolverworker has sync enabled
INFO[0040] application usersubscriptionservice has sync enabled
INFO[0040] application webhook-service has sync enabled
INFO[0040] Pod etcd-server0 is healthy
INFO[0040] Running the health command - [etcdctl endpoint health --endpoints https://localhost:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key]
INFO[0041] Pod etcd-server1 is healthy
INFO[0041] Running the health command - [etcdctl endpoint health --endpoints https://localhost:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key]
INFO[0042] Pod etcd-server2 is healthy
INFO[0044] Waited for job uipath-check/grafana-vj8m5-curl-podfzvvb to reach status COMPLETED, desiredStatus [COMPLETED] with message CompletionsReached
INFO[0044] status COMPLETED is contained in desiredStatus [COMPLETED]
INFO[0044] Querying for running pod in namespace uipath-check, and job.Name grafana-vj8m5-curl-podfzvvb
INFO[0060] Fetching from helm-credential URL in argocd namespace
INFO[0061] Fetching from helm-credential URL in argocd namespace
INFO[0064] There were errors when reading cluster config: cannot parse proxy enabled: strconv.ParseBool: parsing "": invalid syntax
Ran cluster/ checks...
Ran ACTIONCENTER checks...
✔ successful
Ran AICENTER checks...
✔ successful
Ran AIEVENTS checks...
✔ successful
Ran AIMETERING checks...
✔ successful
Ran AIRFLOW checks...
✔ successful
Ran ARGOCD checks...
✔ successful
Ran ASROBOTS checks...
✔ successful
Ran AUTOMATIONHUB checks...
✔ successful
Ran AUTOMATIONOPS checks...
✔ successful
Ran AUTOMATIONSOLUTIONS checks...
✔ successful
Ran BA checks...
✔ successful
Ran CERT-MANAGER checks...
✔ successful
Ran CILIUM checks...
✔ successful
Ran DATAPIPELINE-API checks...
✔ successful
Ran DATASERVICE checks...
✔ successful
Ran DOCUMENTUNDERSTANDING checks...
✔ successful
Ran ETCD checks...
✔ successful
Ran GATEKEEPER checks...
✔ successful
Ran GRAFANA checks...
✔ successful
Ran INSIGHTS checks...
✔ successful
Ran ISTIO checks...
✔ successful
Ran LOGGING checks...
✔ successful
Ran MAINTENANCE checks...
✔ successful
Ran NODE checks...
✔ successful
Ran NOTIFICATIONSERVICE checks...
✔ successful
Ran ORCHESTRATOR checks...
✔ successful
Ran PLATFORM checks...
✔ successful
Ran POD checks...
✔ successful
Ran PROCESSMINING checks...
✔ successful
Ran RELOADER checks...
✔ successful
Ran REPLICAS checks...
✔ successful
Ran ROBOTUBE checks...
✔ successful
Ran SFCORE checks...
✔ successful
Ran TESTMANAGER checks...
✔ successful
Ran WEBHOOK checks...
✔ successful
Checks complete!
By default, the health check command checks the health of all the components. However, it also allows you to check strictly the components that you are interested in:
-
If you want to exclude components from the execution, use the
--excludedflag.For example, if you do not want to check the health of SQL, run the following command:
./bin/uipathctl health check --excluded SQL./bin/uipathctl health check --excluded SQLThe command checks the health of all components except for SQL.
-
If you want to include only certain components in the execution, use the
--includedflag.For example, if you only want to check the health of DNS and objectstore, run the following command:
./bin/uipathctl health check --included DNS,OBJECTSTORAGE./bin/uipathctl health check --included DNS,OBJECTSTORAGE
Analyzing the logs
-
After running a check health check, the logs show that health check for the Data Service application failed.
❌ [DATASERVICE] ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced❌ [DATASERVICE] ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced -
After further investigation, it becomes clear that the Data Service application failed because the
dataservice-runtime-8f5bb7d56-v5krganddataservice-taskrunner-787df76c74-98h5lpods are in a failed state. If you analyze further, you can find that the missingdataservice-external-storage-secretis missing.❌ [POD] ✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found❌ [POD] ✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found -
To fix this issue, ensure that you provided the correct credentials for the objectstore in the
cluster_config.json.
Health test
To run a health test, use one of the following commands, depending on the CLI tool you use:
-
If you use
uipathctl, run:./bin/uipathctl health test./bin/uipathctl health test -
If you use
uipathtools, run:./bin/uipathtools health test./bin/uipathtools health test
By default, the health test command executes health tests on all the components. However, it also allows you to check strictly the components that you are interested in:
-
If you want to exclude components from the execution, use the
--excludedflag.For example, if you do not want to check the health of SQL, run the following command:
./bin/uipathctl health test --excluded SQL./bin/uipathctl health test --excluded SQLThe command checks the health of all components except for SQL.
-
If you want to include only certain components in the execution, use the
--includedflag.For example, if you only want to check the health of DNS and objectstore, run the following command:
./bin/uipathctl test --included DNS,OBJECTSTORAGE./bin/uipathctl test --included DNS,OBJECTSTORAGE
If you compare the output of the check and test commands for the Data Service application, you can see that the former validates the health of the application, whereas the latter checks the routing.
Known issue
You might get an error message similar to the following sample. You can ignore it as no action is required from your side.
E0621 23:32:56.426321 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.426392 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.444420 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.446150 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.513357 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.426321 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.426392 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.444420 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.446150 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.513357 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
Deep validation
Deep validation
The diagnose command provides deep insights into the state of the cluster. It helps you identify issues at all levels, such as SQL, objectstore, node, secret, Istio, networking etc.
- It covers both the
checkandtestcommands. - It runs the prerequisites checks performed before the Automation suite installation, to validate changes to the environment configuration that were made post-installation and that can be the potential cause of the issue.
- It runs on the all the nodes to gather any node-specific issues, such as resource unavailability, any network interference, etc.
To run a diagnostic check, use one of the following commands, depending on the CLI tool you use:
-
If you use
uipathctl, run:./bin/uipathctl health diagnose cluster_config.json --versions versions/helm-charts.json./bin/uipathctl health diagnose cluster_config.json --versions versions/helm-charts.json -
If you use
uipathtools, run:./bin/uipathtools health diagnose cluster_config.json --versions versions/helm-charts.json./bin/uipathtools health diagnose cluster_config.json --versions versions/helm-charts.json
The aforementioned sample is trimmed down. Actual logs have more information. You can notice that the diagnose command runs at multiple levels, such as infrastructure, networking, storage, pods, DNS etc.
Analyzing the logs
There are two potential issues that you can notice in the previous logs:
-
Istio has a bad configuration, which can cause issues accessing the Document Understanding platform:
❌ [ISTIO] ✔ [ISTIO_SYNC_STATUS] Istio sync is up-to-date ❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000"❌ [ISTIO] ✔ [ISTIO_SYNC_STATUS] Istio sync is up-to-date ❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000" -
Data Service is unavailable. See Ceph in the code example.
❌ [DATASERVICE] ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found❌ [DATASERVICE] ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
Known issues
You might get an error message similar to the following sample. You can ignore it as no action is required from your side.
Additional utilities
Additional uitilities
All the Automation Suite diagnostics tool commands (check, test, and diagnose) support additional filtering and output format.
Filtering
| Filters | Description | Usages |
|---|---|---|
--included | Comma-separated list of the services to include in the validation | ./bin/uipathctl health diagnose cluster_config.json --versions versions/helm-charts.json --included ISTIO,INSIGHTS This command runs the diagnose only against Istio and Insights. |
--excluded | Comma-separated list of the services to exclude from the validation | ./bin/uipathctl health test --excluded ISTIO,INSIGHTS This command runs the test in the entire cluster, except Istio and Insights. |
Output format
The Automation Suite diagnostics tool can generate reports in multiple formats: json, yaml, text, and junit. You can pass these values to any of the command via the --output flag. These output formats are handy when you want to leverage these tools to build your own troubleshooting framework on top of them.
Example usages
| Usage | Example Output |
|---|---|
|
|
|
|
|
|
|
|
Reading Diagnostics Reports
INFO Logs
INFO logs in green show that the required checks passed. However, you should still properly check the disk/memory usage to avoid hidden errors.
WARN Messages
Even though these messages do not signal a high risk, you might have to rectify them, as they might be affecting some services in certain scenarios.
ERROR Messages
You must fix the issues described by these messages as they impact some service in the cluster.
Rke2-server or Rke2-agent Service Down
If these services are down, it means the node is down. Try restarting the service using systemctl restart <service-name> as this should fix the issue.
Directory Size Mounted at /var/lib
The report displays the directory size mounted at /var/lib as Kubernetes uses it to store its data. If the directory is full, various issues might arise. To prevent these problems, make sure to increase its size.
Rke2 Version
The report displays the rke2 version for reference.
Disk Pressure or Memory Pressure
For all the nodes, we specify if they are under Disk Pressure or Memory Pressure. If that happens, workloads on these nodes might start showing issues. Check if there are any other processes running on these nodes that are consuming resources and remove them if that is the case.
Ceph Services Status
We use Ceph as S3 Object storage for storing logs and files from different applications. You can view the status of its services. If they are down, you might have to restart them. Make sure to also check if the disk usage by Ceph is full.
Ports 443 and 31443
We expect ports 443 and 31443 to be open with the hostname that was provided. The report indicates if they are not accessible. Make sure to open the appropriate ports if pointed here.
Certificate Validity
The tool checks if the uploaded certificate is valid for the given hostname and if it has not expired. If the certificate does not meet these criteria, errors occur. To prevent this, make sure to check your uploaded certificate and change it if required.
GPU
Since some services require GPU to be present on some of the nodes in the cluster, the Automation Suite Diagnostics Tool checks if there is are GPU nodes and prints number of such nodes. If you are expecting GPU nodes to be present and they do not show up here, that means something went wrong in GPU setup.
DockerRegistry
DockerRegistry is an important component that some services use. If it is down, you need to investigate the issue and perform a restart.
ArgoCD Services Down
ArgoCD is our application lifecycle management (ALM) tool. If any of its services are down, then other applications may become outdated or have other issues. Recovering these services is important, and might need further debugging.
Missing or Degraded ArgoCD Applications
The Automation Suite Diagnostics Tool shows whether ArgoCD applications are missing and degraded.
- If applications are missing, go to the ArgoCD UI and sync it.
- If applications are degraded, additional debugging is needed to investigate the errors thrown by ArgoCD
- Quick validation
- Quick validation
- Health check
- Health test
- Deep validation
- Deep validation
- Additional utilities
- Additional uitilities
- Reading Diagnostics Reports
- INFO Logs
- WARN Messages
- ERROR Messages
- Rke2-server or Rke2-agent Service Down
- Directory Size Mounted at /var/lib
- Rke2 Version
- Disk Pressure or Memory Pressure
- Ceph Services Status
- Ports 443 and 31443
- Certificate Validity
- GPU
- DockerRegistry
- ArgoCD Services Down
- Missing or Degraded ArgoCD Applications