We often emphasize the benefits of vendors using custom or out-of-the-box application-level preflight checks to help ensure your software will install successfully in your customers’ environments. Preflight checks are powered by the open source project Troubleshoot (contributed to the community by Replicated) and are run before the application is installed to help identify issues in advance. This gives your customer a better experience and can help reduce your support burden at time of install. We recently enhanced this capability in application manager v1.67.0 to give you even greater control of what happens when one or more application preflight checks fail.
This feature enhancement allows you to specify if you want a failed preflight check to block a requested application deployment. If you set one or more Troubleshoot analyzers to ‘strict’ mode, this parameter will be picked up by the Replicated application manager and prevent your customer from deploying until the fail condition is remedied. A strict application preflight check failure cannot be skipped by the end user.
Let’s take a look at an example. In the snippet below, we can see one of the default analyzers available. This container runtime analyzer uses data from the clusterResources collector to evaluate which container runtime is present. In this example, the container runtime must be containerd - if not, the analyzer will fail.
Since this analyzer is set to strict: true, app manager will use this as a cue to recognize and enforce the preflight check and make it non-skippable. For this check to be passed and the installation to proceed, a non-failure outcome must first be returned. Below, we can see how the deploy button in the Replicated app manager is unavailable to the user while the preflight checks are still running:
If the user were to mouse over the deploy button, they’ll also be told why they can’t click it:
If the user looks into the preflight checks results (once the checks finish running), they will be given more detail on why the deployment is being blocked:
At this point, the user must resolve this failure before they can re-run the preflight checks and deploy the application.
We've created the reference table below to illustrate the possible outcomes of different test results combined with whether strict blocking preflight checks are being enforced.
The intent is to better inform and guide the user when there are certain failures that you as the vendor know can’t be ignored. For example, maybe your application is only supported with certain versions of Kubernetes, and you want to perform checks on the cluster version. Maybe you know that the node must have at least a minimum amount of resources in order to function correctly. Maybe you have created your own collector and analyzer using Troubleshoot and want to evaluate that condition. The possibilities are endless, but most importantly this functionality can help prevent your users from getting themselves into an unsupported or known bad configuration. Sometimes, failure just isn’t an option.
Looking to share your experience with us? Slack us in the #kots or #troubleshoot channels - we would love to hear your story or feedback on this capability!