Heads up with few initial runtime fabric troubleshooting and hints.
While installing runtime fabric on the server below a few points need to be monitor to be sure the tool is being deployed with all required packages along with dependencies and to find solutions for issues...
1: Validate the provided infra for all required resources, it should be available and recommended network configuration should be done before creating runtime fabric in the selected environment in Anypoint platform control plan :
i : In case any node/controller crashed or needs to be formatted then Runtime fabric which is created Anypoint Control plane, can not be deleted. [critical].Follow the mulesoft support team.
2: Before associating with the environment with the Anypoint Runtime environment for non-production, a minimum of two workers and one controller should be up and running along with 'healthy' status in Anypoint Runtime Manager and cluster must be in 'Active' state.
3: Following #2 point, for the production environment at least 3 controllers are required to associate with the production environment. [Must]
4: If required to reboot/shutdown node/s or controller/s then the fabric status in Runtime Control plane will change from active to degraded/disconnected. Once node/s or controller/s will available and online then it will take some time to share all metadata with the control plane in cloudhub to changes status to active, In some cases, it takes more than expected time.
4: Keep monitoring on the process with available log...
- /var/log/rtf-init.log : This log has init.sh execution details process output along to monitor the status of runtime fabric installations.
- gravity commands to find the status and to validate logs.
5: In case of Anypoint Fabric update needs to be performed, please plan it to the non-business hours. As while performing update activity, few processes might be ineffective to process requests.
6: Validate requirement that which load balancer (shared or dedicated) required to be configured for high availability and request processing.
i: Dedicated load balancer will be managed by the organization and the required resources need to allocate to install and configure it.
ii: Shared load balancer is default and not required to assign any resource or required any dependency on the provider.
iii: External load balancer is another option to manage inbound traffic. Need to assign all controllers and workers IP addresses to it.
7: Last and most important, please validate with the mulesoft team before buying a license for it. An accurate calculation needs to be done before buying a license. There is an impact on system preparation or application deployment difference between license we have and actual requirement.
8: Adhere with license organization have to avoid any unexpected platform behavior or MuleSoft notifications for resources assigned to fabric setup.
9: Follow MuleSoft troubleshooting suggestions, it helps to find if any issues and required solutions.
Regarding fault tolerance, follow available resources in the organization along with best practices which are suggested in MuleSoft documents.
Hope it help.