Cloud Application Architecture Design: Effective Operations

Cloud Application Architecture Design: Effective Operations

Operating the application on cloud has dramatically changed the way operations used to happen prior to cloud. Though the operations team is no longer responsible for hardware and infrastructure that hosts the application and very less responsible for other operations like patching, update, upgrades, etc, it is still a critical part of running the applications successfully on cloud. The focus has been changed some of the following areas,

  • Automated deployments
  • Monitoring
  • Observability
  • Incident response
  • Security
  • Auditing

and the development team in assistance with operations team can execute some aspects of logging and tracing so the application generates required data and events for them to be successful.

Following are some of the recommendations for effective operations of applications hosted on cloud,

  • Treating configuration as code can help operations team to update application configurations easily without needing to know the all the details.
  • To respond to application failures both from function or non functional perspective, there should be enough logging in place for operations team to look at. The complexity increases as the number of services increase. Having a standard logging framework helps operations team not only to filter through logs but alos can be integrated with additional services which can generate alerts, reports and visualizations from the logging data.
  • Application metrics such as health, request/response times, thresholds, etc can help leverage the scaling capabilities of underlying cloud infrastructure.
  • Instrumentation using either Cloud SDKs or generic frameworks can help dependent systems to fetch meaningful data from application. The application team can instrument such things for monitoring, tracing, root-cause analysis, etc.
  • For applications to effectily use distributed tracing, the application needs to corelate inter-service communication with common ID or corelation IDs. The supporting tools can then use the ID to create the trace and pinpoint the failure.
  • Automate recurring tasks such as deployment, configuration updates, monitoring changes as much as possible to reduce human interference & run via a defined pipeline.


For more updates to subscribe to the?Cloud Native Hero! Newsletter

LinkedIn?|?Twitter?|?GitHub?|?Blog?|?Medium

要查看或添加评论,请登录

Swapnil K.的更多文章

社区洞察

其他会员也浏览了