Save your DNS, follow these 8 simple steps
Lee Atchison
Co-Founder & CTO, Product Genius Corporation. Thought Leader, Cloud Expert, Best Selling Author. O'Reilly Media, LinkedIn Learning. Host Software Engineering Daily. Ex-Amazon, Ex-AWS. softwarearchitectureinsights.com.
DNS is essential to?the operation of all aspects?of the internet and modern digital businesses. DNS is a highly available, highly redundant, highly reliable service that is absolutely essential to your company’s applications and business operations. A failure in your DNS can bring business to a halt, jeopardizing your company’s future.
The problem with DNS is that a tiny mistake in a configuration file can have a ripple effect through the entire DNS and impact all aspects of your company’s operations. A DNS failure will impede your customers’ ability to use your products and your company’s ability to make money. Without solid DNS configuration management in place, you make yourself vulnerable to simple but costly mistakes.
DNS changes are so common and so simple that they are rarely considered risky business operations. For smaller organizations, the development team probably manages its own DNS servers or has some other way to make DNS changes on the fly. As organizations get larger and more complex, the number of DNS servers and the number of people who can make changes to them tend to multiply.
With so many people making changes, it’s not surprising that something goes wrong occasionally. In fact, it would be more surprising if things didn’t go wrong.
DNS outages are caused by a variety of factors, including human error, software issues, and hardware failures. But the most common cause of DNS outages is incorrect configuration files being deployed to DNS servers.
What steps can a smaller company that lacks quality DNS hygiene make in order to put a high-quality DNS management process in place? Here are eight things any company can do to improve its overall DNS quality to keep applications operational and healthy.
Step 1. Manage DNS configuration using revision control
This is the simplest and most basic thing you can do to improve the quality of your DNS infrastructure. At the core, DNS configurations are simply flat text files.
Many DNS providers give you a front-end control panel to these configuration files in order to let you make changes more easily. However, they also obscure the impact of the changes you are making. Don’t use these control panels! Instead, manage your configuration files using the standard flat text file format.
Once you have moved to the flat file format, you can easily manage these configuration files using the same revision control program you use for managing your application source code. For most companies, this is some variation of Git.
You undoubtedly have processes in place today in your company for managing your source code, so use the same or similar process for managing your DNS configuration files as well.
This simple change will allow many other process improvements to come naturally, such as configuration reviews, approval workflows, and the ability to track when specific changes were made that may have impacted your application. This is an essential base necessary to keep your DNS service operating and error-free.
Step 2. Review all needed DNS changes
Once you are managing your changes using a revision control program, make sure all changes you make are reviewed and approved. This can be accomplished just like with your application source code, using?branches,?pull requests, and?merges.
Establish a process for approvals of all changes. Make sure at least one or more people review all changes before they are incorporated into your production configuration. This review process should include checks for syntax errors, incorrect DNS settings, and other potential problems. Problems with DNS configurations can be subtle, so a thorough and methodical review should be performed by a knowledgeable reviewer.
Step 3. Document the intent of all changes
Every change you make should be documented. If you are following the above steps, then this can be accomplished using the code check-in comment and pull request process.
Documenting DNS changes will help you later if a problem exists or an incompatible change is proposed. Understanding why a previous change was made will help you repair future problems and help you understand why a specific change may or may not be appropriate.
Step 4. Automate the configuration deployment process
Once you have the process in place to manage the configuration files, establish a process to automate the deployment of configuration file updates to your production DNS. By automating this process, you reduce the likelihood that an incorrect change will be pushed to production, or that a simple human error will cause your DNS to fail or produce bad results.
If you find yourself copying and pasting changes from one configuration file to another during a deployment process, you will be much more likely to make a mistake and introduce a bug into the DNS. Automatically deploying changes will ensure that changes are applied in a consistent and reliable manner.
Your automated deployment system should include an automated rollback mechanism. This may be a natural extension of your revision control process, or a separate deployment rollback process. But being able to quickly and effectively undo a change may mean the difference between a mistake causing a small inconvenience or a massive outage.
Step 5. Grow into a more sophisticated change management system
As your DNS grows in complexity, you may want to consider putting an entire change management system on top of the simple version control system you’ve already established. Full-blown change managment would involve using change request forms, requests for authorization, multi-team sign-offs, and other such procedures.
These changes may seem onerous, but DNS configuration is not a place for slacking off on process. A simple DNS change can impact many teams within your organization. Soliciting those teams’ input before the change is made—or even before the change proposal is accepted—can save you many headaches later on.
The size and complexity of your change management system will naturally depend on the size and complexity of your organization and other software management processes.
领英推荐
Step 6. Use an independent DNS provider
A high-quality DNS requires more than configuration management. It requires a high-quality operational environment as well.
Many of your existing service providers might provide DNS services that you can easily leverage. In particular, the leading cloud providers provide high-quality DNS services.
However, be careful using a DNS service that is provided by a company that provides you with other services, including other cloud services.
During a service outage, the most critical tool that must operate normally is your DNS. You need DNS to help you diagnose and repair most other outages. If your DNS is?also?down, the length of your outage will extend significantly.
The reverse is also true: If you are dealing with a DNS issue, the last thing you need is an outage caused by another service in your application ecosystem.
Avoid these problems by using a high-quality DNS provider that provides only DNS services to you and nothing else. This allows you to isolate your DNS (and any problems with your DNS) from any other service in your application, reducing the likelihood of a DNS-related extended outage.
Make sure the provider you select isn’t dependent on service providers, such as cloud providers, that you are also already relying on! If AWS has an outage, you want your independent DNS provider to keep operating. That doesn’t happen if your DNS provider also depends on AWS.
Some organizations run their own DNS. If you decide to run your own DNS, make sure you operate it using resources that are independent of the rest of your application. This means operating DNS in different data centers, availability zones, and even cloud regions than the rest of your application.
Step 7. Separate internal and external DNS
Let’s take that last point one step further. You have DNS needs that are internal to your company and external DNS needs that your customers depend on. Your internal DNS provides access to internal documentation, internal systems including email and communications tools, and other internal processes and systems. Your external DNS provides access to your company’s applications, products, and services to your customers.
Make sure these two DNS needs are handled by different providers. If your external DNS goes down, fixing that problem will be substantially harder if your internal DNS is also down. This is one reason Meta took so long to fix its application when Facebook?went down in October 2021.
And conversely, if your internal DNS goes down, you don’t want that problem to trickle out to your external customers.
Using different providers along with different DNS configurations and configuration processes is critical to avoiding these sorts of problems.
Step 8. Duplicate your DNS in another provider
Going one more step, set up your production DNS using two different providers. Use one as a primary provider, and the second as a backup provider. This way, if your primary DNS provider should go down, you may be able to switch your production DNS over to your backup provider quickly.
The backup provider should have a complete, operational, and fully tested copy of your DNS configuration set up and operating, so it can be put into play immediately if needed. This process will be easier if you have implemented the automated deployment process recommended above. This automated process can help ensure that your changes are kept in sync between your primary and backup providers.
DNS is a critical system that should be designed for?high availability and reliability from the start. You also need to think about security when designing your DNS infrastructure. Make sure you have redundant systems in place, and that access to your DNS is tightly controlled.
Finally, monitoring DNS is critical to ensuring your system continues to run smoothly. You need tools that will alert you if problems occur, so you can take steps to mitigate the impact as quickly as possible.
DNS outages are a common occurrence, but they don’t have to bring your entire company to a standstill. By using the proper processes and tools, you can minimize the impact of any outages and keep your business running smoothly.
Want more from Lee?
If you are interested in getting more great content from Lee Atchison, sign up for his?Software Architecture Insights?newsletter. Sign up and you’ll be entered into a contest to win a free,?signed?copy of one of Lee’s O’Reilly Media books, such as?Architecting for Scale, or?Overcoming IT Complexity.