How to Monitor Database Availability Groups?

How to Monitor Database Availability Groups?

Database Availability Group (DAG) is the core component of the Microsoft Exchange Mailbox Server that provides high availability and site resilience with continuous replication and failover clustering. A DAG can have up to 16 Exchange Servers in a cluster to host a set of database copies and provide automatic database-level recovery when the disaster strikes.

Although DAG member servers continuously monitor each other for database, disk, server, or network failure, it is also critical for the administrators to keep an active eye on the database copies and member servers' health.

Importance of Monitoring Database Availability Groups (DAG)

Monitoring the DAG member server for replication health, database copies, low disk space, etc., is important to ensure DAG continues to work, performs failover, and activates the passive mailbox copy in the event of server or database failure without any issues.

Failing to monitor the DAG member servers can lead to the following failures and prevent DAG from providing automatic database recovery that can disrupt the services and cause downtime.

  • Witness Server Failure
  • MAPI Network Failure
  • Virtual Directory Failure
  • Replication Failure

Steps to Monitor Database Availability Group (DAG)

Use CollectOverMetrics.ps1 PowerShell script to read DAG member event logs and gather the information related to database operations.

Below, we have discussed steps to check and monitor Database Availability Group (DAG) health to ensure high availability, site resilience, and avoid DAG failure.

Step 1: Assign Required Roles and Permissions

You must assign the following roles required to run the PowerShell cmdlets for monitoring the DAG status.

  • View-only Configuration
  • Monitoring

To assign the roles required, use New-ManagementRoleAssignment cmdlet in EMS:

New-ManagementRoleAssignment –role <role-name> -user <username>        

For example,

New-ManagementRoleAssignment –role "View-only Configuration" -user "administrator"        

Once the roles are assigned, you can use the PowerShell cmdlets discussed in the next step to monitor the Database Availability Group status.

Step 2: Check DAG Status

To monitor Database Availability Group status, you can use PowerShell cmdlets, such as Get-MailboxDatabaseCopyStatus and Test-ReplicationHealth in Exchange Management Shell (EMS).

These cmdlets help you monitor database copy status for a particular database or all database copies on a specific server.

Use Get-MailboxDatabaseCopyStatus Cmdlet for Monitoring DAG Database Copy Status        

  • To check the database copy status of the particular database on all servers, run the following command in EMS:

Get-MailboxDatabaseCopyStatus -Identity MBXDB01 | Format-List        

  • To check the database copy status of a particular DAG Exchange Server, run the following command:

Get-MailboxDatabaseCopyStatus -Server EXCH01 | Format-List        

  • To check the status of all database copies on a local server, use this command:

Get-MailboxDatabaseCopyStatus -Local | Format-List        

  • To check the database copy status on all member servers in DAG at once, execute the following command:

(Get-DatabaseAvailabilityGroup) | ForEach {$_.Servers | ForEach {Get-MailboxDatabaseCopyStatus -Server $_}}        

Use Test-ReplicationHealth Cmdlet for Monitoring DAG Continuous Replication Health Status

Test-ReplicationHealth helps administrators monitor the continuous replication health and replay status of all DAG member servers. It also helps them perform other tests to monitor the quorum, cluster service, and network components' health status.

  • To test the replication health of a member DAG Server EXCH01, run the following command in the EMS.

Test-ReplicationHealth -Identity EXCH01        

If everything in your DAG environment is working, it should display results as Passed.

  • To check all the member servers' replication health in DAG at once, execute the following command:

(Get-DatabaseAvailabilityGroup) | ForEach {$_.Servers | ForEach {Test-ReplicationHealth -Server $_}}        

Step 3: Customize Low Disk Space Threshold

Starting from Exchange 2013 SP1, only the volumes storing the database and logs are monitored by the DAG. By default, the low disk space volume monitor threshold in Exchange Server is set to 180 GB. However, you can increase or decrease the threshold value as per your organizations' needs to monitor the disk space usage by adding the DWORD registry value in the Exchange Server Registry key.

The steps are as follows:

  • Press Windows + R, type regedit, and click OK.
  • Navigate to HKEY_LOCAL_MACHINE\Software\Microsoft\ExchangeServer\v15\Replay\Parameters
  • Create and then double-click on SpaceMonitorLowSpaceThresholdInMB.
  • Add value in MB. For 100 GB, add 100000.
  • Click OK and then restart the Microsoft Exchange DAG Management service.

Step 4: Use CollectOverMetrics.ps1 Script

CollectOverMetrics.ps1 is a PowerShell script that you can use to collect the metrics for databases in DAG. The script is located in the Scripts folder.

The script reads the DAG member servers' event logs to gather the information related to database operations, such as database failovers, mounts, and moves. It stores the information in a CSV file displaying one operation per row. A separate CSV file is created for each DAG member.

  • To use the script, follow these instructions:
  • Navigate to the Scripts folder location in File Explorer.
  • Hold the Shift key and right-click on the empty area.
  • Choose the Open PowerShell window here.
  • Execute the following command to generate an HTML report containing all the information related to your DAG.

.\CollectOverMetrics.ps1 -DatabaseAvailabilityGroup DAG1        

  • You may also generate CSVs for specific databases by running the following command:

CollectOverMetrics.ps1 -SummariseCsvFiles (dir *.csv) -Database MailboxDatabase123,MailboxDatabase456        

In case of error, such as script is not digitally signed, you can temporarily bypass the execution policy by executing the following command in the PowerShell:

Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass        

Further, you may also download and run the Get-DAGHealth.ps1 script (for Exchange 2010/2013 DAG only). This is a script created by MVPs that runs a series of health checks on the DAG and generates a more detailed report. Use the following command to execute the script and collect data.

.\Get-DAGHealth.ps1 -Detailed        

Step 5: Analyze Crimson Channel Log Events

Exchange Server stores the log events in Crimson Channels located under Applications and Services logs.

You should also look into the Crimson Channel logs to check and monitor Microsoft Exchange Replication Services status, such as Active Manager, Volume Shadow Copy Service (VSS) writer, TCP listener, etc.

The steps are as follows:

  • Open Event Viewer.
  • Navigate to Applications and Services Logs > Microsoft > Exchange.
  • There are two crimson channels you need to look for:

o??High Availability: It contains the events related to Microsoft Exchange Replication Service and its components startup and shutdown information. You can also fetch info on the database mount operation and log truncation associated with DAG.

o??MailboxDatabaseFailureItems: This crimson channel contains the log events with failures that impact a database replica.

To Wrap Up

By following the steps discussed in this article, you can efficiently monitor your Database Availability Group (DAG) and keep a check on the member servers, database copies, storage space, and critical Exchange services. However, you should always maintain a regular VSS backup, even after deploying DAG. DAG is not an alternative to backup as it only provides recovery against database-level failure. If a disaster strikes, you can use the backup to restore mailboxes to a new server. You may also use Stellar Repair for Exchange to recover mailboxes if the DAG member server crashes or DAG stops working due to some critical failure and backup isn't available or obsolete. It can help you repair the database, recover mailboxes, and restore them to a live Exchange Server or Office 365 tenant.

要查看或添加评论,请登录

Stellar Information Technology Pvt. Ltd.的更多文章

社区洞察

其他会员也浏览了