Turbo-Charging Replication Re-initializations

Turbo-Charging Replication Re-initializations

I’ve been working with and supporting SQL Server Replication since SQL 2000. It is not a perfect technology, but it does work well when used for its intended purpose: distributed data processing.

Years ago, that meant a central office and remote offices. Or field reps with limited connectivity that used Merge replication to synch up the data at night.

The most common use case I see now is a Publisher for transactional activity and a Subscriber for reporting and analytics. Replication is NOT a HA or DR solution.

Re-initializing replication can be very time consuming

The standard way to start or re-initialize replication is by generating a new snapshot of the publication and letting replication agents pus the data out to the subscriber(s). Depending on the size of the database and other factors, this can take minutes, hours or days.

Here is how I turned a 2-day replication re-initialization for client into a 1-hour process:

1. Optimize the SQL Instance:

We found that the Publisher and Subscriber were both running Standard Edition SQL Server, no patches, default installation settings. We made the basic changes for Max Server Memory, Cost Threshold and MAXDOP on both instances.

2. Optimize tempdb:

TempDB was split into 8 files, thanks to a current SQL Server version, but it was on the same drive as the database data files. We moved them to a new, fast, dedicated drive.

3. Analyze the published database:

The database came from a 3rd party vendor and has hundreds of indexes that were unused or very lightly used. These indexes were being replicated, which happens after the Snapshot is delivered and clustered indexes are created. In some cases, this did not complete for almost 48 hours.

4. Purge unnecessary data:

One table had data in it going back over 5 years, when only 90 days was actually useful to the business. There were also many indexes on this table.

5. Replication Agent Profile settings:

We created a custom profile for the Distribution agent, which involved repeated tweaking of the settings for batches and commands. We also learned that in this case, the SubscriptionStreams parameter would sometimes deadlock on itself/

6: The big winner! Backup and Restore instead of Snapshot:

Even with all of the tweaks, we were still seeing up to 24 hours to re-init. Trying the Backup and Restore option became our last hope. We had never been able to get this to work properly before, so a lot of testing and googling was involved until we found the right combination of parameters to make it work.

The result:

A process that used to paralyze reporting operations for 48 hours now completes in about 60 minutes.

For one client in the car insurance industry, this improvement allowed them to maintain 24/7 operations even during necessary re-initializations.

Bottom Line:

This isn't just about saving time. It's about business continuity.?

With faster re-initialization, our clients can maintain data consistency and availability, crucial for industries like banking and insurance where every minute of downtime can cost thousands.


Sweet Deals for our Newsletter Subscribers:

Pocket DBA ? - first month free for new clients

Fractional DBA – first month 50% off for new clients

We are primarily a SQL Server shop, but we do have a Workday consultant available for up to 10 hours per week!


Recent posts, interesting stuff and SQL tidBITs are in the full version of the newsletter here: https://accidental-dba.beehiiv.com/subscribe


Please share with your favorite Barista. Re-initializing replication at 3am pays better than making coffee!


Siraj Jamdar

Cloud Database SME at Grays

1 个月

Thanks for sharing some cool insights. I agree that it takes a long time for the snapshot creation and subsequent index creation on the subscriber. (Non-primary key Indexes on the subscriber are different from the Publisher). In my case, I had CI/CD pipelines to reinitialize replication from scratch with standard settings, but the snapshot process had to be monitored carefully. That meant a whole weekend of re-running pipelines if a snapshot failed on any publisher. After I implemented a custom archiving solution, that currently has a retention of 7 years and a separate purging solution with variable retention, the size of the data set has come down to a more manageable level. Now it takes 3-4 hours for the snapshot. Can't remember how long it takes to create the non-primary key indexes but it's no more than a couple of hours.

回复
Randall Brian Chesley

Director of Information Technology at Providence

1 个月

Business continuity is another use especially with ESX to move the servers in the background

Joe Fleming

Your SQL Server Lifeline: Immediate Fixes, Ongoing Optimization | Save Costs & Boost Efficiency | Custom Packages | Veteran Expertise

1 个月

Some great tips here, just be aware of the drawback of using backups to sync: If you ever want to add another article, you will have to either do a full re-sync or manually sync the article, which can be tricky in a highly concurrent environment. "But why not just build a new publication for that article?" Great question, that can be an option, but if that article has relationships to tables in the original publication, things can get hairy.

all of the items you #'s were known to me less one - that being #5 ( have had my hands in replication off and on since SQL 2000 as well )

Brian Rogers

Building high performing teams and platforms -> unconventional problem solver transforming teams and companies

1 个月

and distributing snapshots really do work well to jumpstart a target. The initial first sync correlating change tracking can touch a lot of data though.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了