Slow SQL Server Replication while using SAN storage

Slow SQL Server Replication while using SAN storage

Thomas_Helm
Enthusiast Enthusiast
1,086 Views
6 Replies
Message 1 of 7

Slow SQL Server Replication while using SAN storage

Thomas_Helm
Enthusiast
Enthusiast

Hello,

 

we migrated from Vault 2014 to 2017 and moved to a new environment. For version 2014 we used SQL Server 2008 on a local hard disk RAID. With version 2017 we moved to a SQL 2014 SP2 cluster using SAN via fibre channel. The SQL replication has a delay of 3 hours between publisher and subscriber. In 2014 there was no delay with the same amount of users.

 

Asking the storage server team I got the answer, that our application send the requests with with a 256K size. The allocation of the SQL disk is formatted to 64K during initial setup of MS SQL, which is "normal" and recommended by Microsoft. From the storage monitor they saw, that with this size nothing can be stored in cache and a full disk read is needed all the time.

 

Does someone know if something changed between version 2014 and 2017. Can we do something on SQL site to set another parameter for replication. I tried already some changes in the merge agent profile but without any luck so far.

 

Cheers

Thomas Helm

TH
0 Likes
Accepted solutions (2)
1,087 Views
6 Replies
Replies (6)
Message 2 of 7

hans_martin_haga
Advocate
Advocate

The performance bottleneck is likely caused by missing optimization/tuning of the SQL server databases. SQL server tries to avoid disk reads at all costs, so if you see high levels of disc activity you should investigate why (using SQL performance monitor data). What is the SQL server cache hit ratio? Full table scans etc?

 

https://knowledge.autodesk.com/search-result/caas/CloudHelp/cloudhelp/2017/ENU/Vault-Admin/files/GUI...

 

Follow this guideline, it should help...

 

 

 

Message 3 of 7

Jason.Courtemanche
Collaborator
Collaborator

Hi @Thomas_Helm,

 

I'm just checking in to see if you need more help with this. Did the suggestion that @hans_martin_haga provided work for you?

If so, please click Accept as Solution on the posts that helped you so others in the community can find them easily.

 

Regards,

Jason

0 Likes
Message 4 of 7

Thomas_Helm
Enthusiast
Enthusiast

Hi Jason,

 

no, because the things described in the document are still implemented. We are using a storage which was setup with 64k. The SQL reads with 256k which results in a disk read as the storage cannot keep things in the cache. This was the answer from the storage team. Before we was using physical disks within the hardware itself and not faced any issues with the replication. My question is now, did we setup the SQL or storage in a wrong way or was there any changes in the replication setup of Vault.

TH
0 Likes
Message 5 of 7

Jason.Courtemanche
Collaborator
Collaborator

Hi @Thomas_Helm,

 

 

Thank you for the update. I am not aware of any changes to SQL Replication that would add 3 hours to the replication time. How did you test this did you check in one file or assembly at the subscriber and wait 3 hours for the data to be replicated to the publisher? 

 

As @hans_martin_haga  mentioned please double check the SQL Servers to make sure that the initial size and autogrowth for the .mdf and .ldf are set to the recommendations in the SQL Maintenance Plan step 9. Where you able to run SQL Performance Monitor?

 

What size is the database that is getting replicated? Are you also replicating read-only databases like the content center library (CCL)?

What is the spec of the servers so I have an idea of your environment? 

 

There are a lot of factors that affect performance, especially when looking at SQL Replication because of all the hardware and steps involved with replicating a SQL table and data from one server to another. Without a baseline, it's hard to know how different the new servers are compared to before upgrading. Also when upgrading the Vault Server from 2014 to 2017 you have to upgrade the Server hardware, OS, SQL Version. The easiest scenario for testing is just switching the local RIAD hard drive for a NAS and not make any other changes to the production server. This way only one thing has changed so you have a control and can run the same test twice. 

 

If the new hard drive was set up for 64K and the old hard drive RAID was set up as 64K both hard drives, SQL should have been writing to the disk in the same way. Do you know what tool the IT group used to capture the request size of 256K from SQL to the hard drive? 

 

Traditional RIAD systems have the same hard drive in the array so access times are constant. SAN can use a hybrid approach like SSD and HHD to move files from one hard drive type to the other depending on how frequent the access is. 

  What RAID level where you using with Vault 2014? 

  How is the SAN configured does it also use RAID or a hybrid approach?

  Is the SAN setup with 1024 KB starting offset, and a 64Kb block size?

  What is the SAN Cache set to?

 

A good read for anyone who wants to know more about Disk partitions alignment and how it can affect SQL servers.

Or if you want to check if your volume was created with Allocation unit size of 64K

 

Regards,

Jason

0 Likes
Message 6 of 7

Thomas_Helm
Enthusiast
Enthusiast
Accepted solution

Hi,

 

seems the issue and large delay is solved. Last weekend we had our monthly server patch day which reboot all server. Maybe some cache was overloaded after our migration and replication setup 4 weeks ago. Lets see if the performance goes down during the next days/weeks until our next patch day. Another point can be, that one of the two cluster nodes have an issue. before the shutdown, we used node 2 and now we are using node 1. I will check this with our IT and let you know what exactly causes the delay.

 

Cheers,

 

TH
0 Likes
Message 7 of 7

Jason.Courtemanche
Collaborator
Collaborator
Accepted solution

Hi @Thomas_Helm,

 

Thank you for the update. I am glad that installing windows updates and rebooting all of the servers resolved the slow SQL replication issue. 

That is interesting that there was an issue with the SQL Cluster on shutdown, please keep me update if the performance issue comes back. 

 

I will mark your post and mine as the solution. We can update this when there are more details.

 

Anyone looking for more information about configuring the Vault Server to use SQL Failover Cluster (WSFC).  

 

Regards,

Jason

 

 

If one or more of these posts helped answer your question, please click Accept as Solution on the posts that helped you so others in the community can find them easily.

0 Likes