I was hoping to get a response to my latest email. I am repeating here to ensure it doesn't get lost.
Again, thank you for the timely response. Our package Replication - Azure QDB01 - Temporal Store - Salesforce Objects – 03, includes the replication of the Salesforce CASE SObject to an Azure SQL store. We currently have 1.7 million records in the SQL database. The query, “select Id from [Case] where SystemModStamp < '2019-10-10 15:00:11'” in fact will return 1,701,489 IDs which are each char(18). Let’s assume that in a single replication period there is 1 CASE record changed in SFDC. If I’m following your description, then I think you:
- Go to the SFDC source and request all records that have been changed since last sync date/time, returning 1 record.
- Go to the Azure SQL target and pull 1.7 million ids
- Look for the 1 required ID among the 1.7 million to determine whether you need to execute an insert vs. update
- Apply the update transaction against Azure SQL.
I agree this will work, but it will use a lot of CPU, IO, and network resources. The alternative is to restrict the query to the ids required. This is more complex, but has significant performance advantages. In fact one approach is to load the changes to a staging table and then do a set level insert / update. I’m happy to discuss further. I also understand that you are working within the limitations of your product architecture and this may just not be a sweet spot for the product. I would be interested to hear of any testing that mimics the example I provided, maybe increasing the target table size by a couple orders of magnitude to really prove the point. If you have 1 record change in the source and the target holds a 100 million records, then could your current algorithm work?