select id query causing performance issue

  • 31Views
  • Last Post 10 October 2019
0
votes
API User Reporting posted this 02 October 2019

I've noticed substantial performance issues around some of our larger tables that I believe may be coming from Skyvia.  There are queries of the form:

select id from <tableName> where LastModifiedDate < @0

This  could return multiple millions of records, which doesn't seem ideal.  Is this part of your approach to match records for replication?  If so, I might suggest refining the algorithm.  If not, please let me know so I can see if this is coming from a different source.

 

Happy to discuss, show examples if this would be helpful.

 

Thanks,

Mort

 

Order By: Standard | Newest | Votes
0
votes
Mariia Zaharova posted this 02 October 2019

Hello Mort,

 

We have contacted you by email.

 

Best regards,

Mariia

0
votes
API User Reporting posted this 09 October 2019

I was hoping to get a response to my latest email.  I am repeating here to ensure it doesn't get lost.

 

Mariia:

 

Again, thank you for the timely response.  Our package Replication - Azure QDB01 - Temporal Store - Salesforce Objects – 03, includes the replication of the Salesforce CASE SObject to an Azure SQL store.  We currently have 1.7 million records in the SQL database.    The query, “select Id from [Case] where SystemModStamp < '2019-10-10 15:00:11'” in fact will return 1,701,489 IDs which are each char(18).  Let’s assume that in a single replication period there is 1 CASE record changed in SFDC.  If I’m following your description, then I think you:

 

  1. Go to the SFDC source and request all records that have been changed since last sync date/time, returning 1 record.
  2. Go to the Azure SQL target and pull 1.7 million ids
  3. Look for the 1 required ID among the 1.7 million to determine whether you need to execute an insert vs. update
  4. Apply the update transaction against Azure SQL.

 

I agree this will work, but it will use a lot of CPU, IO, and network resources.  The alternative is to restrict the query to the ids required.  This is more complex, but has significant performance advantages.  In fact one approach is to load the changes to a staging table and then do a set level insert / update.  I’m happy to discuss further.  I also understand that you are working within the limitations of your product architecture and this may just not be a sweet spot for the product.  I would be interested to hear of any testing that mimics the example I provided, maybe increasing the target table size by a couple orders of magnitude to really prove the point.  If you have 1 record change in the source and the target holds a 100 million records, then could your current algorithm work?

 

Thanks,

Mort

0
votes
Mariia Zaharova posted this 10 October 2019

Hello Mort,

 

We are sorry for the delay. Your request is now being reviewed by our project manager and developers. As soon as I have any results, I will contact you by email.

 

 

Best regards,

Mariia

Close