DynamoDB Parallel Scans
The typical DynamoDB scan operation processes data sequentially. For large tables, this can cause the scan operation to take a long time to complete. AWS DynamoDB provides a parallel scan operation that allows for multiple threads or workers to scan different sections or segments of a table simultaneously. This can improve scan performance.
The two key inputs to the parallel scan process are the number of threads or workers to use, and how much data each
worker should return. RazorSQL supports DynamoDB parallel scans using an SQL hint that is included with the select
query against the DynamoDB table. Below is an example. This page also has more information on the syntax:
https://razorsql.com/docs/dynamodb_sql_support.html
SQL Select Example:
select /*parallel:10:1000*/ name, id from employee where salary > 30000;
In the above query, the number of threads / workers to use for the scan is 10. The maximum number of rows to return for each thread/worker is 1000. When RazorSQL executes the above query, 10 different threads will scan the DynamoDB table simultaneously. Each thread will retrieve up to 1000 rows. Once all threads have retrieved their maximum number of rows or reached the end of their segment, the results will be returned to the user in the RazorSQL query results section.
As with all operations on the AWS platform, especially for databases like DynamoDB and SimpleDB, care should be taken with regards to how much provisioned throughput is made available for a table. Scans in general and parallel scans can max out provisioned throughput. If the provisioned throughput is too low, this can result in performance issues. If the provisioned throughput is too high, this can result in unexpected monetary charges from AWS.