Rizvi's Blog: Scan Vs Parallel Scan in AWS DynamoDB and Avoid Sudden Bursts of Read Activity

Wednesday, December 21, 2016

Scan Vs Parallel Scan in AWS DynamoDB and Avoid Sudden Bursts of Read Activity

1. How scan works in AWS DynamoDB?

Ans:
i) Scan operation returns one or more items.
ii) By default, Scan operations proceed sequentially.
iii) By default, Scan uses eventually consistent reads when accessing the data in a table.
iv) If the total number of scanned items exceeds the maximum data set size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation.
v) A Scan operation performs eventually consistent reads by default, and it can return up to 1 MB (one page) of data. Therefore, a single Scan request can consume

(1 MB page size / 4 KB item size) / 2 (eventually consistent reads) = 128 read operations.

2. How parallel scan works in AWS DynamoDB?

Ans:
i) For faster performance on a large table or secondary index, applications can request a parallel Scan operation.
ii) You can run multiple worker threads or processes in parallel. Each worker will be able to scan a separate segment of a table concurently with the other workers. DynamoDB’s Scan function now accepts two additional parameters:

TotalSegments denotes the number of workers that will access the table concurrently.
Segment denotes the segment of table to be accessed by the calling worker.

iii) The two parameters, when used together, limit the scan to a particular block of items in the table. You can also use the existing Limit parameter to control how much data is returned by an individual Scan request.

3. Scan vs Parallel Scan in AWS DyanmoDB?

Ans:
i) A Scan operation can only read one partition at a time. So parallel scan is needed for faster read on multiple partition at a time.
ii) A sequential Scan might not always be able to fully utilize the provisioned read throughput capacity. So parallel scan is needed there.
iii) Parallel Scans, 4x Cheaper Reads.

4. When Parallel Scan will be preferred?

Ans:
A parallel scan can be the right choice if the following conditions are met:

The table size is 20 GB or larger.
The table's provisioned read throughput is not being fully utilized.
Sequential Scan operations are too slow.

5. Is filter expression is applied before scan?

Ans: No, A FilterExpression is applied after the items have already been read; the process of filtering does not consume any additional read capacity units.

1. http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html
2. http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#QueryAndScanParallelScan
3. https://aws.amazon.com/blogs/aws/amazon-dynamodb-parallel-scans-and-other-good-news/
4. http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScanGuidelines.html#QueryAndScanGuidelines.BurstsOfActivity
5. http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ScanJavaDocumentAPI.html#DocumentAPIJavaParallelScanExample

Another blog link

https://egkatzioura.wordpress.com/2016/10/03/scan-dynamodb-items-with-dynamodbmapper/

Rizvi's Blog

Pages

Wednesday, December 21, 2016

Scan Vs Parallel Scan in AWS DynamoDB and Avoid Sudden Bursts of Read Activity

No comments:

Post a Comment