Thursday, December 29, 2016

Exception

[19:00:47] DEBUG impl.execchain.MainClientExec  - Connection can be kept alive for 60000 MILLISECONDS
[19:00:47] DEBUG impl.conn.PoolingHttpClientConnectionManager  - Connection [id: 0][route: {s}->https://dynamodb.ap-northeast-1.amazonaws.com:443] can be kept alive for 60.0 seconds
[19:00:47] DEBUG impl.conn.PoolingHttpClientConnectionManager  - Connection released: [id: 0][route: {s}->https://dynamodb.ap-northeast-1.amazonaws.com:443][total kept alive: 1; route allocated: 1 of 50; total allocated: 1 of 50]
Exception in thread "main" java.lang.UnsupportedOperationException: ListIterators are not supported for this list
    at com.amazonaws.services.dynamodbv2.datamodeling.PaginatedList.listIterator(PaginatedList.java:459)
    at java.util.List.sort(List.java:479)
    at java.util.Collections.sort(Collections.java:175)
    at me.otta.backend.repository.datastore.LocationHistoryRepository.getLocationHistoryData(LocationHistoryRepository.java:124)
    at me.otta.backend.main.Application.main(Application.java:185)

DynamoDB indexing

What about indexes? Well in a RDBMS, any field or combo of fields you’ll be querying – you want to index (though this depends on the RDBMS you use, and how exactly you want to index). In Dynamo, there are a number of limitations on “local secondary indexes” (LSI). First and foremost is that you can only have LSIs if you have a Hash and Range primary key. The second limit is that LSIs are single attribute only (DynamoDB calls them “attributes”, RDBMS would call them a column). Third, you can only have 5 LSIs per table. Lastly, you cannot add/modify/remove LSI after you’ve created the table. Since this is NoSQL, all attributes beyond your primary key are flexible – but if you want to use LSI, plan carefully.

Resource Link:

Wrapping my brain around DynamoDB

Friday, December 23, 2016

DynamoDB pros and cons


DynamoDB is awesome, but…




So. A few friends and I got together a while ago to build an app for the newly launched APIs for Mxit. We didn’t plan on making it big, just having some fun. So we also decided to learn new stuff across the board. I’ve worked with Heroku in the past, but never EC2. So we got a completely new stack going: EC2, Node.js and DynamoDB.
The first thing we are running into that isn’t panning out as we had hoped is DynamoDB.
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.
You pay for throughput. Amazon does the rest. They replicate the data, managed it, and let it scale automatically. If you need more throughput (reads/writes per second) you simply dial it up. This itself is quite awesome and clever.

However we have run into some problems that are a pain to iron out. Hopefully by telling someone about it, it can be avoided.

There are 4 ways to fetch items from DynamoDB: Get, BatchGet, Query and Scan.
  • For Get and Query, you can only use the hash key (think of primary key in sql dbs) and the range key. 
  • If you want to fetch items based on any other values, you have to use a scan and add your own conditions (here you can now use anything, not just the hash and range key). The problem is: Scan is expensive. It reads the entire table and fetches what is relevant.

Cool. For now, we aren’t worried that much about performance. We are just playing around after all. WRONG. You can’t sort on Scans! FFFuuu.

You can only fetched sorted results based on the range key you specified in the table (ie one column) with a Query call. Not really knowing we had to do that (our own stupidity) means not one of our tables have a range key. You can’t retroactively add one. The only way to assign a new range key is to create a new table, port over all the data. Now wanting to fetch any type of sorted data, we have to put aside a whole afternoon to code in the table migration, etc. Not ideal.

Chris Moyer has a solution to this. Create a new table with new hash and range key referencing the hashes of the parent tables. Now you can ‘sort’ on any column. The problem is, you have to open up new tables, provisioning throughput to them too, costing more money.

We have to do something about it soon, because we have anyway been hitting our throttledrequests (probably due to all the scans).




So, we have to decide now. Are we sticking with dynamodb, or moving to our own mongodb cluster? I think for our needs right now, dynamodb is causing more headaches than not. We haven’t hit mass scale yet where dynamodb’s provisioning model will help us, and we don’t want to be restricted by how dynamodb handles the data.

So if you want to use dynamodb, remember that!

Resource Link: http://simondlr.com/post/26360955465/dynamodb-is-awesome-but

DynamoDB Query Filtering

Query Filtering

DynamoDB’s Query function retrieves items using a primary key or an index key from a Local or Global Secondary Index. Each query can use Boolean comparison operators to control which items will be returned.
With today’s release, we are extending this model with support for query filtering on non-key attributes. You can now include a QueryFilter as part of a call to the Query function. The filter is applied after the key-based retrieval and before the results are returned to you. Filtering in this manner can reduce the amount of data returned to your application while also simplifying and streamlining your code.
The QueryFilter that you pass to the Query API must include one or more conditions. Each condition references an attribute name and includes one or more attribute values, along with a comparison operator. In addition to the usual Boolean comparison operators, you can also use CONTAINS, NOT_CONTAINS, and BEGINS_WITH for string matching, BETWEEN for range checking,  and IN to check for membership in a set.
In addition to the QueryFilter, you can also supply a ConditionalOperator. This logical operator (either AND or OR) is used to connect each of the elements in the QueryFilter.

Resource Link:

https://aws.amazon.com/blogs/aws/improved-queries-and-updates-for-dynamodb/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+AmazonWebServicesBlog+%28Amazon+Web+Services+Blog%29

Read it

Thursday, December 22, 2016

40 Interview Questions You Should Be Prepared to Ask & Answer

When query is better than scan in aws dynamodb?

Last week, Amazon announced the launch of a new product, DynamoDB. Within the same day, Mitch Garnaat quickly released support for DynamoDB in Boto. I quickly worked with Mitch to add on some additional features, and work out some of the more interesting quirks that DynamoDB has, such as the provisioned throughput, and what exactly it means to read and write to the database.

One very interesting and confusing part that I discovered was how Amazon actually measures this provisioned throughput. When creating a table (or at any time in the future), you set up a provisioned amount of "Read" and "Write" units individually. At a minimum, you must have at least 5 Read and 5 Write units partitioned. What isn't as clear, however, is that read and write units are measured in terms of 1KB operations. That is, if you're reading a single value that's 5KB, that counts as 5 Read units (same with Write). If you choose to operate in eventually consistent mode, you're charged for half of a read or write operation, so you can essentially get double your provisioned throughput if you're willing to put up with only eventually consistent operations.

Ok, so read operations are essentially just look-up operations. This is a database after all, so we're probably not just going to be looking at looking up items we know, right?

Wrong.

Amazon does offer a "Scan" operation, but they state that it is very "expensive". This isn't just in terms of speed, but also in terms of partitioned throughput. A scan operation iterates over every item in the table, It then filters out the returned results, based on some very crude filtering options which are not full SQL-like, (nothing close to what SDB or any relational database offers). What's worse, a single Scan operation can operate on up to 1MB of data at a time. Since Scan operates only in eventually consistent mode, that means it will use up to 500 Read units in a single operation (1,000KB items/2 (eventually consistent) = 500). If you have 5 provisioned Read units per second, that means you're going to have to wait 100 seconds (almost 2 minutes) before you can perform another Read operation of any sort again.

So, if you have 1 Million 1KB records in your Table, that's approximately 1,000 Scan operations to perform. Assuming you provisioned 1,000 Read operations per second, that's roughly 17 minutes to iterate through the entire database. Now yes, you could easily increase your read operations to cut that time down significantly, but lets assume that at a minimum it takes at least 10ms for a single scan operation. That still means the fastest you could get through your meager 1 Million records is 10 seconds. Now extend that out to a billion records. Scan just isn't effective.


So what's the alternative? Well there's this other very obscure ability that DynamoDB has, you may set your Primary Key to a Hash and Range key. You always need to provide your Hash Key, but you may also provide the Range Key as either Greater then, Less then, Equal To,  Greater then or equal to, Less then or equal to, Between, or Starts With using the Query operation.

Unlike Scan, Query only operates on matching records, not all records. This means that you only pay for the throughput of the items that match, not for everything scanned.

So how do you effectively use this operation? Simply put, you have to build your own special indexes. This lends itself to the concept of "Ghost Records", which simply point back to the original record, letting you keep a separate index of the original for specific attributes. Lets assume we're dealing with a record representing a Person. This Person may have several things that identify it, but lets use a unique identifier as their Hash key, with no Rage key. Then we'll create several separate Ghost records, in a different table. Lets call this table "PersonIndex".


Now if we want to search for someone by their First Name, we simply issue a query with a Hash Key of property = "First Name", and a range Key of the first name we're looking for, or even "Starts With" to match things like "Sam" to match "Samuel". We can also insert "alias" records, for things like "Dick" to match "Richard". Once we retrieve the Index Record, we can use the "Stories" property to go back and retrieve the Person records.

So now to search for a record it takes us  Read operation to search, and 1 Read operation for each matching record, which is a heck of a lot cheaper then one million! The only negative is that you also have to maintain this secondary table of Indexes. Keeping these indexes up to date is the hardest part of maintaining your own separate indexes. however, if you can do this, you can search and return records within milliseconds instead of seconds, or even minutes.


How are you using or planning to use Amazon DynamoDB?


Resource Link: http://blog.coredumped.org/2012/01/amazon-dynamodb.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+ChrisMoyer+%28Chris+Moyer%29

  1. http://dynamo.jcabi.com/example-query-scan.html

Wednesday, December 21, 2016

Faisal Vai's Blog

Scan Vs Parallel Scan in AWS DynamoDB and Avoid Sudden Bursts of Read Activity

Scan Vs Parallel Scan in AWS DynamoDB and Avoid Sudden Bursts of Read Activity

1. How scan works in AWS DynamoDB?
Ans:
i) Scan operation returns one or more items.
ii) By default, Scan operations proceed sequentially.
iii) By default, Scan uses eventually consistent reads when accessing the data in a table.
iv) If the total number of scanned items exceeds the maximum data set size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation.
v) A Scan operation performs eventually consistent reads by default, and it can return up to 1 MB (one page) of data. Therefore, a single Scan request can consume
(1 MB page size / 4 KB item size) / 2 (eventually consistent reads) = 128 read operations.
2. How parallel scan works in AWS DynamoDB?
Ans:
i) For faster performance on a large table or secondary index, applications can request a parallel Scan operation.
ii) You can run multiple worker threads or processes in parallel. Each worker will be able to scan a separate segment of a table concurently with the other workers. DynamoDB’s Scan function now accepts two additional parameters:
  • TotalSegments denotes the number of workers that will access the table concurrently.
  • Segment denotes the segment of table to be accessed by the calling worker.
iii) The two parameters, when used together, limit the scan to a particular block of items in the table. You can also use the existing Limit parameter to control how much data is returned by an individual Scan request.
3. Scan vs Parallel Scan in AWS DyanmoDB?
Ans:
i) A Scan operation can only read one partition at a time. So parallel scan is needed for faster read on multiple partition at a time.
ii) A sequential Scan might not always be able to fully utilize the provisioned read throughput capacity. So parallel scan is needed there.
iii) Parallel Scans, 4x Cheaper Reads.
4. When Parallel Scan will be preferred?
Ans:
A parallel scan can be the right choice if the following conditions are met:
  • The table size is 20 GB or larger.
  • The table's provisioned read throughput is not being fully utilized.
  • Sequential Scan operations are too slow.
5. Is filter expression is applied before scan?
Ans: No, A FilterExpression is applied after the items have already been read; the process of filtering does not consume any additional read capacity units.

1. http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html
2. http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#QueryAndScanParallelScan
3. https://aws.amazon.com/blogs/aws/amazon-dynamodb-parallel-scans-and-other-good-news/
4. http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScanGuidelines.html#QueryAndScanGuidelines.BurstsOfActivity
5. http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ScanJavaDocumentAPI.html#DocumentAPIJavaParallelScanExample

Another blog link
  1. https://egkatzioura.wordpress.com/2016/10/03/scan-dynamodb-items-with-dynamodbmapper/

Query and Scan: Find replies to a forum thread posted in the last 15 days using DynamoDBMapper

wire-level logging turned on

Monday, December 19, 2016

Is there a way to collapse all code blocks in Eclipse?

There is a hotkey, mapped by default to Ctrl+Shift+NUM_KEYPAD_DIVIDE.
You can change it to something else via Window -> Preferences, search for "Keys", then for "Collapse All".
To open all code blocks the shortcut is Ctrl+Shift+NUM_KEYPAD_MULTIPLY.
In the Eclipse extension PyDev, close all code blocks is Ctrl + 9
To open all blocks, is Ctrl + 0


Resource Link: http://stackoverflow.com/questions/1726525/is-there-a-way-to-collapse-all-code-blocks-in-eclipse

HTTP Status 405 - Request method 'POST' not supported (Spring MVC)

Check if you are returning a @ResponseBody or a @ResponseStatus
I had a similar problem. My Controller looked like that:
@RequestMapping(value="/user", method = RequestMethod.POST)
public String updateUser(@RequestBody User user){
    return userService.updateUser(user).getId();
}
When calling with a POST request I always got the following error:
HTTP Status 405 - Request method 'POST' not supported
After a while, I figured out that the method was actually called, but because there is no @ResponseBody and no @ResponseStatus Spring MVC raises the error.
To fix this simply add a @ResponseBody
@RequestMapping(value="/user", method = RequestMethod.POST)
public @ResponseBody String updateUser(@RequestBody User user){
    return userService.updateUser(user).getId();
}
or a @ResponseStatus to your method.
@RequestMapping(value="/user", method = RequestMethod.POST)
@ResponseStatus(value=HttpStatus.OK)
public String updateUser(@RequestBody User user){
    return userService.updateUser(user).getId();
}

Resource Link: http://stackoverflow.com/questions/11145884/http-status-405-request-method-post-not-supported-spring-mvc 

Friday, December 2, 2016

Amazon S3 - How To Create User And Grant Permissions To Access And Manage Account Files

AWS Policy Generator

Design Pattern: great tutorial for beginners...

Best Tutorial: Hello World Web Service Example

SOAP: Generating Stub Code for a Java Client

Generating Stub Code for a Java Client

Use the following steps to generate the stub code for a Java web services client application, using the wsimport tool that is included with the Java Development Kit.
  1. Open an xterm or command prompt window and change to the directory that contains the wsimport program, usually JDK_install_dir/bin/.
  2. At the command prompt, type wsimport -s source_dir -d classes_dir URL_TO_WSDL and press Enter.
    For example, to generate stub classes for the SAAS version of the Named Resource Service, you might enter the following command:
    wsimport -s output/source -d output/classes http://dev.pb.com/NamedResourceService/services/NamedResourceService?wsdl
After the command has been executed, the generated source .java files are placed within the directory you specified with the -s option, and the compiled .class files are placed within the directory you specified with the -d option.

AWS various access rule

  1. Bucket Object expires after 24 hours:

IAM Policy Warning

Please make sure that if you use this code you create an appropriate IAM account policy to prevent mis-use. Example, a policy like the following would only allow PUT access to the bucket for a specific IAM user. You could also set the bucket objects to automatically expire after 24 hours which would prevent people flooding your account.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt126637111000",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::your_bucket_name"
      ]
    }
  ]
}