Monday, August 10, 2015

Azure – Best practices, Learnings & Performance tips


Azure in many ways is simple to code and work with but sometimes it gets tricky. Remember when you are stuck with some problem and are searching all over the internet to find solution and wished there was a consolidated list where you can quickly read all the standards, learnings from other people, tips and find all the answers to queries ranging from best practices to performance improvements.

This document tries to put all this under one umbrella with work related to Azure table storage skipping the basic things like connections, coding, etc and it assumes that the reader has good understanding of Azure table storage. 

Best practices

a.      Create a table storage helper class for all the operations that can be performed on the Table storage (Insert/Update/Delete/Get/etc). All the classes in Data Access Layer should call these helper methods to perform any/all operations on Azure tables. This will help in maintaining code.

b.      Data comparisons in Azure are case sensitive, make sure that you address this in all the applicable places. The best/standard way of doing this is to use .ToLowerInvariant(), probably at both inserting & reading the entities from the table storage.

Learnings

a.      Azure table storage Inserts/Updates ie .Insert() or.InsertOrMerge() can sometimes return Bad Request without giving additional details. This exception being very generic, developers try to search on the internet for the solutions with less to no success. I have come up with most common mistakes which developers do which are responsible for these Bad Requests.

·     DateTime

Most common source for Bad Request. Never send DateTime.MinValue to Azure table. DateTime.MinValue is not within the supported DateTime range.

·     DataType

Second most common source. See that the data types of the values being sent to the Azure table are matched with Azure table storage’s data type.

b.      When using .ExecuteQuerySegmented() for scenario of fetching more than 1000 records or pagination, keep it in mind that the table service API will either return maximum of 1000 records or as much as entities as possible in 5 seconds. If the query takes more than this time for execution, it will return with empty entities and a continuation token.

Performance tips

a.      I am sure we all know that it is always faster & a good practice to fetch the data from table storage based on PartitionKey and RowKey but

-        Never combine two or more Partition Keys for fetching records in a query, this results into a full table scan 

Wrong way of doing entities fetch
TableQuery.CombineFilters(
                                TableQuery.GenerateFilterCondition(PartitionKey, QueryComparisons.Equal, value1),
TableOperators.And/Or,
                                TableQuery.GenerateFilterConditionForInt(PartitionKey, QueryComparisons.Equal, value2)
);

 In this scenario, spin up tasks for each PartitionKey filter condition, wait for the tasks to complete and perform operations on top of it. Please find below an example on how this can be achieved.

Correct way of doing entities fetch
var PartitionKeyQuery1 = TableQuery.GenerateFilterCondition(PartitionKey, QueryComparisons.Equal, value1);
var PartitionKeyQuery2 = TableQuery.GenerateFilterCondition(PartitionKey, QueryComparisons.Equal, value2);
var customerTaskA = Task.Run<IEnumerable<CustomerDataEntity>>(() => this.customerDataTable.ExecuteQuery(PartitionKeyQuery1));
                   
var customerTaskB = Task.Run<IEnumerable<CustomerDataEntity>>(() => this. customerDataTable.ExecuteQuery(PartitionKeyQuery2));
// If one or more customer(s) exists for given conditions then process the data
if (customerTaskA.Result != null || customerTaskB.Result != null)
{
   . . .
   . . .
}

Note: When a Task with .Result is called, it internally waits for the task to complete and then executes .Result attribute of the Task.

b.      Try to have the results from Azure table storage in IEnumerable object as long as you can in the code. If there is any filtering, sorting, etc to be applied on the data, use LINQ and perform these operations on IEnumerable object. When you are done with these operations then call .ToList() method of the IEnumerable object. The line of code where you call .ToList() is when it actually executes the Query. This is called Deferred Execution.

So the next time when IEnumerable<Object>.ToList() is taking lot of time or is a performance hit, don’t look for any other places but at the Query which is getting executed.
 This post will be updated periodically with new learnings. Thank you!

*********************************************************************************

Update - 8/21/2015

Learnings 
             
           ·     Unsupported characters
Below characters are not supported for PartitionKey and RowKeys –
i.                 The forward slash (/) character
ii.                Backward slash (\) character
iii.               Number sign (#) character
iv.               Question mark (?) character
v.                Control characters(\t, \n, \r, etc)








No comments:

Post a Comment