Wednesday, February 27, 2013

RavenDB is a bad choice for Asp.Net Session Provider

I was considering using RavenDb as a custom session state provider in a web application.  Essentially I am looking for a document database that works as a session state provider and also serves other application needs.  NServiceBus bundles this database, so I thought I don't have to use one more document database.

This is my first encounter with RavenDb.  I read lot of good things about this database.  I wanted give a try to see if this fits in to my requirements.

With a little bit of googling,  you can find few session state provider implementation for RavenDb here, and here.  All these implementations are based on Microsoft's ODBC session state provider sample here.

All the important methods you need to implement for your own provider are explained here.

Session State Provider is little tricky to implement.  In the context of session asp.net can server three types of page requests.
  • Pages that require no session
  • Pages that require read-only session
  • Pages that require writable session
All these requests can concurrently reach your provider.  Thanks to ajax calls.
All these three types of calls have the potential to update the session document concurrently.

Now the writable session pages attempt to use 'LockId' and 'Locked' properties to protect session data corruption from concurrent write calls.

Pages that require no session data just extend the expiry time of the session.  Pages that required read-only sessions should be able to retrieve the session data given a session id and they will wait for the releasing lock.

All of the above RavenDb session state provider implementations buckle at concurrent user loads.  These providers try to work around RavendDb limitations for this scenario.

Queries require Indexes

You cannot use RavenDb queries to get the session document(s).  This is because RavenDb queries require indexes and these indexes run in the back ground.  Under a heavy load these indexing yields stale documents. And if you wait for the indexing to finish you will run in to timeouts.

No Find And Modify Support

RavenDb won't support find and modify operations over a collection.  Following type of query is not possible.  Now you are left with loading the session by its id and then modifying its partsby examining its properties.

UPDATE Sessions SET Locked = true WHERE Id = 'xyz' and Locked = false

With out using indexes, and without the support for atomic operations as above, you are forced to write the following code.


var doc = sessionStateDoc.Load<SessionState>("xyz");
if(!doc.Locked){
// other code
}

This kid of code increases the chances for concurrency.

Optimistic Concurrency

Pages that doesn't requie session attempts update the "Expires" property of session document, at the same time a page that requires writable session might attempt to update the "LockId" property of session document.  

Even though they are totally different properties of the same document, we are forced to deal with the concurrency.  There might be a way to do fine grained concurrency based on specific fields, I find it way too much trouble than necessary.

When session module calls GetItemExclusive method, if it runs in to concurrency problems, we can simply return null by indicating session module to retry getting the document.  But how many times we should do this?  This slows down the pages.

You can probably attempt to use RavenDb patch command update "Expires" property, thus avoiding concurrency conflicts.  But soon you will find that this idea fails when we try to use RavenDb Expires Bundle.

Expiring documents

RavenDb comes with an expiration bundle that allows you to remove expired session documents.  In order to make this bundle work, you need to make use of the metadata constructs like the following. Unfortunately you must include this setter as part of the unit of work.  

db.Advanced.GetMetadataFor(session)["Raven-Expiration-Date"] = DateTime.UtcNow.AddMinutes(20)

sessionStateDoc.Expires = DateTime.UtcNow.AddMinutes(20)
sessionStateDoc.SaveChanges();

This prevents us from doing partial updates to the document. This metata data update must be done every time you update the collection. 

Other minor but annoying issues  
  • As of build #2261 there are still bugs. 
  • Master-Master replication won't work when you use API Keys.  
  • Expiration bundle randomly deletes session documents.  
  • Raven Studio doesn't give you a comfortable feeling of using a professional grade database.
Following changes gave me a relatively stable implementation of RavenDb Session Provider under higher loads.
  • Do not use expiration bundle.  Use a server side trigger or a scheduled task to expire documents. This allows us to do path command for updating "Expires" property in "ResetTimeOut" method. 
  • Do not use concurrency checks while removing the item, saving the session data, and while releasing exclusive locks.  These calls must succeed, if they fail you might get in to logic errors in the app. 
  • Use optimistic concurrency check only in GetItemExclusive/GetItem routines.  If the concurrency check fails, simply return null, this will force session module make calls to these methods.
All in all I am not happy about the friction.  I smell maintenance head aches.

I would like to try another document database to see if that fits better for this usecase.

Sunday, October 7, 2012

Typescript is neat

javascript is not picky about line endings.  Following code looks fine but causes trouble.

   1: function func() {
   2:     return 
   3:     {
   4:         greet: "Hello World"
   5:     };
   6: }
   7: console.log(func().greet);

Line number 7 prints, undefined.  This is because return statement at line number 2 has line ending.  Now if you use the first code snipped in typescript, you can avoid this error.  Typescript marks the usage of greet in line number 7 as an error.



   1: Error    1    Expected var, class, interface, or module   …\TypeScript1\TypeScript1\app.ts    7    13    app.ts

May be they can do better with the error message. 


This problem can be fixed by moving opening curly brace at line 3 to line 2.



   1: function func() {
   2:     return {
   3:         greet: "Hello World"
   4:     };
   5: }
   6: console.log(func().greet);

Now the error goes away, and also you will get intellisense  for the string property “greet” after func().  in line 6.

Sunday, January 15, 2012

Tips for windows developer working on Mac–Tip # 2

Many times you want to run a text editor from terminal.  May be you want to edit a .bashrc file or a .gitignore file.  I use my favorite editor that works on both windows and mac, Sublime Text

ln -s "/Applications/Sublime Text 2.app/Contents/SharedSupport/bin/subl" /usr/local/bin

once you typed that command from terminal, you can now open files using sublime directly from your terminal by typing

subl ~/.bashrc

Tips for windows developer working on Mac–Tip # 1

Files/folder with names starting with period are hidden by default in Finder.  So files like .bashrc or .gitignore are not can’t be found.  Running following commands from the terminal fixes that issue

defaults write com.apple.finder AppleShowAllFiles TRUE

killall Finder

Second command restarts the finder.

Sunday, November 20, 2011

Running Asp.Net MVC controller actions on STA threads

Recently while we were converting a legacy asp application (Yes they still exist) to asp.net mvc, we had to work with a set of 3rd party business critical components.

These components were legacy COM components.  In load testing we found that controller actions that contain calls to these components were crashing w3wp process almost every minute.   A little bit of research around this problem yielded the following article.

Running ASMX Web Services on STA Threads

The summary of the problem is, MVC action methods run in COM multithreaded apartment (MTA) threads.  These legacy components were being created from MTA threads are being serialized and are being processed by single STA thread.  On top of that these components are loading tons of data in to memory and the load is causing the memory corruption.

So the solution is to make the MVC action method to run on STA thread, thus allowing COM to place the object instances on the creator’s thread.

Here is an MSDN thread that ties up the STA with asp.net mvc

AspCompat=true does not work with MVC

The above solution doesn’t support asp.net sessions and also was written for asp.net mvc 1.0

Here is the asp.net mvc 3.0 adjusted solution.

First the RouteHandler, it derives from MVCRouteHandler and instantiate a IHttpHandler derived class.

public class STARouteHandler : MvcRouteHandler
{
   protected override IHttpHandler GetHttpHandler(RequestContext requestContext)
   {
       return new STARequestHandler(requestContext);
   }
}


Next the STARequestHandler,  the key to this entire magic to work is in the BeginProcessRequest and EndProcessRequest methods. These two methods create an aspnet compat wrapper around the actual execution of action method.  This handler also implements IRequiresSessionState marker interface to support the sessions.


public class STARequestHandler : Page, IHttpAsyncHandler, IRequiresSessionState
{
    public STARequestHandler(RequestContext requestContext)
    {
        if (requestContext == null)
            throw new ArgumentNullException("requestContext");
        this.RequestContext = requestContext;
    }

    private ControllerBuilder _controllerBuilder;

    internal ControllerBuilder ControllerBuilder
    {
        get { return this._controllerBuilder ?? (this._controllerBuilder = ControllerBuilder.Current);}
    }

    public RequestContext RequestContext { get; set; }

    protected override void OnInit(EventArgs e)
    {
        string requiredString = this.RequestContext.RouteData.GetRequiredString("controller");
        var controllerFactory = this.ControllerBuilder.GetControllerFactory();
        var controller = controllerFactory.CreateController(this.RequestContext, requiredString);
        if (controller == null)
            throw new InvalidOperationException("Could not find controller: " + requiredString);
        try
        {
            controller.Execute(this.RequestContext);
        }
        finally
        {
            controllerFactory.ReleaseController(controller);
        }
        this.Context.ApplicationInstance.CompleteRequest();
    }

    public override void ProcessRequest(HttpContext httpContext)
    {
        throw new NotSupportedException("This should not get called for an STA");
    }

    public IAsyncResult BeginProcessRequest(HttpContext context, AsyncCallback cb, object extraData)
    {
        return this.AspCompatBeginProcessRequest(context, cb, extraData);
    }

    public void EndProcessRequest(IAsyncResult result)
    {
        this.AspCompatEndProcessRequest(result);
    }

    void IHttpHandler.ProcessRequest(HttpContext httpContext)
    {
        this.ProcessRequest(httpContext);
    }
}

 And lastly the usage of this handler.

While creating a route for the action method, simply attach this handler to the route definition.


context.MapRoute("STARoute", "{controller/{action}",
                    new { controller = "Home", action = "Index")
                    .RouteHandler = new STARouteHandler();


You can test the apartment state by calling the following line of code in the action method.

Thread.CurrentThread.ApartmentState.ToString();

Tuesday, October 25, 2011

MongoDB mapreduce for stackoverflow.com recent tags

I was playing with MongoDB's mapreduce, and wanted to write a query that simulates the list of 'Recent Tags' feature on stackoveflow home page.

I am using mongo-csharp-driver for this experiment.

Here I am taking a guess at stackoverflow's domain model. Here is the Question model with enough properties to demonstrate the mapreduce query.
Every question is associated with a list of Tags and has a CreatedOn property.
public class Question{
public ObjectId Id { get; set; }
public DateTime CreatedOn { get; set; }
public ICollection Tags { get; set; }
}


Create a database with some name and add 'Questions' collection to the mongodb.

We are going to make the following call to find the recent tags.

var recentTags = questions.MapReduce(map, reduce).GetResultsAs<ResentTagResult>();


RecentTagResult holds the results of the mapreduce query and defined as

public class RecentTagResult {
public string Id;
[BsonElement("value")]
public RecentTag Value;

}


MongoDb's mapreduce call outputs a result set with two properties _id and value. So here I am defining a mapper class with a property 'Id' and property 'Value' of type RecentTag. BsonElement attribute in the above code simply maps lower case property 'value' from the result set to title case property 'Value'. _id property from result is automatically mapped to Id by 'GetResultsAs' call.

RecentTag is defined as follows. Here I am expecting that the query results are going to contain Tag, count of tags, last time when a question was created with this tag. As you can guess our map/reduce functions must emit the values that match the following class definition.
public class RecentTag {
public string Tag { get; set; }
public int Count { get; set; }
public DateTime LastSeenOn { get; set; }

}

Now coming to the meat of the problem. A question can have more than one tag. We are looking for a list of tags used by questions asked in the last month. Here is the definition of the map function (which is completely written in javascript) that goes over entire collection of questions and emits a result each time a tag is found if that question is asked in the last month.

private string map =
@"function() {
if(this.CreatedOn >= new Date('Oct 1, 2011') && this.CreatedOn {
var lastseen = this.CreatedOn;
this.Tags.forEach(function(tag) {emit(tag, {Tag: tag, LastSeenOn:lastseen, Count: 1});});
}}";


It is very important that the emit function's value should match our RecentTag type definition.

emit(tag, {Tag: tag, LastSeenOn:lastseen, Count: 1})

Now coming to reduce, we have a bunch of emitted results from map function and we simply count them to find the total count of each tag found in last month.
A tag might appear more than once in the last month. If the tag appears only once, reduce function will never be called.

private string reduce =
@"function (key, arr_values) {
var dates = [];
arr_values.forEach(function(val) {dates.push(val.lastseenon)});
var result = {Tag: key, LastSeenOn: new Date(Math.max.apply(Math, dates )), Count:0};
for(var i in arr_values)
{
temp = arr_values[i];
result.Count += temp.Count;
}
return result;
}}";

Here att_values contain all emits for a single tag. Again it is important that our return type must match with ResultTag definition similar to the map function.
We start with that definition first

var result = {Tag: '', LastSeenOn: new Date(Math.max.apply(Math, dates )), Count:0};

And then iterate through arr_values and simply increment the count to get the final result.

Filling LastSeenOn property is little tricky. Here we are trying to find out the Max of the CreatedOn property of all emitted values of a single tag.

var dates = [];
arr_values.forEach(function(val) {dates.push(val.lastseenon)});

Here we are gathering all the dates from lastseenon property in to an array. And then while defining the result we are applying the javascrpt's Math.Max function to find the last seen date.

LastSeenOn: new Date(Math.max.apply(Math, dates )

That is all to it.

Monday, July 18, 2011

Open Source Dependencies

Total Cost of Ownership is a critical metric which I would like to pay attention to in software development. A project with lots of open source dependencies can become very difficult to maintain.
With advent of modern package systems like gems, Nuget and CDNs it has never been this easy to use open source in software projects. As of this writing there are thousands of JQuery plugins, hundreds of ruby gems, and hundreds of Nuget packages. I have seen developers arguing about their favorite ORM tool. I have not seen enough arguments about which JQuery light box plug-in should be used for the user interface. If you try to take a stock of all dependencies (both commercial/open source) on your code, you might be surprised to see the list.

I surveyed few .Net web applications. A single web application can have following list of components, without counting the major dependencies like Object Relational Mapper

  1. PDF
  2. Charting
  3. Ajax
  4. Half a dozen JQuery plugins/other JavaScript alternatives
  5. Spread sheet
  6. Dashboard
  7. Social networking
  8. Payment gateway
  9. Reporting
  10. Other value added services like support, feedback, live help
  11. External web services
  12. Scheduling
  13. Email
  14. JSON
  15. Mocking
  16. Unit Testing
.. and the list goes on.

Now imagine using a open source component for each one of these dependencies. That is a lot of code to maintain!

All non-trivial abstractions, to some degree, are leaky

At one point or the other your team needs to know internals of every open source library used in your project. Some of these libraries have hard dependencies, with a potential of preventing the future upgrades.

Never under estimate the testing burden. Take the example of a JQuery plugins. Usually these plugins depend on JQuery. When you upgrade JQuery, you are forced to upgrade the dependent plugins. Depending on quality of the plugin, many times you end of spending hours in debugging why a web page is failing to find the correct version of plugin that works. Cross browser testing is also time killer. All this can become very complex when all you trying to do is upgrading JQuery to its latest version, which itself is a trivial task.

You might not see this kind of risk with popular open source libraries. As they operate similar to commercial offerings. But not all of the open source projects are popular.

Some tips to control the proliferation of these dependencies in to your projects

  1. Always maintain a list of open source dependencies. Make it available to QA team and developers.
  2. Create a test suite to test all these dependencies. And run the tests with every deployment.
  3. Always check-in the source code for the open source project in to source control.
  4. Try to keep these dependencies to minimum. Remember that your team needs to re-learn these dependencies on every upgrade or major functionality rewrite.
  5. If possible try to stick to a known set of controls from a single vendor.
  6. While choosing the libraries weigh in your team’s skill set, team’s composition and future direction. If you don’t have dedicated resources to focus on Ajax work of the website, it is better to stick to a commercial solution than using a laundry list of multiple open source offerings.