Friday, July 31, 2015

Memory usage patterns and index sorting--how you manage to efficiently sort the data set which exceeds system memory

Memory usage patterns and index sorting
When the size of the array to be sorted approaches or exceeds the available primary memory, so that (much slower) disk or swap space must be employed, the memory usage pattern of a sorting algorithm becomes important, and an algorithm that might have been fairly efficient when the array fit easily in RAM may become impractical. In this scenario, the total number of comparisons becomes (relatively) less important, and the number of times sections of memory must be copied or swapped to and from the disk can dominate the performance characteristics of an algorithm. Thus, the number of passes and the localization of comparisons can be more important than the raw number of comparisons, since comparisons of nearby elements to one another happen at system bus speed (or, with caching, even at CPU speed), which, compared to disk speed, is virtually instantaneous.
For example, the popular recursive quicksort algorithm provides quite reasonable performance with adequate RAM, but due to the recursive way that it copies portions of the array it becomes much less practical when the array does not fit in RAM, because it may cause a number of slow copy or move operations to and from disk. In that scenario, another algorithm may be preferable even if it requires more total comparisons.
One way to work around this problem, which works well when complex records (such as in a relational database) are being sorted by a relatively small key field, is to create an index into the array and then sort the index, rather than the entire array. (A sorted version of the entire array can then be produced with one pass, reading from the index, but often even that is unnecessary, as having the sorted index is adequate.) Because the index is much smaller than the entire array, it may fit easily in memory where the entire array would not, effectively eliminating the disk-swapping problem. This procedure is sometimes called "tag sort".[5]
Another technique for overcoming the memory-size problem is to combine two algorithms in a way that takes advantages of the strength of each to improve overall performance. For instance, the array might be subdivided into chunks of a size that will fit easily in RAM (say, a few thousand elements), the chunks sorted using an efficient algorithm (such as quicksort or heapsort), and the results merged as per mergesort. This is less efficient than just doing mergesort in the first place, but it requires less physical RAM (to be practical) than a full quicksort on the whole array.
Techniques can also be combined. For sorting very large sets of data that vastly exceed system memory, even the index may need to be sorted using an algorithm or combination of algorithms designed to perform reasonably with virtual memory, i.e., to reduce the amount of swapping required.

Tuesday, July 28, 2015

Forcing Computed Observables to Refresh in Knockout

Computed Observables in Knockout are a useful way of calculating values based on other observables:
1
2
3
4
5
6
var ExampleViewModel = function() {
    this.name = ko.observable();
    this.isNameValid = ko.computed(function() {
        return this.name().length > 0;
    });
};​
One of the great features here is that when the name property changes the Knockout framework will automatically re-evaluate the computed observable and notify anything that is bound to that value. This works by automatically subscribing to the change notifications from the name observable when the value if first acquired.
Unfortunately this does not work when the value of the computed observable is dependent on a non-observable value:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
var ExampleViewModel = function(externalValidationDefinition) {
    this.name = ko.observable();
    this.isNameValid = ko.computed(function() {
        return this.name().length > externalValidationDefinition.minLength;
    });
};
 
var validationDefinition = {
    minLength: 5;       
};
 
var vm = new ExampleViewModel(validationDefinition);
 
//change the validation definition...
validationDefinition.minLength = 10;
//...but the computed observable is not re-evaluated
In this situation we need to manually force the re-evaluation of the computed observable, but there is no native way of doing so.
To get around this we can add a dummy observable to the view model, then retrieve and ignore its value from within the computed observable. This will cause the Knockout framework to re-evaluate the computed observable whenever the dummy observable changes:
1
2
3
4
5
6
7
8
9
var ExampleViewModel = function(externalValidationDefinition) {
    var _dummyObservable = ko.observable();
 
    this.name = ko.observable();
    this.isNameValid = ko.computed(function() {
        _dummyObservable(); //retrieve and ignore the value
        return this.name().length > externalValidationDefinition.minLength;
    });
};
Now we can invalidate the computed observable just by invalidating the value of the dummy one:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
var ExampleViewModel = function(externalValidationDefinition) {
    var _dummyObservable = ko.observable();
 
    this.name = ko.observable();
    this.isNameValid = ko.computed(function() {
        _dummyObservable(); //retrieve and ignore the value
        return this.name().length > externalValidationDefinition.minLength;
    });
    this.invalidateIsNameValid = function() {
        _dummyObservable.notifySubscribers(); //fake a change notification
    };
};
 
var validationDefinition = {
    minLength: 5;
};
 
var vm = new ExampleViewModel(validationDefinition);
 
//change the validation definition...
validationDefinition.minLength = 10;
//...and manually invalidate
vm.invalidateIsNameValid();​
Now we can manually force the computed observable to be updated when we know that an external factor has changed. If I get the time I may convert this into an extender or helper for reusability later…