niedziela, 25 listopada 2012

The unit of a unit test

As the popularity of unit-testing and TDD increased, so did the demand for the right tools: testing frameworks, test runners, test reporters and so on. This in turn made the testing even more approachable and applicable in legacy code. The result of the snowball effect is that the developers have a wide variety of tools to choose from. And that's in any popular language there is.

One of the most crucial requirements of successful testing is being able to get the test results fast. Out of many aspects, the most important is the isolation of what's to be tested from everything that is outside of the specific test's scope. Most of the time this means running the test without database, UI, web service calls or disk access (except when you're specifically testing that part). This can be achieved by substituting (by the means of stubbing or mocking) the costly calls for the time of test execution.

The separation is of course easier if the code was written with the testing in mind (Dependency Injection, Factories), but it can be done either way (if we're lucky we can try fiddling with the bytecode - like TypeMock - or use language features that allow to substitute any objects - like we can do in JavaScript). In the end, with enough effort, the perfect isolation is achievable.

Let's look at the common understanding of a unit:

Intuitively, one can view a unit as the smallest testable part of an application. In procedural programming a unit could be an entire module but is more commonly an individual function or procedure. In object-oriented programming a unit is often an entire interface, such as a class, but could be an individual method.

If we treat a class method as a unit, then using the available tools we can achieve both the ideal isolation and a high degree of code coverage (even 95-100%). What we get as a result is a set of classes that are working perfectly in isolation.

But do they work together?

Well, we can't really say anything without running other kinds of tests. Many of the errors will be caught by a type system, but only if that's available. If not, even changing a method signature will most probably break the system.

The way to solve the problem is to test the integration of the units. But we really want this process to be fast enough so that the code-test cycle can be ran often. Basically, we need another level of tests, or - to put it another way - we need tests with less granular units. That's a lot of work, and in the end it's a not-so-obvious way of introducing duplication: we test every part of a scenario and then the whole scenario itself.

Then how about being less strict with the definition of a unit and redefining it so that it's not only a class method anymore? Unfortunately, I can only now think of a bit vague definition of a unit: a closed set of code that makes sense in isolation.

If we have some kind of a framework class (date handling, i18n, general serialization), then it's perfectly fine for a unit of test. On the other hand, there might be a lot of technical code that is there just to support a set of web page interaction scenarios - in that case the whole page seems like a better choice for the unit of test. But again, handling the negative cases (the different paths of failure) is easier when the units are as small as possible.

Varying the size of a test unit is importang for maintaining good quality of the tests: bigger units cover more production code while requiring less test code; smaller units make sense for widely used types and are useful for pinpointing failure paths.

wtorek, 20 listopada 2012

Abusing Python's with statement

The Python's with statement makes it easier to manage resources:
import io

with io.open("file.txt") as f:
    # do something with the file
# f.close() called automatically
To make an object work with the with statement, it has to be a context manager, which means it has to have an __enter__() and __exit__() methods. I found the with statement useful for scripts that need to perform several tasks:
with step('rebuilding application'):
 
    step('cleaning output...')
    # clean the output
 
    with step('building...'):
     
        with step('copying files...'):

            step('copy the files')
            # copy the files
            
            step('extra check')
            # do the extra check
 
        step('postprocessing...')
        # do postprocessing
        
    step('removing temp files')
    # remove temp files
The output would look like this:
rebuilding application
    cleaning output...
    building...
        copying files...
            copy the files
            extra check
        postprocessing...
    removing temp files
To make it work, we just need a proper context manager:
import traceback    # for printing the traceback

class step:
    indent_size = 0

    def __init__(self, text):
        print (' ' * step.indent_size) + text

    def __enter__(self):
        step.indent_size += 4;

    def __exit__(self, exc_type, exc_val, exc_tb):
        step.indent_size -= 4
        
        if exc_val is None:
            # handle the exception
            print ''.join(traceback.format_exception(exc_type, exc_val, exc_tb))

        # suppress the exception?
        return True
But what if based on some condition we would like to skip further processing of the current step? The with statement provides no flow control - break or continue cannot be used here. What if a user wants to stop not only the most nested step?
with step('rebuilding application'):
 
    step('cleaning output...')
    # clean the output
 
    with step('building...'):
     
        with step('copying files...'):

            step('copy the files')
            # copy the files
            
            # check some condition
            skip_check = True
            if skip_check:
                # exit two levels:
                # don't perform the extra check
                # and don't do the postprocessing
                step_exit(2)
                
            step('extra check')
            # do the extra check
 
        step('postprocessing...')
        # do postprocessing
        
    step('removing temp files')
    # remove temp files
The only way to early exit a with statement is by throwing an exception. Let's do just that - here are the required changes to achieve the expected result:
import traceback    # for printint the traceback
import os           # for the early exit

# the exception to be thrown to early-exit the with statement
class StepExitException(BaseException):
    def __init__(self, levels=1):
        # keep track of which level we're at
        self.level = levels

class step:
    indent_size = 0

    def __init__(self, text):
        print (' ' * step.indent_size) + text

    def __enter__(self):
        step.indent_size += 4;

    def __exit__(self, exc_type, exc_val, exc_tb):
        step.indent_size -= 4
        
        if exc_val is None:
            return True
            
        if exc_type is not StepExitException:
            print ''.join(traceback.format_exception(exc_type, exc_val, exc_tb))
            # early exit without throwing exception, warning: does not close handles
            os._exit(1)
        
        # only need to exit the current with statement
        if exc_val.level == 1:
            # swollow the exception
            return True
            
        # propagate the exception one level up
        step_exit(exc_val.level - 1)

# to be called to early exit the with statement
def step_exit(levels=1):
    raise StepExitException(levels)

środa, 17 października 2012

The Factory pattern in JavaScript

I'd like to present a small JavaScript library that implements the Factory pattern. The code is free and can be accessed on GitHub.

The idea behind it is that it should allow to dynamically configure which objects are going to be created. This is mostly useful for mocking (or stubbing) the troublesome parts (i.e. RESTful HTTP requests). But due to the way it works, it also allows to achieve significant unit isolation for testing purposes.

Let's take a look at how it works. I'll paste some bits of the code and the spec written using Jasmine.

The library defines a globally accessible function:

var $factory = function (scope) {
    // ...
};

The function creates a factory that will include all the constructors defined in the specified scope. As inspired by Angular.js, only functions that start with the '$' character are considered:

it('recognizes all constructor functions within the specifed scope that start with the $ sign', function () {
    var scope = {
            $service: function () {
            },
            $other: { },
            prop: { }
        },
        $f = $factory(scope);
    expect($f.constructorCount).toBe(1);
});
it('ignores jQuery', function () {
    var scope = {
            $: function () {
            }
        },
        $f = $factory(scope);
    expect($f.constructorCount).toBe(0);
});

All the constructor functions are introduced as properties of the factory:

it('introduces all constructor functions into the scope of the factory', function () {
    var scope = {
            $service: function () {
            }
        },
        $f = $factory(scope);
    expect($f.service).toBe(scope.$service);
});

So now whenever you need to create an object, use the factory instead of using the constructor function directly:
var $service = function (x) {
    // ...
};

// instead of this:
// var service = $service(100);

// use this:
var $f = $factory(this); // or $factory(window);
var service = $f.service(100);

Most of the time the objects have dependencies that they construct and use. In the simplest form of Dependency Injection, the dependencies are passed as the parameters of a constructor. Here, the object can easily just create the objects it needs. To avoid playing around with global objects, the constructor function can receive the factory as the last argument which is automatically injected:

it('injects the factory instance as the last parameteor of constructor functions', function () {
    var scope = {
            $service: function (prop, $factory) {
                return {
                    prop: prop,
                    factory: $factory
                };
            }
        },
        $f = $factory(scope),
        service = $f.service(1);
    expect(service.prop).toBe(1);
    expect(service.factory).toBe($f);
});

The real use case would be a view object that uses a repository that uses an abstraction of XHR:

var $http = function () {
    return {
        get: function (url, query) {
            // issue a GET request to the specified url
        }
    };
};
var $itemRepo = function ($factory) {
    var http = $factory.http();
    return {
        getItem: function (query) {
            // use http to send a specific RESTful request
        }
    }
};
var $itemView = function (name, $factory) {
    var itemRepo = $factory.itemRepo();
    return {
        item: itemRepo.getItem(name)
    };
};

var $f = $factory(this);
var itemView = $f.itemView('my_item');

For the testing purposes, when we want to test the $itemView, we want to replace the $itemRepo. This can be easily done in a setup, by preparing a proper factory:

var factory = {
    itemRepo: {
        // stubbed $itemRepo
    }
};
var itemView = $itemView('my_item', factory);

The same effect can be achieved by constructing a proper scope, but this solution is cleaner and simpler.

And that's it!

But what is that we've actually created? Is this an Abstract Factory? Let's quote the book:
"Provide an interface for creating families of related or dependent objects without specifying their concrete classes"
Well, our factory adheres to the definition. The last part is debatable, because on the one hand there is no concept of concrete classes in JavaScript and on the other hand we need to explicitly say which constructor function we want to call.

So here's where the definition of the Factory pattern is very useful. In the context of what I wrote in a previous post, this piece of code is indeed a Factory, because it's full of Creation Methods. It uses the JavaScript concept of constructor functions and allows you to dynamically configure which objects are going to be actually constructed.

sobota, 13 października 2012

On Creation Method and Factory patterns

There's a really good book by Joshua Kerievsky titled "Refactoring to Patterns". It contains a catalog of refactorings that take design either to, towards, or away from pattern. I'd like to quote two neat ideas from the book.

The first one is the Creation Method [RtP]:
a method that creates instances of a class
The method can be either static or nonstatic. Also, the method can return an instance of either interface/abstract class or concrete class. It follows from the definition that every Factory Method [DP] is a Creation Method, but not always the other way round.

The Factory Method pattern allows encapsulating the changing conrete type that needs to be instantiated. But what is the use of Creation Method?

The Creation Method is usefull when there are many constructors with considerable amount of parameters. The complexity of instantiating can be easily encapsulated:
class Task
{
    // last two arguments are optional
    public Task(
            string name,
            TimeSpan plannedTime,
            TimeSpan? actualTime,
            TeamMember owner)
    {
        // ...
    }

    public static Task UnassignedTask(string name, int plannedHours)
    {
        return new Task(name, new TimeSpan(plannerHours, 0, 0), null, null);
    }
}
As there are no limitations on naming the Concrete Methods, they can better reveal the programmer's intention:
class Point
{
    public static Point Zero
    {
        get { return new Point(0, 0); }
    }
}

The second pattern is the Factory [RtP]:
a class that implements one or more Creation Methods
Again, it follows from the definition that every Abstract Factory [DP] is a Factory, but not always the other way round.

The idea behind an Abstract Factory is to be able to build a family of product objects (concrete classes that implement a set of interfaces/extend abstract classes). Additionally, the Abstract Factory can be substituted at runtime. On the other hand, a Factory is merely a class that has any sort of method that create new instances. So what's the point of using a Factory?

Well, as the Factory is a place to accomodate all the Creation Methods, all the benefits apply. So whenever you feel like you're doing Shotgun Surgery [R] which means a need to modify lots of places to introduce even a small change, and this has to do with object creation, then an introduction of a Factory is a sure way to improve the quality of the code.

But there's another interesting way to utilise Factory - to encapsulate concrete classes:
abstract class SqlType
{
    public abstract SetValue(object value);

    public static SqlType Varchar(int length) { /*...*/ }
    public static SqlType Number(int precision, int scale) { /*...*/ }
}
The concrete types can be hidden from the user and moved out of the public API.


Creation Method and Factory stand on their own, but in the end they can be just a stop towards Factory Method and Abstract Factory.


[RtP] "Refactoring to Patterns", Joshua Kerievsky
[DP] "Design Patterns", Gamma, et al.
[R] "Refactoring", Martin Fowler

piątek, 12 października 2012

JavaScript code coverage with Saga

If you're looking for a code coverage tool for JavaScript, here's one: Saga by Timur Strekalov. The tool can produce some very visually appealing coverage reports:
Saga's code coverage report (taken from the official website)

How does it work? It uses a browser emulator written in Java (HtmlUnit which provides all the browser API) to run the unit tests on instrumented code and then produce the coverage report. Saga was meant to be run as part of Continuous Integration, but I'll show you how to use it manually for development.

Saga supports all testing frameworks, because it just fires html files to gather statistics. I'll be using Jasmine in the example.

2. Grab the Jasmine standalone package. Luckily, this comes preloaded with sample tests.
3. Extract the files so that you have the following structure:
/
/ js
    / lib
    / spec
    / runner
    / SpecRunner.html
/ saga-cli-1.3.0
    / saga-cli-1.3.0-jar-with-dependencies.jar
    / etc.
4. Create the coverage report
From your root directory run the following command:
java -jar saga-cli-1.3.0\saga-cli-1.3.0-jar-with-dependencies.jar -b js -i *.html -o coverage -n .+jasmine.+

Here's the rundown of the important options:
  • -b: specifies the base directory to scan for test runners (i.e. run everything in js folder)
  • -i: specifies the path pattern for unit test (run all *.html files; you might want to later narrow it to *Tests.html or similar)
  • -o: specifies the output directory for coverage report (warning: the files will be silently overwritten)
  • -n: specifies additional file patten for files that should not be instrumented and therefore not shown on the report (here I exclude Jasmine's files)
The coverage report will be create in the /coverage directory. And that's all!

It would be nice to actually see the test results and I hope that Saga implements that kind of feature in future releases.

poniedziałek, 8 października 2012

The astounding recurring patterns in software development evolution

It's interesting to see the history repeating itself. But let's begin from the start.

There are people who set milestones in software development. One of them is Dennis Ritchie, the creator of the C programming language. The language was not a revolution, but an evolution and compilation of ideas that appeared in prior languages. The language's success stems from the abstraction it introduces. On the one hand, it is far away enough from assembler that it significantly speeds up and eases software development. On the other hand, it is close enough to machine language, that there is hardly any computer architecture that has no C compiler.

Forward ten years and we meet another important person - Bjarne Stroustrup, a Danish computer scientist. After working on analysing UNIX kernel he came to the conclusion that - for practical reasons - it would be beneficial to introduce object-oriented concepts to C. What was first known as "C with classes", became one of the most widely used programming languages. Although the language now has it's own unique features, C++ was - just as the name says - meant to be the next increment of C.

Few years later the same ideas were applied to another important language called Pascal. This included some big names like Niklaus Wirth and Apple. But the chief architect of the success was Anders Hejlsberg. When working at Borland he lead the creation of the best Rapid Application Development tool called Delphi. It was badly beating Visual Basic, Microsoft's own product. But the guys from Redmond were always good at making business and after few magic tricks Hejlsberg was at Microsoft, playing a key role in development of C# and the .NET Framework that were to stop Java flooding the Windows ecosystem.

In the meantime, the web was getting smarter by the means of DHTML and later AJAX. But it wouldn't be so if it weren't for Brendan Eich's ten days long effort to create a programming language for the web. As the big names were battling for the market shares, this little language became the standard for the dynamic web sites. Even the language's name was influenced by what was going on between the companies. And thus JavaScript was born.

Fast forward to around 2010 and we find people wanting rich, interactive applications not only for mobile devices, but the whole web experience. To answer that demands, developers dig more into JavaScript, create tons of different libraries and tools, but ultimately find the JavaScript's shortcomings to be of significant impact. For that reason, many cross-compilers emerge for languages that are popular in other domains: Java, C#, Python and others. It's no surprise that solutions like CoffeeScript and Dart gain a lot of attention and early-adopters. But they fail to appeal to those who need to support legacy platforms (i.e. IE - the old ones).

And we're back at what has happened just recently. The man I mentioned before, Anders Hejlsberg, who happens to be a Dane, took the most popular language (of the web) and created it's next increment, TypeScript. To me, this looks exactly like the ++ of C++ (and ++ of C# if you will). The small increments which try to answer the technical challenges that the developers face everyday is what made C++ work. I hope I'm right and the TypeScript project will be as successful.