Scheduling

This page demonstrates how to construct and optimize a multistage pipeline. We'll write the two-stage pipeline that blurs an image similar to the one on the front page.

Dump the following code into the file halide_blur.cpp:

// We'll start it off by including Halide and some other things we'll need.
#include "halide/Halide.h"
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
    
using namespace Halide;
    
int main(int argc, char **argv) {
        
    // First we'll need to acquire an image to blur. If you have image reading/writing code lying around you can use that. You can also grab our png-loading wrapper from http://github.com/halide/halide/tree/master/apps/png.h. For the code on this page, we'll just create a 16-bit image filled with random noise:

    Image<uint16_t> input(1024, 768);

    srand(time(NULL));
    for (int y = 0; y < input.height(); y++) {
        for (int x = 0; x < input.width(); x++) {
            input(x, y) = rand() % 1024;
        }
    }

    // We didn't use the full 16-bit range, and limited ourselves to numbers between 0 and 1023, because we don't want to have to worry about overflow for this example.

    // Now we define a two-stage Halide pipeline that blurs this image:
    Func blur_x, blur_y;
    Var x, y;
    
    blur_x(x, y) = (input(x-1, y) + input(x, y) + input(x+1, y))/3;
    blur_y(x, y) = (blur_x(x, y-1) + blur_x(x, y) + blur_x(x, y+1))/3;

    // Now we can JIT-compile and evaluate our pipeline like so:
    Image<uint16_t> output = blur_y.realize(1024, 768);

    // For now, we'll do nothing with the returned image, so our main function ends here:   
    return 0;
}

Ok, let's compile and run this:

$ g++-mp-4.7 -std=c++0x halide_blur.cpp -Lhalide -lHalide -o halide_blur

$ ./halide_blur
Error: Function may load image .i0 out of bounds

Halide refused to run the pipeline, because it detected the input image (which has been given the internal name i0) will be read out of bounds. If you consider the blur functions we have defined, you'll see why. Halide infers that computing a 1024x768 window of blur_y requires a slightly taller window of blur_x (1024x770), which in turn requires a slightly wider window of the input (1026x770). This is larger than the input image. Rather than trigger a segmentation fault, Halide checks this before it runs the pipeline.

Halide functions (Funcs) do not have boundaries. They are defined over an infinite integer domain. This means that Halide is free to evaluate them over any range. Input images, however, are bounded.

What we need to do now, is wrap the input image in a Halide function that defines a boundary condition for it:

Func clamped;
clamped(x, y) = input(clamp(x, 0, input.width()-1), clamp(y, 0, input.height()-1));

Now instead of using input in the definition of blur_x, we'll use clamped. clamped is a Func, so it can be evaluated over any domain. Furthermore, Halide can detect that clamped never loads the input image out of bounds, so it won't issue an error. We should also add a check to make sure the output is what we expected. The modified code looks like this:

#include "halide/Halide.h"
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
    
using namespace Halide;
    
int main(int argc, char **argv) {

    Image<uint16_t> input(1024, 768);

    srand(time(NULL));
    for (int y = 0; y < input.height(); y++) {
        for (int x = 0; x < input.width(); x++) {
            input(x, y) = rand() % 1024;
        }
    }

    // We define a three-stage Halide pipeline that adds a boundary condition and then blurs this image:
    Func blur_x, blur_y, clamped;
    Var x, y;
    
    clamped(x, y) = input(clamp(x, 0, input.width()-1), clamp(y, 0, input.height()-1));
    blur_x(x, y) = (clamped(x-1, y) + clamped(x, y) + clamped(x+1, y))/3;
    blur_y(x, y) = (blur_x(x, y-1) + blur_x(x, y) + blur_x(x, y+1))/3;

    Image<uint16_t> output = blur_y.realize(1024, 768);

    // While we're at it, let's add a check to make sure the output is
    // what we expect by also writing the algorithm in C
    for (int y = 0; y < input.height(); y++) {
        for (int x = 0; x < input.width(); x++) {
            int left = std::max(0, x-1);
            int right = std::min(input.width()-1, x+1);
            int up = std::max(0, y-1);
            int down = std::min(input.height()-1, y+1);
            
            int16_t blur_x_up = (input(left, up) + input(x, up) + input(right, up))/3;
            int16_t blur_x_here = (input(left, y) + input(x, y) + input(right, y))/3;
            int16_t blur_x_down = (input(left, down) + input(x, down) + input(right, down))/3;
            int16_t blur_y = (blur_x_up + blur_x_here + blur_x_down)/3;
            
            if (blur_y != output(x, y)) {
                printf("Output at %d %d was %d instead of %d\n", 
                       x, y, output(x, y), blur_y);
                return -1;
            }
        }
    }
    
    printf("Success!\n");

    return 0;
}

The check amounted to rewriting the algorithm in C. Notice how much longer it is!

Let's recompile and run:

$ g++-mp-4.7 -std=c++0x halide_blur.cpp -Lhalide -lHalide -o halide_blur

$ ./halide_blur
Success!

That's better! Now let's make it fast.

... coming soon ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduling

Clone this wiki locally