Making of lftextures

December 26, 2009 at 20:08
filed under Blog
Tagged

Prologue: The Need
Dan Reetz thought to himself, “I have this great idea for a large camera, but I need to have the tools to use the camera before I build it.” Dan had been collecting the cameras and other parts for awhile before I’d met him. Once he told me about his idea, we decided that the project was within reach and went for it.

Our plan was to design a program that could display the additional information captured in such a fashion; and to do so we needed some data. So we built a “slide-type” camera, and made a set of overlapping images of a few different scenes for an initial dataset.
Dan's Original Slide Camera

Step 1: The Method
At the workstation, Photoshop is fired up. Dan’s intuition is to pick the object we want to focus, align all images so that feature is centered in each image, and then take the mean of all the images. Success! We’ve refocused the scene! We also experimented with the mode, median, etc. and found the mean and median to look the best.

Step 2: Emulation
My task now was to perform the same tasks in a manner that was repeatable on any suitable dataset.

First our requirements:

  • We need to display 12 or so images at once
  • We need the user to have interactive control

These lead me to choose a graphical language of some sort. I decided to use OpenGL (used in 3D games and applications) and the GLUT toolkit (OpenGL extensions for keyboard, mouse, window, etc. control). I chose them for two reasons: cross-compatibility, and familiarity.

Using OpenGL allowed me to take advantage of the blending modes and the ability to set the transparency of an object to perform the “mean” of the images without an intermediate processing step.

A problem was therefore introduced: I would want to make big rectangles and display (texture) the images on them for speed and efficiency reasons, but OpenGL expects texture dimensions to be in powers of two, meaning we couldn’t use any input image we wanted without preprocessing them first. (I actually built an initial attempt that sampled the image and made thousands of little “pixel” rectangles, but this method was very slow and memory intensive, i.e. over 2GB of RAM)

Of course, others wanted to texture arbitrary images as well, and the problem had been solved quite a long time ago when graphics cards began to support the GL_TEXTURE_RECTANGLE_ARB extension; this allows textures to be any dimension, among other things.

At this point:

  • We could perform our target procedure
  • We had reduced the CPU and memory footprint

All that remained was to implement the steps to calculate the refocused scene. By a stroke of luck our initial attempt worked very well; at first we didn’t realize that it’s success was due to the careful manner in which we had taken the pictures.

Step 3: Fire-fighting
We now had a working program, but it wouldn’t perform correctly on some of our data, confusing and frustrating us.
Images before alignment

It didn’t take us long to root out the problem; once we had the data open in a viewer we noticed that some datasets were aligned very well while others were not. We quickly did a manual alignment in Photoshop on one of the problem datasets et voilĂ !
Images after alignment

Epilogue: Onward
The core of the application hasn’t changed since. It is a major pain to manually align each dataset however, I’ve begun work on a rewrite that aligns the images in the program, instead of relying on the user to do so before hand.

Now that we’d completed our tool, we could see what our plenoptic camera was seeing. We now had a flurry of questions to answer:

  • Why is the plane of focus so shallow? (1-2 cm!?)
  • Can we retain sharpness on the edges of our output?
  • How can we improve the resolution of our compositions?
  • What other sorts of operations can we perform?

And additionally:

  • How many of our limitations are caused by our software?
  • Are some of our problems caused by our hardware?

So we march on.

4 comments

RSS / trackback

respond

  1. KARILUOMA » Sidetrack: Edge-finding

    on March 2, 2010 at 19:25

    [...] a model of the Large Camera Array Dan and I have been using. Our intention is to have the cameras lined up so that each of the [...]

  2. Mike Warot

    on April 12, 2010 at 15:16

    The focus is so shallow because you effectively have a HUGE aperture. I made the same mistake the first time I tried to do virtual focus as well.

    You can retain sharpness in the edges, if your resampling / remapping algorithms don’t degrade the image too much along the way.

    You might be able to super-sample the output, but you’d have to understand more math than I can cope with right now. The basic theory is to consider each image as frequency limited, and to randomly pick tweaks to an higher resolution version of the image, and seeking tweaks that best result in the pixel outputs you actually have. I STRONGLY suspect that you’d have to work in RAW format to get really useful results.

    I think your software is a good start…. I’m using HUGIN for something that it was never designed for… and it does a bang up good job as well. Of course manually putting in control points isn’t fun, but I find it meditative.

    Keep up the good work.

  3. matti

    on April 12, 2010 at 15:45

    @Mike Yeah, I will admit the resampling algorithm I use currently has zero thought in it.

    I was aiming for a mean or median of the pixels values at any given point, but instead I got lazy and used some old code that does opacity = 1/n where n is the position of the image in the array. This means the 12th picture only contributes 1/12 to the final output, far far away from a mean or a median operation.

    It hasn’t bothered me enough to change it; for displayable results I’ve always just used Photoshop/Gimp to do a true mean/median operation.

  4. Mike Warot

    on April 13, 2010 at 08:51

    Matti… my algorithm to do the mean takes 1/N of each new image, and (N-1)/N of the previous average. It sounds like we’re doing it THE SAME WAY. If you do the math long hand, you’ll be delighted to learn that this results in each image contributing exactly 1/N to the final product. 8)