Sidetrack: Edge-finding and Line Representations
Sidetrack: Edge-finding and Line Representations

So, you want to take two images, slightly offset in space from one another, and combine them into a larger image.

This is an old problem, and seems “solved” if you consider the Photoshop plugins, etc. to produce such panoramas. Most don’t realize that these tools are less than perfect, for instance the final product can be greatly improved by removing the lens distortion from each image before attempting to perform a panoramic merge.

But we don’t want to make a panorama with our images, we just want to do all the steps (correctly) up to the point where we’d try and combine them, i.e. perform translations/rotations/scaling to make the images lie on the same plane, so that each image’s only difference is being offset in that plane (only x and y offset, no z offset, no rotations in space, etc. )

Consider a model of the Large Camera Array Dan and I have been using. Our intention is to have the cameras lined up so that each of the camera’s sensors lie in the same plane, so that the only information being added by each adjacent camera is an image x units to the left. This is impractical in reality: each camera is slightly offset from the others, some even have tilted lenses causing them to look slightly down, etc. as we’ve observed in the data we’ve collected.

In order to find the transformations that will rotate/scale/translate the images so that they all lie in the same plane, we must make a set of assumptions about our problem to have any hope of implementing a solution. (The alternative being trying all such rotations in all three axis of freedom, etc. until the images are planar, an algorithm that would take prohibitively long [years] to complete.)

We can assume that all such rotations take place from (have origin at) the center of the image, since our images were taken with a camera and lens system. We can also assume the maximum displacements in the x,y,z plane, based on measurements of our physical system. We can additionally put maximum bounds on our rotations, but we may not want to in order to allow for flexibility in camera orientations (say landscape, portrait mixes).

The goal all of this is working toward is to be able to calibrate an array of cameras, or a collection of images from one camera, moved through space between shots, so as to put there output on the same plane. In the case of an array of cameras, this calibration need only be computed once (or every once in awhile) in order to better fit the model of being perfectly planar.

If such an algorithm is successful, it would enable a hastily constructed or ad-hoc array to provide useful Light Field representations, and improve the output for constructed arrays as well.

Currently, our lfmanip only concerns itself with translations in the image’s x,y planes (not the real-world x,y plane, mind you). If we knew the rotations and scaling between each image, we could line up our images in a much better way.

To tackle this problem, I’ve decided to attempt to reduce each image to a line representation, a vector image. Given each image as a set of lines, it will be remarkably easier to compare two sets of lines and try and decide which set of transformations will make them most similar.

I’ll be graduating with my Bachelor’s of Arts in Mathematics/Computer Science this spring, so¬† I’ve decided to tackle the problem of mathematically describing the “optimal” way to prepare an image for edge-detection for my capstone research project. The hope is that working with the nit-picky parts of this process I’ll be able to extract a set of lines from a given image, particularly those lines that will be most helpful in deciding how an image has been transformed with respect to rotation/scaling/translation of a previous image.

It’s likely such a best approach does not exist or is computationally infeasible. Whatever the case I’m optimistic that by slowly adding in more assumptions about the problem I’ll be able narrow the search space to the point where an algorithm can be written to do the heavy lifting.

  1. Mike Warot

    on April 12, 2010 at 10:14

    I’ve been doing these things by hand for a few years, and have been pondering how to automate the process. I use a single camera, handheld, and take anywhere between 10 and 100 shots of the same object sequentially from different locations. No rules, no measurements, just plain old moving the camera around a bit. This means that I have NO basis for assuming anything about geometry.

    My way to put the images into focus is to pick a feature, by hand, in Hugin, and create a ring of control points that link that feature across all of the images. This then makes it easier to do a few more points, sometimes 2 is all you need, but more features help constrain the geometry better, and give a better virtual focus.

    If you could come up with a way to pick a single point, and find it across all of the images, that gives a good anchor to just do xy offsets, but you still need to deal with rotation. A second point generally is all you need, as long as all of the photos are from the same distance away. (Good for far objects, not good enough for closer ones).

    Once you have the ability to put a far object in focus. You could then pick a closer one, and find the new XY offsets. If you keep the rotation information from the first attempt, you can save a lot of time and effort.

    It’s my THEORY, that if you have the rotation factored out, you could then interpolate between the different XY offsets to put focus at any desired distance, including past both near and far objects.

    I could be wrong, and I know the geometry gets complicated, but I think that might offer some food for thought.

    I hope this has been helpful.

    –Mike–

  2. matti

    on April 12, 2010 at 12:16

    @Mike Thanks for the input! Sometimes it takes just a little interest to get old ideas moving again.

    The current system I’m nursing handles the feature points you mention, it locates them and aligns a set of images to one point, saving the x,y offsets to a file.

    As I say in this post, rotation isn’t as easy to model in a computer; my intention is to mimic the process you and I use; that is look at the image, decide which way to rotate it, then check to see if it “looks” better.

    The idea of using interpolation in some way to increase the “resolution” of the captured data has been thrown around a few times by Dan and I, it is as you say, a very exciting idea.

    I’ll be releasing this “last” version of my program shortly after I graduate; It’ll probably be in two parts, one as a graphical application, and another that will allow batch processing of a set of images; i.e. by dropping a folder on the application icon. Look forward to it!

  3. Mike Warot

    on April 12, 2010 at 15:22

    More thoughts about rotation…

    The way I get rid of rotation (because all of my shots are handheld) is to either pick a horizon, and tell Hugin about it… or to pick 2 points common to all the images in the stack, and let hugin rotate them to match the first image.

    The way that I’d LIKE to do it is to pick a point, and then have it found across all of the photos in the stack using OpenCV or something like that. Hugin automates the process of finding the second point in a pair once it has enough info to make a reasonable guess, so you only have to actually put in both points manually on the first pair.

    If I could get a set of coordinates that match a user selected point in the first image, for each point I choose, I could then output a .pto file which hugin could then use to optimize and do all the alignments, etc. (Actually Hugin is just a front end for panotools – http://sourceforge.net/projects/panotools/) which does the actual optimizations in 3d.

    You might very well be able to use panotools to solve your rotation and other 3d issues and avoid a lot of gnarly math. It’s open source.

  4. matti

    on April 12, 2010 at 15:37

    @Mike Funny you should mention panotools, I have it linked in the main post. As for OpenCV; that’s also being used in the latest version of lfManip.

    It seems like you have a feature request of sorts; I’ll highly consider making some cross-compatibility with hugin; I know Dan uses hugin too. If you could, I’d appreciate it if you’d try out the forthcoming software and give me feedback, specifically on the now-planned hugin compatibility file-format output, to see if it’s anything useful or not for your needs.

    Hope you don’t mind waiting until mid-May, though!

  5. Mike Warot

    on April 13, 2010 at 08:47

    I’ve tried your current version of LfManip, but unfortunately it doesn’t work for me as my source images are totally unconstrained… focus does work on an image, but since there is no “starting position” to scale from, I can’t do any 3d effects.

    I think I’m going to have to dig around the OpenCV libraries as well, to figure out what can be done. I know that matching a template against an image is a compute intensive task, but it’s the only way for me to do things because of the nature of my image stacks.

    I’ll see about doing some regularly spaced stacks if I get some free time.

    Keep up the good work!


Leave a Comment?

Send me an email, then I'll place our discussion on this page (with your permission).


Return | About/Contact