An Augmented Reality Graphical Editor

By Robert Schlaff

Sections of this document

1. Updated Abstract

2. Introduction

3. Initial Phases of the Project

3a. Learning Matlab

3b. Learning Affine Transformations

4. My Augmented Reality Editor

5. Conclusion and Future Work

Appendix -- Explanation of Code

1. Updated Abstract

In this project I created an augmented reality drawing program based on the Matdraw drawing system used in Matlab. The idea is to build a general drawing system that will allow a user to interface their drawings with a tracking system or to augment real life scenes with drawings.

The GUI is based on the use of affine transformations to change the coordinate axes of rigid bodies. Any body with 3 tracked points can be used for this two dimensional transformation. The current demonstration that I have built, allows a user to draw on the faces of the box (three different planes). As the box rotates, the drawings deform, as they would if they were attached to the box itself.

(Note: While this project doesn't seem to have that much to do with the original abstract, the two are very similar. Only a few functions need to be changed to allow this system to work with a tracking system.)

2. Introduction (Much of this was inspired by James R. Vallino's Dissertation at the University of Rochester)

Augmented Reality

In the past few years, virtual reality has become a significant rage in the popular consciousness. In everything from video games to virtual walk-throughs of unbuilt houses, virtual reality has something to offer. However, virtual reality has one main obvious limitation -- it is not very good at providing a realistic interpretation of the world. This is true for many of our senses. How well can we model the feel of something, or the taste of it with a computer. Recently, however, many people have proposed combining the real world with virtual objects so that they appear as one environment. In this project I deal with augmented reality, which is the application of some virtual elements into a primarily real scene.

Augmented reality creates a view of a real scene and incorporates virtual 3-D objects into that scene. As the user moves through the real scene they should not be able to tell the difference between real and virtual objects. Augmented reality thereby can add information to make people more productive. The illustration below shows one such situation where cat scan images are combined with regular images used during brain surgery to allow a surgeon to better complete his duty.

How is virtual Reality Different from Augmented Reality?

In a virtual reality environment, the entire system is immersive. The user becomes totally divorced from his real environment. This is a very complicated simulation problem and requires a very large amount of processing to do reasonably well.

In an augmented reality system, the virtual elements are not obvious and the goal is to keep the user immersed in the real world. This is important because it allows the user to complete his real world task easier, because he's already very familiar with the real world. The major problem in augmented reality is not processing power, it's placing the item in the correct place in the world. An example of this is shown below. In the image on the right, the wireframe surrounds the camera, and it seems as if the wireframe could actually be in the real situation. In the image on the left, the wireframe is slightly off to the left and back of the camera. The wireframe looks a lot more out of place in this case and breaks the immersion of the real world.

The chart below lists the Milgram's mixed reality scale and illustrates fairly well the difference between the different environments:

My Project and Augmented Reality?

In this project, the goal was to create a GUI, in which users can draw objects in an affine coordinate axis defined by objects in the plane. The idea is to build a general drawing system that will allow a user to interface their drawings with a tracking system. This is useful for both input to trackers and an editor for augmented reality. For example, in X-Vision, the user can input trackers by drawing shapes on the tracked image. Lines correspond to edge trackers, and patches specify SSD trackers. Using this we could also track an affine coordinate system (which only needs 4 points to be fully specified). Therefore the user could click an origin, and three lines and would totally specify coordinates of any rigid object.

Another use of these affine transformations is in drawing programs. Since we can only draw in two dimensions at a time, these drawing programs would allow the user to specify a specific plane and draw on it. While most drawing programs do that now with layers, these layers are always coplanar. With these transformations, we would be able to have layers that were not coplanar. This would make it significantly easier to do three dimensional drawings.

A drawing program can also be combined with a tracking program to create an augmented reality editor. The user would be able to place objects in the scene and specify an affine transformation, and then the objects would appear to move with the reality.

I've completed the first phase of this augmented reality editor for my senior project. It allows users to create lines and register them in a Real Environment in which a box is being rotated. This interface can easily be easily extended to drawing on any two dimensional image of a three dimensional object where the three dimensional coordinates of at least three points are known.

3. Initial Phases of my Project

3a. Learning Matlab

I spent a good while learning how to use Matlab. In the spirit of "learning by doing" I created a program that played the card game "War" with itself. In the process of writing this program I also learned how to use the Matlab to debug the code that I had written.

Building A Simple GUI that Augments Reality

My second project was to write a simple fake tracking GUI for the Matlab. In this GUI, the user steps through a series of images of a person hitting a ping pong ball. The user draws a box around the ping pong ball in each frame. These frames are then played back in succession and combined with the user drawn boxes to make what looks like a tracking video. The user interacts with the program by using graphical buttons such as forward one frame, back one frame, and play the movie.

Following is a screen shot of the GUI without the pseudo tracking, and then with the pseudo tracking:

Problems with Matlab

While parts of Matlab lead to structured programming, the language has some idiosyncrasies that make it hard to deal with. In Matlab 5 it was very easy to create the actual GUI, using the GUIDE editor. However, GUIDE has some strange idiosyncrasies that make it irritating to deal with. First, callbacks can not be commented when using the callback editor. Second of all, one can not cut and paste in the callback editor. This is why it is often advantageous to call a function from the callback editor.

Another irritating idiosyncrasy is Matlab's use of variables. In order to specify a persistent variable, the variable must be stored in global space. This lack of variable protection often makes for non-robust code. In addition, each function must be in its own file and Matlab provides no support for user creation of their own function libraries. This means a very large number of M-Files or functions that are passed an argument. This causes poorly structured programs either way.

3b. Learning Affine Transformations

In phase two of the project I started to deal with affine transforms, which were the transforms that I used to put the virtual articles into the real space. Affine transforms are transformations that allow any two dimensional (or three dimensional) set of coordinates to be mapped to any other two dimensional (or three dimensional) set of coordinates by knowing the positions of only a small number of coordinates. We only need three points that are not co-linear to transform one two dimensional set of points to another.

Basically, any two dimensional rigid shape has a series of properties. One very important property for us is that all parallelograms stay parallelograms no matter what position view of the axis is taken. This is important because it is also true for affine transformations. Therefore, if we can determine any two non parallel lines in the source and target systems, we can recreate all of the points in the axis.

The affine transform has to include rotation, shear, and translation. First we do a translation of each coordinate with the following matrix. Newpoint = original point + (neworigin - original origin)

The following shows a transform from an any affine transform (specified by P1 and P2) to Euclidean coordinates (x,y). Notice that after the translation the origins of both systems are at the same point.

x = P2x*A + P1x*B
y = P2y*A + P1y*B

or, in matrix form

[x] = [P2x P1x] *  [A]
[y] = [P2y P1y]    [B]

This is obvious from the diagram above. How do we determine the Euclidean x coordinate? We multiply the number of x units per P2 axis, times the number of P2 units (A) and add the number of x units per P1 axis, times the number of P2 units.

P1x (units of x per unit of P1) * A (units of P1) gives us A* P1 units of x

I call this the forward transform in the paper because given the affine coordinates and the transform, we will know where to map on the screen (which has Euclidean coordinates). To determine the reverse transform (the affine coordinates given the Euclidean coordinates and the affine axis) we do the following:

[x]               = [P2x P1x] *  [A]
[y]               = [P2y P1y]    [B]


[P2x P1x]-1 [x]   = [P2x P1x]-1 [P2x P1x]*  [A]
[P2y P1y]   [y]   = [P2y P1y]   [P2y P1y]   [B]


[P2x P1x]-1 [x]   = [1 0] *  [A]
[P2y P1y]   [y]   = [0 1]    [B]


[P2x P1x]-1 [x]   =  [A]
[P2y P1y]   [y]   =  [B]

4. My Augmented Reality Editor

This was the main part of my project. The idea was to build a simple GUI in where a naive user could draw on the planes of a three dimensional object. In this way they could stick a painting on the wall of a room or put a new cover on a book in a scene.

Currently I have a reasonably good demonstration of this. At a Matlab prompt the user type AfDraw. What comes up is a series of drawing tools, ('text', 'line', 'select', 'ellipse', and 'box'), and a main GUI window. In the main window, the user has a choice of faces(front, side, bottom, and none). By clicking on an axis and using the drawing tools, the user can draw on one of the faces.

When the user clicks on an axis, he changes the value of a global variable that keeps track of what axis the line is being drawn in. When the line is drawn, that value is placed in the line's userdata. In addition to keeping track of the axis, it also provides a visual cue as to what axis is actually current. For instance, if the user clicks on the Front axis, as above, the front face of the box is outlined.

There are two buttons on the bottom of the GUI, 'edit' and 'run'. These refer to the two modes that the GUI can be in: editing and run mode. When the user starts the program, he is in editing mode. This allows him to choose an axis and to draw on the image. However, he can not change the pose of the box. In order to change the pose of the box, he must enter run mode by clicking 'run.' He can then change the pose by putting a frame number in the Edit window labeled frame, or he can go to the next frame by pressing 'Next Frame.' Notice that in run mode the axis choice is grayed out.

When the user clicks on the 'run' button, we computer the A and B for each line by using revtrans. This data is then stored in the userdata field of the object along with the axis variable. By recomputing A and B only when we go from edit mode to run mode, we decrease the amount of error that is introduced into our calculations.

Whenever we change the pose of the object, we compute the value of the new points by using fortrans (with the appropriate transformation matrix for the new affine axes).

Here's an example. In frame 1 I drew a giant X in on the front axis. Then I entered run mode and changed the frame to 700. The X still appears in the same place.

Mapping Textures with Affine Transforms

We can also use affine transforms to map textures to objects. All we have to do is take all the points on the object's surface, and map them to points on the texture or image. This is easily done using revtrans. In the function facemap, I have shown how to do this by mapping my face to the of the box:

Input Pictures:

Output Pictures with Texture Maps

5. Conclusion and Further Work

This idea has shown great promise in this project. The program that I have built should be developed in two different directions.

Tracking

The interface that we currently have can easily be used as a front end for a tracking engine such as X-Vision. Lines would be interpreted as edge trackers and boxes would be interpreted as SSD trackers. This GUI-Matlab interface would significatly increase the ease of using X-vision for small projects.

An Augmented Reality Editor

This project can be upgraded into an augmented reality editor. In this way a user can draw images on the planes of a three dimensional object. This would make many things, like drawing words of lines on a slanted surface, much easier. It would be similar to layers that appear in normal drawing programs but the layers themself would not need to be in parallel planes. One useful property for this Editor would be the ability to map textures onto parts of surfaces as I have done in my facemap program.

Appendix -- Augmented Editor Files

Changes made to Matdraw: drwcback.m, I added the following lines:

'Tag','MDLine',...
'UserData',[WhichAx;0],...
'ZData',[1 1]);

When the line object is created. These were the only changes needed for matdraw. The tag allows us to find the line to do the affine transforms on it. The UserData field allows us to store which axes the line has been drawn in (Front, Side, Bottom). The Zdata field is there because it allows us to see the lines even when the a new image is drawn under it.

revtrans.m

This file does simple affine transformations that uses the transform (stored in gettrans.m) to store the line data in Euclidean coordinates.

fortrans.m

This uses the Euclidean coordinates generated by revtrans plus the axes generated by gettrans to generate the affine coordinates in the appropriate axis.

throwup.m

This function deletes the previous image of the box and puts up a new one.

guieditcback.m

This is called when the editing button is pressed. It turns the frame window and the next frame button off and reenables the axes listbox.

guiruncback.m

This is the opposite of guieditcback.m. It turns on the frame window and next frame button but sets the axes listbox to none.

listcallback.m

This function sets which axis we'll be drawing on.

Some notes on the GUI:

The GUI (mygui.m) is everything that the user sees except for drawing tools (provided by matdraw) and the image of the box. The image of the box is not included in the GUI because it will be deleted and GUIDE does not provide for that. The box is drawn by the function throwup.m.