Deprecated: Assigning the return value of new by reference is deprecated in /customers/bellerophon.be/bellerophon.be/httpd.www/gim/wp-settings.php on line 472
Deprecated: Assigning the return value of new by reference is deprecated in /customers/bellerophon.be/bellerophon.be/httpd.www/gim/wp-settings.php on line 487
Deprecated: Assigning the return value of new by reference is deprecated in /customers/bellerophon.be/bellerophon.be/httpd.www/gim/wp-settings.php on line 494
Deprecated: Assigning the return value of new by reference is deprecated in /customers/bellerophon.be/bellerophon.be/httpd.www/gim/wp-settings.php on line 530
Deprecated: Assigning the return value of new by reference is deprecated in /customers/bellerophon.be/bellerophon.be/httpd.www/gim/wp-includes/cache.php on line 103
Deprecated: Assigning the return value of new by reference is deprecated in /customers/bellerophon.be/bellerophon.be/httpd.www/gim/wp-includes/query.php on line 21
Deprecated: Assigning the return value of new by reference is deprecated in /customers/bellerophon.be/bellerophon.be/httpd.www/gim/wp-includes/theme.php on line 623 Gestures in Motion
It has been 2 weeks since my last post and I regret to have to inform you that I have made almost no progress for this project whatsoever. As we are nearing the end of the academic year 2008-2009 there are also other projects that have to be finished and exames to be studied.
The past couple of weeks were filled with work for 2 other projects. 1 of them is named Chat 2.0 and the project was to create an Audio/Video client and server for communication through the RTP protocol, also using SIP and SDP. The other project was to make a computer game, named Visitor. And it is this project that has a real reason for being mentioned in this blog because, as you may have guessed by know, it uses… GESTURES!
The game Trine (which I mentioned before in this blog) uses gestures to create new objects in the game world and Trine served as an inspiration for the game we made for the project. Our own game uses aMiGoLib (my own gesture library, remember?) to recognize the gestures the player makes. These gestures are not only used to create new objects in the world (like boxes or platforms) but also to perform other actions (like healing and creating a forcefield).
Visitor gameplay video
So whilst I’ve not been able to work for this project per sé, the usage of my library in the game-project makes for a nice proof-of-concept of the usability of gestures and equally important: it proves my work on the library was usefull and that the library can be used in other projects as well.
For the game we used the Vester algorithm because it is still my personal favourite. Most of the time it recognized the gestures very well, although the gesture set contained some possibly ambiguous gestures. As you can see from the graphic below, the gesture for Platform and The Force might be ambiguous if the user doesn’t move to the bottom enough at the end of the gesture.
Gestureset used in Visitor
The good news is that these 2 projects are now finally completed and this and the next two weeks I will have time to finalize my work on the 3D gesture recognition and test different gesture sets. I also need to complete a first draft of my thesis text before the 1st of June and based on that my councellors will decide if I can present this project in June or if I have to postpone the presentation to August, giving me some more time to finalize. I’m going to try very hard to have it ready by June; let’s hope I’ll be up to the task.
You can expect the first official draft on this blog by sunday after which I plan to make regular updates about my progress on 3D recognition and results from testing different gesture sets on the 3 other algorithms I’ve already implemented.
Today was the day I finally solved the problem with the Rubine classifier and let it be no surprise that the reason of its failing was all my own dumb fault. In my three years of study I have often found the smallest errors the most difficult to see and this time it was no different. My first misconception was to add a calculation Rubine mentioned in his paper, but it was nowhere in his final formulas. Because some example code uses this calculation and some doesn’t, I wanted to play it safe and put it in for good measure. This led to my values being totally different from the ones I found in an example. Because I thought they used the calculation in the example as well it took me a long time (up until now) to figure out it was this step causing the problems. The second error was that I used the expression /= instead of *=. Fixing this brought my values from -2450 and -8065 to the range of 50 to 170, which you will agree is a lot more fun to work with (and more what one would expect as output values).
The first results of the Rubine algorithm give me 100% recognition rate. This is a bit surprising because in several comparisons to other algorithms Rubine is definetily not always the best recognizer. But then again, if you see some sample code available, one might understand why…
So I’m very happy because I now have 3 full algorithms up and running, which you will agree is much better than just 2. Chart-making here I come! Of course this was not my only feat in the past week. The multi-stroke possibilities were further expanded, also with satisfying results. Now I need to find out if the same method can be applied to Rubine, which I doubt very very very much (let’s try!). There is also an NBestList-return from the recognizer. This will come in handy once I’m comparing the different recognizers but also for users of AmigoLib, because context-dependent information can choose to pick the second gesture from the list if it’s score is not too much different from the first etc. .
Lastly I tried capturing some 3D gestures. The method is not perfect and it’s really difficult to get some usable data but I finally managed to record some gestures and transform the input into something more readable. The tracker is not flawless but it provides good enough overall coordinates to make a recognition algorithm work after some preprocessing (in theory). And that’s exactly my next step. Now that I have completed the 3 traditional algorithms for about 95% I’m going to start working on my $1 extension into 3D space. First of all I’m going to determine what kind of preprocessing steps are needed on the points to remove the noise from the data. After that I’m going to try to translate the preprocessing steps of $1 (except for the rotation invariance) to 3D. And finally I’m going to test this approach and see where some tweaking might be needed.
Image for the SiGer algorithm I discuss in my thesis text
At the same time I will start comparing the 3 2D algorithms (3 indeed, I’m still very happy!) for recognition rates and their inherent ability to handle the different preprocessing steps and difficulties such as rotation invariance, scale/position invariance and speed invariance. This will hopefully produce some interesting graphs (also see some of the older posts) to be used in the chapter “Results” of my thesis text. Speaking of which: I’ve also made some very good progress on that. The related work chapter is almost finished and I’ve started making some of the images I’m going to use to illustrate all the interesting stuff I’ve described. I’m at about 40 pages (without images) with 7000+ words so that seems right on schedule.
As it has become a habit over the last few posts, my goals for next week are:
- capture some more 3D gestures and extract usable data from them
- research how the $1 preprocessing can be done in 3D and implement the necessary mathematical functions
- make a bigger corpus of multistroke and singlestroke gestureexamples to be used in comparisons. The goal is to make gesturesets to test specific attributes of the recognizers, such as for example noise-sensitivity.
I hoped you liked the update, I sureley enjoyed finally finding the Rubine-password!
It has been one week since my last post, time for an update.
The first thing on my list was the Rubine algorithm. It really wasn’t working the way it should but because of the nature of the computations it is incredibly hard to debug without an example. After a long search I finally found some interesting data. Comparing my results to the ones on this page revealed a series of small errors in my feature-calculations. It even turned out I forgot to calculate one! After fixing the features it was time to go to the statistical part of the recognizer. Like I said before I have some example code in Java, but it seems some more tweaking was needed to get it to work properly in c++. Now I’ve got everything working up until the Matrix.inverse(). This is still an issue because it seems to go completely from from there on. I decided to leave Rubine once more and start on multistrokes.
For the multistrokes I first needed to see how I could get this into my existing code. Everything was built upon single strokes. The method I was following was that of the $N recognizer. This method doesn’t really require multi-stroke support in the inner workings of the library because it preprocesses the multistroke input to a range of single-stroke examples, which was already available in my code. The suggested algoritms for this preprocessing were not that difficult, so implementation was rather quick. Getting a testing environment up and running was a little bit more difficult (Qt is not being very friendly on this project) but finally I got everything working and now I can start testing and fine-tuning.
This means I haven’t really gotten around to implementing the math for 3D gestures and neither the rotation invariance for Vester. There is a very positive note however: the library for tracking 3D input with the wiimotes is up and running! This took me some time as well, but now that it’s working I think it shouldn’t take too much work to get a capturing environment ready. Another thing I did was write some more of my thesis document. The related - work chapter is really taking shape now, but I’m afraid I’m going to need a lot of graphics and I’m not really sure if I can just take the images from the original papers or if I need to remake them myself.
The plans for this week:
- Test the multistrokes-gestures with a drawing tablet
- Get the 3D-capturing environment up and running
- Revise the Rubine algorithm so it works properly
I hope this will not be overdoing it, but I’ve got high hopes. Thanks for reading.
A little and quick update this time, since there really isn’t all that much to tell. The past week I’ve been busy with some other schoolprojects and I haven’t had a lot of time for my gestures.
I did get around to trying to get the 3D tracking with the wiimotes to work. This was quite difficult. First of all I needed the latest version of Matlab, which is almost impossible to do legally. After some “creative searching” I obtained a version, which I couldn’t get installed. After even another faulty version I finally found one that I got up and running. With Matlab installed I could get the wiist-library, provided by Stijn and Tanja from Antwerp University, compiled. Trying to start the program resulted in a weird error, which even google couldn’t really help me with. I’ve contacted Tanja and Stijn and I hope they will be able to help me.
Matlab code can be compiled to a c++ compatible library
In the mean time I’ve started writing my thesis document. I’ve completed a large part of the introduction and also a bit of the “related work” chapter about the recognition algorithms I’ve found during my research.
All in all I’m starting to have a little bit more faith to get everything ready by the deadline, if I’m willing to cut some corners. To end this post, my planning for this week (in order of priority) :
- Get rubine working and implement other features
- Implement multistrokes
- Implement preprocessing steps for 3D
- Rotation invariance for vester’s algorithm
And if Tanja and Stijn are able to help me I’m certainly also going to make a capturing program for 3D coordinates. Let’s hope they come to my rescue!
It has been a while since my last post but that doesn’t mean I haven’t been working on this project. Since my last post I’ve tried implementing Rubine, which has given me much grief.
The Rubine algorithm isn’t all that complicated but it takes some getting used to the way of thinking. It is called a “statistical classifier” because it uses things like covariancematrices to calculate weights for its different parameters. I’ve had a course in statistics, so after some freshening up, I knew where the steps were coming from. The main problem was not understanding the algorithm, but implementing it in C++. The different features (which calculate mathematical data about the gesture) were not difficult to implement, but the statistical steps were far from evident. Luckily I had some example code from the $1-project and iGesture, but this code was in C# or Java. That wasn’t such a big problem until the line : matrix.inverse().
Never call a matrix simple
Now there may be a matrix class in C# and Java, in C++ there is none. Until then I had been using a very simple implementation of my own ( vector<vector<double>> ), but now I needed to perform a rather complex operation on this matrix. I’ve also had a course on basic matrix mathematics and as with the statistics, re-reading the old textbook brought back a lot of insight. There are many ways to determine the inverse, the easiest of which uses the determinant of the matrix. The determinant is easily calculated with a recursive function, which works well for matrices to about 5×5. Larger than that you can go and get yourself a cup of coffee, because your CPU will be doing overtime. Since the Rubine covariancematrix is at least 10×10 (and in my plans sometimes a 20×20), optimization was needed. This could only come from reducing the matrix to it’s upper-triangular form and I really couldn’t find a decent algorithm for this. To make an even longer story short : after a lot of work I found a class that provided the fast inverse-method. I haven’t gotten around to testing the final implementation yet, because I’m all fed up with rubine for the moment.
As if this wasn’t enough my graphical toolbox I use for the UI, Qt, was giving me a hard time. It seems that the capturing of mouse-move events gets less accurate and less frequent after a while (going from 50 inputpoints for an S-gesture to about 17 points for the same S in less than 5 minutes). Trying to solve this problem got me nowhere… the same problem existed on a QWidget as well (whereas I was working on a QGraphicsScene). It’ll just have to do for now until I find the problem.
So much for the bad news. Those who read the title know that there is some good stuff to come as well. It wasn’t all non-invertable matrices and weird mouse-events this week. I contacted the makers of $1 about the vester algorithm and they gave some interesting comments (i.e. they expect it won’t be as performant with some noise in the inputpoints, and that it would be less effective with similar gestures). These are exactly the kind of things I need because I can test them and make some cool graphs (see the previous posts) and try and determine if vester is better/equal/worse than $1 and why. I’ve actually just completed the basic implementation of the vester algorithm and so far it’s not missed a match.
Another very interesting thing the makers of $1 told me was the existence of $N. This is an extension of $1 for multistrokes, also solving the problems $1 had with 1-D gestures (i.e. lines). I’m going to try and implement multi-stroke versions of $N and vesters-algorithm (and possibly Rubine) and compare them as well.
Examples of basic 3D gestures
My plans have changed a little bit since last week. I don’t think I’ll have the time to implement the Hidden Markov Model-algorithm. I don’t really get all the fine details yet and I haven’t found any comprehensive example code to use to base my implementation on. But the most important reason is that I have a more interesting idea about the 3D gestures. I’m going to try to expand the basis of the $1 algorithm to 3D. This should be possible because it consists of relatively basic math. This will give me something new, original and usefull (because $1 was made for quick prototyping, what about prototyping of 3D gestures?). The only problem here is the capturing of the 3D points. The wiimote-library I’ve been waiting for has arrived but now I need the latest version of matlab to run it. I’m going to try and obtain it and capture some 3D gestures and recognize them with an adapted version of $1. I’m very exited about this and I hope I can get some good results out of this experiment.
All in all I think it’s looking good now. If I can really compare vester to $1, get my rubine working and recognize some basic 3D gestures in my own $1-adaptation I think I’ll have enough interesting results for my thesis to be the succes I always hoped it would be.
To be really honest, I’ve been a bit lost since the meeting with Tom Cuypers last tuesday. I knew it wasn’t going to be all fantastic and that I was still a long way from a decent thesis but as it turns out I have my work cut out for me still.
My original goal was to just research existing algorithms, implement some of them and prove through this that I could research a subject all by myself and use it to implement something usable. Too bad Robin, that’s only the first step. I had already given up hope to create something that was actually new (I simply don’t have enough knowledge about the subject to implement a totally new and innovative recognizing algorithm) but now I was told I need to do some real original research. For example : compare the different algorithms in different situations and try to determine which one is best in particular situations. It’s not that this wouldn’t be possible but it takes a lot of work to do these comparisons in a way that matters even a little bit (because I’m really not convinced about the validity of results obtained by one testperson, this testperson being the one who also made the experiment). So now I’m expected to have some nice graphs which indicate some interesting property of the algorithms I’ve researched. The fact that up until then I had not really thought of such interesting properties didn’t help.
my morale, 0 is right after the meeting
Luckily, as you can see on the above graph (see, I’m already trying to use them) this feeling was only right after the meeting. Since then I’ve tried to see the positive side of things and I have managed pretty well. Without further ado, here are my plans :
- this week / half of the next : Finish aMiGolib with $1 (and Vester, see below), Rubine (with all the features I can find)
- next week 2nd half : train aMiGolib with some gestures and run the algorithms on them. Hopefully this will give some interesting results (graphs to the rescue!)
- 3rd week : implement gesture-to-keyboard-shortcut application (and test it on photoshop).
- 4th week : try 3D capturing with the wiimote
- 5th week : hidden markov model implementation
- 6th week : testing and concluding
This is quite the tight schedule because I don’t have a huge number of weeks left before I have to present my thesis. Everything has to go as planned or I’m going to have to drop some things, which I really want to prevent at all costs.
For this I’m going to have to make a harsh decision : stop all internet research on gesture algorithms, libraries and applications starting right now. This is because whenever I start searching for “gesture” on the net, I keep on finding new and interesting things and pursuing these mostly takes up the better part of the day as I’ve seen happen again today. I’ve got a good number of things to write about now and probably I’ve missed some important things but I simply don’t have the time to read every interesting paper I find. Less reading, more coding Robin!
This decision is even worse because today I found a very interesting addition to my most interesting source, the $1 recognizer. I contacted the creator with some questions about his algorithm and it turns out it’s faster and even easier than the $1 algorithm (at least at first sight; GRAPHS NEEDED!). Vester’s algorithm (used in High Sign) doesn’t compare the path-distances between two examples, but the differences between angles of successive points in the examples. This is a very interesting approach which I’m sure to implement and research (I’ve even asked the creators of the $1 algorithm for their comments on this approach). Maybe I can actually get some original data from comparing this algorithm (which has not been documented in a paper or publication yet) with the others and seeing how this works. Also, Vester’s algorithm is not yet rotation invariant. I smell a challenge coming up.
All in all I’ve had a little dip but I’m on the rise again, posing clear goals for myself and there’s maybe even a chance for some original results; wait and see!
Or let me rephrase that a little bit : STL, iterators and templates!
I made big plans to try and implement the $1 dollar recognizer in my library today… I’m never thinking big when it comes to programming in c++ again. I wanted to keep things generic, extensible and that of course means templates. GestureExample<T> allowed me to use any kind of point for describing my gestures (T = Point2D or T = Point3D or T = Mr. T… it should all be possible). Sadly, programming templates is not really all that easy in c++. After some time I decided to cut the .cpp file and just put everything in the .h; totally oldschool, but it works !
So we have at least a basic working version of a very important class. Now to start on preprocessing the points in that class. I used a std::vector<T> to store my points. Just giving users access to this internal datastructure is not a good idea, so I wanted something like GestureExample<Point2D>::iterator and example.begin()… be careful what you wish for. Take a look at these snippets :
// this does not work
typedef std::vector<T>::iterator iterator;
// neither does this
typedef std::vector<T> pointContainer;
typedef pointContainer::iterator iterator;
// but hey, look at this!
typedef std::vector<T> pointContainer;
typedef typename pointContainer::iterator iterator;
Monty Python's Brave Sir Robin
Probably all kinds of very good historical and technical reasons why things are as they are… I can’t really wrap my head around it though. Eventually, even the custom-iterator-monster was defeated by Brave Sir Robin; on my way to bigger and better things. I’m not going to bore you with more salient and technical details about today’s endeavor. Let’s just say that “pure virtual template functions”, “vector.splice()” and “resampling algorithms” will shiver my spine for years to come.
Sure enough it pays to keep on going because at the moment of writing I have made some minor progress. A few preprocessing steps are implemented and most of the difficult internal work is tackled so I hope to be able to work a little bit quicker in the next few days. I’m curious what Tom Cuypers will have to say about my work on the library and my plans for it. You’ll read all about that tomorrow!
Friday I spent the best part of the day trying to find my way around LaTeX. It has been ages since our last encounter but so far so good. I’ve brought the suggested structure into a basic draft version, together with some short lines of text and I think it looks pretty good. I’m not quite sure about some small details but I’m geussing Tom can help me with those on tuesday. You can see the first version here : first draft (dutch).
The other thing I did on friday was making a gesture-recorder. A small application in flex/actionscript to quickly draw some examples and save them to XML-files. This will come in handy when testing my other applications and training the gesture-recognizers.
Amigo is the Spanish word for "friend"
Today I started working on what is supposed to become the most important part of this project : aMiGolib (a Motion input Gesture output library). The plan is to implement 3 different algorithms ($1, Rubine and HMM) and use it in some proof-of-concept projects like Gaptcha and Aether. I’m going to try to keep the library as extensible as possible, with extensive preprocessing options and a lot of configuration possibilities and even threads. So far c++ templates have been very unkind to me but I hope to make some real progress tomorrow and get some of the preprocessing steps up and running.
I also continued doing some research, mainly on existing libraries (to see how they were tackling my issues) but found that there are not a big number of them out there and also not always full open source. From these and earlier findings, I don’t think there is any good, free, multi-purpose, trainable, performant gesture library with decent documentation.
It has been about 1.5 months since my last post and I’m deeply ashamed about that. It has been really busy since then and I’ve had very little time to work on this project. Now the easter holidays have finally begun and this is my cue to start focussing on this project and making sure I get everything ready before the deadline.
I only have one day of classes every week, so that means I’m going to try spending 3 days a week on this project, starting today. I’m also going to try posting a new post here every day I worked on the project, keeping track of my progress.
I started today reading up on all the links and papers I had found the past weeks but had not yet read. There was some very interesting information in most of them, most notably about Hidden Markov Models and also about the original Rubine recognition features.
LaTeX : Scientific typesetting engine
While reading I made a textfile with some remarks which I want to incorporate in my paper. A few weeks ago I made a first proposition for the structure of this paper and recently I got back some feedback from Tom Cuypers. The structure was changed dramatically, but only for the best. I have a really good feeling about it now and am very eager to start writing. I decided to go for LaTeX instead of Microsoft Word after all, hopefully I won’t regret this decision.
I’ve also been thinking a lot about what I’m going to be programming. I had a lot of different ideas in the beginning, but given the current time frame I don’t think I’ll be able to do all of them. My idea at the moment is to just go for a basic recognition library, in C++, with 3 different algorithms ($1, Rubine and something with HMM) and later use it in a few practical applications ( Gaptcha, photoshop shortcuts by drawing letters, basic gestures to create objects in a 3D world and hopefully also something with 3D gestures).
These things seem doable at the moment and are also quite representative of my research and the possible applications of gestures. I have a meeting with Tom Cuypers on tuesday and I’m eager to hear his views on these plans. Maybe he has some other ideas or suggestions about what I should implement in order to have the best possible result in the end.
In the mean time I’m also waiting for the 3D capture algorithm which some colleagues are making and I hope to use that for some easy capturing of 3D coordinates.
Making objects with gestures in Trine
I also contacted the a member of the programming team of the game Trine. They use gestures as a big part in this new game and I asked about some details on how they did the recognition. The reply was very usefull for my part on how gestures are already being used in existing applications.
“I made it through the wilderness
Somehow I made it through”
A few weeks ago the exams ended and I found myself with some free time once again. I made you a promise that I would start on Gestris as soon as I could and here are the first results :
First gestris preview
Believe it or not, this simple program has already proven itself much more complicated than I could’ve imagined. My choice for Adobe’s Flex as platform had significant advantages UI-wise (mxml and easy event-structure for the win!) but it took me a while to figure out the best way to draw the board (redraw every frame or move components; I went with the redraw-option in the end, thanks to this article).
Not only the UI-side of tetris but also detecting collisions between different pieces was not as easy as it sounds (damn my ideals of keeping everything as generic as possible!).
In the end I spent a lot of time making the basic program and making it as flexible as possible. And then I hadn’t even started on the gesture recognition!
For this first proof-of-concept I wanted to keep things as simple as possible and what better way to do just that as by using the simplest algorithm I found during research : the $1 recognizer. The makers promise high recognition rates with few examples, rotation/scale invariance and one of their main goals was to make the algorithm easy to understand and implement. Sounds ideal to me :).
I started from scratch and used the JavaScript-code from the website as guideline. This implementation was relatively quick and very easy to understand indeed. Once I tried recognizing a few simple gestures (I first trained the recognizer with square, triangle and circle gestures, 1 example each), the algorithm was providing 33,33333% recognition rates, by which I mean it seemed to be randomly choosing gestures as output. After searching for a long time it turned out I was the one to blame; sinus and cosinus should not be forgotten about lightly so it seems. After this, the recognizer provided a lot better results. Time to move on to the real gestures for my pieces. Keeping things simple with L, Lmirror, Z and Zmirror (also 1 example each) the recognizer wasn’t perfect but it functioned well enough to make the game playable. After this the problems began.
I had the great idea to not just use gestures for defining the pieces but also for controlling them once they were on the board. Defining different lines for moving down, left and right showed the major flaw of the $1 algorithm : rotation-invariance also meant that I could not have the same basic gesture in different rotations for different meanings : he detected down as right, left as down etc. To make things even more complicated he even recognized the 3 lines as an L or Lmirror sometimes.
Different gestures for different pieces
I then decided to focus on the gestures for the pieces, adding square and “I” with the latter being the previous gesture for down. With this somewhat larger gesture set results were a little bit less acurate than with only 4 gestures. Especially the “I”-gesture was difficult because the recognizer kept recognizing L and Lmirror. And that is the status of Gestris at the moment : 90% basic logic implemented, 6 gestures of which 5 are recognized at good rates and one very irritating “I”, leaving me to believe that some additional algorithm will be necessary to disambiguate this from the other gestures.
All in all I’m very happy with the status of my thesis so far. I have done some basic research, tried out some of the found theories and algorithms and found that gesture recognition really isn’t all that easy (and neither is making tetris :D). The next step will be to further develop Gestris with even more pieces, try to find a solution for the “I” gesture and developing a good way to move the pieces without using the keyboard.
In parallel I have also started on the Wii-part of my thesis for which my counselors developed a very nifty piece of equipment for me to use as infrared-gloves. Pictures and a new post about this will follow as soon as I’ve worked on it a little more. Also I will be updating the “source”-page with some new and interesting links and there will be a post about applications which use gesture recognition for user input at the present, which will probably be the basis of a chapter in my thesis text, exploring the plethora of possibilities and interesting usages people have already found.