I had originally been experimenting with Depth.js as a way to expose Kinect data to the web browser and utilise gesture browsing through an environment we could control with javascript. However during the initial research phase of our latest Head labs project I discovered a new library also built to intercept the Open NI and Primesense NITE data and expose it to a javascript layer for use in the web browser. Enter Zigfu.

As opposed to Depth.js the Zigfu library is really aimed at game development rather than web browsing and the API is mainly concerned with tracking visible users and skeleton joints. The Zigfu library has an API for developing with both javascript and the Unity 3D game engine.  Currently the game engine exports as a unity plugin for use in the browser but Zigfu are also aiming to release a flash export in the future for better cross platform coverage. 

We decided to stick to the javascript flavour so we weren’t introducing a new technology stack via the Unity game engine into our research and as we are still in the early prototyping phases we didn’t really need the sophistication of the full 3D world that the Unity game engine would provide. FYI the Unity game engine supports javascript as a scripting language so for those of you wishing to explore it further for the purposes of game development it might not be as big a leap into the unknown as you might have feared.

To begin using Zigfu you require the Zigfu browser plugin. If you don’t have this installed then Firefox will give you a missing plugin warning however Chrome fails silently and this can be a little confusing. Also when running Zigfu I consistently return a javascript error saying this.parentElement is undefined. I believe it's a problem with the plugin inserting something into the page, possibly the unlicensed usage watermark but this part of the project is not available for editing so I’ve been ignoring that error for the time being.

I started with the basic tutorials which are all pretty straight forward and easy to follow. The documentation however is pretty thin, something which they acknowledged themselves and said they were keen to rectify in the fullness of time, so creating anything beyond the standard tutorial is a case of suck it and see for now.

We came across a lot of teething problems in the early stages of our prototyping. There are several areas of concern between the interface of the Kinect sensor and getting useful data displayed on screen. We experienced the same issues we came across when developing our Arduino project in using external electronic devices. There is a certain point in the workflow where you can not access any debugging information from the hardware which makes it difficult to ascertain where the fault might lie.

The most difficult issue we encountered is when there is no user detected by Zigfu. This happened fairly consistently and we are still unsure of the resolution. It was hard to determine if this was due to environmental factors, such as lighting, position or wiring, perhaps sunlight levels affecting the infra-red or a malfunctioning Kinect device or if the issues were purely based around incorrectly implemented code.

The issue appeared sporadically after the application had been functioning. If the application was left running for a long time it seems to stop working completely as if the plugin has crashed. I also experienced issues with my machine crashing and the mouse and keyboard malfunctioning and it was unclear if this was related to having the Kinect device attached or purely coincidental.

The only way to reset the application when it stopped responding completely was to unplug the Kinect, close all the browser processes and ensure the Zigfu plugin process had ended and then plug everything back in and start again. Another sanity check you can do to ensure the Kinect is wired into the USB and responding is to run the libfreenect GLView test application. You can find instructions on how to install that if you don't already have it working.

We also discovered the user's distance from the Kinect device is very important and I suspect this was responsible for 50% of the issues we were having with initial user response. You really have to be at least 3-4 feet away from the sensor to reliably return a detected user and you will also get much more stable skeleton tracking if you remain a certain distance from the device.

I wanted to map an object on screen to the user's hand in the X and Y dimensions (2D space). I started by implementing the simple cursor tutorial, which uses the Zigfu hand session detector. This part of the API defines a bounding box that captures when the user's hand is controlling the widget and when it is inactive.

You can see a video of our first prototype using Zigfu to control a target crosshair on screen.

However the cursor can drop out of the bounding box far too easily making it hard to control and this made the interface feel temperamental or broken. This confused me as a user was clearly engaged but the hand session would not begin. I had initially assumed there was a problem with my code as I was consistently returning a value of null for engaged users although I am now of the opinion this may have been a combination of distance and possibly sunlight interference, although we're still not entirely sure if this is really a factor at all. 

To get around the issue of the bounding box interaction I switched to using the plain skeleton data instead. We were hoping to use the finger joints as our pointer but we don’t seem to be able to pick up the joint data for either right or left fingers, even when the hand session is detected, which means we can’t do pointing or gestures yet but we're hoping that might come in future iterations of the library as the fingers joints are already listed in the joints array. 

I also found out through lengthy and frustrating trial and error that the device is quite slow to respond to a user's hand gesture. You either have to hold your hand steady to trigger the hand cursor, or shake it in the center of the screen and this introduces a lag, however once the hand gesture is detected movement can be sped up. 

Our final issue, which became a deal breaker when we came to demonstrate the prototype to our project stakeholder was a persistent juddering in the hand movement recorded. I suspected this was caused by a lack of finesse in the skeleton detection passed from Open NI/NITE to Zigfu.

You can see our initial 3D prototype here where we implemented the the hand position in X, Y, Z co-ordinates coming straight from the skeleton data.

I tried to smooth the juddering by using the jQuery animate method which would tween the positions set in between each frame. The animation has to be relatively fast to allow it to keep up with the 30fps animation and still allow tweening to be visible. However there are issues with this technique. Tweened animations build up an animation queue that is still running when users stops moving and introduces lag to the interface which we did not want. 

We came across several other people recording the same issues relating to juddering in various forums and discussion boards so we contacted Zigfu again to see if they could offer us some more help. They've been super quick to respond to us any time we've asked for help and we're really indebted to them for all the support they've given us and for sending us their unminified libraries to read through. Zigfu have told us that the hand-point is unstable because the hand joint is using skeleton data that isn't entirely smooth for UI usage. They have recently added an external hand-point API which provides a smoothed hand-point from the NITE middleware. 

This video demonstrates our exploration with the new hand point API which is considerably smoother and will allow us to progress onto the final part of the prototyping so we're really excited about the next stages of the project.