Image-based rendering is a technique which has received a considerable
interest in computer graphics for the realistic rendering of complex
scenes. Instead of modeling shape, material, reflection of objects as
well as light sources and light exchange with high accuracy and
sophisticated physical models, image-based rendering synthesizes new
views of a scene by interpolating among multiple images taken with one
or multiple cameras. The use of
real pictures leads to naturally looking scenes and allow the
reproduction of fine structures (e.g., hair, fur, leaves) that are
difficult to model with polygonal representations. Also, the rendering
complexity is independent from the scene content since interpolation
is performed on pixels instead of polygons. As a result, sophisticated
scenes can naturally be rendered with limited computational
complexity.
Although image-based rendering has traditionally been applied to view synthesis
of virtual environments, the method can also be applied to dynamic scenes
with more degrees of freedom. We use IBR techniques for natural animation
of faces. In contrast to existing approaches, we combine geometry
warping with image-based rendering in order to describe global head
motion and to render a correct outline even in presence of hair. In
order to reduce the memory requirements, only head turning with the
most dominant image changes is interpolated from a set of initially
captured views, whereas other global head motions are represented with
a geometry model. Similarly, the jaw movement which affects the
silhouette of the person viewed from the side is also represented by
geometry deformations. Local expressions
and motion of the mouth and eyes are directly extracted from the
video, warped to the correct position using the 3D head model,
and smoothly blended into the global head texture. The additional use
of geometry in image-based rendering severely restricts the number of
images required but enables head rotation of the person as a
postprocessing step in applications like virtual conferencing.

Features
- Realistic rendering of faces
- Correct handling of silhouette and occluded areas
- Natural representation of hair
- Less demands on accuracy of 3D head model
- Full control over head pose
Image-based View Interpolation
The rendering of new frames is performed by image-based interpolation
combined with geometry-based warping. Given a set of facial animation
parameters, the frame of the image cube having the closest value of
head rotation is selected as reference frame for warping. Thus, the
dominant motion changes are already represented by a real image
without any synthetic warping. Deviations of the deisred global motion
parameters from the stored values of the initialization step are
compensated using 3D geometry. This combination of geometry
warping with image-based interpolation allows a very flexible
trade-off between accuracy and size of the image cube.
Head translation and head roll can be addressed by pure 2D motion,
only head pitch needs some depth dependent warping. As long as the
rotation angles are small which is true in most practical situations,
the quality of the geometry can be rather poor. Also local
deformations due to jaw movements are here represented by head model
deformations. In order to combine both sources, alpha blending is
used to smoothly blend between the warped image and the 3D model.
Realistic rendering of moving eyes and mouth is difficult to achieve.
We therefore use the original image data from the
camera to achieve realistic animation of face features. The area
around the eyes and the mouth is cut out from the camera frames,
warped to the correct position of the person in the virtual scene
using the 3D head model, and smoothly merged into the synthetic
representation using alpha blending.

Manipulation of head pose. The video is rendered with a face orientation different from the original camera sequence.
Virtual Video Conferencing
In virtual conferencing, multiple distant participants can meet in a
virtual room. The use of a
synthetic 3D computer graphics scene allows more than two
partners to join the discussions even if they are far apart from each
other. Each partner is recorded by a single camera and the video
objects are inserted into the artificial scene. In order to place
multiple participants into a common room, viewpoint modification is
necessary which requires information about 3D structure of the
person. This geometry information can be estimated from multiple
frames or a-priori knowledge is utilized by means of a rough generic
head model.
Since a wide range of different head poses is covered in the image
cube, large changes compared to the real orientation can be applied
later on for the rendering of new views in the virtual scene. This
enables many enhancements compared to conventional systems. For small
displays, e.g., head motion is very small if a user looks at different
people in the room. In order to show the other participants the
current focus of a local user these head motions can be enhanced and
adjusted to the chairs' positions at the virtual table. If one user is
connected with a conventional terminal without tracking capabilities,
also synthetic head motion can be added to show a visually more
pleasing result. For that purpose, we added a speech detector, which
selects the person currently speaking. The head of the user with no
tracking capabilities is then turned to this
person. For a video conferencing
application, the entire rendering of the 3D scene with image-based
warping of the video textures runs in real-time at 25 frames per
second on a standard PC.

Virtual video conferencing system. Head pose of participants can be modified in order to illustrate inter-person communication of distant partners.
Publications
P. Eisert and J. Rurainsky,
"Geometry Assisted Image-based Rendering for Facial Analysis and Synthesis,"
Signal Processing: Image Communication ,
vol. 21, no. 6, pp. 493-505, July 2006.
P. Eisert and J. Rurainsky
"Image-based Rendering and Tracking of Faces,"
Proc. International Conference on Image Processing (ICIP05),
Genova, Italy, pp. 1037-1040, September 2005.
Contact
Dr. Peter Eisert
Email:
eisert@hhi.fhg.de
Phone: +49 30 31002 614
Fraunhofer Institute for Telecommunications
Einsteinufer 37
D-10587 Berlin
Germany