IP Logo
distance keeper Computer Vision & Graphics  
Graphic Element West Graphic Element Middle Graphic Element East
 
Graphic Element Quadgray Start
Graphic Element Quadgreen Organisation
  Image Communication
  Computer Vision & Graphics
  Immersive Media & 3D-Video
  Hardware Architectures & Implementations
  Embedded Systems
Graphic Element Quadgray Fields of Competence
Graphic Element Quadgray Fields of Application
Graphic Element Quadgray Alliances & Committees
Graphic Element Quadgray Products
Graphic Element Quadgray Events
Graphic Element Quadgray Staff
Graphic Element Quadgray Jobs
Graphic Element Quadgray Visitors
Graphic Element Quadgray Contact
Graphic Element Quadgray HHI Home
Group 2 Logo
   
 

Image-based Rendering of Faces

Image-based rendering is a technique which has received a considerable interest in computer graphics for the realistic rendering of complex scenes. Instead of modeling shape, material, reflection of objects as well as light sources and light exchange with high accuracy and sophisticated physical models, image-based rendering synthesizes new views of a scene by interpolating among multiple images taken with one or multiple cameras. The use of real pictures leads to naturally looking scenes and allow the reproduction of fine structures (e.g., hair, fur, leaves) that are difficult to model with polygonal representations. Also, the rendering complexity is independent from the scene content since interpolation is performed on pixels instead of polygons. As a result, sophisticated scenes can naturally be rendered with limited computational complexity.

Although image-based rendering has traditionally been applied to view synthesis of virtual environments, the method can also be applied to dynamic scenes with more degrees of freedom. We use IBR techniques for natural animation of faces. In contrast to existing approaches, we combine geometry warping with image-based rendering in order to describe global head motion and to render a correct outline even in presence of hair. In order to reduce the memory requirements, only head turning with the most dominant image changes is interpolated from a set of initially captured views, whereas other global head motions are represented with a geometry model. Similarly, the jaw movement which affects the silhouette of the person viewed from the side is also represented by geometry deformations. Local expressions and motion of the mouth and eyes are directly extracted from the video, warped to the correct position using the 3D head model, and smoothly blended into the global head texture. The additional use of geometry in image-based rendering severely restricts the number of images required but enables head rotation of the person as a postprocessing step in applications like virtual conferencing.



Features

  • Realistic rendering of faces
  • Correct handling of silhouette and occluded areas
  • Natural representation of hair
  • Less demands on accuracy of 3D head model
  • Full control over head pose

Image-based View Interpolation

The rendering of new frames is performed by image-based interpolation combined with geometry-based warping. Given a set of facial animation parameters, the frame of the image cube having the closest value of head rotation is selected as reference frame for warping. Thus, the dominant motion changes are already represented by a real image without any synthetic warping. Deviations of the deisred global motion parameters from the stored values of the initialization step are compensated using 3D geometry. This combination of geometry warping with image-based interpolation allows a very flexible trade-off between accuracy and size of the image cube.

Head translation and head roll can be addressed by pure 2D motion, only head pitch needs some depth dependent warping. As long as the rotation angles are small which is true in most practical situations, the quality of the geometry can be rather poor. Also local deformations due to jaw movements are here represented by head model deformations. In order to combine both sources, alpha blending is used to smoothly blend between the warped image and the 3D model.

Realistic rendering of moving eyes and mouth is difficult to achieve. We therefore use the original image data from the camera to achieve realistic animation of face features. The area around the eyes and the mouth is cut out from the camera frames, warped to the correct position of the person in the virtual scene using the 3D head model, and smoothly merged into the synthetic representation using alpha blending.


Manipulation of head pose. The video is rendered with a face orientation different from the original camera sequence.


Virtual Video Conferencing

In virtual conferencing, multiple distant participants can meet in a virtual room. The use of a synthetic 3D computer graphics scene allows more than two partners to join the discussions even if they are far apart from each other. Each partner is recorded by a single camera and the video objects are inserted into the artificial scene. In order to place multiple participants into a common room, viewpoint modification is necessary which requires information about 3D structure of the person. This geometry information can be estimated from multiple frames or a-priori knowledge is utilized by means of a rough generic head model.

Since a wide range of different head poses is covered in the image cube, large changes compared to the real orientation can be applied later on for the rendering of new views in the virtual scene. This enables many enhancements compared to conventional systems. For small displays, e.g., head motion is very small if a user looks at different people in the room. In order to show the other participants the current focus of a local user these head motions can be enhanced and adjusted to the chairs' positions at the virtual table. If one user is connected with a conventional terminal without tracking capabilities, also synthetic head motion can be added to show a visually more pleasing result. For that purpose, we added a speech detector, which selects the person currently speaking. The head of the user with no tracking capabilities is then turned to this person. For a video conferencing application, the entire rendering of the 3D scene with image-based warping of the video textures runs in real-time at 25 frames per second on a standard PC.


Virtual video conferencing system. Head pose of participants can be modified in order to illustrate inter-person communication of distant partners.


Publications

P. Eisert and J. Rurainsky,
"Geometry Assisted Image-based Rendering for Facial Analysis and Synthesis," Signal Processing: Image Communication , vol. 21, no. 6, pp. 493-505, July 2006.

P. Eisert and J. Rurainsky
"Image-based Rendering and Tracking of Faces," Proc. International Conference on Image Processing (ICIP05), Genova, Italy, pp. 1037-1040, September 2005.


Contact

Dr. Peter Eisert
Email:
Phone: +49 30 31002 614
Fraunhofer Institute for Telecommunications
Einsteinufer 37
D-10587 Berlin
Germany