Thursday, November 17, 2011

AFX Extensions for Humanoid Animation


The new Humanoid Animation Framework, defined by MPEG-4 SNHC (Preda,
2002; Preda & PrĂȘteux, 2001) is defined as a biomechanical model in AFX and
is based on a rigid skeleton. The skeleton consists of bones, which are rigid
objects that can be transformed (rotated around specific joints), but not deformed.
Attached to the skeleton, a skin model is defined, which smoothly
follows any skeleton movement.
More specifically, defining a skinned model involves specifying its static and
dynamic (animation) properties. From a geometric point of view, a skinned model
consists of a single list of vertices, connected as an indexed face set. All the
shapes, which form the skin, share the same list of vertices, thus avoiding seams
at the skin level during animation. However, each skin facet can contain its own
set of color, texture and material attributes.
The dynamic properties of a skinned model are defined by means of a skeleton
and its properties. The skeleton is a hierarchical structure constructed from
bones, each having an influence on the skin surface. When bone position or
orientation changes, e.g., by applying a set of Body Animation Parameters,
specific skin vertices are affected. For each bone, the list of vertices affected
by the bone motion and corresponding weight values are provided. The weighting
factors can be specified either explicitly for each vertex or more compactly by
defining two influence regions (inner and outer) around the bone. The new
position of each vertex is calculated by taking into account the influence of each
bone, with the corresponding weight factor. BAPs are now applied to bone nodesand the new 3D position of each point in the global seamless mesh is computed
as a weighted combination of the related bone motions.
The skinned model definition can also be enriched with inverse kinematicsrelated
data. Then, bone positions can be determined by specifying only the
position of an end effector, e.g., a 3D point on the skinned model surface. No
specific inverse kinematics solver is imposed, but specific constraints at bone
level are defined, e.g., related to the rotation or translation of a bone in a certain
direction. Also muscles, i.e., NURBS curves with an influence region on the
model skin, are supported. Finally, interpolation techniques, such as simple linear
interpolation or linear interpolation between two quaternions (Preda & PrĂȘteux,
2001), can be exploited for key-value-based animation and animation compression.

Face and Body Animation in the MPEG-4 Standard


The MPEG-4 SNHC (Synthetic and Natural Hybrid Coding) group has standardized
two types of streams in order to animate avatars:
• The Face/Body Definition Parameters (FDP/BDP) are avatar-specific
and based on the H-anim specifications. More precisely the MPEG-4 BDP
Node contains the H-anim Humanoid Node.
• The Face/Body Animation Parameters (FAP/BAP) are used to animate
face/body models. More specifically, 168 Body Animation Parameters
(BAPs) are defined by MPEG-4 SNHC to describe almost any possible
body posture. A single set of FAPs/BAPs can be used to describe the face/
body posture of different avatars. MPEG-4 has also standardized the
compressed form of the resulting animation stream using two techniques:
DCT-based or prediction-based. Typical bit-rates for these compressed
bit-streams are two kbps for the case of facial animation or 10 to 30 kbps
for the case of body animation.In addition, complex 3D deformations that can result from the movement of
specific body parts (e.g., muscle contraction, clothing folds, etc.) can be modeled
by using Face/Body Animation Tables (FAT/BATs). These tables specify a set
of vertices that undergo non-rigid motion and a function to describe this motion
with respect to the values of specific FAPs/BAPs. However, a significant
problem with using FAT/BAT Tables is that they are body model-dependent and
require a complex modeling stage. On the other hand, BATs can prevent
undesired body animation effects, such as broken meshes between two linked
segments. In order to solve such problems, MPEG-4 addresses new animation
functionalities in the framework of AFX group (a preliminary specification has
been released in January 2002) by including also a generic seamless virtual model
definition and bone-based animation. Particularly, the AFX specification describes
state of the art components for rendering geometry, textures, volumes
and animation. A hierarchy of geometry, modeling, physics and biomechanical
models are described along with advanced tools for animating these models.
a maximum 3-D displacement for each vertex. An application may uniformly
scale these displacements before applying them to the corresponding vertices.
For example, this field is used to implement Facial Definition and Animation
Parameters of the MPEG-4 standard (FDP/FAP).
Finally, the H-anim 2001 standard does not introduce any major changes, e.g.,
new nodes, but provides better support of deformation engines and animation
tools. Additional fields are provided in the Humanoid and the Joint nodes to
support continuous mesh avatars and a more general context-free grammar is
used to describe the standard (instead of pure VRML97, which is used in the two
older H-anim standards). More specifically, a skeletal hierarchy can be defined
for each H-anim humanoid figure within a Skeleton field of the Humanoid node.
Then, an H-anim humanoid figure can be defined as a continuous piece of
geometry, within a Skin field of the Humanoid node, instead of a set of discrete
segments (corresponding to each body part), as in the previous versions. This
Skin field contains an indexed face set (coordinates, topology and normals of skin
nodes). Each Joint node also contains a SkinCoordWeight field, i.e., a list of
floating point values, which describes the amount of “weighting” that should be
used to affect a particular vertex from a SkinCoord field of the Humanoid node.
Each item in this list has a corresponding index value in the SkinCoordIndex
field of the Joint node, which indicates exactly which coordinate is to be
influenced.

The Web3D H-Anim Standards


The Web3D H-anim working group (H-anim) was formed so that developers
could agree on a standard naming convention for human body parts and joints.
The human form has been studied for centuries and most of the parts alreadyhave medical (or Latin) names. This group has produced the Humanoid
Animation Specification (H-anim) standards, describing a standard way of
representing humanoids in VRML. These standards allow humanoids created
using authoring tools from one vendor to be animated using tools from another.
H-anim humanoids can be animated using keyframing, inverse kinematics,
performance animation systems and other techniques. The three main design
goals of H-anim standards are:
• Compatibility: Humanoids should be able to display/animate in any VRML
compliant browser.
• Flexibility: No assumptions are made about the types of applications that
will use humanoids.
• Simplicity: The specification should contain only what is absolutely necessary.
Up to now, three H-anim standards have been produced, following developments
in VRML standards, namely the H-anim 1.0, H-anim 2.0 and H-anim 2001
standards.
The H-anim 1.0 standard specified a standard way of representing humanoids in
VRML 2.0 format. The VRML Humanoid file contains a set of Joint nodes, each
defining the rotation center of a joint, which are arranged to form a hierarchy.
The most common implementation for a joint is a VRML Transform node, which
is used to define the relationship of each body segment to its immediate parent,
although more complex implementations can also be supported. Each Joint node
can contain other Joint nodes and may also contain a Segment node, which
contains information about the 3D geometry, color and texture of the body part
associated with that joint. Joint nodes may also contain hints for inversekinematics
systems that wish to control the H-anim figure, such as the upper and
lower joint limits, the orientation of the joint limits, and a stiffness/resistance
value. The file also contains a single Humanoid node, which stores humanreadable
data about the humanoid, such as author and copyright information. This
node also stores references to all the Joint and Segment nodes. Additional nodes
can be included in the file, such as Viewpoints, which may be used to display the
figure from several different perspectives.
The H-anim 1.1 standard has extended the previous version in order to specify
humanoids in the VRML97 standard (successor of VRML 2.0). New features
include Site nodes, which define specific locations relative to the segment, and
Displacer nodes that specify which vertices within the segment correspond to
a particular feature or configuration of vertices. Furthermore, a Displacer node
may contain “hints” as to the direction in which each vertex should move, namely


3D Human Body Coding Standards


As it was mentioned in the previous section, an HBM consists of a number of
segments that are connected to each other by joints. This physical structure can
be described in many different ways. However, in order to animate or interchange
HBMs, a standard representation is required. This standardization allows
compatibility between different HBM processing tools (e.g., HBMs created
using one editing tool could be animated using another completely different tool).
In the following, the Web3D H-anim standards, the MPEG-4 face and body
animation, as well as MPEG-4 AFX extensions for humanoid animation, are
briefly introduced.
1989). A mathematical model will include the parameters that describe the links,
as well as information about the constraints associated with each joint. A model
that only includes this information is called a kinematic model and describes the
possible static states of a system. The state vector of a kinematic model consists
of the model state and the model parameters. A system in motion is modeled
when the dynamics of the system are modeled as well. A dynamic model
describes the state evolution of the system over time. In a dynamic model, the
state vector includes linear and angular velocities, as well as position (Wren &
Pentland, 1998).
After selecting an appropriate model for a particular application, it is necessary
to develop a concise mathematical formulation for a general solution to the
kinematics and dynamics problem, which are non-linear problems. Different
formalism have been proposed in order to assign local reference frames to the
links. The simplest approach is to introduce joint hierarchies formed by independent
articulation of one DOF, described in terms of Euler angles. Hence, the body
posture is synthesized by concatenating the transformation matrices associated
with the joints, starting from the root. Despite the fact that this formalism suffers
from singularities, Delamarre & Faugeras (2001) propose the use of compositions
of translations and rotations defined by Euler angles. They solve the
singularity problems by reducing the number of DOFs of the articulation.

Sappa, Aifanti, Grammalidis & Malassiotis


each arm and each leg. The illustration presented in Figure 1 (left) corresponds
to an articulated model defined by 22 DOF.
On the contrary, in computer graphics, highly accurate representations consisting
of more than 50 DOF are generally selected. Aubel, Boulic & Thalmann
(2000) propose an articulated structure composed of 68 DOF. They correspond
to the real human joints, plus a few global mobility nodes that are used to orient
and position the virtual human in the world.
The simplest 3D articulated structure is a stick representation with no associated
volume or surface (Figure 1 (left)). Planar 2D representations, such as the
cardboard model, have also been widely used (Figure 1 (right)). However,
volumetric representations are preferred in order to generate more realistic
models (Figure 2). Different volumetric approaches have been proposed,
depending upon whether the application is in the computer vision or the computer
graphics field. On one hand, in computer vision, where the model is not the
purpose, but the means to recover the 3D world, there is a trade-off between
accuracy of representation and complexity. The utilized models should be quite
realistic, but they should have a low number of parameters in order to be
processed in real-time. Volumetric representations such as parallelepipeds

3D Human Body Modeling


Modeling a human body first implies the adaptation of an articulated 3D
structure, in order to represent the human body biomechanical features. Secondly,
it implies the definition of a mathematical model used to govern the
movements of that articulated structure.
Several 3D articulated representations and mathematical formalisms have been
proposed in the literature to model both the structure and movements of a human
body. An HBM can be represented as a chain of rigid bodies, called links,
interconnected to one another by joints. Links are generally represented by
means of sticks (Barron & Kakadiaris, 2000), polyhedrons (Yamamoto et al.,
1998), generalized cylinders (Cohen, Medioni & Gu, 2001) or superquadrics
(Gavrila & Davis, 1996). A joint interconnects two links by means of rotational
motions about the axes. The number of independent rotation parameters will
define the degrees of freedom (DOF) associated with a given joint. Figure 1
(left) presents an illustration of an articulated model defined by 12 links (sticks)
and ten joints.
In computer vision, where models with only medium precision are required,
articulated structures with less than 30 DOF are generally adequate. For
example, Delamarre & Faugeras (2001) use a model of 22 DOF in a multi-view
tracking system. Gavrila & Davis (1996) also propose the use of a 22-DOF
model without modeling the palm of the hand or the foot and using a rigid headtorso
approximation. The model is defined by three DOFs for the positioning of
the root of the articulated structure, three DOFs for the torso and four DOFs for

An Ode To Conceptualiza tio n:

An Ode To Conceptualiza tio n:
(There is nothing like your first love, or the moment a great idea comes to you) . . .
Pencil poised,
gazing into space,
listening to my thoughts;
waiting to hear
what they have to say.
Soul searching,
that is soul soothing.
A special process
of self examination
and introspection.
Thinking of thoughts
thought about already,
or waiting to be thought.
As I think about these thoughts,
I am the thoughts