***Electronic Imaging 2005
By John Latta
San Jose, CA
January 17 – 20, 2005
This is the combined technology event between The Society for
Imaging Science and Technology and SPIE (The International
Society for Optical Engineering). Electronic Imaging has 6
programs on going at the same time. The conference description
is 232 pages. Thus, the task is to find sessions and
presentations of high interest.
It is no longer just imaging but now mobile media. We saw it
in presentations on 3D animations and digital photography. The
impact of the large market of personal-use media is impacting
the ecosystem from content production to delivery. There is
much more to making media happen than taking or receiving a
picture. Welcome to the Electronic Media conference.
3D stereo technology was everywhere. Many required glasses but
there were quite a few auto stereoscopic displays. There is
this magic appeal of creating a 3D display technology that
gains a presence in the market. Here at Electronic Imaging
there was no lack of effort.
Roadmap for CMOS Image Sensors
Professor Peter Catrysse of the Department of Electrical
Engineering, Stanford University gave an overview of the
issues around CMOS sensors. This was basically a summary of
the research efforts of the students in the department. An
important part of this research is the tool called Image
System Evaluation Tool (ISET). This is basically an image
chain tool. It includes the following chain elements:
Original Scene
Optics
Sensor
Image Processing
Perceptual Evaluation
A recent research result is the there is a visual threshold at
1,000 photos for noise in imaging. This happens at 3% contrast
from uniform photo noise. Micro-lenses which are used to
effectively increase the fill ratio of the sensing array play
an increasing role in the image chain.
The talk concluded that the tradeoff in image quality happens
as the sensor size continues to decrease. As the sensor gets
smaller the likelihood of photon noise becomes visible
increases while there are also limits on the effective use of
micro lens arrays.
Integrated Optics Design for Digital Cameras
Patrick Maeda of the Palo Alto Research Center gave an
interesting talk on the techniques of doing and end to end
camera simulation. The talk showed how it was possible to
follow the ISET model for image analysis but to also perform
traditional lens analysis. The simulations had at the center
of image chain either CodeV or Zemax lens design programs. The
macro and programming languages in these programs were used to
support the integration with the image chain analysis.
The approach to characterizing the imaging process is that of
a convolution. This is the convolution of the ideal irradiance
distribution with the point spread function (PSF) of the
optical system. In the spatial frequency domain this is a
linear filtering process. However, this approach is shift
variant. That is, the PSF varies across the field of view and
the wavelength. To get an accurate image simulation one much
segment the object plane into isoplanatic regions. The image
analysis becomes:
Generate an irradiance distribution for each
wavelength (object plane)
Generate a PSF for each wavelength and object
position
Generate the image plane image for each wavelength
and sum to form the final image.
The technique was shown for two lens examples. To determine
one PSF required the tracing of 100m rays. One complete image
chain analysis would take 10m of computation on a PC.
Comparison of CCD and CMOS Sensor in two DSLR Cameras
Kartik Venkataraman, Micro Technology, did an assessment of
the sensors in the Nikon D100 and Canon Rebel 300D cameras.
This was done without lenses and thus a direct analysis of the
sensors. The following tests were performed:
Dark Noise and Signal Analysis
Photo Response – Linearity, S/N and Dynamic Range
Spectral Sensitivity
Spatial Resolution – MTF
Visual Image Quality and s/n
What was so surprising is that the Nikon D100 with a CCD
sensor and the Canon with CMOS sensor were in a virtual dead
heat in performance.
Facial Animation on Cell Phones
There were actually two presentations. The first was by Thomas
Riegel of Siemens AG and the other by Ennco Sandali of the
University of Genoa, Italy.
Within MPEG-4 there is the capability to accomplish realistic
animation of the face, speech and facial expression. This is
done using 84 face definition parameters – FDPs. These
parameters include: texture points, scene graph (mesh), face
texture and face animation rules.
As a result it is possible to show 6 basic facial expressions
such as joy and anger. A typical compressed FAP is about .3 to
2k/bs.
One of the advantages of this technique is that it is possible
to apply FAP to either create an avatar of a generic face or
to apply this to a real face which is texture mapped onto a
face. If one expected to see the equivalent of a real person
in 3D it would be a disappointment but given the very modest
bandwidth requirements the image representation was quite
good.
Enrico Sandali of the University of Genoa, Italy, actually
showed a 3D animation engine on a cell phone. The goal of this
work was to show MPEG-4 compliant 3D facial animations on the
Symbian SmartPhones. The problem with these phones is that
there is no floating point and the computation resources are
quite limited. Tests were run on a Nokia 6600 and the Sony-
Ericsson P800. For smooth shaded facial textures a 3D frame
rate of 14.8 f/s was accomplished on the Nokia phone and
15.1f/s on the Ericsson. This was shown in both a video and on
a phone that Enrico was carrying. Given the resources the
results were quite good.
Mars Pictures – A Sight to Behold
The Plenary presentation was by Justin Maki, Jet Propulsion
Lab on the Mars mission based on the two vehicles: Spirit and
Opportunity. This is called the Rover Imaging System. Each
vehicle has 9 cameras on board, all at 1mp each. There was
also a descent camera which makes a total of 10 cameras per
rover mission. As Justin stated, this mission has more cameras
than all the other mars missions combined and that there have
been more images collected than all the other missions
combined. So far 70,000 images have been collected with a
large percentage of those in 3D. The pancam has 4 cameras and
this camera is on a pole above the rover platform. The camera
is typically used to collect large area images by rotating
about the pole on the rover. The high point of the
presentation was the stereo pictures. With over 500 in the
audience, each has a set of glasses. Some of the imagery was
outstanding. One came away with a real feel for the isolation
of Mars and the appeal that such distance exploration has. To
date Spirit has traveled 3,600+ meters and Opportunity 1,600+
meters. It is felt that these vehicles could go on for a year
or more – well beyond the design lifetime. It was exciting to
see the role that imaging has and continues to play as we
explore beyond the Earth.
Automatically Categorizing Facial Expressions
Brenton McMenamin, University of Wisconsin, presented the
paper “An Anatomically Constrained Neural Network Model for
the Categorization of Facial Expression.” One of the goals of
this study is related to the report that for new passports
“non-neutral” facial expressions will not be allowed since
they might interfere with automated facial recognition
methods. That is, a smile or sad expression might make it more
difficult to do facial biometrics. This technique, reported by
Brenton, was to show that it would be possible for an
automated method to detect non-neutral facial expressions.
The innovation is that a model was constructed of how the
human brain processes images. This modeled the flow of an
image from the retina to the thalamus and to the amygdale of
the brain. From the amygdala ones emotional response is
formed. The model also includes a more cognitive path which
goes from the thalamus to the visual cortex to the amygdala.
The power of the technique was the use of a single layer feed-
forward neural network. This is modeled using MATLAB. “Back-
propagation with momentum was used for training.” The training
data set consisted of 232 images including some from the CVL
Face Database. The results show accuracies of 84% and 86% for
two models that were tested. However, the presentation did not
show examples of the facial expressions and how the process
worked. This was interesting none-the-less.
Real Time Holography on a PC
The MIT Media Laboratory has a project to create video rate
holograms for PCs. This presentation by Michael Bove was an
update on that project. The research is being done in the
Object-Based Media Group. The goal of the work is to make a
holo-video display about the same size and cost as a desktop
CRT PC display and driven by a PC. Prior work had developed
the Cheops which is a 3D engine which uses SDI Onyx and a
holographic display using acoustic-optic modulators. The
research reported on this paper was to use a GPU for the 3D
processing. In this case, it is an NVIDIA Quadro FX3000G chip
on a PHY card. They would have liked to put this into one PC
chassis but the power and space requirements for the card
necessitated that the three cards reside in 3 PC cabinets. As
a result the output of the cards had to be synchronized for a
single display. The output was based on creating a stereogram.
This created multiple 2D views that used diffraction to
multiplex them spatially. The system created 32 views in each
of red, blue and green which took 70ms. The fringe computation
took 450ms and thus the system ran at a 2 f/s rate. When
compared to the much larger Cheops system which ran at only
.5f/s the advantage of the GPU method was evident. New results
were reported which used a new rendering method called RIP –
Reconfigurable Image Projection. This combined the efficiency
of a stereogram with the geometric accuracy of interference
modeling. This system ran at the impressive 1.2 f/s. It had
140 parallax views at 383 X 144 resolution.
This project took advantage of the continuing improvement in
GPU and with impressive results.
3D Demonstrations
3D Consortium
The 3D Consortium was announced. Its objective is to promote
the formation, expansion and development of the market for
three dimensional imaging.
www.3dc.gr.jp
Toshiba 3D Notebook
Toshiba was showing a notebook computer with a 3D LCD display.
Autostereoscopic and the quality was quite good.
Video2Stereo
The company, using technology developed by RAFAEL, an Israeli,
defense company, can covert video movies to stereoscopic
movies.
www.video2stereo.com/
Stereoscopic Player
Peter Wimmer of Austria was showing a stereoscopic movie
player. This allows one to play stereoscopic videos and DVDs
(external decoder required) and also allows you to watch live
video from a capture device. Since it is based on DirectShow,
it can handle almost any media format, e.g. AVI, MPEG, WMV and
ASF.
mitglied.lycos.de/stereo3d/
StereoGraphics
StereoGraphics has 3 LCD panels from 22” to 40” which show
autostereoscopic images. A lenticular array is placed in front
of the LCD panel which is directly against the LCD surface.
www.stereographics.com/
3D Stereoscopic Desktop Display
Planar had an example of its SD displays. This is two displays
with a combiner glass which they call the StereoMirror.
Polarized glasses are required.
www.planar.com/advantages/innovation/
dep3D – Large Screen 3D Display
Shown was a 40” rear screen projection 3D display. The depth
was quite good. It requires passive glasses.
www.dep3d.com/
3D Video Camera
21st Century 3D was showing the 3DVX. This is a small form
factor 3D motion picture camera. It is basically two video
cameras linked together and it used the Panasonic AG-DVX100A..
www.21stcentury3d.com/
2D – 3D Conversion Software
VREX was showing its 2D – 3D conversion software for both
still and video images.
www.vrex.com.my/
Watch Movies in 3D
DDD was showing its TriDef DVD player which is capable of
playing conventional 2D DVDs in 3D in real time.
www.ddd.com/
Is Computer Graphics Headed in the Right Direction?
Pat Hanrahan, Stanford University gave the Plenary and asked
the question: Realism or Abstraction: The Future of Computer
Graphics? In it he questioned if the drive to make increasing
3D realism with computer graphics the best direction to go.
Along the way Pat also questioned that there may be better
ways than gaming to use the processing power in today’s GPUs.
The holy grail at the annual SIGGRAPH event has been ever
better computer generated images of objects including animate
objects, especially humans. Pat Hanrahan goes so far to ask
the question - Are we focused on the wrong goal for computer
graphics? He notes the progression from Renaissance Art which
brought the invention of perspective and shading then to
computer graphics with realistic image synthesis. Now the next
step is virtual reality with teleimmersion and the complete
control of the sensory field. Yet, with this focus on
technology, we are missing the power of immersive effects of
the “The Glorification of St. Ignatius,” in the ceiling of the
Church of St. Ignazio. This painting, when viewed from the
correct location is immersive. What computer graphics is about
is not creating realistic images but abstract forms of what
will look to be realistic. He cited examples, including the
development of his own subsurface scattering and the use of
glare, contrast and blur in wide dynamic range images which
has the impact of looking more realistic. Pat then used a
number of examples to show how even simple representations can
be very useful. These included: engineering drawings including
those nearly 100 years old, graphical reasoning in plots that
go back 250 years, and the use of abstraction in thematic
maps. He ended by stating that we should look beyond just the
realistic representation of objects but to consider those that
are informative, expressionistic and more. During the Q & A
session we discussed the open issue – cannot we do more with
the powerful GPUs now developed for more that the realistic
form of expression.
Integral Photography – Looking for better 3D Displays
Integral photography dates back to Gabriel Lippmann with his
invention in 1908. Here at Electronic Imaging we have seen
more on Integral Photography than in the last 20 years. The
papers today point to sophisticated electronic integral
photography systems. This could well point to the future of
autostereoscopic displays.
Integral photography is the use of an array of lenses to
record an image of an object. Directly behind the lenses is a
recording material, each one capturing an image based on its
relationship with the others and the object. It is the spatial
distribution of the lens and capture system that captures the
3D detail. When the recorded images are played back, from the
recorded image position and the light passing through the lens
array, an image of the original object appears in front of the
lens array where the object was. This object has parallax
based on the original images that were recorded and not that
which recreates the object. The number of discrete views is
based on the number of images taken. If, for example, the
lenses were just vertical and cylindrical the parallax would
only be in the horizontal axis. This form of integral
photography is commonly seen in posers and other displays.
What was discussed here at Electronic Imaging is a more
sophisticated integral photography where each recording
element was a small lens.
Three-Dimensional Electro-Floating Display Based on Integral
Imaging Technology
S. Min of the Seoul National University, also Samsung,
reported on a technique to use a floating lens. This is an
electronic system in that the display device is an LCD
projector. The lens array is quite modest at 13 X 13 and 10 sq
mm each. The floating lens is 300 sq mm. There are a number of
limitations of the technique which included a flipped image
and the degree of magnification that can be supported
Projection-type Integral 3D Imaging Using Multi-facet Flat
Mirrors
Sungyong Jung of Samsung gave this presentation on this
technique which uses a segmented mirror instead of an array of
lenses. The technique also is electro-optical in that the
image can be detected in real time and transmitted. Further it
is also possible to create the images on a computer and
transmit them. There is a trade-off of the use of the mirrors
and the image quality. For example, If the number of mirror
elements goes up this allows for the creation of more
perspectives but if the radius of curvature of the mirror
surface increases the viewing angle goes down and there is
greater distortion. Two examples of mirrors were show – a 1D
curve which has only horizontal parallax and a spherical or 2D
curve which has full parallax. A floating lens is also used to
position and magnify the final image.
Autosteroscopic Liquid Crystal Display Using a Mosaic Color
Pixel Arrangement
K. Taira of Toshiba gave this presentation on how a integral
display can be put on a notebook computer. This was part of
the demonstration the prior day. There are two examples –
15.4” with 12 parallaxes and a 20.8” with 18 parallaxes. In
this case a high quality LCD display was used to make the 3D
display. There is a trade off of linear spatial resolution for
depth resolution. In the case of the 20.8” display this is a
QUXGA (3200 X 2400) RGB display. The 3D display is 300 X 800.
This large screen display showed only still images while the
15.4” display showed movies and interactive content. This
15.4” LCD panel was 1920 X 1200 and it resulted in a 3D
display of 300 X 400. During the question session it was asked
if Toshiba was going to commercialize the display there was no
indication they would do so.
A Long Viewing Distance Integral Photography Auto Stereoscope
Display
Hongen Liao of the University of Tokyo described a integral
photography technique to gain large object depths. This has
application in medicine. One is to allow multiple individuals
to see and participate in a surgery. The intent was to create
a 3D system capable of depths of several meters both in front
and behind the display. One of the trades is a reduction of
the viewing angle. The design used a 35 X 35 lens array with
10mm size. The image depth was 7.5m in front of the array and
7.5m behind it. A video was shown that conveyed the image
depth. Plans are to make a large scalable display using micro
displays and a convex lens.
When does a Virtual Environment Become Real?
Heinrich Bulthoff, Max Planck Institute for Biological
Cybernetics, Tubingen, Germany gave a compelling presentation
on multimodal integration and spatial cognition in virtual
environments. The work at his institute has taken the VR
experience to higher levels of realism. They are building a 6m
2D treadmill that will allow individuals to walk unconstrained
in this walking simulator. This will allow for a platform to
integrate all cues: visual, vestibular, haptic, auditory,
tactile and propriceptive.
Electronic Imaging had a special session on VALVE: Visual,
Action and Locomotion in Virtual (and Real) Environments. The
lead off speaker was Heinrich Bulthoff, the head of the Max
Planck Institute for Biological Cybernetics in Tubingen,
Germany. He made a good case for the study for psychophysics
using virtual environments. There are three factors to be
considered: control of the environment for testing, ability to
creating an environment for interaction and a the need for a
high degree of realism.
Some of the intriguing work explored how perception works when
faced with conflicting cues from different modalities. In one
case visual and haptic. That is, if eye perceives a certain
size and the hand another which is correct? The result of the
work is that the brain combines visual and haptic information
in a statistically optimal way. The cues are weighted
according to the reliability of the cue.
Another research effort sought to determine why certain
architectural spaces were more appealing and satisfying than
others. Again, using virtual reality it was possible to
develop five factors which explain individual response. This
is ranked by: isovist area, enclosure ratio, isovist
roundness, number & density of verticies and number of
symmetries. Invoist is a “viewshed polygon from one
standpoint.” Another definition is “The space that can be seen
from any vantage point is called an isovist.”
Heinrich’s work has driven the observation that missing from
this research is a architecture walk-through tool. That is, a
table which one can freely walk as if on a floor or street and
effectively stay in one spot. This implies a 2D treadmill.
This is quite difficult to do and still have the individual
perceive walking. The laboratory has defined a table based on
ball bearings that will simulate such walking. It will be 6m
in diameter and housed in a new building. Already the lab has
a 3DOF motion platform. As Heinrich the 2d treadmill becomes
the ultimate simulator with the ability to integrate all cues:
visual, vestibular, haptic, auditory, tactile and
proprioceptive.
With such an environment it becomes hard to tell the
difference between real and virtual.
WAVE Comments
The dimensions of imaging seen at Electronic Imaging were
striking and lie in three domains: Virtual, Real and Imaging
in Depth.
Virtual Imaging is about creating a world that does not exist.
3D rendering of objects, places and environments which look
real is the goal. Virtually all the effort is focused on
entertainment with movies being the form where the best
imaging qualities are seen. Computer and console games are
another form. Another aspect of the same domain is flight
simulators but these are task trainers with very specific
objectives.
Taking pictures, video and any form of recorded imagery is
capturing the real. This can be either real time or after the
fact. The sensory stimulation is based almost exclusively on
imagery and audio. The emphasis is on increasing quality of
the capture and play back. Higher resolution cameras and
larger screens for display is another aspect of quality.
Compared to virtual there is a much broader involvement of the
mass market in real. An excellent example of this is cameras
in cell phones. One aspect of real is that mass market creates
their own content and thus directly associates with the end
result. There is no expectation to simulate anything.
Imaging in Depth was the surprise at Electronic Imaging – the
seriousness of 3D. Much of the technology has been around for
100 years but what we saw was credible research to make 3D a
high quality capture and playback technology. The emphasis
from the Asian companies, Samsung and others is on auto
stereoscopic. This makes sense in that mass market products
cannot hinder the viewer with the requirement to wear glasses.
Note that early signs of this was seen at CEATEC and now at
this event we saw some of the efforts of the research to make
3D a viable electronic technology.
It is important to note that both virtual and real can use the
attributes of imaging in depth. In virtual this is mostly done
with HMDs (head mounted displays) but these are hardly mass
market products. While in the real there is IMAX 3D theaters
and some indications that 3D displays and movies may be
showing resurgence. Consumer 3D cameras have no indication of
a comeback as there are no 3D digital cameras on the market.
At Electronic Imaging we saw many examples of the impressive
use of imaging. Yet, only a small sliver of this has any
impact on the markets. Sure, this is a reflection of much of
what we saw was research or narrow use, such as the Mars
photographs. But given that visual modality is the dominant
human modality it remains a surprise that more use of the
technology is not happening. Today, the only market which
exhibits the use of the technology is entertainment: movies or
games. Yet, this is on the fringes of visual modality – a tiny
portion of what we see.
Wave Issue 0507 2/18/05 Article 1-01