***Ziff Davis 3D Benchmarks - A Hornet's Nest
by John Latta and David Lohse

Following Microsoft's Meltdown the WAVE Report was alerted by its
readers about the Ziff Davis 3D Benchmark that surfaced in the hardware
vendor's suites at the conference. With that began an effort to
understand the benchmark and its impact on the industry. This seemingly
simple exercise was became an entrance into a hornet's nest.

The WAVE Report has been receiving comments from companies as a result
of visits by Ziff Davis as it sought to create a benchmark. Our COMDEX
report (see WAVE #614) covered the annual ZDBop rollout of the 1997
benchmarks where virtually nothing was said about the 3D Benchmarks.
Yet, few open details have emerged until now on the tests and the
response of the 3D industry.

Intel, in Santa Clara and with support from Folsom, has been developing
a 3D benchmark called MonaLisa. On December 20th this benchmark was
turned over to Ziff Davis and it forms the basis for what will become
the Ziff Davis 3D Benchmark (no name has been given). The WAVE Report
has obtained an early version of MonaLisa and performed preliminary
testing with it.

If one factor is consistent with MonaLisa and the ongoing work on the
3D Benchmark is that these are works in progress. It should be taken as
no more than that. Given the enormous importance of benchmarking and
the role that Ziff Davis has in setting benchmarks we believe that even
a review of this early effort is of value.

At the end of the review we offer comments from the industry and our
Points to Ponder.


MonaLisa - An Overview

The MonaLisa interface is rough and the documentation scant. There are
a myriad of rendering options as well as a large number of models and
several scenes for testing purposes. However, when reviewing the
various data bases they appear crude and poorly assembled. This
particular version of the software tests only Direct3D in retained mode
but we understand that Immediate mode testing will be implemented in
the future.

Usability:

The program provides a number of the options and tests but it is
unclear what many of them actually do and their utility is not evident.
For example, several prominent but cryptic buttons and options on the
main screen are labeled "Scene Analyzer," "Bench Runner," "Rescan" and
"CPU Soaker" with no other indication of their role and impact on the
benchmarks. One can only take from the prominent placement that they
perform important functions. Similarly, a button labeled "Synthetic
Scene Generator" apparently enables the creation of a custom 3D
scene/model, but the editing options lack coherency and functionality.
Although we have not yet run a complete series of tests, we understand
from others who have tried, it can take up to a whole day - hardly an
efficient use of time to measure real time 3D acceleration. Further,
the frame rates are quite low in a number of the tests - 1 frame per
second.

Benchmarking:

There appear to be two primary ways to run tests: either with a custom-
designed batch text file based on input commands (also very sparsely
documented), or interactively via the GUI interface. Results are given
in frames/sec and kpps which we assume to be polygons/sec. Thus, this
test appears to evaluate only a limited set of performance parameters.

Options:

Screen sizes and bit depths are supported from 320x200x8 up to
1024x768x32. The number and types of lights can be customized, as can
the fill and shading modes (flat or Gouraud, with Phong currently
grayed out - this is not supported yet by Direct3D). Other grayed-out
options include fog table and antialiasing. There are also a large
number of texturing options, allowing the user to specify both the
scene and background texture (from a provided collection). Only two
filtering options are currently enabled: nearest and linear, while four
other options remain grayed out (nearest mipmap nearest up to liner
mipmap linear).

Scenes:

A large number of models are included with the benchmark (more than
50), most of which are only moderately complex textured models - as
reported by a user the chapel has 20,382 triangles and the city 44,558
triangles. During testing the camera path can be specified as either
linear, circular or static around the model. In addition, a few of the
models/scenes are actually more complex scenes that have pre-determined
camera paths (flight paths) for fly-through testing. In some of the
scenes the textures were poorly aligned including some with gaps which
became evident with movement.

Reporting:

MonaLisa can provide very detailed information about a scene or model,
including the number of vertices, triangles, materials, groups and
objects. It also has a separate "Scene Analyzer" function that can
analyze scenes and models for very specific information such as
triangle distribution by size and pixel depth and writes distribution.
However, it appears to only work with .sdl models and scenes.

Logging:

Although you can generate log files from the tests run, the files are
very primitive. Insufficient information is included in the log files;
many of the important parameters used in the tests are not reported.


Industry Response

The WAVE Report sent a draft of this article to a number of companies.

Ziff Davis was obviously guarded in its response to the WAVE Report. It
is their policy not to comment on unreleased benchmarks. However, their
work on the 3D Benchmark has become a consuming task and ones whose
scope and importance is much greater than anticipated. This is also a
reflection on the importance that the industry places on doing well in
3D and the critical role that benchmarks play. Ziff Davis would agree
to be quoted in stating that their objective is to "create THE 3D
benchmark" for the industry.

One area where Ziff Davis has been criticized is that their Winstone
and Winbench tests are not forward looking but test systems based on
last year's industry hardware availability and software sales. That is,
the industry is not challenged by the tests to set new performance
levels or capabilities. At WAVE we regard it critical that the
benchmarks on 3D allow companies to excel in both quality and
performance. If there is one significant shortcoming in the MonaLisa is
that it does not set a performance and quality expectation. We
recognize that the industry will continue to be driven by the single
number rating schemes but that 3D poses an especially difficult problem
of evaluating image quality which has both spatial and temporal
variations. MonaLisa does not address this and, for example, does not
even provide the means to support side by side comparisons between
systems.

Microsoft responded to the benchmark as follows:

These tests are a first step, however, they are focused on fill rate
and frame rate
as the primary metric of performance. It would, for example, be
better to have
tests which automatically scale to the performance of the hardware so
that
measurements are taken based on load capacity, not frame rate. As
chips become
available with transform and lighting acceleration these capabilities
also need to
be measured. A beginning but more remains to be done.

A developer also responded that:

The benchmark does not give any information about the driver or
hardware. It should
query the hardware and report on what it finds. Then it should have a
test case
that tests each cap bit reported and verify the capability.

Another hardware company stated that although MonaLisa does a good job
of measuring
overall system performance for the high performance 3D graphics
companies the tests
tend to mask this performance behind sluggish CPU performance. As a
result they
recommended that there be two modes: system testing and just 3D
acceleration
testing using the Direct 3D execute buffer performance evaluation.


Points to Ponder

MonaLisa and the Ziff Davis 3D Benchmarking exposed a raw nerve in the
emerging 3D industry. This is a high stakes game and benchmarking is a
key battleground. We can only encourage the industry to continue to be
vocal in its expectations and strive for increasing quality and
performance levels which will benefit the consumer and end user.

We also came away with another observation: where is Microsoft in the
definition of a 3D benchmark which tests Direct3D? It seems odd that
the key benchmark testing a Microsoft API was done by Intel. Microsoft
can play a major and important role in allowing developers to set new
performance levels and understand how well their hardware performs.

We recognize that MonaLisa is only a first step. By the same token the
industry needs much more. Given the role which Ziff Davis can and has
played in the past we look forward to the realization of a 3D Benchmark
which fulfills their objective of being THE benchmark. The 3D industry
looks forward to the next and hopefully vastly improved release.


Wave Issue 9704 2/28/97 Article 3-01