Bullseye
you are in savethegaywhale || msc || evaluation
4 Evaluation
4.1 Evaluation of applications
In order to ensure an effective evaluation of the applications, it is necessary to decide on a methodology and the criteria for applying that methodology. This ensures that each application is similarly evaluated and that the results of the evaluation are valid.
The methods to be used in a usability evaluation can be encapsulated in the acronym IMPACT (Turner, in press), before deciding on what evaluation methodologies are to be used it is necessary to define the parameters of the evaluation.
Intention: this seeks to establish the nature of the data by considering the questions to be answered by the evaluation. In this context they are the questions posed by the original proposal.
" Does SVG have sufficient advantages over Flash; with respect to bandwidth,
client side rendering, usability of interactive features and other key attributes
which research may elicit to enable it to become the primary means of vector graphic
delivery on the Internet?
" What key developments are required to facilitate wider use of SVG?
It is primarily the 1st question that will be addressed by the evaluation; the 2nd question will be addressed in the discussion following the evaluation. It is necessary to be aware that the entertainment value of the applications is not being assessed.
Metrics: this develops the intention into the qualitative and quantitative output of the evaluation and should be considered with respect to the overall aims.
" Bandwidth and file size are significant as they are directly related to download
time. Download time is recognised as a key criterion in section 3.2.
" Client side rendering is also related to download time as well as the specification
of the computer and the format being used. It is important to recognise that the
rendering process is dynamic. Not only is the time taken to display the applications
initial screen to be evaluated, but also the quality of the rendering as the applications
are run. In the case of 'Bullseye' this may be affected by the as the applications
speeding up as the user progresses through the levels of difficulty.
" Primary criteria for the usability of interactive features include client side
rendering (do the applications keep up with the user input?). This means it should
be 'usable' as discussed in section 3.1.1. Additionally both applications have been
designed to support interactive mouse, sound, image and text based features (the
latter two both static and dynamic).
People: in order that any user testing can be undertaken using users from the target group for which the application is designed it is necessary to define the main characteristics of the target group(s). For this evaluation users of 'Bullseye' are considered to be comfortable using the Internet in a Windows type environment and have previously downloaded small applications. No other demographic information is relevant.
Activities: defined as the typical activities that a user might undertake when using the application and are pertinent to defining a scenario that may be used in an evaluation. In this evaluation the activities are playing the game at all levels of difficulty.
Context: these activities may be carried out at work/school or home in an environment that is likely to have both audio and visual distractions. Normally it would be expected that the user undertook this as an individual, but it could also be a group activity with the user interacting with other users as well as the computer.
Technologies: this will be any computer complying with appendix 4 (revised). The computer will probably be static, but could a wireless networked laptop. The Internet connection type (modem or broadband) is relevant to download times.
The methods used must take into account the IMPACT analysis and the resources available to find the usability and format related problems in the application design. When considering options for expert review, one possibility is a cognitive walkthrough, but as the example demonstrated by Blackmon et al (2002) shows, this method can be complex and resource intense. Guideline based evaluation is useful for identifying recurring problems, but can miss severe problems and is resource heavy, whereas heuristic evaluation is a suitable method for finding both major and minor usability problems in a user interface and requires relatively little resource (Jeffries et al, 1991) especially in terms of the number of evaluators required (Nielsen and Molich, 1990). A combination of user evaluation and expert heuristic evaluation is considered optimal, while user evaluation may reveal problems in using the application, heuristic testing may help identify the cause of the problem and so suggest a possible solution (Doubleday et al, 1997). Furthermore user testing can reveal that some problems identified by experts are in fact trivial (Turner, in press).
However an expert and user evaluation will not fully address all the elements identified in the 'intention' and developed in the 'metrics' section above, specifically bandwidth and the specification of the computer being used. In order to assess how the applications perform on a variety of platforms an evaluation website was developed. This encourages users in the wider Internet community to use the applications and via an online form return information for analysis.
Initial steps in conducting an expert evaluation are to develop heuristics against which to judge the applications and create a suitable scenario of use, the evaluators can then work through the scenario and test for breaches of the heuristics. As a baseline for the heuristics, the set defined by Nielsen and Molich (1990) have been used. It is important to recognise that these heuristics whilst focussing on usability do so by primarily addressing the interface design. The intention and metrics of the IMPACT analysis reveals that whilst the evaluation is considering the usability of the applications, it is doing so by assessing more than just the interface design. It is necessary to modify the heuristics to evaluate the download time, speed of client side rendering and the overall ability of the formats to respond to user inputs correctly. The heuristics developed (appendix 11) reflect this and consider both the effectiveness of the formats as well as the interface design. This latter point should ensure that any identified limitations are able to be isolated as a feature of the format rather than a problem with the interface design.
Three evaluators with usability evaluation training conducted the evaluation, they were asked to use computers that complied with the revised specifications in appendix 4. This was an attempt to ensure that deficiencies in the format language were the focus of this stage of the evaluation as any problems related to resource requirement would be identified in the Internet based evaluation. The key results of the evaluation are summarised thus:
" The layout and interfaces of the applications contain no significant breaches
of heuristics 5 to 9 which related to the interface design differences, this is
important as it confirms that both applications are suitable for a comparative evaluation
by non-expert users.
" Evaluators observed that the rollover buttons to select the level of difficulty
on the initial screen (appendix 9), whilst displaying the conventional finger to
imply that they performed an action, only worked when clicked off the text area
of the button. This is a breach of heuristic 3, that the application should respond
correctly.
" Evaluators also noted that the crosshair (which tracks the mouse cursor) disappeared
behind the target when the cursor is moved over the target, so at the instant the
user clicked the mouse to fire, they could not be sure precisely where the crosshair
was aimed. This is a breach of heuristic 4, that the user should remain in control.
On a related matter it was observed that the flash effect created when the crosshair
was fired was not visible if the target was hit, as it was also hidden behind the
target in similar circumstances.
The problem with the rollover buttons is considered to be a bug. As far as can
be ascertained the code used is valid SVG and so should provide the functionality
intended. That such usability problems appear not to have been identified before
underlines that SVG is in some respects an immature technology.
The second problem is related to the rendering model, it is a feature of all current
SVG standards (World Wide Web Consortium (1), 2003) that objects are painted in
the order of listing in the file (outwith the defs section). The DOM (appendix 6c)
further clarifies the order of painting. However if objects are rearranged within
the DOM so that the crosshair and Flash (the SVG flash when the crosshair is fired)
objects render in the correct order, no hit is recorded if the crosshair is fired
over the target. Mong and Brailsford (2003) observe the difficulties that occur
from the painting order being determined from the ordering of the content stream
and note the proposal in SVG 1.2 to introduce the concept of a 'z-index' to alleviate
this. This would mean that objects can have their painting order determined explicitly
without affecting their relationship to each other in the DOM. However at the time
of writing the current working draft for SVG 1.2 (World Wide Web Consortium (5),
2003) states:
'Previous drafts of SVG 1.2 mentioned the possibility of a 'z-index' property to allow separation of document order from drawing order. After long consultations with implementers and content developers, the SVG Working Group has decided not to add the feature.'
They (World Wide Web Consortium (5), 2003) continue:
'It is possible to simulate 'z-index' at the moment either using SMIL animation and multiple use elements, or through scripting (moving an element toward the end of the document). However, both can place restrictions on document structure, and have limitations due to property inheritance.'
The W3C are not transparent as to why they are not implementing this previously recommended feature, however it will prove a limitation to developers not to have this feature in SVG 1.2. Furthermore the workarounds suggested (multiple use elements) complicate the development process and might prove too demanding for an inexperienced developer. These workarounds were contemplated during the development of 'Bullseye'. Nevertheless it was considered that having 2 crosshairs following the mouse cursor would use extra computing resource, which on lower powered computers, due to lags, could create the impression of 2 crosshairs on the screen at once. It was considered that this would create greater usability problems than 1 crosshair, which 'disappears' behind the target.
The user evaluation develops the expert evaluation, establishes if the expert findings are valid (some findings may be trivial in the real world) and uncovers problems that may have been missed by the experts. Detailed guidelines for the conduct of the user evaluation are given in appendix 12, but the key criteria and objectives are:
" To carry out an observed evaluation of the applications in a context defined
by the IMPACT analysis.
" To assess objectively (as opposed to the subjective Internet based evaluation)
how users compared the 2 applications and how they reacted to any problems
" If resources permit some evaluation will be conducted on computers where no SVG
viewer is installed, where Adobe SVG viewer version 2 (ASV2) is installed and where
ASV3 or ASV6 are installed. This will enable the reaction to the different situations
to be fully assessed.
Developing the last 2 points; the Internet based evaluation reveals a number
of users who were not aware of what SVG viewer they had (or knew they had ASV2).
These users complained of many unhelpful error messages and an inability to use
the application. The idea of asking users to play 'Bullseye' in SVG on computers
with ASV2 is to clarify the exact nature of these anecdotal usability problems.
4.5.1 Analysis
The evaluation was undertaken using 4 volunteers who complied with the 'people' requirement in the IMPACT analysis.
The most striking feature of this evaluation was the reaction of those users who were asked to play 'Bullseye', in SVG, on a computer with ASV2 installed. Users were confronted with error messages (given in detail in appendix 13) and, in practice, found the application unusable. Whereas, when playing the Flash application the response was simply 'it worked'. Those users invited to play 'Bullseye' in SVG on computers with ASV3 or ASV6 installed found the comparison much less noticeable. They did observe the usability problems that arose during the expert evaluation, but found them slightly less significant that the expert evaluation suggested. One user commented that the crosshair behaviour in the SVG application was alleviated by the inclusion of different sound effects depending on whether the target was hit or missed.
Overall the user evaluation did not add any new findings to the evaluation process, but it did support the findings related to ASV2 in the Internet based evaluations and confirm the findings of the expert evaluation. Therefore it did serve a useful verification purpose.
As already discussed, in order to address the matter of how interactive SVG applications perform relative to Flash on the Internet, a web based evaluation was developed. This consisted of a site, which encouraged visitors to play 'Bullseye' in both formats and fill in a simple web based form, the results of which were automatically emailed for collation and analysis.
A requirement of developing the site was the need to integrate the applications in to HTML pages, in Flash this is achieved by publishing the application with the appropriate check boxes ticked. This simple procedure produces an SWF file and an associated HTML file, which links to the SWF file. No understanding of coding in Flash or HTML is required. This is in contrast to SVG where the developer attempting to integrate an SVG application into an HTML page is presented with a choice. Should the application be integrated using the HTML <object> tag or the <embed> tag? The former is strict HTML whereas the latter is not part of any HTML specification. In practice, as Neumann and Winter (2003) observe and as Adobe Systems (2003) request of developers, it is necessary to use the non standardised <embed> tag. This introduces a complication for developers, as the de jure <object> tag will cause errors in some browsers, especially if the SVG application contains scripting (Neumann and Winter, 2003). The <embed> tag was used for the evaluation website. However in the 'Bullseye' website the <embed> tag does not give error messages if there is no SVG viewer installed. Therefore a page to help visitors establish if the had an SVG viewer and offer alternative courses of action was added. The consequences of the requirement to use the <embed> tag is not trivial. It is reasonable to suggest that visitors, who have no SVG viewer, require prompt and succinct information as to why they cannot see anything and should then be given sufficient information to download a viewer. That they do not is a significant impediment to the wider promulgation of SVG.
As a precursor to publicising the website to potential evaluators, an expert evaluation was conducted. The objective of this evaluation was to ensure that the website interface was usable according to heuristics based on Nielsen and Molich's (1990) guidelines and that the website was suitable for its purpose as discussed in section 3.1.2. The detailed guidelines to evaluators, heuristics, scenario and the URL are in appendix 14. The evaluation conducted by 3 evaluators with usability design training revealed some problems. The most significant was that the evaluation form opened in the same frame as the Flash and SVG applications, so if the user returned to an application whilst completing the form, it was empty when the user returned to it. The solution was to make the form open in a new window.
With reference to the 'people' section of the impact analysis a request to take part in the evaluation was circulated to 2 primary groups:
" General users of the Internet via email lists related to non-computing interests
known to myself, such users might have little or no knowledge of SVG. As such they
are representative of those who need to be persuaded to download an SVG viewer if
they do not already have one.
" An SVG developer's forum , where it can be expected that all users have an SVG
viewer and an interest and some understanding of SVG. Such users might be able to
give insightful opinions into the 2 formats.
The website was constructed to ensure that replies from the 2 groups could be
isolated for analysis. The form that users were invited to complete consisted of
a series of radio buttons relating to the specification of the platform they were
using and opportunities to comment on how they felt aspects of the applications
compared to each other.
Results of the Internet based evaluation:
(fields with zero returns are omitted)
| Selection | General users |
SVG developers forum | Total |
| OS: | |||
| Windows | 13 | 14 | 27 |
| Macintosh | 1 | 1 | |
| Internet connection: | |||
| Broadband | 8 | 12 | 20 |
| Modem | 3 | 2 | 5 |
| Browser: | |||
| I.E 5.5 or greater | 8 | 10 | 18 |
| I.E 5 or less | 1 | 3 | 4 |
| Other | 2 | 1 | 3 |
|
SVG viewer: |
|||
| ASV6 | 1 | 1 | 2 |
| ASV3 | 4 | 12 | 16 |
| ASV2 | 3 | 3 | |
| None/Not sure | 4 | 4 | |
| Flash Player: | |||
| 7 | 5 | 5 | 10 |
| 6 | 3 | 3 | 6 |
| 5 | 1 | 1 | |
| Not sure | 2 | 6 | 8 |
| Flash download speed: | |||
| Poor | |||
| Satisfactory | 4 | 2 | 6 |
| Good | 6 | 8 | 14 |
| Graphics quality: | |||
| Satisfactory: Yes | 5 | 5 | 10 |
| Satisfactory: No | 1 | 3 | 4 |
| SVG download speed: | |||
| Poor | 1 | 1 | |
| Satisfactory | 5 | 4 | 9 |
| Good | 1 | 6 | 7 |
| Graphics quality: | |||
| Satisfactory: Yes | 3 | 7 | 10 |
| Satisfactory: No | 3 | 3 | |
| Preferred format: | |||
| Flash | 7 | 7 | 14 |
| SVG | 1 | 4 | 5 |
| Not sure | 2 | 2 |
Table 4: Summary of Internet based evaluation.
Key points of note arising from Table 4 are:
" SVG developers are all using ASV3 or ASV6 and so were able to operate 'Bullseye'
in SVG. This is in contrast to the majority of general users who did not know what
SVG viewer they had (judging by the nature of comments relating to error messages
they very probably has ASV2), and were unable to operate the application.
" In contrast to the previous point, one user who had Flash player 5 (on a Windows
platform) was automatically informed that he required Flash player 6. He was also
offered an automatic download and install of this version.
" For both formats nearly all users found the download times at least satisfactory,
however more users stated that Flash was good in this respect, with some users saying
that Flash was quicker to load.
" For both formats a significantly higher proportion of general users found the
graphics satisfactory, suggesting that users may be less critical than developers.
But overall a majority of both groups found both formats satisfactory, with Flash
being preferred overall, with some users suggesting it was smoother in operation.
Users also commented that in the SVG game, the rollover buttons for selecting the level of difficulty only worked when the cursor was over the graphical part of the button and not the text part. They noted that the 'flash' when the crosshair was fired went behind the target rather than in front as might be expected. These points have been discussed earlier in the evaluation of applications, but the fact that the wider Internet community noticed them emphasises their significance.
The reason why some users felt that Flash was quicker to load could be due to
the increased processing resource required by SVG viewers (as discussed in the review
of literature), despite the fact that the average processing power of the platforms
was more than 1Gb with at least 512Mb ram.
One possible explanation for the Flash game being thought smoother by some users
is that the natural progress of the Internet evaluation is to play 'Bullseye' in
Flash first then in SVG. However it is possible that the operation of the second
plugin is hampered by the first , if this is the case the results might be more
evenly balanced.
The Internet based evaluation implies that virtually all SVG viewing on the Internet
is done with Adobe viewers. As discussed in the 'Evaluation of SVG plugins' section,
this suggests that Adobe SVG viewers are the de facto standard, however a number
of users have no SVG viewer or (most probably) ASV2 and in both situations do not
get clear error messages explaining what their alternative courses of action are.
SVG was thought by a minority of users to have slightly smoother graphics; these
users were primarily from the SVG developer's forum and might be considered biased
in that judgement. However all the users who thought SVG to be smoother had high-powered
computers (>2MHz CPU and at least 256Mb RAM), suggesting that whilst the platform
specifications (given in appendix 4) established at the outset of the design and
development stage of the project were satisfactory for the Flash application, they
were inadequate for the SVG application. Consequently, based on the feedback from
the evaluation the baseline specification for the SVG application was increased
(appendix 4) by an approximate factor of 2. This suggests that Flash is a more accessible
format for delivering interactive vector graphics over the Internet than SVG.
Notwithstanding that some users preferred SVG, the most significant conclusion from
the Internet based evaluation is that SVG is an inferior format to Flash for delivering
interactive multimedia material over the Internet.
4.7 Evaluation of SVG resources
As a result of developing an interactive application in SVG, it is clear that whilst a usability evaluation of the attributes in the 1st research question will provide an answer to that question. The 2nd research question is unlikely to be wholly answered by this usability evaluation. In order to fully address the 2nd question it is apparent from the development experience that an evaluation of current leading SVG authoring applications is necessary. This arises because the Flash application was developed using the graphical user interface (GUI) Macromedia application, whereas the SVG application was developed using a text editor. It is reasonable to examine whether the difficulties and complexities discussed in developing the SVG version of 'Bullseye' are in part due to the development environment. The outcome of such an evaluation may provide guidance as to what developments are required to facilitate wider use of SVG.
In addition to evaluating SVG authoring applications, it is considered appropriate to conduct an evaluation of SVG viewer plugins. The review of literature demonstrated that there is variation in the ability of SVG viewers to implement certain features of the SVG specification and during the development of 'Bullseye' all prototyping was undertaken using ASV3 and to a lesser extent ASV6. Indeed Adobe specific namespaces were called:
xmlns:a3="http://ns.adobe.com/AdobeSVGViewerExtensions/3.0/"
a3:scriptImplementation="Adobe"
xmlns:a="http://www.adobe.com/svg10-extensions"
a:timeline="independent"
These permit the use of features such as sound.
The evaluation discussed in the review of literature was conducted by the W3C
(World Wide Web Consortium (6), 2003) and is confined to how viewers render a specific
suite of SVG features, but it does not examine how viewers deal with features such
as scripting. A potentially more significant problem is how viewers deal with error
situations; in particular do they comply with Nielsen and Molich's (1990) 8th heuristic
and provide a useful error message? As with an evaluation of development environments,
the outcome may provide guidance as to any developments that are required to facilitate
wider use of SVG.
4.8 Evaluation of SVG development applications
In order to properly evaluate development environments, a series of tests were developed to ensure that the environments were consistently examined across a range of key SVG features. The tests were carried out by downloading the applications (trial versions in the case of commercial applications) and after an initial familiarisation process the tasks were attempted. The tests and results are show in table 5, with more detailed comments on the results in appendix 15.
Task list:
Use the development environments to:
1. Create a bullseye target, made up from 3 circles. Save as an SVG file.
2. Make target traverse viewBox from left to right indefinitely.
3. Insert script to count number of times target traverses viewBox (displaying
count on screen).
4. Create an ellipse shape with a gradient fill.
5. Attach an Internet hyperlink to the ellipse shape, activated by mouse click.
6. Insert text of varying styles (size, colour and font).
7. Opens an SVG file of 'Bullseye'
The primary outcomes of the evaluation are:
| Environment | Task 1 | Task 2 | Task 3 | Task 4 | Task 5 | Task 6 | Task 7 |
| 1 JASC WebDraw | Fail | Pass | Fail | Pass | Pass | Partial | Fail |
| 2 Corel DRAW11 | Pass | Fail | Fail | Partial | Pass | Fail | Fail |
| 3 SodiPodi | Pass | Fail | Fail | Partial | Pass | Partial | Partial |
| 4 SViGio | Partial | Partial | Fail | Fail | Fail | Partial | Fail |
| 5 EvolGrafix | Pass | Partial | Pass | Pass | Pass | Partial | Pass |
| 6 Adobe Illustrator 10 | Partial | Fail | Fail | Pass | Fail | Pass | Fail |
Table 5: Summary of Development environment evaluation.
Table 5 demonstrates that even the simplest tasks are too challenging for some applications. This is in direct contrast to Trippe and Binder's (2002) assertion that artwork can simply be saved as SVG. Only EvolGrafix emerged from the evaluation as a plausible GUI environment for developing interactive SVG applications. If any of the applications are to be used for such development there would be a requirement (to a significant degree) to edit the source code in a text editor in order to achieve the desired functionality. This means that for SVG applications that are any more than static images some knowledge of SVG programming will be required. Whereas in Flash whilst knowledge of ActionScript is desirable, it is not necessary for tasks 1 to 6 above, but is helpful when developing an application such as 'Bullseye'.
For the evaluation of SVG plugins, tests were developed based on the iterative development of 'Bullseye'. That is the plugins attempted to render increasingly complex SVG applications. The tests and results are shown in table 6, with more detailed comments on the results in appendix 16.
Task list:
1. Display an image of 3 circles making up a target.
2. Display the target image and an embedded bitmap as a background.
3. Display an image of 3 circles traversing the screen indefinitely.
4. Demonstrate ability to open an Internet link by mouse clicking an object.
5. Display the traversing image and display text counting the traverses (JavaScript
support).
6. Show the crosshair following the mouse, hiding the cursor. (cursor hiding is
achieved by swapping the standard pointer for a transparent image, this is in
effect a specialised form of custom cursor.
7. Show the target traversing and changing its 'y' position with each traverse
(dynamic manipulation of the DOM with JavaScript).
8. Open 'Bullseye' on the Internet.
| Viewer | Task 1 | Task 2 | Task 3 | Task 4 | Task 5 | Task 6 | Task 7 | Task 8 |
| 1 ASV6 | Pass | Pass | Pass | Pass | Pass | Pass | Pass | Pass |
| 2 ASV3 | Pass | Pass | Pass | Pass | Pass | Partial | Pass | Pass |
| 3 ASV2 | Pass | Pass | Pass | Pass | Pass | Fail | Pass | Partial |
| 4 Corel | Pass | Pass | Fail | Pass | Fail | Pass | Fail | Partial |
| 5 Batik | Pass | Pass | Fail | Fail | Fail | Fail | Fail | Fail |
| 6 Amaya | Pass | Pass | Fail | Pass | Fail | Fail | Fail | Fail |
| 7 Mozilla | Pass | Fail | Fail | Fail | Fail | Fail | Fail | Fail |
Table 6: Summary of SVG viewer evaluation.
As Table 6 demonstrates, the Adobe ASV6 and ASV3 viewers were the only ones to
render the application sufficiently to allow the game to be played. The other viewers
fared less well. None of the viewers offered clear error messages with a choice
of alternative action when rendering was incomplete. This is clearly unsatisfactory
from a usability point of view and as the Internet based evaluation demonstrated,
the lack of clear error messages (such as when using ASV2) do not encourage the
user to download a suitable viewer and thereby promulgate SVG. The most detailed
messages were offered by the Amaya browser, which indicated that the use of Adobe
namespaces was unsatisfactory. When developing the mini applications for this part
of the evaluation care was taken to exclude reference to Adobe namespaces unless
they were critical to the functionality of the application. However in order to
include all the modelled functionality in the final application it was necessary
to use some Adobe specific features, such as the ability to play sound. It can be
argued that the inability of non-Adobe viewers to render Adobe specific features
is not significant as the Internet based evaluation demonstrated that users are
only using Adobe viewers.
This situation has come about, in part, due to Adobes strong presence in the software
market and their introduction of features that require an Adobe viewer, which has
given their viewers a competitive edge, even in a zero price market, as some developers
have optimised their applications to utilise the Adobe features. This, it can be
argued, demonstrates that users in the wider Internet community prefer a common
application for a purpose, indeed one that appears proprietary even if it is in
fact an open standards application, as this (proprietary behaviour) is perceived
by users to give the consistency that the W3C open standard SVG is intended to achieve,
but does not.