Bullseye

you are in savethegaywhale || msc || evaluation

sitemap

 

4 Evaluation

4.1 Evaluation of applications

In order to ensure an effective evaluation of the applications, it is necessary to decide on a methodology and the criteria for applying that methodology. This ensures that each application is similarly evaluated and that the results of the evaluation are valid.

4.2 IMPACT

The methods to be used in a usability evaluation can be encapsulated in the acronym IMPACT (Turner, in press), before deciding on what evaluation methodologies are to be used it is necessary to define the parameters of the evaluation.

Intention: this seeks to establish the nature of the data by considering the questions to be answered by the evaluation. In this context they are the questions posed by the original proposal.

" Does SVG have sufficient advantages over Flash; with respect to bandwidth, client side rendering, usability of interactive features and other key attributes which research may elicit to enable it to become the primary means of vector graphic delivery on the Internet?
" What key developments are required to facilitate wider use of SVG?

It is primarily the 1st question that will be addressed by the evaluation; the 2nd question will be addressed in the discussion following the evaluation. It is necessary to be aware that the entertainment value of the applications is not being assessed.

Metrics: this develops the intention into the qualitative and quantitative output of the evaluation and should be considered with respect to the overall aims.

" Bandwidth and file size are significant as they are directly related to download time. Download time is recognised as a key criterion in section 3.2.
" Client side rendering is also related to download time as well as the specification of the computer and the format being used. It is important to recognise that the rendering process is dynamic. Not only is the time taken to display the applications initial screen to be evaluated, but also the quality of the rendering as the applications are run. In the case of 'Bullseye' this may be affected by the as the applications speeding up as the user progresses through the levels of difficulty.
" Primary criteria for the usability of interactive features include client side rendering (do the applications keep up with the user input?). This means it should be 'usable' as discussed in section 3.1.1. Additionally both applications have been designed to support interactive mouse, sound, image and text based features (the latter two both static and dynamic).

People: in order that any user testing can be undertaken using users from the target group for which the application is designed it is necessary to define the main characteristics of the target group(s). For this evaluation users of 'Bullseye' are considered to be comfortable using the Internet in a Windows type environment and have previously downloaded small applications. No other demographic information is relevant.

Activities: defined as the typical activities that a user might undertake when using the application and are pertinent to defining a scenario that may be used in an evaluation. In this evaluation the activities are playing the game at all levels of difficulty.

Context: these activities may be carried out at work/school or home in an environment that is likely to have both audio and visual distractions. Normally it would be expected that the user undertook this as an individual, but it could also be a group activity with the user interacting with other users as well as the computer.

Technologies: this will be any computer complying with appendix 4 (revised). The computer will probably be static, but could a wireless networked laptop. The Internet connection type (modem or broadband) is relevant to download times.

4.3 Evaluation methods

The methods used must take into account the IMPACT analysis and the resources available to find the usability and format related problems in the application design. When considering options for expert review, one possibility is a cognitive walkthrough, but as the example demonstrated by Blackmon et al (2002) shows, this method can be complex and resource intense. Guideline based evaluation is useful for identifying recurring problems, but can miss severe problems and is resource heavy, whereas heuristic evaluation is a suitable method for finding both major and minor usability problems in a user interface and requires relatively little resource (Jeffries et al, 1991) especially in terms of the number of evaluators required (Nielsen and Molich, 1990). A combination of user evaluation and expert heuristic evaluation is considered optimal, while user evaluation may reveal problems in using the application, heuristic testing may help identify the cause of the problem and so suggest a possible solution (Doubleday et al, 1997). Furthermore user testing can reveal that some problems identified by experts are in fact trivial (Turner, in press).

However an expert and user evaluation will not fully address all the elements identified in the 'intention' and developed in the 'metrics' section above, specifically bandwidth and the specification of the computer being used. In order to assess how the applications perform on a variety of platforms an evaluation website was developed. This encourages users in the wider Internet community to use the applications and via an online form return information for analysis.

4.4 Expert Evaluation

Initial steps in conducting an expert evaluation are to develop heuristics against which to judge the applications and create a suitable scenario of use, the evaluators can then work through the scenario and test for breaches of the heuristics. As a baseline for the heuristics, the set defined by Nielsen and Molich (1990) have been used. It is important to recognise that these heuristics whilst focussing on usability do so by primarily addressing the interface design. The intention and metrics of the IMPACT analysis reveals that whilst the evaluation is considering the usability of the applications, it is doing so by assessing more than just the interface design. It is necessary to modify the heuristics to evaluate the download time, speed of client side rendering and the overall ability of the formats to respond to user inputs correctly. The heuristics developed (appendix 11) reflect this and consider both the effectiveness of the formats as well as the interface design. This latter point should ensure that any identified limitations are able to be isolated as a feature of the format rather than a problem with the interface design.

4.4.1 Analysis

Three evaluators with usability evaluation training conducted the evaluation, they were asked to use computers that complied with the revised specifications in appendix 4. This was an attempt to ensure that deficiencies in the format language were the focus of this stage of the evaluation as any problems related to resource requirement would be identified in the Internet based evaluation. The key results of the evaluation are summarised thus:

" The layout and interfaces of the applications contain no significant breaches of heuristics 5 to 9 which related to the interface design differences, this is important as it confirms that both applications are suitable for a comparative evaluation by non-expert users.
" Evaluators observed that the rollover buttons to select the level of difficulty on the initial screen (appendix 9), whilst displaying the conventional finger to imply that they performed an action, only worked when clicked off the text area of the button. This is a breach of heuristic 3, that the application should respond correctly.
" Evaluators also noted that the crosshair (which tracks the mouse cursor) disappeared behind the target when the cursor is moved over the target, so at the instant the user clicked the mouse to fire, they could not be sure precisely where the crosshair was aimed. This is a breach of heuristic 4, that the user should remain in control. On a related matter it was observed that the flash effect created when the crosshair was fired was not visible if the target was hit, as it was also hidden behind the target in similar circumstances.

The problem with the rollover buttons is considered to be a bug. As far as can be ascertained the code used is valid SVG and so should provide the functionality intended. That such usability problems appear not to have been identified before underlines that SVG is in some respects an immature technology.
The second problem is related to the rendering model, it is a feature of all current SVG standards (World Wide Web Consortium (1), 2003) that objects are painted in the order of listing in the file (outwith the defs section). The DOM (appendix 6c) further clarifies the order of painting. However if objects are rearranged within the DOM so that the crosshair and Flash (the SVG flash when the crosshair is fired) objects render in the correct order, no hit is recorded if the crosshair is fired over the target. Mong and Brailsford (2003) observe the difficulties that occur from the painting order being determined from the ordering of the content stream and note the proposal in SVG 1.2 to introduce the concept of a 'z-index' to alleviate this. This would mean that objects can have their painting order determined explicitly without affecting their relationship to each other in the DOM. However at the time of writing the current working draft for SVG 1.2 (World Wide Web Consortium (5), 2003) states:

'Previous drafts of SVG 1.2 mentioned the possibility of a 'z-index' property to allow separation of document order from drawing order. After long consultations with implementers and content developers, the SVG Working Group has decided not to add the feature.'

They (World Wide Web Consortium (5), 2003) continue:

'It is possible to simulate 'z-index' at the moment either using SMIL animation and multiple use elements, or through scripting (moving an element toward the end of the document). However, both can place restrictions on document structure, and have limitations due to property inheritance.'

The W3C are not transparent as to why they are not implementing this previously recommended feature, however it will prove a limitation to developers not to have this feature in SVG 1.2. Furthermore the workarounds suggested (multiple use elements) complicate the development process and might prove too demanding for an inexperienced developer. These workarounds were contemplated during the development of 'Bullseye'. Nevertheless it was considered that having 2 crosshairs following the mouse cursor would use extra computing resource, which on lower powered computers, due to lags, could create the impression of 2 crosshairs on the screen at once. It was considered that this would create greater usability problems than 1 crosshair, which 'disappears' behind the target.

4.5 User Evaluation

The user evaluation develops the expert evaluation, establishes if the expert findings are valid (some findings may be trivial in the real world) and uncovers problems that may have been missed by the experts. Detailed guidelines for the conduct of the user evaluation are given in appendix 12, but the key criteria and objectives are:

" To carry out an observed evaluation of the applications in a context defined by the IMPACT analysis.
" To assess objectively (as opposed to the subjective Internet based evaluation) how users compared the 2 applications and how they reacted to any problems
" If resources permit some evaluation will be conducted on computers where no SVG viewer is installed, where Adobe SVG viewer version 2 (ASV2) is installed and where ASV3 or ASV6 are installed. This will enable the reaction to the different situations to be fully assessed.

Developing the last 2 points; the Internet based evaluation reveals a number of users who were not aware of what SVG viewer they had (or knew they had ASV2). These users complained of many unhelpful error messages and an inability to use the application. The idea of asking users to play 'Bullseye' in SVG on computers with ASV2 is to clarify the exact nature of these anecdotal usability problems.

4.5.1 Analysis

The evaluation was undertaken using 4 volunteers who complied with the 'people' requirement in the IMPACT analysis.

The most striking feature of this evaluation was the reaction of those users who were asked to play 'Bullseye', in SVG, on a computer with ASV2 installed. Users were confronted with error messages (given in detail in appendix 13) and, in practice, found the application unusable. Whereas, when playing the Flash application the response was simply 'it worked'. Those users invited to play 'Bullseye' in SVG on computers with ASV3 or ASV6 installed found the comparison much less noticeable. They did observe the usability problems that arose during the expert evaluation, but found them slightly less significant that the expert evaluation suggested. One user commented that the crosshair behaviour in the SVG application was alleviated by the inclusion of different sound effects depending on whether the target was hit or missed.

Overall the user evaluation did not add any new findings to the evaluation process, but it did support the findings related to ASV2 in the Internet based evaluations and confirm the findings of the expert evaluation. Therefore it did serve a useful verification purpose.

4.6 Internet based Evaluation

As already discussed, in order to address the matter of how interactive SVG applications perform relative to Flash on the Internet, a web based evaluation was developed. This consisted of a site, which encouraged visitors to play 'Bullseye' in both formats and fill in a simple web based form, the results of which were automatically emailed for collation and analysis.

4.6.1 Site Development

A requirement of developing the site was the need to integrate the applications in to HTML pages, in Flash this is achieved by publishing the application with the appropriate check boxes ticked. This simple procedure produces an SWF file and an associated HTML file, which links to the SWF file. No understanding of coding in Flash or HTML is required. This is in contrast to SVG where the developer attempting to integrate an SVG application into an HTML page is presented with a choice. Should the application be integrated using the HTML <object> tag or the <embed> tag? The former is strict HTML whereas the latter is not part of any HTML specification. In practice, as Neumann and Winter (2003) observe and as Adobe Systems (2003) request of developers, it is necessary to use the non standardised <embed> tag. This introduces a complication for developers, as the de jure <object> tag will cause errors in some browsers, especially if the SVG application contains scripting (Neumann and Winter, 2003). The <embed> tag was used for the evaluation website. However in the 'Bullseye' website the <embed> tag does not give error messages if there is no SVG viewer installed. Therefore a page to help visitors establish if the had an SVG viewer and offer alternative courses of action was added. The consequences of the requirement to use the <embed> tag is not trivial. It is reasonable to suggest that visitors, who have no SVG viewer, require prompt and succinct information as to why they cannot see anything and should then be given sufficient information to download a viewer. That they do not is a significant impediment to the wider promulgation of SVG.

As a precursor to publicising the website to potential evaluators, an expert evaluation was conducted. The objective of this evaluation was to ensure that the website interface was usable according to heuristics based on Nielsen and Molich's (1990) guidelines and that the website was suitable for its purpose as discussed in section 3.1.2. The detailed guidelines to evaluators, heuristics, scenario and the URL are in appendix 14. The evaluation conducted by 3 evaluators with usability design training revealed some problems. The most significant was that the evaluation form opened in the same frame as the Flash and SVG applications, so if the user returned to an application whilst completing the form, it was empty when the user returned to it. The solution was to make the form open in a new window.

4.6.2 Analysis

With reference to the 'people' section of the impact analysis a request to take part in the evaluation was circulated to 2 primary groups:

" General users of the Internet via email lists related to non-computing interests known to myself, such users might have little or no knowledge of SVG. As such they are representative of those who need to be persuaded to download an SVG viewer if they do not already have one.
" An SVG developer's forum , where it can be expected that all users have an SVG viewer and an interest and some understanding of SVG. Such users might be able to give insightful opinions into the 2 formats.

The website was constructed to ensure that replies from the 2 groups could be isolated for analysis. The form that users were invited to complete consisted of a series of radio buttons relating to the specification of the platform they were using and opportunities to comment on how they felt aspects of the applications compared to each other.

Results of the Internet based evaluation:
(fields with zero returns are omitted)

Selection General users
SVG developers forum Total
OS:      
Windows 13 14 27
Macintosh 1   1
Internet connection:      
Broadband 8 12 20
Modem 3 2 5
Browser:      
I.E 5.5 or greater 8 10 18
I.E 5 or less 1 3 4
Other 2 1 3

SVG viewer:

     
ASV6 1 1 2
ASV3 4 12 16
ASV2 3   3
None/Not sure 4   4
Flash Player:      
7 5 5 10
6 3 3 6
5 1   1
Not sure 2 6 8
Flash download speed:      
Poor      
Satisfactory 4 2 6
Good 6 8 14
Graphics quality:      
Satisfactory: Yes 5 5 10
Satisfactory: No 1 3 4
SVG download speed:      
Poor   1 1
Satisfactory 5 4 9
Good 1 6 7
Graphics quality:      
Satisfactory: Yes 3 7 10
Satisfactory: No   3 3
Preferred format:      
Flash 7 7 14
SVG 1 4 5
Not sure   2 2

Table 4: Summary of Internet based evaluation.

Key points of note arising from Table 4 are:

" SVG developers are all using ASV3 or ASV6 and so were able to operate 'Bullseye' in SVG. This is in contrast to the majority of general users who did not know what SVG viewer they had (judging by the nature of comments relating to error messages they very probably has ASV2), and were unable to operate the application.
" In contrast to the previous point, one user who had Flash player 5 (on a Windows platform) was automatically informed that he required Flash player 6. He was also offered an automatic download and install of this version.
" For both formats nearly all users found the download times at least satisfactory, however more users stated that Flash was good in this respect, with some users saying that Flash was quicker to load.
" For both formats a significantly higher proportion of general users found the graphics satisfactory, suggesting that users may be less critical than developers. But overall a majority of both groups found both formats satisfactory, with Flash being preferred overall, with some users suggesting it was smoother in operation.

Users also commented that in the SVG game, the rollover buttons for selecting the level of difficulty only worked when the cursor was over the graphical part of the button and not the text part. They noted that the 'flash' when the crosshair was fired went behind the target rather than in front as might be expected. These points have been discussed earlier in the evaluation of applications, but the fact that the wider Internet community noticed them emphasises their significance.

The reason why some users felt that Flash was quicker to load could be due to the increased processing resource required by SVG viewers (as discussed in the review of literature), despite the fact that the average processing power of the platforms was more than 1Gb with at least 512Mb ram.
One possible explanation for the Flash game being thought smoother by some users is that the natural progress of the Internet evaluation is to play 'Bullseye' in Flash first then in SVG. However it is possible that the operation of the second plugin is hampered by the first , if this is the case the results might be more evenly balanced.

4.6.3 Conclusion

The Internet based evaluation implies that virtually all SVG viewing on the Internet is done with Adobe viewers. As discussed in the 'Evaluation of SVG plugins' section, this suggests that Adobe SVG viewers are the de facto standard, however a number of users have no SVG viewer or (most probably) ASV2 and in both situations do not get clear error messages explaining what their alternative courses of action are.
SVG was thought by a minority of users to have slightly smoother graphics; these users were primarily from the SVG developer's forum and might be considered biased in that judgement. However all the users who thought SVG to be smoother had high-powered computers (>2MHz CPU and at least 256Mb RAM), suggesting that whilst the platform specifications (given in appendix 4) established at the outset of the design and development stage of the project were satisfactory for the Flash application, they were inadequate for the SVG application. Consequently, based on the feedback from the evaluation the baseline specification for the SVG application was increased (appendix 4) by an approximate factor of 2. This suggests that Flash is a more accessible format for delivering interactive vector graphics over the Internet than SVG.
Notwithstanding that some users preferred SVG, the most significant conclusion from the Internet based evaluation is that SVG is an inferior format to Flash for delivering interactive multimedia material over the Internet.

4.7 Evaluation of SVG resources

As a result of developing an interactive application in SVG, it is clear that whilst a usability evaluation of the attributes in the 1st research question will provide an answer to that question. The 2nd research question is unlikely to be wholly answered by this usability evaluation. In order to fully address the 2nd question it is apparent from the development experience that an evaluation of current leading SVG authoring applications is necessary. This arises because the Flash application was developed using the graphical user interface (GUI) Macromedia application, whereas the SVG application was developed using a text editor. It is reasonable to examine whether the difficulties and complexities discussed in developing the SVG version of 'Bullseye' are in part due to the development environment. The outcome of such an evaluation may provide guidance as to what developments are required to facilitate wider use of SVG.

In addition to evaluating SVG authoring applications, it is considered appropriate to conduct an evaluation of SVG viewer plugins. The review of literature demonstrated that there is variation in the ability of SVG viewers to implement certain features of the SVG specification and during the development of 'Bullseye' all prototyping was undertaken using ASV3 and to a lesser extent ASV6. Indeed Adobe specific namespaces were called:

xmlns:a3="http://ns.adobe.com/AdobeSVGViewerExtensions/3.0/"
a3:scriptImplementation="Adobe"
xmlns:a="http://www.adobe.com/svg10-extensions"
a:timeline="independent"

These permit the use of features such as sound.

The evaluation discussed in the review of literature was conducted by the W3C (World Wide Web Consortium (6), 2003) and is confined to how viewers render a specific suite of SVG features, but it does not examine how viewers deal with features such as scripting. A potentially more significant problem is how viewers deal with error situations; in particular do they comply with Nielsen and Molich's (1990) 8th heuristic and provide a useful error message? As with an evaluation of development environments, the outcome may provide guidance as to any developments that are required to facilitate wider use of SVG.

4.8 Evaluation of SVG development applications

In order to properly evaluate development environments, a series of tests were developed to ensure that the environments were consistently examined across a range of key SVG features. The tests were carried out by downloading the applications (trial versions in the case of commercial applications) and after an initial familiarisation process the tasks were attempted. The tests and results are show in table 5, with more detailed comments on the results in appendix 15.

Task list:
Use the development environments to:

1. Create a bullseye target, made up from 3 circles. Save as an SVG file.
2. Make target traverse viewBox from left to right indefinitely.
3. Insert script to count number of times target traverses viewBox (displaying count on screen).
4. Create an ellipse shape with a gradient fill.
5. Attach an Internet hyperlink to the ellipse shape, activated by mouse click.
6. Insert text of varying styles (size, colour and font).
7. Opens an SVG file of 'Bullseye'

4.8.1 Analysis

The primary outcomes of the evaluation are:

Environment Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7
1 JASC WebDraw Fail Pass Fail Pass Pass Partial Fail
2 Corel DRAW11 Pass Fail Fail Partial Pass Fail Fail
3 SodiPodi Pass Fail Fail Partial Pass Partial Partial
4 SViGio Partial Partial Fail Fail Fail Partial Fail
5 EvolGrafix Pass Partial Pass Pass Pass Partial Pass
6 Adobe Illustrator 10 Partial Fail Fail Pass Fail Pass Fail

Table 5: Summary of Development environment evaluation.

Table 5 demonstrates that even the simplest tasks are too challenging for some applications. This is in direct contrast to Trippe and Binder's (2002) assertion that artwork can simply be saved as SVG. Only EvolGrafix emerged from the evaluation as a plausible GUI environment for developing interactive SVG applications. If any of the applications are to be used for such development there would be a requirement (to a significant degree) to edit the source code in a text editor in order to achieve the desired functionality. This means that for SVG applications that are any more than static images some knowledge of SVG programming will be required. Whereas in Flash whilst knowledge of ActionScript is desirable, it is not necessary for tasks 1 to 6 above, but is helpful when developing an application such as 'Bullseye'.

4.9 Evaluation of SVG plugins

For the evaluation of SVG plugins, tests were developed based on the iterative development of 'Bullseye'. That is the plugins attempted to render increasingly complex SVG applications. The tests and results are shown in table 6, with more detailed comments on the results in appendix 16.

Task list:

1. Display an image of 3 circles making up a target.
2. Display the target image and an embedded bitmap as a background.
3. Display an image of 3 circles traversing the screen indefinitely.
4. Demonstrate ability to open an Internet link by mouse clicking an object.
5. Display the traversing image and display text counting the traverses (JavaScript support).
6. Show the crosshair following the mouse, hiding the cursor. (cursor hiding is achieved by swapping the standard pointer for a transparent image, this is in effect a specialised form of custom cursor.
7. Show the target traversing and changing its 'y' position with each traverse (dynamic manipulation of the DOM with JavaScript).
8. Open 'Bullseye' on the Internet.

4.9.1 Analysis

Viewer Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8
1 ASV6 Pass Pass Pass Pass Pass Pass Pass Pass
2 ASV3 Pass Pass Pass Pass Pass Partial Pass Pass
3 ASV2 Pass Pass Pass Pass Pass Fail Pass Partial
4 Corel Pass Pass Fail Pass Fail Pass Fail Partial
5 Batik Pass Pass Fail Fail Fail Fail Fail Fail
6 Amaya Pass Pass Fail Pass Fail Fail Fail Fail
7 Mozilla Pass Fail Fail Fail Fail Fail Fail Fail

Table 6: Summary of SVG viewer evaluation.

As Table 6 demonstrates, the Adobe ASV6 and ASV3 viewers were the only ones to render the application sufficiently to allow the game to be played. The other viewers fared less well. None of the viewers offered clear error messages with a choice of alternative action when rendering was incomplete. This is clearly unsatisfactory from a usability point of view and as the Internet based evaluation demonstrated, the lack of clear error messages (such as when using ASV2) do not encourage the user to download a suitable viewer and thereby promulgate SVG. The most detailed messages were offered by the Amaya browser, which indicated that the use of Adobe namespaces was unsatisfactory. When developing the mini applications for this part of the evaluation care was taken to exclude reference to Adobe namespaces unless they were critical to the functionality of the application. However in order to include all the modelled functionality in the final application it was necessary to use some Adobe specific features, such as the ability to play sound. It can be argued that the inability of non-Adobe viewers to render Adobe specific features is not significant as the Internet based evaluation demonstrated that users are only using Adobe viewers.
This situation has come about, in part, due to Adobes strong presence in the software market and their introduction of features that require an Adobe viewer, which has given their viewers a competitive edge, even in a zero price market, as some developers have optimised their applications to utilise the Adobe features. This, it can be argued, demonstrates that users in the wider Internet community prefer a common application for a purpose, indeed one that appears proprietary even if it is in fact an open standards application, as this (proprietary behaviour) is perceived by users to give the consistency that the W3C open standard SVG is intended to achieve, but does not.

back to top