-
Metrics must be simple, objective, easy to collect, easy to interpret, and hard to misinterpret.
-
Metrics collection must be automated and non-intrusive, that is, not interfere with the activities of the
developers.
-
Metrics must contribute to quality assessment early in the lifecycle, when efforts to improve software quality are
effective.
-
Metric absolute values and trends must be actively used by management personnel and engineering personnel for
communicating progress and quality in a consistent format.
-
The selection of a minimal or more extensive set of metrics will depend on the project's characteristics and
context: if it is large or has stringent safety or reliability requirements and the development and assessment
teams are knowledgeable about metrics, then it may be useful to collect and analyze the technical metrics. The
contract may require certain metrics to be collected, or the organization may be trying to improve it skills an
processes in particular areas. There is no simple answer to fit all circumstances, the Project Manager must select
what is appropriate when the Measurement Plan is produced. When introducing a metrics program for the first time
though, it is sensible to err on the side of simplicity.
Metrics for certain aspects of the project, include:
-
Progress in terms of size and complexity.
-
Stability in terms of rate of change in the requirements or implementation, size, or complexity.
-
Modularity in terms of the scope of change.
-
Quality in terms of the number and type of errors.
-
Maturity in terms of the frequency of errors.
-
Resources in terms of project expenditure versus planned expenditure
Trends are important, and somewhat more important to monitor than any absolute value in time.
Metric
|
Purpose
|
Sample measures/perspectives
|
Progress
|
Iteration planning
Completeness
|
-
Number of classes
-
SLOC
-
Function points
-
Scenarios
-
Test cases
These measures may also be collected by class and by package
-
Amount of rework per iteration (number of classes)
|
Stability
|
Convergence
|
-
Number and type of changes (bug versus enhancement; interface versus implementation)
This measure may also be collected by iteration and by package
-
Amount of rework per iteration
|
Adaptability
|
Convergence
Software "rework"
|
-
Average person-hours/change
This measure may also be collected by iteration and by package
|
Modularity
|
Convergence
Software "scrap"
|
-
Number of classes/categories modified per change
This measure may also be collected by iteration
|
Quality
|
Iteration planning
Rework indicator
Release criterion
|
-
Number of errors
-
Defect discovery rate
-
Defect density
-
Depth of inheritance
-
Class coupling
-
Size of interface (number of operations)
-
Number of methods overridden
-
Method size
These measures may also be collected by class and by package
|
Maturity
|
Test coverage/adequacy
Robustness for use
|
-
Test hours/failure and type of failure
This measure may also be collected by iteration and by package
|
Expenditure profile
|
Financial insight
Planned versus actual
|
-
Person-days/class
-
Full-time staff per month
-
% budget expended
|
Even the smallest projects will want to track progress to determine if the project is on schedule and on budget, and if
not, to re-estimate and determine if scope changes are needed. This minimal metrics set will therefore focus on
progress metrics.
-
Earned Value. This is used to re-estimate the schedule and budget for the remainder of the project, and/or to
identify need for scope changes.
-
Defect Trends. This is used to help project the effort required to work off defects.
-
Test Progress Trend. This is used to determine how much functionality is actually complete.
These are described in more detail below.
Earned Value
The most commonly used method ([PMI96]) to measure
progress is Earned Value Analysis.
The simplest way to measure earned value is to take the sum of the original estimated effort for all completed tasks. A
"percent complete" for the project can be computed as the earned value divided by the total original estimated effort
for the project. Productivity (or Performance Index) is the earned value divided by the actual effort spend on the
complete tasks.
For example, suppose the coding effort has been divided into several tasks, many of which are now complete. The
original estimate for the completed tasks was 30 effort days. The total estimated effort for the project was 100 days,
so we can project that the project is roughly 30% complete.
Suppose the tasks were completed under budget - requiring only 25 days to complete. The Performance Index is 30 / 25 =
1.2 or 120%.
We can project that the project will complete 20% under budget, and reduce our estimates accordingly.
Considerations
-
The Performance Index must only be used to adjust estimates for similar tasks. Early completion of requirements
gathering tasks does not suggest that coding will complete more quickly. So, compute the Performance Index only for
similar kinds of tasks, and adjust estimates only for similar tasks.
-
Consider other factors. Will future tasks be performed by similarly skilled staff under similar conditions? Has the
data been contaminated by "outliers" - tasks which were severely over-estimated or under-estimated? Is time being
reported consistently (for example, overtime should be included even if not paid)?
-
Are estimates for newer tasks already accounting for the Performance Index? If so, then estimates for new tasks
will tend to be closer to the target, pushing the performance index closer to 100%. You should either consistently
re-estimate all incomplete tasks, or adopt the following practice from Extreme Programming (XP)[JEF01] - refer to the original estimates as "points", and measure new tasks in
terms of these same "points" without adjusting for actual performance . The advantage of "points" is that increases
(or decreases) in performance can be tracked ("project velocity" in XP terminology).
If tasks are large (more than 5 days), or there are a lot of tasks which are partially complete, you may wish to factor
them into your analysis. Apply a subjective "percent completion", multiply this by the task's effort estimate, and
include this in the earned value. Greater consistency in results is obtained if there are clear rules for assigning the
percent complete. For example, one rule could be that a coding task is assigned no more than 80% complete until the
code has passed a code review.
Earned value is discussed further under the A Complete Metrics Set: Project Plan section
below.
Defect Trend
It is often useful to track the trend of open and closed defects. This provides a rough indication as to whether there
is a significant backlog of defect fixing work to be completed and how quickly they are being closed.
Defect trends are just one of the metrics provided by Rational ProjectConsole.
Considerations
-
All change requests should not have equal weight, whether they affect one line of code or cause major re-design.
This can be addressed by some of the following techniques:
-
Be aware of outliers. Change Requests which require substantial work should be identified as such and be
tracked as separate tasks, not bundled into a bucket of general bug fixing. If lots of tiny fixes are
dominating the trend, then consider grouping them so that each Change Request represents a more consistent
unit of work.
-
You can record more information, such as a subjective "effort category" of "less than 1 day" "1 day" "less
than 5 days" "more than 5 days".
-
You can record estimated SLOCs and actual SLOCs for each Change Request. See A Small Set of Metrics below.
-
A lack of defects being recorded may imply a lack of testing. Be aware of the level of test effort occurring when
examining defect trends.
Test Progress Trend
The ultimate measure of completeness is how much functionality has been integrated.
If each of your development tasks represents a set of integrated functionality, then an earned value trend chart may
be sufficient.
A very simple way to communicate progress is with a Test Progress Trend.
Considerations
Some test cases may represent significantly more value or effort than others. Don't read too much into this graph - it
just provides some assurance that there is progress towards completed functionality.
The minimal set of metrics described previously is not enough for many projects.
Software Project Management, a Unified framework [ROY98], recommends
the following set of metrics for all projects. Note that these metrics require Source Lines of Code (SLOC) estimates
and actuals for each change request, which requires some additional effort to gather.
Metrics and Primitives metrics
Total SLOC
|
SLOCt = Estimated total size of the code. This may change significantly as requirements are better
understood and as design solutions mature. Include reused software which is subject to change by the
team.
|
SLOC under configuration
control
|
SLOCc = Current baseline
|
Critical defects
|
SCO0 = number of type 0 SCO (where SCO is a Software Change Order - another term for Change Request)
|
Normal defects
|
SCO1 = number of type 1 SCO
|
Improvement requests
|
SCO2 = number of type 2 SCO
|
New features
|
SCO3 = number of type 3 SCO
|
Number of SCO
|
N = SCO0 + SCO1 + SCO2
|
Open Rework (breakage)
|
B = cumulative broken SLOC due to SCO1 and SCO2
|
Closed rework (fixes)
|
F = cumulative fixed SLOC
|
Rework effort
|
E = cumulative effort expended fixing type 0/1/2 SCO
|
Usage time
|
UT = hours that a given baseline has been operating under realistic usage scenarios
|
Quality Metrics for the End-Product
From this small set of metrics, some more interesting metrics can be derived:
Scrap ratio
|
B/SLOCt, percentage of product scrapped
|
Rework ratio
|
E/Total effort, percentage of rework effort
|
Modularity
|
B/N, average breakage per SCO
|
Adaptability
|
E/N, average effort per SCO
|
Maturity
|
UT/(SCO0 + SCO1), Mean time between defects
|
Maintainability
|
(scrap ratio)/(rework ratio), maintenance productivity
|
In-progress Indicators
Rework stability
|
B - F, breakage versus fixes over time
|
Rework backlog
|
(B-F)/SLOCc, currently open rework
|
Modularity trend
|
Modularity, over time
|
Adaptability trend
|
Adaptability, over time
|
Maturity trend
|
Maturity, over time
|
The things to be measured are:
-
the Process - the sequence of tasks invoked to produce the software product (and other work products);
-
the Product - the work products of the process, including software, documents and models;
-
the Project - the totality of project resources, tasks and work products;
-
the Resources - the people, methods and tools, time, effort and budget, available to the project.
To completely characterize the process, measurements should be made at the lowest level of formally planned task. Tasks
will be planned by the Project Manager using an initial set of estimates. A record should then be kept of actual values
over time and any updated estimates that are made.
Metrics
|
Comments
|
Duration
|
Elapsed time for the task
|
Effort
|
Staff effort units (staff-hours, staff-days, ...)
|
Output
|
Work Products and their size and quantity (note this will include defects as an output of test
activities)
|
Software development environment usage
|
CPU, storage, software tools, equipment (workstations, PCs), disposables. Note that these may be
collected for a project by the Software Engineering Environment Authority (SEEA).
|
Defects, discovery rate, correction rate.
|
Total repair time/effort and total scrap/rework (where this can be measured) also needs to be
collected; will probably come from information collected against the defects (considered as work
products).
|
Change requests, imposition rate, disposal rate.
|
Comments as above on time/effort.
|
Other incidents that may have a bearing on these metrics (freeform text)
|
This is a metric in that it is a record of an event that affected the process.
|
Staff numbers, profile (over time) and characteristics
|
|
Staff turnover
|
A useful metric which may explain at a post-mortem review why a process went particularly well, or
badly.
|
Effort application
|
The way effort is spent during the performance of the planned activities (against which time is
formally recorded for cost account management) may help explain variations in productivity: some
subclasses of effort application are, for example:
-
training
-
familiarization
-
management (by team lead, for example)
-
administration
-
research
-
productive work-it's helpful to record this by work product, and attempt a separation of
'think' time and capture time, particularly for documents. This will tell the Project manager
how much of an imposition the documentation process is on the engineer's time.
-
lost time
-
meetings
-
inspections, walkthroughs, reviews - preparation and meeting effort (some of
these will be separate activities and time and effort for them will be recorded against a
specific review task)
|
Inspections, walkthroughs, reviews (during a task- not separately scheduled reviews)
|
Record the numbers of these and their duration, and the numbers of issues raised.
|
Process deviations (raised as non-compliances, requiring project change)
|
Record the numbers of these and their severity. This is an indicator that more education may be
required, that the process is being misapplied, or that the way the process was configured was
incorrect
|
Process problems (raised as process defects, requiring process change)
|
Record the number of these and their severity. This will be useful information at the post-mortem
reviews and is essential feedback for the Software Engineering Process Authority (SEPA).
|
The products in the Rational Unified Process (RUP) are the Work Products, which are documents, models, or model elements. The models are collections of like things (the model
elements) so the recommended metrics are listed here with the models to which they apply: it is usually obvious if a
metric applies to the model as a whole, or an element. Explanatory text is provided where this is not clear.
Work Product Characteristics
In general, the characteristics we are interested in measuring are the following:
-
Size - a measure of the number of things in a model, the length of something, the extent or mass of
something
-
Quality
-
Defects - indications that a work product does not perform as specified or is not compliant with
its specification, or has other undesirable characteristics
-
Complexity - a measure of the intricacy of a structure or algorithm: the greater the complexity,
the more difficult a structure is to understand and modify, and there is evidence that complex
structures are more likely to fail
-
Coupling - a measure of the how extensively elements of a system are interconnected
-
Cohesion - a measure of how well an element or component meets the requirement of having a
single, well-defined, purpose
-
Primitiveness - the degree to which operations or methods of a class can be composed from others
offered by the class
-
Completeness - a measure of the extent to which a work product meets all requirements (stated and
implied-the Project Manager should strive to make explicit as much as possible, to limit the risk of
unfulfilled expectations). We have not chosen here to distinguish between sufficient and
complete.
-
Traceability - an indication that the requirements at one level are being satisfied by work products at
a lower level, and, looking the other way, that a work product at any level has a reason to exist
-
Volatility - the degree of change or inconclusiveness in a work product because of defects or changing
requirements
-
Effort - a measure of the work (staff-time units) that is required to produce a work product
Not all of these characteristics apply to all work products: the relevant ones are elaborated with the particular work
product in the following tables. Where several metrics are listed against a characteristic, all are potentially of
interest, because they give a complete description of the characteristic from several viewpoints. For example, when
considering the traceability of Use Cases, ultimately all have to be traceable to a (tested) implementation model: in
the interim, it will still be of interest to the Project Manager to know how many Use Cases can be traced to the
Analysis Model, as a measure of progress.
Documents
The recommended metrics apply to all the RUP documents.
Characteristic
|
Metrics
|
Size
|
Page count
|
Effort
|
Staff-time units for production, change and repair
|
Volatility
|
Numbers of changes, defects, opened, closed; change pages
|
Quality
|
Measured directly through defect count
|
Completeness
|
Not measured directly: judgment made through review
|
Traceability
|
Not measured directly: judgment made through review
|
Models
Requirements
Requirements Attributes
This is actually a model element.
Characteristic
|
Metrics
|
Size
|
-
number of requirements in total (= Nu+Nd+Ni+Nt)
-
number to be traced to use cases ( = Nu)
-
number to be traced to design, implementation, test only ( = Nd)
-
number to be traced to implementation, test only ( = Ni)
-
number to be traced to test only ( = Nt)
Note that this partitions requirements into those that will be modeled by Use Cases and those
that will not. The expectation is that Use-Case traceability will account for those
requirements assigned to Use Cases, to track design, implementation and test.
|
Effort
|
-
Staff-time units (production, change and repair)
|
Volatility
|
-
Number of defects and change requests
|
Quality
|
-
Number of defects, by severity
|
Traceability
|
|
Use-Case Model
Characteristic
|
Metrics
|
Size
|
-
Number of Use Cases
-
Number of Use Case Packages
-
Reported Level of Use Case (see white paper, "The Estimation of Effort and Size
based on Use Cases" from the IBM Web site)
-
Number of scenarios, total and per use case
-
Number of actors
-
Length of Use Case (pages of event flow, for example)
|
Effort
|
-
Staff-time units with production, change and repair
|
Volatility
|
-
Number of defects and change requests
|
Quality
|
-
Reported complexity (0-5, by analogy with COCOMO [BOE81], at class level; complexity range is narrower at
higher levels of abstraction - see white paper, "The Estimation of Effort and Size based on
Use Cases" from the IBM Web site)
-
Defects - number of defects, by severity, open, closed
|
Completeness
|
-
Use Cases completed (reviewed and under configuration management with no defects
outstanding)/use cases identified (or estimated number of use cases)
-
Requirements-to-UC Traceability (from
Requirements Attributes)
|
Traceability
|
-
Analysis
-
Scenarios realized in analysis model/total scenarios
-
Design
-
Scenarios realized in design model/total scenarios
-
Implementation
-
Scenarios realized in implementation model/total scenarios
-
Test
-
Scenarios realized in test model (test cases)/total scenarios
|
Design
Analysis Model
Characteristic
|
Metrics
|
Size
|
-
Number of classes
-
Number of subsystems
-
Number of subsystems of subsystems ...
-
Number of packages
-
Methods per class, internal, external
-
Attributes per class, internal, external
-
Depth of inheritance tree
-
Number of children
|
Effort
|
-
Staff-time units for production, change and repair
|
Volatility
|
-
Number of defects and change requests
|
Quality
|
Complexity
|
-
Response For a Class (RFC): this may be difficult to calculate because a complete set of
interaction diagrams is needed.
|
Coupling
|
-
Number of children
-
Coupling between objects (class fan-out)
|
Cohesion
|
|
Defects
|
-
Number of defects, by severity, open, closed
|
Completeness
|
-
Number of classes completed/number of classes estimated (identified)
-
Analysis traceability (in
Use-Case model)
|
Traceability
|
Not applicable-the analysis model becomes the design model.
|
Here we see some OO-specific technical metrics that may be unfamiliar-depth of inheritance tree, number of
children, response for a class, coupling between objects, and so on. See [HEND96] for details
of the meaning and history of these metrics. Several of these metrics were originally suggested by Chidamber and
Kemerer (see "A metrics suite for object oriented design", IEEE Transactions on Software Engineering, 20(6), 1994), but
we have applied them here as suggested in [HEND96] and have
preferred the definition of LCOM (lack of cohesion in methods) presented in that work.
Design Model
Characteristic
|
Metrics
|
Size
|
-
Number of classes
-
Number of design subsystems
-
Number of subsystems of subsystems ...
-
Number of packages
-
Methods per class, internal, external
-
Attributes per class, internal, external
-
Depth of inheritance tree
-
Number of children
|
Effort
|
-
Staff-time units (for production, change and repair)
|
Volatility
|
-
Number of defects and change requests
|
Quality
|
Complexity
|
-
Response For a Class (RFC): this may be difficult to calculate because a complete set of
interaction diagrams is needed.
|
Coupling
|
-
Number of children
-
Coupling between objects (class fan-out)
|
Cohesion
|
|
Defects
|
-
Number of defects, by severity
|
Completeness
|
|
Traceability
|
Number of classes in Implementation Model/number of classes
|
Implementation
Implementation Model
Characteristic
|
Metrics
|
Size
|
-
Number of classes
-
Number of files
-
Number of implementation subsystems
-
Number of subsystems of subsystems ...
-
Number of packages
-
Methods per class, internal, external
-
Attributes per class, internal, external
-
Size of methods*
-
Size of attributes*
-
Depth of inheritance tree
-
Number of children
-
Estimated size* at completion
|
Effort
|
-
Staff-time units (with production, change and repair separated)
|
Volatility
|
-
Number of defects and change requests
-
Breakage* for each corrective or perfective change, estimated (prior to fix) and actual
(upon closure)
|
Quality
|
Complexity
|
-
Response For a Class (RFC)
-
Cyclomatic complexity of methods**
|
Coupling
|
-
Number of children
-
Coupling between objects (class fan-out)
-
Message passing coupling (MPC)***
|
Cohesion
|
-
Number of children
-
Lack of cohesion in methods (LCOM)
|
Defects
|
-
Number of defects, by severity, open, closed
|
Completeness
|
|
* Some method of measuring code size should be chosen and then consistently applied, for example non-comment,
non-blank. See [ROY98] for a discussion of the merits and application of 'lines of code' as a metric.
Also see the same reference for the definition of 'breakage'.
** The use of cyclomatic complexity is not universally accepted - particularly when applied to the methods of a class.
See [HEND96] for a discussion of this metric.
*** Originally from Li and Henry, "Object-oriented metrics that predict maintainability", J. Systems and Software,
23(2), 1993, and also described in [HEND96].
Test
Test Model
Characteristic
|
Metrics
|
Size
|
-
Number of Test Cases, Test Procedures, Test Scripts
|
Effort
|
-
Staff-time units (with production, change and repair separated) for production of test
cases, and so on
|
Volatility
|
-
Number of defects and change requests filed against the test model
|
Quality
|
-
Defects - number of defects by severity, open, closed (these are defects raised
against the test model itself, not defects raised by the test team against other software)
|
Completeness
|
|
Traceability
|
-
Number of Test Cases reported as successful in Test Evaluation Summary/Number of test cases
|
Management
Change Model-this is a notional model for consistent presentation-the metrics will be collected from
whatever system is used to manage Change Requests.
Characteristic
|
Metrics
|
Size
|
-
Number of defects, change requests by severity and status, also categorized as number of
perfective changes, number of adaptive changes and number of corrective changes.
|
Effort
|
-
Defect repair effort, change implementation effort in staff-time units
|
Volatility
|
-
Breakage (estimated, actual) for the implementation model subset.
|
Completeness
|
-
Number of defects discovered/number of defects predicted (if a reliability model is used)
|
Project Plan (section 4.2 of the Software Development Plan)
These are measures that come from the application of Earned Value Techniques to project management; together they
are called Cost/Schedule Control Systems Criteria (C/SCSC). A simple earned value technique is described above as
part of A Minimal Set of Metrics. More detailed analyses can be
performed using related metrics, including:
-
BCWS, Budgeted Cost for Work Scheduled
-
BCWP, Budgeted Cost for Work Performed
-
ACWP, Actual Cost of Work Performed
-
BAC, Budget at Completion
-
EAC, Estimate at Completion
-
CBB, Contract Budget Base
-
LRE, Latest Revised Estimate (EAC)
derived factors for cost and schedule variance. See [ROY98] for a
discussion of the application of an earned value approach to software project management.
The project needs to be characterized in terms of type, size, complexity and formality (although type, size and
complexity usually determine formality), because these aspects will condition expectations about various thresholds for
lower level measures. Other constraints should be captured in the contract (or specifications). Metrics derived from
the process, product and resources will capture all other project level metrics. Project type and domain can be
recorded using a text description, making sure that there is enough detail to accurately characterize the project.
Record the project size by cost, effort, duration, size of code to be developed, function points to be delivered. The
project's complexity can be described - somewhat subjectively-by placing the project on a chart showing
technical and management complexity relative to other completed projects. [ROY98], Figure 14-1
shows such a diagram.
The derived metrics described in [ROY98], which are the
Project Manager's main indicators, can be obtained from the metrics gathered for product and process. These are:
-
Modularity = average breakage (NCNB*) per perfective or corrective change on implementation model
-
Adaptability = average effort per perfective or corrective change on implementation model
-
Maturity = active test time/number of corrective changes
-
Maintainability = Maintenance Productivity/Development Productivity = [actual cumulative fixes/cumulative
effort for perfective and corrective changes]/[estimated number of NCNB at completion/estimated production effort
at completion]
-
Rework stability = cumulative breakage-cumulative fixes
-
Rework backlog = [cumulative breakage-cumulative fixes]/NCNB unit tested
* NCNB is non-comment, non-blank code size.
Progress should be reported from the project plan using work product completion metrics - with particular weight (from
an earned value perspective) being given to the production of working software.
If an estimation model such as COCOMO (see [BOE81] is used, the
various scale factors and cost drivers should be recorded. These actually form a quite detailed characterization of the
project.
The items to be measured include people (experience, skills, cost, performance), methods and tools (in terms of effect
on productivity and quality, cost), time, effort, budget (resources consumed, resources remaining).
The staffing profile should be recorded over time, showing type (analyst, designer, and so on), grade (which implies
cost), and team to which it's allocated. Both actuals and plan should be recorded.
Again, the COCOMO model requires the characterization of personnel experience and capability and software development
environment, and is a good framework in which to keep these metrics.
Expenditure, budget, and schedule information will come from the Project Plan.
|