Concept: Metrics
Metrics are numbers used as a measurement for standard of quality for comparing different items or time periods.
Relationships
Related Elements
Main Description

Why do we Measure?

We measure primarily to gain control of a project, and therefore to be able to manage it. We measure to evaluate how close or far we are from the objectives we had set in our plan in terms of completion, quality, compliance to requirements, etc.

We measure also to be able to better estimate for new projects effort, cost and quality, based on past experience. Finally, we measure to evaluate how we improve on some key aspects of performance of the process over time, to see what are the effects of changes.

Measuring some key aspects of a project adds a non-negligible cost. So we do not measure just anything because we can. We must set very precise goals for this effort, and only collect metrics that will allow us to satisfy these goals.

There are two kinds of goals:

  1. Knowledge goals: they are expressed by the use of verbs like evaluate, predict, monitor. You want to better understand your development process. For example, you may want to assess product quality, obtain data to predict testing effort, monitor test coverage, or track requirements changes.
  2. Change or achievement goals: these are expressed by the use of verbs such as increase, reduce, improve, or achieve. You are usually interested in seeing how things change or improve over time, from an iteration to another, from a project to another.

Examples:

  • Monitor the progress relative to the plan
  • Improve customer satisfaction
  • Improve productivity
  • Improve predictability
  • Increase reuse

These general management goals do not translate readily into metrics. We have to translate them into some smaller subgoals (or action goals) which identify the actions project members have to take to achieve the goal. And we have to make sure that the people involved understand the benefits.

Examples

The goal to "improve customer satisfaction" would decompose into:

  • Define customer satisfaction
  • Measure customer satisfaction, over several releases
  • Verify that satisfaction improves

The goal to "improve productivity" would decompose into:

  • Measure effort
  • Measure progress
  • Calculate productivity over several iterations or projects.
  • Compare the results

Then some of the subgoals (but not all) would require some metrics to be collected.

Example

"Measure customer satisfaction" can be derived from

  • Customer survey (where customer would give marks for different aspects)
  • Number and severity of calls to a customer support hotline.

For more information, consult [AMI95].

A useful way to categorize these goals is by organization, project and technical need. This gives a framework for the refinement discussed above.

Organizational Needs for Metrics

An organization needs to know, and perhaps improve, its costs per 'item', shorten its build times (time-to market), while delivering product of known quality (objective and subjective), and appropriate maintenance demands. An organization may from time to time (or even continuously) need to improve its performance to remain competitive. To reduce its risks, an organization needs to know the skill level and experience level of its staff, and ensure it has the other resources and capability to compete in its chosen sphere. An organization must be able to introduce new technology and determine the cost-benefit of that technology. The following table lists some examples of the kinds of metrics that are relevant to these needs for a software development organization.

Concern

Metric

Item Cost Cost per line of code, cost per function point, cost per use case. Normalized effort (across defined portion of life cycle, programming language, staff grade, etc.) per line of code, function point or use case. Note that these metrics are not usually simple numbers - they depend on the size of the system to be delivered and whether the schedule is compressed.
Construction Time Elapsed time per line of code or per function point. Note that this will also depend on system size. The schedule can also be shortened by adding staff - but only up to a point. An organization's management ability will determine exactly where the limit is.
Defect Density in Delivered Product Defects (discovered after delivery) per line of code or per function point.
Subjective Quality Ease of use, ease of operation, customer acceptance. Although these are fuzzy, ways of attempting quantification have been devised.
Ease of Maintenance Cost per line of code or function point per year.
Skills Profile, Experience Profile The Human Resources group would presumably keep some kind of skills and experience database.
Technology Capability
  • Tools - an organization should know which are in general use, and the extent of expertise for those not regularly used.
  • Process Maturity - where does the organization rate on the SEI CMM scale, for example?
  • Domain Capability - in which application domains is the organization capable of performing?
Process Improvement Measures
  • Process execution time and effort.
  • Defect rates, causal analysis statistics, fix rates, scrap and rework.

Project Needs for Metrics

A project must meet the following criteria before delivery:

  • required functional and non-functional capabilities
  • various technical constraints
  • budgetary and scheduling constraints
  • certain transition, operational and maintenance characteristics

The Project Manager must be able to see if s/he is tracking towards such goals, expanded in the following table to give some idea of things to consider when thinking about project measurements:

Concern

Project Effort and Budget
  • How is project tracking on effort and cost against plan?
Project Schedule
  • Is the project meeting its milestones?
Transition/Installation
  • Are the predicted effort, cost and skills requirements acceptable?
Operation
  • Are the predicted effort and skills requirements supportable by the customer?
Maintenance/Supportability
  • Are the predicted effort and skills requirements acceptable to the customer?
Functional Requirements
  • Are the requirements valid, complete?
  • Are the requirements allocated to an iteration?
  • Are the requirements being realized according to plan?
Non-Functional Requirements
  • Performance
    • Is the system meeting requirements for responsiveness, throughput, recovery time?
  • Capacity
    • Can the system handle the required number of simultaneous users? Can the web site handle the required number of hits per second? Is there sufficient storage for the required number of customer records?
  • Quality Factors
    • Reliability: how often are system failures allowed, and what constitutes a system failure?
    • Usability: is the system easy and pleasant to use? How long does it take to learn how to use it and what skills are required?
    • Fault tolerance/robustness/resilience/survivability: can the system continue to function if failures occur? Can the system cope with bad input? Is the system capable of automatic recovery after failure?
  • Specialty Engineering Requirements
    • Safety: can the system perform without risk to life or property (tangible and intangible)?
    • Security/privacy: does the system protect sensitive data from unauthorized access? Is the system secure from malicious access?
    • Environmental impact: does the system meet environmental requirements?
  • Other Regulatory or Legal Requirements
  • Constraints
    • External environment: is the system capable of operation in the prescribed environment?
    • Resources, host, target: does the system meet its CPU, memory, language, hardware/software environment, constraints?
    • Use of commercial-off-the-shelf (COTS) or other existing software: is the system meeting its reuse constraints?
    • Staff availability and skills: can the system be built with the number and type of staff available?
    • Interface support/compatibility: can the system support required access to and from other systems?
    • Reusability: what provisions are made for the system to be reusable?
    • Imposed standards: are the system and the development method compliant?
    • Other design constraints (architectural, algorithmic, for example):  is the system using the required architectural style? Are the prescribed algorithms being used?

This is an extensive, but not exhaustive list, of concerns for the Project Manager. Many will require the collection and analysis of metrics, some will also require the development of specific tests (to derive measurements) to answer the questions posed.

Technical Needs for Metrics

Many of the project needs will not have direct measures and even for those that do, it may not be obvious what should be done or changed to improve them. Lower level quality-carrying attributes can be used to build in quality against various higher level quality attributes such as those identified in ISO Standard 9126 (Software Quality Characteristics and Metrics) and those mentioned above in Project Needs. These technical measures are of engineering (structural and behavioral) characteristics and effects (covering process and product), that contribute to project level metrics needs. The attributes in the following table have been used to derive a sample set of metrics for the Rational Unified Process work products and process. This may be found in Guideline: Metrics.

Quality

Attributes

Goodness of Requirements
  • Volatility: frequency of change, rate of introduction of new requirements
  • Validity: are these the right requirements?
  • Completeness: are any requirements missing?
  • Correctness of expression: are the requirements properly stated?
  • Clarity: are the descriptions understandable and unambiguous?
Goodness of Design
  • Coupling:  how extensive are the connections between system elements?
  • Cohesion: do the components each have a single, well-defined purpose?
  • Primitiveness: can the methods or operations of a class be constructed from other methods or operations of the class? If so they are not primitive (a desirable characteristic).
  • Completeness: does the design completely realize the requirements?
  • Volatility: frequency of architectural change.
Goodness of Implementation
  • Size: how close is the implementation to the minimal size (to solve the problem)? Will the implementation meet its constraints?
  • Complexity: is the code algorithmically difficult or intricate? Is it difficult to understand and modify?
  • Completeness: does the implementation faithfully realize all of the design?
Goodness of Test
  • Coverage: how well does the test exercise the software? Are all instructions executed by a set of tests? Does the test exercise many paths through the code?
  • Validity: are the tests themselves a correct reflection of the requirements?
Goodness of Process (at lowest level)
  • Defect rate, defect cause: what is the incidence of defects in a task, and what are the causes?
  • Effort and duration: what duration and how much human effort does an activity require?
  • Productivity: per unit of human effort, what does an activity yield?
  • Goodness of work products: what is the level of defects in the outputs of a task?
Effectiveness of Process/Tool Change (same as Goodness of Process, but percentage changes rather than total values):
  • Defect rate, defect cause
  • Effort and duration
  • Productivity
  • Goodness of work products

For a deep treatment of metrics concepts, see [WHIT97].

What is a Metric?

We distinguish two kinds of metrics:

  • A metric is a measurable attribute of an entity. For example, project effort is a measure (that is, metric) of project size. To be able to calculate this metric you would need to sum all the time-sheet bookings for the project.
  • A primitive metric is a raw data item that is used to calculate a metric. In the above example the time-sheet bookings are the primitive metrics. A primitive metric is typically a metric that exists in a database but is not interpreted in isolation.

Each metric is made up of one or more collected metrics. Consequentially each primitive metric has to be clearly identified and its collection procedure defined.

Metrics to support change or achievement goals are often "first-derivative" over time (or iterations or project). We are interested in a trend, not in the absolute value. To "improve quality" we need to check that the residual level of known defects diminishes over time.

Templates

Template for a metric

Name Name of the metric and any known synonyms.
Definition The attributes of the entities that are measured using this metric, how the metric is calculated, and which primitive metrics it is calculated from.
Goals List of goals and questions related to this metric. Also some explanation as to why the metric is being collected.
Analysis procedure How the metric is intended to be used. preconditions for the interpretation of the metric (e.g., valid range of other metrics). Target values or trends. Models of analysis techniques and tools to be used. Implicit assumptions (for example, of the environment or models). Calibration procedures. Storage.
Responsibilities Who will collect and aggregate measurement data, prepare the reports and analyze the data.

Template for a primitive metric

Name Name of the primitive metric
Definition Unambiguous description of the metric in terms of the project's environment
Collection procedure Description of the collection procedure. Data collection tool and form to be used. Points in the lifecycle when data are collected. Verification procedure to be used. Where will the data be stored, format, precision.
Responsibilities Who is responsible for collecting the data. Who is responsible for verifying the data.

Metrics Tasks

There are two tasks:

  • Define measurement plan
  • Collect measures

Define measurement plan is done once per development cycle - in the inception phase, as part of the general planning activity, or sometimes as part of the configuration of the process in the development case. The measurement plan may be revisited like any other section of the software development plan during the course of the project.

Collect measures is done repetitively, at least once per iteration, and sometimes more often; for example, weekly on an iteration spanning many months.

The metrics collected are part of the Status Assessment document, to be exploited in assessing the progress and health of the project. They may also be accumulated for later use in project estimations and trends over the organization.

How are the Metrics Used?

Estimation

The project manager in particular is faced with having to plan - assign resources to tasks with budgets and schedules. Either effort and schedule are estimated from a judgment of what is to be produced, or the inverse - there are fixed resources and schedule and an estimate of what can be produced is needed. Estimation typically has to do with the calculation of resource needs based on other factors - typically size and productivity - for planning purposes.

Prediction

Prediction is only slightly different from estimation, and is usually about the calculation of the future value of some factor based on today's value of that factor, and other influencing factors. For example, given a sample of performance data, it is useful to know (predict) from it how the system will perform under full load, or in a resource constrained or degraded configuration. Reliability prediction models use defect rate data to predict when the system will reach certain reliability levels. Having planned an activity, the project manager will need data on which to predict completion dates and effort at completion.

Assessment

Assessment is used to establish the current position for comparison with a threshold, say, or identification of trends, or for comparison between alternatives, or as the basis for estimation or prediction.

For more on metrics in project management, read [ROY98].