Managing the complexity of AV testing

The most challenging barrier to Automated / autonomous Vehicle (AV) deployment is assuring that in the real world, they behave properly and safely under “all” possible conditions. The developers of this safety-critical technology need to develop a robust safety case and have a comprehensive Verification and Validation process (V&V) to assure the system's safety.

For a typical safety-critical system in the automotive industry, the V&V activities typically cost about 50% of the entire development budget. For AV technologies, as the potential for harm is much greater and the technology is much more complex, the V&V cost will amount to much more than half of the total cost. CavPoint is developing software tools to reduce the costs of V&V for AV technologies.

Approach to AV testing and development

The AV development processes will adopt a scenario-based approach (and not a mileage-based approach such as saying for example that 5 billion miles of driving are required) where the driving scenarios that the system must handle have been identified and documented. These will be derived using a combination of:

  • The developers identifying them based on their analysis of the system’s use cases and Operational Design Domain (ODD) - the set of environments and conditions that the system is specifically designed to operate in

  • Experiences and situations seen during real-world testing within the system’s intended ODD

  • Accident databases highlighting potential sources of collisions, near-misses and hazardous conditions.

Although some of the testing will be carried out physically on test tracks and on public roads, the vast majority of the testing will be done in the virtual world, in simulation. How do we then decide exactly what tests should be run in simulation?

Handling scenarios and scenario variations

Taking an example AV scenario, "driving through a roundabout", it is useful to think what variations there might be, for example using as a framework the PEGASUS 6 layer model. Some of the variations to this scenario include:

  • Road layout (size, shape, number of lanes, temporary roadworks, broken-down vehicle, etc.)

  • Weather and lighting conditions

  • Speed, density and type (e.g. car, truck, bicycle, scooter), of the other traffic

  • Behaviour (e.g. aggressivity of drivers, illegal manoeuvres) of the other traffic

Example of roundabout layout

In theory, the AV system must be able to deal with “all” of the variations, and combinations of variations, of its scenarios. In practice, covering absolutely 100% of all possible variations is impossible as it is an infinite space. The AV developer must still have covered enough of these variations to assure themselves, the public and the authorities, that their system is safe enough to be deployed. There is at present no recognised method to measure this coverage, and a realistic safety case is likely to require the use of multiple arguments and metrics.

Finding the needle in the haystack

The purpose of testing is to find defects, so the developer has to test “all” of these variations in order to find the problematic ones. This is like trying to find a needle in a haystack, and whenever a change to the software or system is made, new issues may have been introduced, so there is now a new, slightly different, haystack to search in. This is an enormous challenge in the safety assurance process, and in the deployment of the AV system.

Here, we define a defect as being more than a “bug”, as we are considering a wide range of undesirable behaviours including:

  • Errors where the system does not behave as specified (i.e. a bug)

  • Errors where the system does not behave as intended or desired, possibly because there was an error or omission in the specifications or requirements

  • Deficiencies in system performance, including subjective deficiencies such as "too slow", "uncomfortable", "too hesitant", "confusing or unpredictable behaviour", etc. These subjective judgements should consider the point of view of not only the passengers, but also of the other road users.

CavPoint’s approach

CavPoint is developing software to carry out an automated search for defects in an AV system, within a simulated environment. Instead of just testing random variations of the base scenario, an approach which is sometimes called “fuzzing” or “Monte-Carlo”, we combine relevant Key Performance Indicators (KPIs) with the latest machine learning and optimization algorithms to actively search for the scenario variations that reveal defects in the AV system. We use a range of objective and subjective KPIs covering safety, performance, comfort and effects on traffic flow.

The AV developers can therefore focus on resolving issues rather than finding them.

CavPoint software's interactions with AV simulator and AV control software

A typical search might involve running, for example, tens of thousands of test variations. If several hundred of these revealed problems, it would be rather overwhelming to present the AV development engineer with such a long list of issues. The aim of the search is therefore to also identify which of the problematic scenario variations actually have a common cause, cluster these together, and present the smaller number of clusters to the development engineer so that they can then address them.

We are developing the core technology to actively search for issues and report them in a clear and user-friendly manner to the AV system developer. It is important to develop these methods using realistic driving scenarios and we have been working with two such scenarios:

  • an urban Level 4 AD (automated driving) scenario of an AV on a minor road pulling out into a main road

  • a highway ADAS (Advanced Driver Assistance System) scenario based on an ACC (Adaptive Cruise Control) Euro NCAP cut-out scenario.

Most of the work has concentrated on the ACC scenario.

In order for us to carry out this work, we have connected our proof-of-concept software to AV simulation packages and to AV control software. This allows us to evaluate our techniques and investigate issues using toolchains and software that AV developer organisations, our prospective future customers, would also use.


The AV developer must create a safety case for their system, which is a set of claims, arguments and evidence that the system that they have designed, and the procedures that they have followed, result in an overall system that has an acceptable level of safety within its ODD. There are several safety standards being used in the AV industry (e.g. ISO 26262, ISO/PAS 21448, UL 4600) but these do not define in precise detail what must be included within the safety case.

In practice, it will consist of a multitude of arguments, and we believe that our directed search approach will form an essential part of this, helping our customer to establish the evidence supporting a robust safety case and compliance with the V&V requirements and recommendations of those standards.

We are currently developing our proof-of-concept (PoC) in order to develop our core technology. The next stage is to carry out a pilot project with a customer, an AV developer organisation, in order to further develop our PoC when applied to the customer’s driving scenario and software, and allow us to help improve the customer’s verification & validation process. If you would be interested in collaborating with us on this, please do not hesitate to contact us.

Beyond our core “directed search” technology, there are broader safety and V&V issues to consider when developing a production-level automated driving system such as:

  • How do you define the ODD and the base scenarios that your automated system must operate in

  • How do you know when you have enough coverage of all the possible scenario variations, and how do you handle the most unlikely variations (via technical, procedural means)

  • How to use KPIs to define an acceptable level of objective, subjective and behavioural system performance, and use these to

  • set quantitative targets for subsystems and components

  • assess the maturity and deployment readiness of the system

CavPoint can also help you explore these issues and find solutions tailored specifically for your system and circumstances.