Blog

Academics can show governments how to evaluate SIBs more rigorously

23 April 2018

Chris Fox

A wide range of approaches can help identify causality and effectiveness even in complex environments.

We can – and we should – improve our evaluations of SIB and Payment By Results (PBR) programmes. They should focus more on causality, rather than simply contract compliance or implementation.

If we don’t focus on attribution, it will become hard to demonstrate that SIBs are more than a series of interesting pilots. We’ll miss the chance to test an alluring proposition – that SIBs could transform the large scale commissioning and delivery of health and social welfare programmes.

Getting evaluation right in this field is not - as some might suggest – intrinsically challenging. SIBs and PBR projects do not create unusual difficulties for evaluation techniques. We have the knowhow - sophisticated, diverse tools are well-developed that could settle most questions thrown up by SIBs. The real issue is: will those who champion SIBs expose such initiatives to the full rigour of the evaluative tools that exist?

Academic responsibility

The academic community can help ensure that rigour. Management consultants, contracted to perform evaluations, tend to provide what governments specify, which, so far, has been limited and fallen short of what’s required. Academics could set out a wider, more exacting range of evaluation options that are more suitable. We should show policy makers clearly how better evaluations could be achieved, particularly if the case for widespread adoption of SIBs is to be made.

This difficulty in properly assessing the impact of SIBs seems to be a particularly British problem. In the United States, most SIBs have been accompanied by fairly rigorous counterfactual evaluations, including randomised control trials (RCTs). There, the credibility of the SIB model among commissioners and investors has required demonstration of its ability to deliver tangible outcomes. This may be because, in the US, more funding has come from wealthy individuals or private foundations, with an investment ethos. In Britain, funding tends to spring from philanthropic organisations which seem interested in testing concepts over categorical outcomes.

Evaluations are too based on performance management

Whatever the reasons, SIB pay-outs in the UK typically rely more on performance management information to demonstrate the achievement of outputs. Supporters of this approach say that complicated counterfactual evaluations add to the already high transaction costs associated with SIBs. That’s understandable for individual SIBs. However, cumulatively, this approach hinders the quest to find out whether SIBs really work. It undermines the case for wider roll-out.

Evaluations can and should answer two major questions about SIBs. There’s “attribution”: whether SIBs actually achieve the outcomes desired. Second, we need to understand SIBs as a mechanism and establish how effective they are compared with other models of commissioning. This is important because there are less expensive, less complicated methods than SIBs for commissioning services in this field.

The attribution issue has become unnecessarily mired in a polarised debate about whether RCTs are suitable for SIBs projects. Opponents contend that RCTs are not particularly useful in this field because SIBs interventions tend to take place in highly complex environments. While it’s true that these interventions often occur amid complexity, that actually strengthens the case for RCTs. It becomes even more important to understand whether an intervention is indeed responsible for any of the impacts being observed.

Testing theories of change

Good RCTs would strengthen SIBs evaluations because they would be theoretically informed. They would start with a theory of change setting out the potential causal mechanisms that are of interest. In contrast, many SIBs evaluations rely on contractual frameworks and demonstrating whether they have worked, rather than testing hypothesised causality. Most good RCTs today are also accompanied by high quality implementation evaluation. So they have a dual strategy.

Well organised RCTs avoid “one-shot” design. They are actually a sequence of evaluations that build by testing, at a granular level, particular moderators of change, rather than simply focussing on the overall social outcome and trying to come to a one-shot conclusion. This is how, in reality, even medical research works. You don’t do a single RCT. You build from small scale studies through to larger scale studies.

Sequences of evaluations are good

The wider evaluation world is focussing more on sequencing evaluations and ensuring that tools employed are appropriate to the point of a programme’s development. This avoids problems that one shot evaluations can create: that you evaluate too early; that the throughputs you were promised never arrive; that you end up developing an evaluation design which is underpowered to identify the changes that you’re looking for; there are inconclusive findings that have cost a lot of money but don’t provide the hoped for insights.

I advise against the one shot model. Instead, we like to start evaluations early without diving straight in with an RCT. We focus on developing a sequence. That’s the strength of the Education Endowment Foundation evaluation model. It begins with small scale pilot studies that focus on theory of change and early implementation, then efficacy trials that are more like a small RCT, leading up finally to effectiveness trials. Only at that point - when causalities have been established - is control finally handed over to implementers.

Building commissioner confidence

This sequential approach gives commissioners confidence. You’re saying to them that this isn’t a “one shot, put all your money on the table up-front” model. It’s about gradually building knowledge and providing gate-keeping points where a commissioner can ultimately say: “This isn’t working, we need to rethink. We may need to reinvest or, even disinvest.” That’s helpful to commissioners, especially if they are being asked to back innovation that feels risky.

Small ‘n’ designs

In some cases, RCTs are not possible, but there are many alternative models of impact evaluation that could be considered for SIBs. “Small n” designs provide ways to think about causal attribution where a programme does not have sufficient numbers to allow a traditional impact evaluation design. Process tracing is an example of “small n” design, where one uses theory to identify critical points in a change process that need to be tested. Then one selects cases to test these critical points, using interviews and observations of what’s going on. This Popperian approach acknowledges that there is no absolute objective knowledge. However, it can find ‘smoking gun’ evidence that strongly suggests causality, even if that may not amount to absolute proof.

These process approaches that search out causality would be an improvement on current tests of some SIB or PBR programmes which, if they can’t do an RCT, tend to opt for process/implementation evaluations that are less demanding - usually interviewing stakeholders and writing a report, but lacking a more theory-driven approach.

More rigour is needed

I’ve set out ways in which SIB and PBR evaluations could be improved by RCTs or hybrids that avoid the unnecessarily polarised debate between the pro- and anti-RCT lobbies. Beyond RCTs, there are other approaches to evaluating causality, suitable in instances where there are small numbers of cases. We should learn from this wider discussion of evaluation techniques. Academics owe it to those investing and working in SIBs to ensure that policy makers adopt a rigorous approach to evaluation. We need to know what works and what doesn’t if SIBs are ever to be widely adopted.

Chris Fox is Professor of Evaluation and Policy Analysis and Director of the Policy Evaluation and Research Unit at Manchester Metropolitan University. He is co-author of “Payment by results and social impact bonds: Outcome-based payment systems in the UK and US”, published by Policy Press in February 2018.
https://policypress.co.uk/payment-by-results-and-social-impact-bonds