Notes
Article history
The research reported in this issue of the journal was funded by the HS&DR programme or one of its preceding programmes as project number 14/19/19. The contractual start date was in March 2015. The final report began editorial review in March 2017 and was accepted for publication in July 2017. The authors have been wholly responsible for all data collection, analysis and interpretation, and for writing up their work. The HS&DR editors and production house have tried to ensure the accuracy of the authors’ report and would like to thank the reviewers for their constructive comments on the final report document. However, they do not accept liability for damages or losses arising from material published in this report.
Declared competing interests of authors
Geoff Wong is a member of the National Institute for Health Research Health Technology Assessment programme Primary Care Panel, and is a panel member of the Health and Safety Executive External Peer Review Panel Evaluation Governance Group. During the course of the project Gill Westhorp worked as a consultant and consulting academic undertaking realist evaluations and reviews, and provided some capacity building and some PhD supervision on a commercial basis. These activities were not undertaken under the auspices of this project.
Permissions
Copyright statement
© Queen’s Printer and Controller of HMSO 2017. This work was produced by Wong et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.
2017 Queen’s Printer and Controller of HMSO
Chapter 1 Background
Many of the problems confronting policy- and decision-makers, evaluators and researchers today are complex. For example, much health service demand results from the effects of smoking, suboptimal diets (including obesity), excessive alcohol, inactivity or adverse family circumstances (e.g. partner violence), all of which, in turn, have multiple causes operating at both individual and societal level. Interventions or programmes designed to tackle such problems are themselves complex, often having multiple, interconnected components delivered individually or targeted at communities or populations. Their success depends both on individuals’ responses and on the wider context in which people strive (or not) to live healthy lives. What works in one family, one organisation or one city may not work in another.
Similarly, the ‘wicked problems’ of contemporary health services research – how to improve quality and assure patient safety consistently across the service, how to meet rising need from a shrinking budget and how to realise the potential of information and communication technologies (which often promise more than they deliver) – require complex delivery programmes with multiple, interlocked components that engage with the particularities of context. What works in hospital A may not work in hospital B.
Designing and evaluating complex interventions is challenging. Randomised trials that compare ‘intervention on’ with ‘intervention off’, and their secondary research equivalent, meta-analyses of such trials, may produce statistically accurate statements (e.g. that the intervention works ‘on average’), but may leave us none the wiser about where to target resources or how to maximise impact.
Realist evaluation seeks to address these problems. It is a form of theory-driven evaluation, based on realist philosophy,1 that aims to advance understanding of why these complex interventions work, how, for whom, in what context and to what extent, as well as to explain the many situations in which a programme fails to achieve the anticipated benefit.
Realist evaluation assumes both that social systems and structures are ‘real’ (because they have real effects) and that human actors respond differently to interventions in different circumstances. To understand how an intervention might generate different outcomes in different circumstances, realism introduces the concept of mechanisms – which may be helpfully conceptualised as underlying changes in the reasoning of participants who are triggered in particular contexts. 2 For example, a school-based feeding programme may work by relieving hunger in young children in a low-income rural setting where famine has produced overt nutritional deficiencies, but for teenagers in a troubled inner-city community where many young people are disaffected, it may work chiefly by making pupils feel valued and nurtured. 3 What constitutes ‘working’ is also likely to be somewhat different in the two settings.
Realist evaluations have addressed numerous topics of central relevance in health services research, including what works and for whom when ‘modernising’ health services,4 introducing breastfeeding support groups,5 using communities of practice to drive change,6 involving patients and the public in research,7 how robotic surgery impacts on team-working and decision-making within the operating theatre8 and fines for delays in discharge from hospitals. 9 They have also been used in fields as diverse as international development, education, crime prevention and climate change.
What is realist evaluation?
Realist evaluation was developed by Pawson and Tilley in the 1990s,10 originally in the field of criminology, to address the question, ‘what works for whom in what circumstances and how?’ in criminal justice interventions. This early work highlighted the following points:
-
Social programmes (closely akin to what health service researchers call complex interventions) are an attempt to address an existing social problem (i.e. to create some level of social change).
-
Programmes ‘work’ by enabling participants to make different choices (although choice-making is always constrained by such things as participants’ previous experiences, beliefs and attitudes, opportunities and access to resources).
-
Making and sustaining different choices may require a change in a participant’s reasoning (e.g. in their values, beliefs, attitudes or the logic they apply to a particular situation) and/or the resources (e.g. information, skills, material resources, support) they have available to them. Programmes provide opportunities and resources. The interaction between what the programme provides and the participant’s ‘reasoning’ is what enables the programme to ‘work’ and is known as a ‘mechanism’.
-
Programmes work in different ways for different people (that is, the contexts within programmes can trigger different change mechanisms for different participants).
-
The contexts in which programmes operate make a difference to the outcomes they achieve. Programme contexts include features such as social, economic and political structures, organisational context, programme participants, programme staffing, geographical and historical context, and so on. In realist terms, context does not simply denote spatial, geographical or institutional locations. Context refers, among other things, to the sets of ‘social rules, norms values and interrelationships’ that operate within these locations. 10
-
Some aspects of the context enable particular mechanisms to be triggered. Other aspects of the context may prevent particular mechanisms from being triggered. That is, there is always an interaction between context and mechanism, and that interaction is what creates the programme’s impacts or outcomes: context + mechanism = outcome.
-
Because programmes work differently in different contexts and through different change mechanisms, they cannot simply be replicated from one context to another and automatically achieve the same outcomes. Theory-based understandings about ‘what works for whom, in what contexts, and how’ are, however, transferable.
-
Therefore, one of the tasks of evaluation is to learn more about: ‘what works’, in what respects and to what extent, including intended and unintended outcomes; ‘for whom’, that is, for which subgroups of participants; ‘in which contexts’; and ‘what mechanisms are triggered by what programmes in what contexts’.
A realist evaluation approach assumes that programmes are ‘theories incarnate’. That is, whenever a programme is implemented, it rests on a theory about what ‘might cause change’, even though that theory may not be explicit. One of the tasks of a realist evaluation is, therefore, to make the theories underpinning a programme explicit, by developing clear hypotheses about how, and for whom, programmes might ‘work’. The implementation of the programme, and the evaluation of it, then tests those hypotheses. This means collecting data, not just about programme impacts or the processes of programme implementation, but about the specific aspects of context that might impact on programme intended and unintended outcomes, and about the specific mechanisms that might be creating change.
Pawson and Tilley10 also argue that a realist approach has particular implications for the methods required to evaluate a programme. For example, rather than comparing changes for participants who have undertaken a programme with a group of people who have not (as is done in randomised controlled or quasi-experimental designs), a realist evaluation compares context–mechanism–outcome configurations (CMOCs) within programmes. It may ask, for example, whether a programme works more or less well, and/or through different mechanisms, in different localities (and if so, how and why) or for different subgroups of the population. Furthermore, they argue that different stakeholders will have different information and understandings about how programmes are supposed to work and whether or not they in fact do so. Data collection processes (interviews, focus groups, questionnaires and so on) should be constructed to identify and collect the particular information that those stakeholder groups will have, and thereby to confirm, refute or refine theories about how and for whom the programme ‘works’.
Realist evaluation is underpinned by a realist philosophy of science (‘realism’). 11 Philosophically speaking, realism can be thought of as sitting between positivism (‘there is a real external world which we can come to know directly through experiment and observation’) and constructivism (‘given that all we can know has been interpreted through human senses and the human brain, we cannot know for sure what the nature of reality is’). However, it is worth pointing out that this is not to suggest that ‘constructivism’ and ‘positivism’ represent opposite poles on the same continuum. Realism holds that there is a real social world but that our knowledge of it is amassed and interpreted (partially and/or imperfectly) via our senses and brains, and filtered through our language, culture and past experience. In other words, realism sees the human agent as operating in a wider social reality, encountering experiences, opportunities and resources, and interpreting and responding to the world within particular personal, social, historical and cultural frames. For this reason, different people respond differently to the same experiences, opportunities and resources. Hence, a programme (or, in the language of health services research, a complex intervention) aimed at improving health outcomes is likely to have different levels of success with participants in different contexts, and even in the same context at different times.
The need for standards and training materials in realist evaluation
The RAMESES JISCMail listserv [www.jiscmail.ac.uk/RAMESES (an e-mail list for discussing realist approaches)] postings suggest that enthusiasm for realist evaluation and belief in its potential for application in many fields have outstripped the development and application of robust quality standards in the field. Two important prior publications have systematically shown that many so-called ‘realist evaluations’ were not applying the concepts appropriately and were, as a result, producing potentially misleading findings and recommendations. 12,13
Pawson and Manzano-Santaella, in their paper, ‘A realist diagnostic workshop’, used case examples of flawed realist evaluations to highlight three common errors in such studies. 13 First, while it is possible to show associations and correlations in data from many types of evaluation, the focus of a realist evaluation should be to explore and explain why such associations occur. Second, they explain what may constitute valid data for use in realist evaluation. Producing a realist explanation is likely to require a mix of data types to provide explanations and support for the relationships within and between CMOCs. Third, realist explanations require CMOCs to be produced. Pawson and Manzano-Santaella note that some realist evaluations have presented finely detailed lists of contexts, mechanisms and outcomes, but have failed to produce a coherent explanation of how these contexts, mechanisms and outcomes were linked and related, or not related, to each other. Pawson and Manzano-Santaella called for greater emphasis on elucidating programme theory (the theory about what a programme or intervention is expected to do and, in some cases, how it is expected to work) expressed as CMO configurations.
Marchal et al. 12 undertook a review of the realist evaluation literature in health systems research to quantify and analyse the field. They identified 18 realist evaluations and noted a range of challenges that arose for researchers. Absence of prior theoretical and methodological guidance appeared to have led to recurring problems in the realist evaluations they appraised. Marchal et al. 12 noted that ‘[t]he philosophical principles that underlie realist evaluation are variably interpreted and applied to different degrees’. Different researchers had conceptualised concepts used in realist evaluation such as ‘middle-range theory’, ‘mechanism’ and ‘context’ differently. This, they concluded, was often related to fundamental misunderstandings, and the rigour of the evaluation suffered as a result.
These two papers12,13 showed that, although realist evaluation had been embraced by parts of the health research community, it had also proven a challenging task for some who were unfamiliar with the practical application of realism. Both sets of authors called for methodological guidance to allay misunderstandings about the purpose, underlying philosophical assumptions, analytic concepts and methods of realist evaluation.
Chapter 2 Methods
Objectives
The project had both strategic and operational objectives, and, because it was funded through the health sector, the objectives were framed in relation to health. However, representatives from beyond the health sector were involved to ensure that the products were relevant to any realist evaluation.
Strategic objectives
-
To develop quality standards, reporting guidance and resources and training materials for realist evaluation.
-
To build capacity in health services research for supporting and assessing realist approaches to research.
-
Acknowledging the unique potential of realist research to address the patient’s agenda (‘what will work for us in our circumstances?’), to produce resources and training materials for lay participants, and those seeking to involve them, in research.
Operational objectives
-
Recruit an interdisciplinary Delphi panel of, for example, researchers, support staff, policy-makers, patient advocates and practitioners with various types of experience relevant to realist evaluation.
-
Summarise the current literature and expert opinion on best practice in realist evaluation, to serve as a baseline/briefing document for the panel.
-
Run three rounds (and more if needed) of the online Delphi panel to generate and refine items for a set of quality standards and reporting guidance.
-
In parallel with the Delphi panel:
-
provide ongoing advice and consultancy to up to 10 realist evaluations, including any funded by the National Institute for Health Research (NIHR), thereby capturing the ‘real-world’ problems and challenges of this methodology
-
host the RAMESES JISCMail list on realist research, capturing relevant discussions about theoretical, methodological and practical issues
-
feed problems and insights from 4a and 4b into the deliberations of the Delphi panel.
-
-
Write up the quality standards and guidance for reporting in an open-access journal.
-
Collate examples of learning/training needs for researchers, postgraduate students and peer reviewers in relation to realist evaluation.
-
Develop, deliver and refine resources and training materials for realist evaluation. Deliver three 2-day ‘realist evaluation’ workshops and three 2-day ‘training the trainers’ workshops for a range of audiences [including interested NIHR Research Design Service (RDS) staff].
-
Develop, deliver and refine information and resources for patients and other lay participants in realist evaluation. In particular, draft template information sheets and consent forms that could be adapted for ethics and governance activity.
-
Disseminate training materials and other resources, for example via public-access websites.
Overview of methods
We first provide a brief overview of the range of methods we used to meet the objectives set out above and of how they related to each other. The methods we used in this project closely resemble those we used in another project (the RAMESES project), which developed methodological guidance, reporting standards and training materials for realist and meta-narrative reviews. 14 We have previously published a protocol paper that outlined the methods we intended to use in this project. 15 The following methods sections outline, in more detail, specific aspects of the methods used.
To fulfil operational objectives 1 and 2, we undertook a thematic review of the literature. Findings were supplemented by our content expertise and with feedback collated from presentations and workshops for researchers using or intending to use realist evaluation. We synthesised our findings into briefing materials on realist evaluation for the Delphi panel. We recruited members to the Delphi panel, which had wide representation from researchers, students, policy-makers, evaluators, theorists and research sponsors. We used the briefing materials to inform the Delphi panel in preparation for the task, so they could contribute to developing standards (objective 3). For the advice and consultancy to realist evaluations (objective 4a), we drew on our experience in conducting realist evaluations and developing and delivering education materials, but also on relevant feedback from the Delphi panel, an e-mail list on realist research approaches (www.jiscmail.ac.uk/RAMESES) and the evaluations teams we had supported in the past. To help us refine our reporting standards (objective 5), we captured methodological and other challenges that arose within the realist evaluation projects to which we provided methodological support. All of these sources fed into the reporting standards, quality standards and resources and training materials (objective 7). We did not set specific time points when we would refine the drafts of our project outputs. Instead, we iteratively and contemporaneously fed the data we captured into our draft reporting standards, quality standards and resources and training materials, making changes gradually. Only our Delphi panel ran within a specific time frame. The final guidance and standards were, therefore, the product of continuous refinements. To understand and develop information and resources for patients and other lay participants in realist evaluation (objective 8), we convened a group consisting of patients and the public. We addressed objective 9 through academic publications, online resources and delivery of presentations and workshops. The project was overseen by a Project Advisory Group, which comprised three independent members (see Acknowledgements). This group met with the project team on three occasions (May 2015, November 2015 and May 2016) and provided advice to the project team. Figure 1 provides a pictorial overview of how the different methods we used fed into each other.
Details of literature search methods
With input from an expert librarian, we identified reviews, scholarly commentaries, models of good practice and examples of (alleged) misapplication of realist evaluation. To identify the relevant documents we refined and developed the search used by Marchal et al. 12 for a previous review on a similar topic, and also applied contemporary search methods designed to identify ‘richness’ when exploring complex interventions. 16,17
A search was conducted on 3 March 2015 across 10 databases. Free-text terms were selected to describe realist methods and thesaurus terms were used where available (see Appendix 1). The following databases were searched:
-
Cumulative Index to Nursing and Allied Health Literature (CINAHL; via EBSCOhost)
-
The Cochrane Library (Wiley Online Library)
-
Dissertations & Theses database (ProQuest)
-
EMBASE (via OvidSP)
-
Education Resources Information Center (ERIC; via EBSCOhost)
-
Global Health (via OvidSP)
-
MEDLINE and MEDLINE In-Process & Other Non-Indexed Citations (via OvidSP)
-
PsycINFO (via OvidSP)
-
Scopus, Science Citation Index (SCI), Social Science Citation Index (SSCI) & Conference Proceedings, Citation Index – Science (CPCI-S)
-
Web of Science Core Collection (Thomson Reuters Corporation, New York, NY, USA).
A forward citation search was conducted via the Web of Science Core Collection for the following key text: Pawson R, Tilley N. Realistic Evaluation. London: Sage; 1997. 10
No language or study design filters were applied. We included any document that referred to or claimed to be a realist evaluation that used the approach as set out by Pawson and Tilley in their key publication, Realistic Evaluation. 10 Documents were excluded if they were not realist evaluations, published prior to the year 2000, book reviews, letters and comment. We set the cut-off point at 2000, as we assumed that evaluations based on Pawson and Tilley’s work would begin appearing in the literature from this point onwards. All citation screening was undertaken by Geoff Wong. The whole searching process, from start to the retrieval of all full-text documents, took approximately 1 month.
We decided that, because of the narrow purpose of our review and the number of relevant citations retrieved, we would stop analysing data when we had reached thematic saturation. As a strategy to manage the potential number of realist evaluations, we decided to start our analysis and synthesis from the most recent (i.e. from 2015) realist evaluations and work ‘backwards’. The decision on when thematic saturation had been reached was made in discussion with the whole project team. For both practical reasons (e.g. resource constraints) and academic ones (no new data), we stopped including new papers when there was agreement that saturation of themes had been reached. Thematic saturation was reached once the group agreed by consensus that the new realist evaluations identified contained no new themes or only subthemes that related to the three questions listed below in bullet points.
The thematic analysis was led by Geoff Wong, who undertook all stages of the review and shared findings with the rest of the project team so that discussion, debate and refinement of interpretations of the data could take place. Findings were shared by e-mail and, when necessary, face-to-face meetings were conducted to discuss interpretations of the data.
In undertaking our thematic analysis, we familiarised ourselves with the included evaluations to identify patterns in the data. Aware that the purpose of the review was to produce briefing documents for the Delphi panel, we considered the following questions:
-
What is considered by experts in realist evaluation to be current best practice (and what is the range and diversity of such practice)?
-
What do experts in realist evaluation, and other researchers who have undertaken a realist evaluation, believe counts as high quality and necessary to report?
-
What issues do researchers struggle with (based on thematic analysis of postings on the RAMESES JISCMail list archive as well as the published literature)?
In the panels, we wanted to achieve a consensus on quality and reporting standards, and thus what we needed from our review of the literature were data to inform us on what might constitute quality in executing and reporting realist evaluations. We accepted that we might need to refine, discard or add additional questions and topic areas in order to better capture our analysis and understanding of the literature as these emerged from our reading of the evaluations.
Data were extracted to a Microsoft Excel® (Microsoft Corporation, Redmond, WA, USA) spreadsheet that we iteratively refined to capture the data needed to produce our briefing materials. This review was undertaken in a short time frame. The time taken from obtaining full-text documents to producing the final draft for circulation of the briefing documents was approximately 12 weeks. The output of this phase was a provisional summary that addressed the questions above and highlighted, for each question, the key areas of knowledge, ignorance, ambiguity and uncertainty. This was distributed to the Delphi panel (as our briefing document) as the starting point for its work.
Our purpose in identifying published reviews was not to complete a census of realist evaluations. We make no claims that the review we undertook was exhaustive; thus, we never intended that it should be published as a stand-alone piece of research. In other words, the purpose of our review was not to produce definitive summaries in response to the themes above but to prepare a baseline set of briefing materials for the Delphi panel, and to deliberate on and add to them in the next step. As such, the review we undertook would be best considered as being a rapid, accelerated or truncated thematic review. Such an approach will predictably produce limitations, and these are discussed in Chapter 4, Limitations.
Details of online Delphi process
We recruited Delphi panel members purposefully, to ensure that we had representation from evaluators, researchers, funders, journal editors and experts in realist evaluation. Individuals were recruited through relevant organisations and targeted e-mails, and also through personal contacts and recommendations. Those interested in participating were provided with an outline of the study, and individuals who indicated the greatest commitment and potential to balance the sample were selected.
The Delphi panel was run online using SurveyMonkey (SurveyMonkey, Palo Alto, CA, USA). Participants in round 1 were provided with the briefing materials we developed from the literature review and were invited to suggest what might be included in the reporting standards. Responses were analysed and fed into the design of questionnaire items for round 2.
In round 2 of the Delphi Panel, participants were asked to rank each potential item twice on a 7-point Likert scale (1 = strongly disagree to 7 = strongly agree), once for relevance (i.e. ’Should an item on this theme/topic be included at all in the guidance?’) and once for validity (i.e. ’To what extent do you agree with this item as currently worded?’). Those who agreed that an item was relevant, but disagreed on its wording, were invited to suggest changes to the wording via a free-text comments box. In this second round, participants were again invited to suggest additional topic areas and items. We did not prespecify stop-points for establishing when consensus has been achieved. This was because we wanted to have the flexibility to return to the Delphi panel items that we judged might need further input. Although we accept that this may have enabled us to preferentially return some items and not others, we guarded against this by sending all Delphi panel members an end-of-round report detailing all the findings, changes made to the text and items to be returned to the next round. Panel members were invited to contact us should they have any concerns with the items that were not returned for re-rating, such as believing that the item should be returned to the panel, or disagreeing with wording changes.
Participants’ responses were collated and the numerical rankings were entered onto an Excel spreadsheet. The response rate, average, mode, median and interquartile range (IQR) for each item was calculated. Items that scored low on relevance were omitted from subsequent rounds. We invited further online discussion on items that scored high on relevance but low on validity (indicating that a rephrased version of the item was needed) and on those for which there was wide disagreement about relevance or validity. The panel members’ free-text comments were also collated and analysed thematically.
Following analysis and discussion within the project team, we drew up a second list of statements that were circulated for ranking (round 3). Round 3 contained items for which consensus had not yet been reached. For items on which consensus had been reached, we did not return these to rounds 3, 4 or beyond for panel members to re-rate, even if we had made changes to the wording. This was because, when we undertook the RAMESES project, we had received informal feedback from the Delphi panel members indicating that round 2 of the online Delphi process had been very time-consuming. We were advised that to retain a high response rate for subsequent rounds, we should minimise the time commitment we asked of panel members. We planned that the process of collating responses, further e-mail discussion and re-ranking would be repeated until a maximum consensus was reached (rounds 4, 5, and so on). In practice, very few Delphi panels, online or face to face, go beyond three rounds because participants tend to ‘agree to differ’ rather than move towards further consensus. We used e-mail reminders to optimise our response rate from Delphi panel members. We considered consensus to be achieved when the median score was 6 or above.
We planned to report residual non-consensus as such and to report the nature of the dissent described (if any). Making such dissent explicit tends to expose inherent ambiguities, which may be philosophical or practical, and acknowledges that not everything can be resolved; such findings may be more use to those who use realist evaluation than a firm statement that implies that all tensions have been fixed. We used the findings from the Delphi panel to develop the reporting standards and methodological quality standards for realist evaluations.
Developing quality standards
The quality standards were designed to support professional development, assist evaluators to assess the quality of various aspects of the evaluation process and to assist reviewers with meta-evaluation (i.e. assessing the quality of evaluations).
To develop the quality standards, we drew on the following sources of data:
-
free-text comments from participants and findings from the Delphi panels
-
personal expertise as evaluators, researchers, peer reviewers and trainers in the field
-
feedback from participants at workshops and training sessions run by members of the project team
-
comments made on RAMESES JISCMail.
The data from the sources above were collated contemporaneously and discussed within the project team. Iterative cycles of discussion and revisions for content and clarity of the drafts were needed to develop the standards. Box 1 provides an illustration of how we drew on the data sources to produce the quality standards.
As evaluators, researchers and trainers in realist evaluation, we had noted that there was some confusion among researchers about the nature, need and role of realist programme theory (or theories) in realist evaluations. To develop the briefing materials and initial drafts of the reporting standards for realist evaluations, we searched for and analysed a number of published evaluations and noted that our impressions were well founded.
When providing methodological support for a realist evaluation, the importance of programme theory emerged again. One of the project team commented, ‘I felt the development of the initial “programme theory” pulled things together . . .’ In our Delphi process, we encouraged participants to provide free-text comments. These closely reflected the comments we received about the importance of programme theory.
Development of the quality criteriaWe drew on our content expertise of the topic area and published methodological literature to develop the quality criteria. In addition, we found that some of our Delphi panel participants provided us with clear indications that supported the criteria we set. For example, we suggested that realist evaluations should develop a programme theory and one that did not was ‘inadequate’. Delphi panel participants’ free-text comments echoed our suggestion:
Really important . . .
Initial programme theories will be clearly stated . . .
Many people’s efforts at realist evaluation fall at the programme theory stage . . .
We were also able to draw on the discussions that took place on JISCMail to support some of our criteria. For example, under ‘adequate’, we suggested that: ‘initial tentative programme theory (or theories) were identified and (as far as possible) described in realist terms (that is, in terms of the causal relationship between contexts, mechanisms and outcomes). These were refined as the evaluation progressed’.
As illustration, a comment from JISCMail that we drew upon to support this criterion was:
It’s good to read that you are planning to develop a programme theory. It may be that even before you start data collection that you may wish to develop an initial ‘best guess’ programme theory of the . . . intervention. Do not worry that it may be a best guess and has no CMOCs (i.e. is not particularly realist in nature) – it is a starting point. As the evaluation progresses your job is to gradually (iteratively) ‘convert’ it into a more detailed realist programme theory that has data to support any inferences you have made.
Developing, delivering and refining resources and training materials for realist evaluation
An important part of our project was to produce publicly accessible resources to support training in realist evaluations. We anticipated that these resources will need to be adapted, and perhaps supplemented, for different groups of learners, and interactive learning activities added. We developed, and iteratively refined, draft learning objectives, example course materials and teaching and learning support methods. We drew on a range of sources to inform the content and format of our training materials as well as our experience as trainers and consultants on realist evaluations.
We sought out examples of the kinds of requests that are often made by evaluators for support on realist evaluation, for example using the rich archive of postings on the RAMESES JISCMail listserv from both novice and highly experienced practitioners, going back 3 years. We also proactively asked the list members for additional examples, and used our empirical data from the Delphi panel and our literature review to identify relevant examples. Finally, we sought input from UK RDS staff interested in realist evaluation to describe the kind of problems people bring to them, and where they feel that further guidance, support and resources are needed.
We used a thematic approach to classify examples into a list of problems and issues, each with a corresponding training need(s) and resources to address them. These were developed iteratively in regular discussions and meetings of the research team. Our goal was to develop a coherent and comprehensive curriculum for training realist researchers and for ‘training the trainers’.
Support and consultancy to realist evaluations
The support we offered to fellow evaluators and researchers using realist evaluations consisted of two overlapping and complementary levels:
-
Online discussion and support via JISCMail for evaluators and researchers, at any level, interested in or undertaking a realist evaluation. When questions or issues were raised, either one of the project team or another list member would reply. Where necessary, summaries were made of discussions and clarification was provided by members of the project team.
-
Direct requests for support and training. During the course of the study, members of the project team were frequently approached to provide methodological support to realist evaluation projects. The exact content, nature and duration of the support provided was discussed between the relevant team members to ensure that what was provided met the needs of those who requested the support.
Realist evaluation and ‘training the trainers’ workshops
Throughout the 24 months of the project, members of the project team offered training workshops to other evaluators, researchers and patient organisations on an as-requested basis. When asked to provide a workshop, the logistics and content of each workshop were discussed between the relevant project team member and the hosts.
For the ‘training the trainers’ workshops, we engaged with the NIHR’s RDS. We did this by e-mailing each regional service and also asking for expressions of interest via e-mail lists and personal contacts.
Develop, deliver and refine information and resources for patients and other lay participants in realist evaluation
To develop these resources, we convened a panel of lay participants with the help of the Patient and Public Involvement Co-ordinator from the Nuffield Department of Primary Care Health Sciences at the University of Oxford. We sought to invite lay participants who had been involved in research studies and came from a range of backgrounds and ages. During the panel, we sought to understand what lay participants might wish to know if they were to participate in a realist evaluation and provided examples of the potential materials for their consideration.
Chapter 3 Results
We produced four outputs related to realist evaluations for this project, namely:
-
reporting standards
-
methodological quality standards
-
resources and training materials (for researchers, evaluators and lay participants)
-
capacity building.
This chapter provides details of the results we obtained from the methods and approaches we used, and how they contributed to the content of our outputs.
Literature search
We searched 10 electronic databases from inception (where applicable) to March 2015 and, along with citation tracking, retrieved 4426 documents after removal of duplications. A total of 1498 duplicates were removed, along with a further 737 papers that did not meet our inclusion criteria. A total of 2191 papers were screened by title and abstract for inclusion with 1503 excluded at this stage. Figure 2 shows the disposition of the documents and Table 1 the number of citations returned for each database searched.
Database | Number of citations returned |
---|---|
CINAHL | 215 |
The Cochrane Library | 26 |
Dissertations & Theses | 147 |
EMBASE | 484 |
ERIC | 209 |
Global Health | 94 |
MEDLINE and MEDLINE In-Process & Other Non-Indexed Citations | 455 |
PsycINFO | 533 |
Scopus, SCI, SSCI and CPCI-S | 854 |
Web of Science Core Collection | 340 |
Citation tracking | |
Pawson R, Tilley N. Realistic Evaluation. London: Sage; 199710 | 1069 |
One of the project team (GWo) screened the abstracts and titles and included documents that claimed to be realist evaluations. In total, 152 documents were judged to be realist evaluations. Because of the narrow focus of our review of the literature on realist evaluations, as discussed in Chapter 2, Details of literature search methods, we worked ‘backward’ from 2015 to earlier years and sought to stop analysis at the point of thematic saturation. We achieved thematic saturation after analysis of 37 out of the 152 realist evaluations. Out of these realist evaluations, 32 (from years 2015 and 2014 inclusively) evaluated health-related topics, and five (from years 2015 to 2012 inclusively) evaluated non-health-related topics. We made this distinction to ensure that we analysed realist evaluations that covered a range of topic areas, as the approach is used in a broad range of topic areas beyond health research. Hence, Table 2 shows only the characteristics of the documents we analysed (evaluation title, type of document, year submitted for publication and topic area) and drew on to produce our briefing document for the Delphi panel.
Study title (reference and reference number) | Year submitted | Topic area |
---|---|---|
Health-related realist evaluations | ||
Grades in formative workplace-based assessment: a study of what works for whom and why (Lefroy et al.18) | 2015 | Education – medical (work-based assessment) |
What works in ‘real life’ to facilitate home deaths and fewer hospital admissions for those at end of life?: results from a realist evaluation of new palliative care services in two English counties (Wye et al.19) | 2015 | Palliative care (home death and hospital admissions) |
Faculty development for educators: a realist evaluation (Sorinola et al.20) | 2014 | Education – medical (faculty development) |
Reducing emergency bed-days for older people? Network governance lessons from the ‘Improving the Future for Older People’ programme (Sheaff et al.21) | 2014 | Emergency bed-days for older people |
Using interactive workshops to prompt knowledge exchange: a realist evaluation of a knowledge to action initiative (Rushmer et al.22) | 2014 | Interactive workshops for knowledge exchange |
Can complex health interventions be evaluated using routine clinical and administrative data? – a realist evaluation approach (Riippa et al.23) | 2014 | Use of routinely collected data for evaluating complex interventions |
Introducing Malaria Rapid Diagnostic Tests (MRDTs) at registered retail pharmacies in Ghana: practitioners’ perspective (Rauf et al.24) | 2014 | Implementation of malaria rapid diagnostic tests in retail pharmacies |
Advancing the application of systems thinking in health: a realist evaluation of a capacity building programme for district managers in Tumkur, India (Prashanth et al.25) | 2014 | Capacity building programme for district health managers |
Stroke patients’ utilisation of extrinsic feedback from computer-based technology in the home: a multiple case study realistic evaluation (Parker et al.26) | 2014 | Stroke rehabilitation using computer-based technology |
Educational system factors that engage resident physicians in an integrated quality improvement curriculum at a VA hospital: a realist evaluation (Ogrinc et al.27) | 2014 | Quality improvement in resident physician training |
Realistic nurse-led policy implementation, optimization and evaluation: novel methodological exemplar (Noyes et al.28) | 2014 | Policy implementation |
Putting context into organizational intervention design: using tailored questionnaires to measure initiatives for worker well-being (Nielsen et al.29) | 2014 | Work well-being |
Mechanisms that support the assessment of interpersonal skills: a realistic evaluation of the interpersonal skills profile in pre-registration nursing students (Meier et al.30) | 2014 | Interpersonal skills assessment |
Factors affecting the successful implementation and sustainability of the Liverpool Care Pathway for dying patients: a realist evaluation (McConnell et al.31) | 2014 | Palliative care – Liverpool Care Pathway |
Towards a programme theory for fidelity in the evaluation of complex interventions (Masterson-Algar et al.32) | 2014 | Implementation fidelity – complex rehabilitation intervention for patients with stroke |
Action learning sets in a nursing and midwifery practice learning context: a realistic evaluation (Machin and Pearson33) | 2014 | Education – action learning sets in nursing |
Advancing the application of systems thinking in health: realist evaluation of the Leadership Development Programme for district manager decision-making in Ghana (Kwamie et al.34) | 2014 | Leadership development programme |
Adolescents developing life skills for managing type 1 diabetes: a qualitative, realistic evaluation of a guided self-determination-youth intervention (Husted et al.35) | 2014 | Chronic disease management – use of guided self-determination in diabetes |
The management of long-term sickness absence in large public sector healthcare organisations: a realist evaluation using mixed methods (Higgins et al.36) | 2014 | Sickness absence – long-term sickness absence in health-care workers |
General practitioners’ management of the long-term sick role (Higgins et al.37) | 2014 | Sickness absence – GPs’ management long-term sickness absence |
More than a checklist: a realist evaluation of supervision of mid-level health workers in rural Guatemala (Hernández et al.38) | 2014 | Supervision of mid-level health workers |
Dialysis modality decision-making for older adults with chronic kidney disease (Harwood and Clark39) | 2014 | Treatment decision-making – kidney dialysis |
Housing, health and master planning: rules of engagement (Harris et al.40) | 2014 | Housing regeneration |
Public involvement in research: assessing impact through a realist evaluation (Evans et al.41) | 2014 | Public involvement in research |
Academic practice–policy partnerships for health promotion research: experiences from three research programs (Eriksson et al.42) | 2014 | Health promotion – collaboration between academics, practitioners and policymakers |
Schools’ capacity to absorb a Healthy School approach into their operations: insights from a realist evaluation (Deschesnes et al.43) | 2014 | Health in schools |
A realist evaluation of a community-based addiction program for urban aboriginal people (Davey et al.44) | 2014 | Substance use – First Nations, Inuit and Métis populations |
Community resistance to a peer education programme in Zimbabwe (Campbell et al.45) | 2014 | Health education – peer education of HIV |
The transformative power of youth grants: sparks and ripples of change affecting marginalised youth and their communities (Blanchet-Cohen and Cook46) | 2014 | Youth empowerment |
The SMART personalised self-management system for congestive heart failure: results of a realist evaluation (Bartlett et al.47) | 2014 | Chronic disease management – use of technology for self-management of health failure |
Levels of reflective thinking and patient safety: an investigation of the mechanisms that impact on student learning in a single cohort over a 5 year curriculum (Ambrose and Ker48) | 2014 | Education – teaching patient safety to medical students |
People and teams matter in organizational change: professionals’ and managers’ experiences of changing governance and incentives in primary care (Allan et al.49) | 2014 | Health services management – organisational change |
Non-health-related realist evaluations | ||
Into the void: a realist evaluation of the eGovernment for You (EGOV4U) project (Horrocks and Budd50) | 2015 | E-services designed to tackle social exclusion and disadvantage |
Evaluating Criminal Justice Interventions in the Field of Domestic Violence – A Realist Approach (Taylor51) | 2014 | Criminal justice – domestic violence interventions |
How to use programme theory to evaluate the effectiveness of schemes designed to improve the work environment in small businesses (Olsen et al.52) | 2012 | Work environment in small businesses |
Improving outcomes for a juvenile justice model court: a realist evaluation (Kazi et al.53) | 2012 | Criminal justice – juvenile justice model court |
A model for design of tailored working environment intervention programmes for small enterprises (Hasle et al.54) | 2012 | Work environment in small enterprises |
Because many evaluation reports are not published and our search strategy focused on published materials, the great majority of documents we analysed were journal articles about evaluations rather than complete evaluation reports. We acknowledge that full evaluation reports may have provided greater detail. However, because journal articles usually require a description of both methods and findings, our focus was methodological and the literature review served only to identify issues to refer to the Delphi panel; therefore, we remain confident that the sample was adequate for the task.
We conducted a thematic analysis guided, initially, by the three questions set out above (see Chapter 2, Details of literature search methods) to produce the briefing documents for the realist evaluation Delphi panel (see Appendix 2). All the data we extracted were either entered into an Excel spreadsheet or written up directly into a draft of our briefing document. Of the three questions set out above, two refer to what experts in realist evaluation and researchers who have undertaken a realist evaluation consider to be best practice and high quality. Much of this information was contained in the documents listed in Table 2, but we also had to supplement our understanding by drawing on more methodological documents. 1,10,12,13
Our first question [what is considered by experts to be current best practice (and what is the range and diversity of such practice)?] related to perceptions of methodological rigour in the execution of realist evaluations. Addressing this question required the most immersion and analysis. With this question, we wanted to understand expert opinions about best practice to produce a high-quality realist evaluation. As a project team, we had our own ideas, but wanted to explore whether or not these were reflected in the included evaluations. We first had to decide whether or not we could agree among ourselves on which of the evaluations we analysed were of high, mixed or low quality. To do this, each evaluation was read in detail (GWo) and selected characteristics were extracted into an Excel spreadsheet. The headings on this spreadsheet were study name, type of document, year submitted, country, topic area, purpose of evaluation, understand realism?, methodological comments, lessons for methods, methods for reporting and challenges reported by reviewers’ notes.
Once completed, the spreadsheet and the full-text documents were circulated to the rest of the project team. Through e-mail discussion and debate, a consensus was achieved on which studies were deemed high, mixed or low quality. The next step in the process was to re-read each of the included evaluations to determine which evaluation practices and processes were necessary to lead to a high-quality evaluation. Later on in the project, to develop reporting standards for realist evaluations, we used these findings to inform what needed to be reported to ensure that sufficient information was available to the reader, so that they were able to make judgments about methodological rigour. This addressed our second question (what do experts and other researchers believe count as high quality and necessary to report?). Again, this was led by Geoff Wong, and each issue that needed addressing was added to a draft of the briefing documents. To further strengthen the inferences we made on issues that needed to be addressed and, hence, included in our briefing materials, we looked back through the archives of the RAMESES JISCMail e-mail listserv to identify if the issues we had included had also been raised by other researchers. We also drew on the methodological issues raised in methods papers on realist evaluations in a similar way. 12,13
The drafts of briefing materials were circulated to the project team and a consensus was achieved through discussion and debate. The briefing materials were the result of four rounds of revisions.
The contents of our briefing materials were as follows:
-
terminology
-
philosophical basis of realist evaluation
-
classification
-
title
-
rationale for using realist evaluation
-
methods
-
data collection methods
-
programme theory
-
findings
-
conclusion
-
recommendations.
The complete briefing document circulated to the Delphi panels for realist reviews and meta-narrative reviews can be found in Appendix 2.
Delphi panel
We ran the Delphi panels between May 2015 and January 2016. We recruited 35 panel members from 27 organisations across six countries. The panel members comprised evaluators of health services (23), public policy (nine), nursing (six), criminal justice (six), international development (two), contract evaluators (three), policy- and decision-makers (two), funders of evaluations (two) and publishing (two) (note that some individuals had more than one role).
We started round 1 in June 2015 and circulated the briefing materials document to the panel. We sent two chasing e-mails to all panel members, and within 8 weeks all panel members who indicated that they wanted to provide comments had done so. In round 1 of the Delphi panel, 33 members provided suggestions for items that should be included in the reporting standards and/or comments on the nature of the standards themselves. We used the suggestions from the panel members and the briefing document as the basis of the online survey for round 2.
Round 2 started at the end of September 2015 and ran until early November 2015. Panel members were invited to complete our online survey and asked to rate each potential item for relevance and validity. A copy of this survey can be found in Appendix 3. Where needed, up to three reminder e-mails were sent to the panel members. For round 2, the panel was presented with 22 items to rank. The overall response rate across all items for this round was 76%. Once the panel had completed its survey, we analysed their ratings for relevance and validity. Full details of the round 2 results can be found in Table 3. We also produced a post-round briefing document from round 2, which detailed for each item:
-
the response rate
-
mode
-
median
-
IQR
-
the action we took for each item based on the panel’s ratings
-
an anonymised list of all the free-text comments made.
Item | Relevance | Validity | ||||||
---|---|---|---|---|---|---|---|---|
Response rate (%) | Mode | Median | IQR | Response rate (%) | Mode | Median | IQR | |
Title | 28/35 (80) | 7 | 6.5 | 2.25 | 28/35 (80) | 6 | 6 | 2 |
Summary or abstract | 28/35 (80) | 7 | 6 | 1 | 28/35 (80) | 6 | 5.5 | 3 |
Rationale for evaluation | 28/35 (80) | 7 | 6 | 1 | 28/35 (80) | 6 | 5 | 2.25 |
Programme theory | 27/35 (77) | 7 | 7 | 0 | 27/35 (77) | 7 | 7 | 2 |
Evaluation questions, objectives and focus | 27/35 (77) | 7 | 7 | 1 | 27/35 (77) | 7 | 6 | 3 |
Ethics | 27/35 (77) | 7 | 7 | 1 | 27/35 (77) | 7 | 7 | 1 |
Rationale for using realist evaluation | 27/35 (77) | 7 | 7 | 1 | 27/35 (77) | 7 | 6 | 1.5 |
Protocol or evaluation design | 27/35 (77) | 7 | 7 | 1 | 27/35 (77) | 7 | 6 | 2.5 |
Setting(s) of the evaluation | 27/35 (77) | 7 | 7 | 1 | 27/35 (77) | 6 | 6 | 2 |
Nature of the programme being evaluated | 27/35 (77) | 7 | 7 | 1 | 27/35 (77) | 7 | 6 | 3 |
Recruitment process and sampling strategy | 26/35 (74) | 7 | 7 | 1 | 26/35 (74) | 7 | 6 | 2 |
Data-gathering approachesa | 26/35 (74) | 7 | 7 | 0.75 | 26/35 (74) | 7 | 6 | 1.75 |
Data documentationa | 26/35 (74) | 6 | 6 | 1.75 | 26/35 (74) | 5 | 5.5 | 1 |
Data analysis | 26/35 (74) | 7 | 7 | 0.75 | 26/35 (74) | 7 | 6 | 1.75 |
Processes used to ensure qualityb | 26/35 (74) | 7 | 6 | 3 | 26/35 (74) | 7 | 5 | 2.75 |
Characteristics of participants | 26/35 (74) | 7 | 6.5 | 1 | 26/35 (74) | 7 | 6 | 2 |
Main findings | 26/35 (74) | 7 | 7 | 0.75 | 26/35 (74) | 7 | 6 | 1 |
Summary of findings | 26/35 (74) | 7 | 7 | 1 | 26/35 (74) | 6 | 6 | 1 |
Strengths, limitations and future research directions | 26/35 (74) | 7 | 6.5 | 1 | 26/35 (74) | 6 | 6 | 1 |
Comparison with existing literature | 26/35 (74) | 7 | 7 | 1 | 26/35 (74) | 7 | 6.5 | 1 |
Conclusion and recommendations | 26/35 (74) | 7 | 7 | 1 | 26/35 (74) | 7 | 6 | 1.75 |
Funding | 26/35 (74) | 7 | 7 | 1 | 26/35 (74) | 7 | 7 | 1 |
Based on the rankings and free-text comments, our analysis indicated that two items needed to be merged and one item removed. Minor revisions were made to the text of the other items based on the rankings and free-text comments. After discussion within the project team, we judged that only one item (the newly created merged item) needed to be returned to round 3 of the Delphi panel. Prior to the start of round 3, the post-round briefing document from round 2 was circulated to panel members. We did not receive any communication indicating that the panel members disagreed with the actions we undertook in response to their ratings and free-text comments from round 2.
For round 3, we asked the panel to consider again only the single item for which a consensus had not been reached. We produced an online survey for round 3 and, again, asked them to rate the item for relevance and validity. To keep the panel updated, we provided it with our post-round briefing document from round 2 (available on request from authors). Round 3 ran from late November 2015 to early January 2016. A copy of this survey can be found in Appendix 4. Two reminder e-mails were sent to the panel members. Once the panel had completed its survey, we analysed its ratings for relevance and validity (Table 4). The response rate for the single item included in round 3 was 80%. We produced a post-round briefing document from round 3 and circulated this to all our panel members for the sake of completeness (available on request from authors). We did not receive any communication indicating that the panel members disagreed with the actions we undertook in response to their ratings and free-text comments from round 3. Overall, consensus was reached within three rounds on both the content and wording of a 20-item reporting standard.
Item | Relevance | Validity | ||||||
---|---|---|---|---|---|---|---|---|
Response rate (%) | Mode | Median | IQR | Response rate (%) | Mode | Median | IQR | |
Data collection methods | 28/35 (80) | 7 | 7 | 1 | 28/35 (80) | 7 | 6 | 2.25 |
Using the data we gathered from the three rounds of the Delphi panel, we produced a final set of items to be included in the reporting for realist evaluations. These were published in June 2016 in BMC Medicine, an open-access journal. 55 Within this publication, we have provided an ‘example’ for each standard; that is, an example of good practice drawn from published evaluations. Our reporting standards have also been accepted and listed on the EQUATOR (Enhancing the QUAlity and Transparency Of health Research) network, a resource centre for good reporting of health research studies (www.equator-network.org).
Developing quality standards
We developed quality standards for two user groups, which are set out using rubrics:
-
evaluators and peer reviewers of realist evaluations
-
funders or commissioners of realist evaluations.
Quality standards for evaluators and peer reviewers of realist evaluations
By peer reviewers, here, we specifically refer to individuals who have been asked to appraise the quality of completed evaluations. For each aspect of quality that requires a judgement about quality, we have provided a brief description of why the process is important, as well as descriptors of criteria against which a decision about quality might be arrived at. The quality standards for peer reviewers of realist evaluation reports are set out in Table 5.
Quality standards for realist evaluation (for evaluators and peer reviewers) | ||||
---|---|---|---|---|
1. The evaluation purpose | ||||
Realist evaluation is a theory-driven approach, rooted in a realist philosophy of science, which emphasises an understanding of causation and how causal mechanisms are shaped and constrained by context. This makes it particularly suitable for evaluations of certain topics and questions, for example complex interventions and programmes that involve human decisions and actions. A realist evaluation question contains some or all of the elements of ‘what works, how, why, for whom, to what extent and in what circumstances, in what respect and over what duration?’ and applies a realist logic to address the question(s). Above all, realist evaluation seeks to answer ‘how’ and ‘why?’ questions. Realist evaluation always seeks to explain. It assumes that programme effectiveness will always be conditional and is oriented towards improving understanding of the key contexts and mechanisms contributing to how and why programmes work | ||||
Criterion | Inadequate | Adequate | Good | Excellent |
A realist approach is suitable for the purposes of the evaluation. That is, it seeks to improve understanding of the core questions for realist evaluation |
|
|
Adequate plus:
|
Good plus:
|
The evaluation question(s) are framed to be suitable for a realist evaluation | The evaluation question(s) are not structured to reflect the elements of realist explanation. For example, the question(s):
|
The evaluation question(s) include a focus on how and why outcomes were generated in the evaluand, and contained at least some of the additional elements:for whom, in what contexts, in what respects, to what extent and over what durations | Adequate plus:
|
Good plus:
|
2. Understanding and applying a realist principle of generative causation in realist evaluations | ||||
Realist evaluations are underpinned by a realist principle of generative causation – underlying mechanisms that operate (or not) in certain contexts to generate outcomes: Context + Mechanism = Outcome (CMO). Realist evaluations aim to understand how different mechanisms generate different outcomes in different contexts. This intent influences everything from the type of evaluation question(s) to an evaluation’s design (e.g. the construction of a realist programme theory, recruitment process and sampling strategy, data collection methods, data analysis, to recommendations) | ||||
Criterion | Inadequate | Adequate | Good | Excellent |
A realist principle of generative causation is applied | Significant misunderstandings of realist generative causation are evident. Common examples include the following:
|
Some misunderstandings of realist generative causation are evident, but the overall approach is consistent enough that a recognisably realist analysis results from the process | Assumptions and methods used throughout the evaluation are consistent with a realist generative causation | Good plus:
|
3. Constructing and refining a realist programme theory or theories | ||||
At an early stage in the evaluation, the main ideas that went into the making of an intervention, programme or policy (the programme theory or theories, which may or may not be realist in nature) are surfaced and made explicit. An initial tentative programme theory (or theories) is constructed, which sets out how and why an intervention, programme or policy is thought to ‘work’ to generate the outcome(s) of interest. Where possible, this initial tentative theory (or theories) will be progressively refined over the course of the evaluation Over the course of the evaluation, if needed, programme theory (or theories) are ‘re-cast’ in realist terms (describing the contexts in which, populations for which, and main mechanisms by which, particular outcomes are, or are expected to be, achieved). Ideally, the programme theory is articulated in realist terms prior to data collection in order to guide the selection of data sources about context, mechanism and outcome. However, in some cases, this will not be possible and the product of the evaluation will be an initial realist programme theory |
||||
Criterion | Inadequate | Adequate | Good | Excellent |
An initial tentative programme theory (or theories) is identified and developed. Programme theory is ‘re-cast’ and refined as realist programme theory | Programme theory (or theories) are:
|
|
Adequate plus:
|
Good plus:
|
4. Evaluation design | ||||
Descriptions and justifications of what is planned in the evaluation design, in what order and why should be clearly articulated. Realist evaluations are ideally adaptive – that is, the evaluation question(s), scope and/or design may be adapted over the course of the evaluation to ‘test’ (confirm, refute or refine) aspects of the programme theory as it evolves. If changes are made to the evaluation design, these should be clearly described and justified. At the start of an evaluation, where possible, any changes that might be needed should be anticipated and contingencies planned | ||||
Criterion | Inadequate | Adequate | Good | Excellent |
The evaluation design is described and justified |
|
|
Adequate plus:
|
Good plus:
|
Ethical clearance is obtained if required |
|
|
|
Specific implications of realist methodology are explained in the proposal for ethics approval [e.g. the need to link data across context, mechanism and outcome; the role of the evaluator(s) in relation to other stakeholders and the programme] and specific strategies to address those implications are included |
5. Data collection methods | ||||
In a realist evaluation, a broad range of data increases the robustness of the theory ‘testing’ process and a range of methods used to collect them. Data will be required for all of context, mechanism and outcome, and to inform the relationships between them. Data collection methods should be adequate to capture not only intended, but also (as far as possible) unintended, outcomes (both positive and negative) and the context–mechanism interactions that generated them. Realist evaluation is usually multimethod (i.e. it uses more than one method to gather data). Where possible, data about outcomes should be triangulated (at least using different sources, if not different types, of information) | ||||
Criterion | Inadequate | Adequate | Good | Excellent |
Data collection methods are suitable for capturing the data needed in a realist evaluation | Within the realist evaluation project:
|
|
Adequate plus:
|
|
6. Sample recruitment strategy | ||||
In a realist evaluation, data are required for contexts, mechanisms and outcomes. One key source is respondents or key informants. Data are used to develop and refine theory about how, for whom and in what circumstances programmes generate their outcomes. This implies that any processes used to invite or recruit individuals need to identify an adequate sample of individuals who are able to provide information about contexts, mechanisms, outcomes and/or programme theory | ||||
Criterion | Inadequate | Adequate | Good | Excellent |
The respondents or key informants recruited are able to provide sufficient data needed for a realist evaluation |
|
Recruitment is:
|
Adequate plus:
|
|
7. Data analysis | ||||
Data analysis in realist evaluation is not a specific method but a way of interrogating programme theory (or theories) with data, and a way of using theory to understand patterns in data. In other words, data analysis is a way of teasing out what works, for whom, in what contexts, in what respects, over what duration and so on In a realist evaluation, where possible, the analysis process should occur iteratively. The overall approach to data analysis is retroductiveb (i.e. it moves between inductive and deductive processes, includes and tests researcher ‘hunches’ and aims to provide the best possible explanation of acknowledged-to-be-incomplete data). The processes used to analyse the data and integrate them into one or more realist programme theories should be consistent with a central principle of realism – namely generative causation. How these data are then used to further develop, confirm, refute or refine one or more programme theories should be clearly described and justified |
||||
Criterion | Inadequate | Adequate | Good | Excellent |
The overall approach to analysis is retroductiveb |
|
|
Adequate plus:
|
Good plus:
|
Data analyses processes applied to gathered data are consistent with a realist principle of generative causation |
|
|
Adequate plus:
|
Good plus:
|
Criterion | Inadequate | Adequate | Good | Excellent |
A realist logic of analysis is applied to develop and refine theory | The analysis does not:
|
|
Adequate plus:
|
Data analysis is iterative over the course of the evaluation, with earlier stages of analysis being used to refine programme theory and/or refine evaluation design for subsequent stages |
8. Reporting | ||||
Realist evaluations may be reported in multiple formats – detailed reports, summary reports, articles, websites and so on. Reports should be consistent with the RAMESES II reporting standards for realist evaluations (see https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-016-0643-1) | ||||
Criterion | Inadequate | Adequate | Good | Excellent |
The evaluation is reported using the items listed in the RAMESES II reporting standard for realist evaluations | Key items are missing. For example:
|
Most items in the RAMESES II reporting standards for realist evaluations are reported. In particular:
|
|
Good plus:
|
Findings and implications are clear and reported in formats that are consistent with realist assumptions |
|
|
Adequate plus:
|
Good plus:
|
As an illustrative example to explain how to use the layout of these quality standards, in the quality standard for ‘4. Evaluation design’, this aspect of the evaluation could be judged as being adequate if, ‘what was planned in the evaluation design, in what order and why was described and justified in detail’. For this aspect of an evaluation to be judged as ‘good’, we recommend that, as well as fulfilling the criteria for adequate (hence our use of the term ‘adequate plus’), evaluations would need to ensure, among other things, that the ‘adequate plus: the design “tested” multiple aspects of programme theory’ criteria is fulfilled.
Quality standards for funders or commissioners of realist evaluations
As more and more realist evaluations are being undertaken, those commissioning the evaluations need to pass judgements on two broad areas: the proposed evaluation design and methodological expertise. We appreciate that many funding bodies and commissioners already have systems in place to guide their decision-making processes. However, a number of agencies have sought guidance about, or training in, how to assess the methodological aspects of realist tenders and proposals they have to deal with. As such, we see this guidance we have produced not as replacement for, but as a supplement to, existing organisational decision-making processes and guidance. We are also aware that funding bodies and commissioners have differences in the degree of involvement with the evaluations they have funded or commissioned. In response to these differences, these quality standards have been designed and worded in such a way that they may be used when an evaluation is still ongoing. The quality standards for realist evaluations for funders or commissioners of realist evaluations are set out in Table 6.
Quality standards for realist evaluation (for funders or commissioners of realist evaluations) | ||||
---|---|---|---|---|
1. The evaluation purpose | ||||
Realist evaluation is a theory-driven approach, rooted in a realist philosophy of science, which emphasises an understanding of causation and how causal mechanisms are shaped and constrained by context. This makes it particularly suitable for evaluations of certain topics and questions, for example complex interventions and programmes that involve human decisions and actions. A realist evaluation question contains some or all of the elements of ‘what works, how, why, for whom, to what extent and in what circumstances, in what respect and over what duration?’ and applies a realist logic to address the question(s). Above all, realist evaluation seeks to answer ‘how?’ and ‘why?’ questions. Realist evaluation always seeks to explain. It assumes that programme effectiveness will always be conditional and is oriented towards improving understanding of the key contexts and mechanisms contributing to how and why programmes work | ||||
Criterion | Inadequate | Adequate | Good | Excellent |
A realist approach is suitable for the purposes of the evaluation |
|
|
Adequate plus:
|
Good plus:
|
The evaluation question(s) are framed in such a way as to be suitable for a realist evaluation | The evaluation question(s) are not structured to reflect the elements of realist explanation. For example, answering the questions:
|
The evaluation question(s) include a focus on how and why outcomes are likely to be generated, and contain at least some of the additional elements, ‘for whom, in what contexts, in what respects, to what extent and over what durations’ | Adequate plus:
|
Good plus:
|
2. Understanding and applying a realist principle of generative causation in realist evaluations | ||||
Realist evaluations are underpinned by a realist principle of generative causation. That is, underlying causal processes (called ‘mechanisms’) operate (or not) in certain contexts to generate outcomes. The explanatory framework is Context + Mechanism = Outcome (CMO). Realist evaluations aim to understand how different mechanisms generate different outcomes in different contexts. This intent influences everything from the type of evaluation question(s) to an evaluation’s design (e.g. the construction of a realist programme theory, recruitment process and sampling strategy, data collection methods, data analysis, to recommendations) | ||||
Criterion | Inadequate | Adequate | Good | Excellent |
A realist principle of generative causation is applied | Significant misunderstandings of realist generative causation are evident. Common misunderstandings include the following:
|
Some misunderstandings of realist generative causation exist, but the overall approach is consistent enough that a recognisably realist analysis results from the process | Assumptions and methods used throughout the evaluation are consistent with a realist generative causation | Good plus:
|
3. Constructing and refining a realist programme theory or theories | ||||
At an early stage in the evaluation, the main ideas that went into the making of an intervention, programme or policy (the programme theory or theories, which may or may not be realist in nature) are identified and described. An initial tentative programme theory (or theories) is constructed, which sets out how and why an intervention, programme or policy is thought to ‘work’ to generate the outcome(s) of interest. Where possible, this initial tentative theory (or theories) is progressively refined over the course of the evaluation Over the course of the evaluation, if needed, programme theory (or theories) is ‘re-cast’ in realist terms (describing the contexts in which, populations for which and main mechanisms by which particular outcomes are expected to be achieved). Ideally, the programme theory is articulated in realist terms prior to data collection in order to guide the selection of data sources about context, mechanism and outcome. However, in some cases, this will not be possible and the product of the evaluation will be an initial realist programme theory |
||||
Criterion | Inadequate | Adequate | Good | Excellent |
An initial tentative programme theory (or theories) is, or will be, identified and developed. Programme theory is or will be ‘re-cast’ and refined as realist programme theory | Programme theory (or theories):
|
|
Adequate plus:
|
Good plus:
|
4. Evaluation design | ||||
Descriptions and justifications of what is planned in the evaluation design, in what order and why should be clearly articulated. Realist evaluations are ideally adaptive; that is, the evaluation question(s), scope and/or design may be adapted over the course of the evaluation to ‘test’ (confirm, refute or refine) aspects of the programme theory as it evolves. If changes are made to the evaluation design, these should be clearly described and justified. At the start of an evaluation, where possible, any changes that might be needed should be anticipated and contingencies planned | ||||
Criterion | Inadequate | Adequate | Good | Excellent |
The evaluation design is described and justified |
|
|
Adequate plus:
|
Good plus:
|
Ethical clearance is or will be obtained if required | No consideration is given to whether or not the evaluation requires ethics approval | Protocols for ethics approval are considered and approval sought if required | Proposals for ethics approval clearly distinguish the implications of the evaluation for different groups and different contexts | Where relevant, specific implications of realist methodology are explained in the proposal for ethics approval and specific strategies to address those implications are provided |
5. Data collection methods | ||||
In a realist evaluation, a broad range of data increases the robustness of the theory ‘testing’ process and a range of methods used to collect data. Data will be required for all of context, mechanism and outcome, and to inform the relationships between them. Data collection methods should be adequate to capture not only intended, but also, as far as possible, unintended, outcomes (both positive and negative), and the context–mechanism interactions that generated them. Realist evaluation is usually multimethod (i.e. uses more than one method to gather data). Where possible, data about outcomes should be triangulated (at least using different sources, if not different types, of information) | ||||
Criterion | Inadequate | Adequate | Good | Excellent |
Data collection methods are suitable for capturing the data needed in a realist evaluation | Within the realist evaluation project:
|
|
Adequate plus:
|
|
6. Sample recruitment strategy | ||||
In a realist evaluation, data are required for all of the context, mechanisms and outcomes. One key source is respondents or key informants. Data are used to develop and refine theory about how, for whom and in what circumstances programmes generate their outcomes. This implies that any processes used to invite or recruit individuals need to identify an adequate sample of individuals who are able to provide information about contexts, mechanisms, outcomes and/or programme theory | ||||
Criterion | Inadequate | Adequate | Good | Excellent |
The respondents or key informants recruited are likely to be able to provide sufficient data needed for a realist evaluation |
|
Recruitment is:
|
Adequate plus:
|
|
7. Data analysis | ||||
Data analysis in realist evaluation is not a specific method but a way of interrogating programme theory (or theories) with data, and a way of using theory to understand patterns in data. In other words, data analysis is a way of teasing out what works, for whom, in what contexts, in what respects, over what duration and so on In a realist evaluation, where possible, the analysis process should occur iteratively. The overall approach to data analysis is retroductiveb (i.e. it moves between inductive and deductive processes, includes and tests researcher ‘hunches’ and aims to provide the best possible explanation of acknowledged-to-be-incomplete data). The processes used to analyse the data and integrate them into one or more realist programme theories should be consistent with a central principle of realism – namely generative causation. How these data are then used to further develop, confirm, refute or refine one or more programme theories should be clearly described and justified |
||||
Criterion | Inadequate | Adequate | Good | Excellent |
The overall approach to analysis is or will be retroductiveb |
|
|
Adequate plus:
|
Good plus:
|
Data analyses processes are consistent with a realist principle of generative causation |
|
|
Adequate plus:
|
Good plus:
|
Criterion | Inadequate | Adequate | Good | Excellent |
A realist logic of analysis is used to develop and refine theory | The analyses used or planned do not:
|
|
Adequate plus:
|
|
8. Reporting | ||||
Realist evaluations may be reported in multiple formats – detailed reports, summary reports, articles, websites and so on. Reports should be consistent with the RAMESES II reporting standards for realist evaluations (see https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-016-0643-1) | ||||
Criterion | Inadequate | Adequate | Good | Excellent |
The realist evaluation is or will be reported using the items listed in the RAMESES II reporting standards for realist evaluations |
|
|
A firm commitment is made to:
|
Good plus:
|
Developing, delivering and refining resources and training materials for realist evaluation
Two types of educational materials were developed: resource materials (made freely available online) and training materials.
The resource materials focus on topic areas that the literature review, Delphi panel and discussion list had identified as being most challenging and/or required further clarification. To make the materials accessible, we established a rough word limit of around 1000 words per topic, with very clearly defined topics, and written in as plain English as possible. This means that more introductory materials are accessible to those with very limited prior knowledge or experience of realist evaluation. It also means that more advanced readers can search for specific topics without having to wade through the more introductory resources, and that additional materials can easily be added in future.
Each of the resource materials provides references for those who wish to understand a topic area in greater depth. Many provide examples from previously completed evaluations to illustrate key points. Some provide direct links to more detailed articles on the same topic and/or to additional resources. For example, the ‘realist interview’ resource links to a longer journal article and to a list of questions that can be used in realist interviews or as a guide to start developing realist interview questions.
Most of the resource materials were written by one or two individuals within the project team and were then peer reviewed internally by a realist methodological expert. A couple were written by people outside the project team with interests in specific topics. These were each reviewed by at least two team members. The resource materials are open access and can be found on the RAMESES project website [http://ramesesproject.org (accessed 15 September 2017)].
An overview of the topic areas currently covered may be found in Table 7. Additional topics are also planned by members of the project team, to be added at a later date.
Topic area | Brief summary of contents |
---|---|
Realist evaluation, realist synthesis, realist research – what’s in a name? | Definition and explanations of the differences between realist evaluation, review and research |
What is a mechanism? What is a programme mechanism? | Explanation of the concept of a mechanism |
What do realists mean by context, or, why nothing works for everywhere for everyone | Explanation of the concept of context |
Protocols and realist evaluation | Explains what a realist evaluation protocol consists of and why |
Philosophies and evaluation design | A short description of factors to be taken into account in designing a realist evaluation and how these may differ from some other designs |
Realist evaluation and ethical considerations | Issues in writing research ethics applications and strategies to address them |
Developing realist programme theories | Processes for developing (or ‘surfacing’) initial programme theories for realist evaluations |
The realist interview | Explanation of how realist interviews differ from other interviews, and their role in realist evaluation |
Realist evaluation interviewing – a ‘starter set’ of questions | Provides evaluators with a series of example questions and the rationale for their use |
A realist understanding of programme fidelity | Discussion of the idea of fidelity within realist evaluation |
‘Theory’ in realist evaluation | Explains the different types of theory used in realist evaluation |
Working with a librarian on a realist review | Some realist evaluations involve an initial realist review. This document provides hints about how librarians may be able to assist, and how to enable them to support researchers, in realist research and evaluation |
Realist evaluation: an introduction for commissioners | A short introduction for commissioners of evaluations, including when to commission a realist evaluation, what to include in the request for tender and how to assess tenders |
Retroduction in realist evaluation | Explains what retroduction is and how it is used in realist research |
Frequently asked questions about realist evaluation | Covers the frequently asked questions about realist evaluations and signposts reader to further resources |
Support and consultancy to realist evaluations
We were approached by a wide range of evaluators who asked us for help with their realist evaluation projects. Selection was done on a ‘first-come, first-served’ basis. An overview of the 17 realist evaluation projects we provided methodological support and consultancy to may be found in Table 8.
Evaluation title | Evaluation aim(s)/questions(s)/focus | Funder/commissioner | Type of support provided |
---|---|---|---|
When cure is not likely: What do young adults with cancer and their families need and how can it best be delivered? A BRIGHTLIGHT companion study |
|
Marie Curie, UK |
|
Is bigger better? Lessons for large-scale general practice |
|
Nuffield Trust, UK |
|
Determinants of effectiveness of a novel community health workers programme in improving maternal and child health in Nigeria | To better understand to what extent, and under what conditions, a community health workers programme (with or without conditional cash transfers) contributes to achieving equitable access to quality services and maternal and child health outcomes in Nigeria |
MRC Joint DFID/ESRC/MRC/Wellcome Trust Health Systems Research Initiative |
|
Investigating the communications component of dental complaints: towards a needs-based communications resource | To explore the characteristics of dental communication between dentists who are vulnerable to receiving complaints and their patients, so as to design a needs-based communications resource | NIHR doctoral fellowship | Assistance with study design and initial programme theory development. Proposal to be submitted in 2017 |
Building Capacity to Use Research Evidence (BCURE) | To build the capacity of policy-makers in several low- and middle-income countries to use research evidence more effectively in decision-making | UK Department for International Development | Guidance on qualitative data collection techniques (topic guides for interview and focus groups) |
Sea swimming and mental wellbeing |
|
Arts and Humanities Research Council |
|
Developing and evaluating a collaborative care intervention for offenders with common mental health problems, near to and after release | To develop a way of organising care for men with common mental health problems as they approach being released from prison | NIHR’s Programme Grant for Applied Research programme | Guidance on best-practice examples of collecting primary quantitative data in realist evaluations |
Involving radiographers in mammography image interpretation and reporting in symptomatic breast clinics: a realist evaluation | In what circumstances, how and why can radiographers substitute the work of radiographers in mammography image interpretation and reporting in symptomatic breast clinics | NIHR doctoral training fellowship held by Anne-Marie Culpan | Acted as a doctoral supervisor to Anne Marie Culpan and provided support in:
|
A realist process evaluation of robotic surgery: integration into routine practice and impacts on communication, collaboration, and decision making |
|
NIHR’s HSDR programme |
|
Realist evaluation of adapted sex offender treatment interventions for people with learning disabilities | What works on Adapted Sex Offender Treatment Programs (ASOTPs) for whom, in what contexts, why and how? | ESRC new investigator award held by Andrea Hollomotz | Mentor to Andrea Hollomotz
|
Values based recruitment: what works, for whom, why, and in what circumstances? | How have education and service providers implemented values-based recruitment approaches and what are the impacts on service delivery and care? | DH’s Policy Research programme |
|
The use of Pressure Ulcer Risk Assessment Instruments in clinical practice: A Realist Evaluation | To understand how hospital ward teams use PURPOSE-T and another commonly used risk assessment form and how their use impacts on:
|
NIHR postdoctoral fellowship (started October 2016) held by Susanne Coleman | Supervisor on Susanne Coleman’s successful NIHR postdoctoral fellowship proposal
|
Assessing the feasibility of implementing and evaluating a new problem-solving model for patients at risk of self-harm and suicidal behaviour in prison | Assessment of the feasibility and acceptability of the problem solving intervention, using qualitative methods | NIHR research for patient benefit |
|
An Evaluation of the Leeds Curriculum | An evaluation of the impact of the Leeds Curriculum on the delivery of student education and student experience | Internal – University of Leeds, Leeds, UK |
|
Medical Technologies Innovation – Closing the Early Stage Translation Gap in the Leeds City Region | How does sector-specific support in research translation, innovation training and development, and access to wider networks of project partners, support and embed research translation capability in Medical Technologies across five partner universities within the Leeds City region? | Higher Education Funding Council |
|
Realist evaluation and ‘training the trainers’ workshops
We provided training workshops to organisations interested in learning more about realist evaluation on a first-come, first-served basis. When we were contacted, we entered into discussion with the individuals who contacted us and arranged bespoke training to meet their needs. These ranged from short 15-minute presentations to whole-day workshops. Table 9 lists the 29 realist evaluation presentations or workshops we ran nationally and internationally.
Date | Venue |
---|---|
April 2015 | University of Oxford, Oxford, UK |
May 2015 | Nuffield Trust, London, UK |
June 2015 | University of Leeds, Leeds, UK |
July 2015 | University of Waterloo, Waterloo, ON, Canada |
July 2015 | White Rose Doctoral Training Centre, University of Leeds, Leeds, UK |
August 2015 | London School of Hygiene and Tropical Medicine, London, UK |
September 2015 | Diakonhjemmet University College/Gjøvik University College, Oslo, Norway |
October 2015 | Oxford Policy Management, Oxford, UK |
October 2015 | 21st Qualitative Health Research Conference, Toronto, ON, Canada |
November 2015 | Centre for Evidence Based Intervention, Oxford, UK |
November 2015 | Researching Medical Education Conference, London, UK |
November 2015 | Realism Leeds Conference, Leeds, UK |
February 2016 | University College Cork, Cork, Ireland |
March 2016 | HM Treasury, London, UK |
April 2016 | University of Oxford, Oxford, UK |
May 2016 | Keele University, Keele, UK |
May 2016 | Health and Wellbeing Research Institute – Sheffield Hallam University, Sheffield, UK |
June 2016 | RDS East Midlands, Nottingham, UK |
June 2016 | Health Services Management Centre, University of Birmingham, Birmingham, UK |
July 2016 | ESRC Research Methods Conference, Bath, UK |
July 2016 | University of Plymouth, Plymouth, UK |
July 2016 | White Rose Doctoral Training Centre, University of Leeds, Leeds, UK |
September 2016 | DFID Joint Evaluation and Statistics Professional Development Conference, Oxford, UK |
September 2016 | European Evaluation Society Conference, Maastricht, the Netherlands |
October 2016 | RDS East Midlands, Nottingham, UK |
October 2016 | Cochrane Colloquium, Seoul, South Korea |
October 2016 | International Conference on Realist Evaluation and Synthesis, London, UK |
December 2016 | Division of Rehabilitation and Ageing, University of Nottingham, Nottingham, UK |
February 2017 | University of Leeds, Leeds, UK |
In terms of ‘training the trainers’ workshops, we wanted to build capacity within the NIHR RDS. We discussed what the training needs might be initially with colleagues at the RDS London’s East London Team. Their feedback was supplemented with comments we received from our project’s Advisory Group. After the publication of our project’s protocol paper,15 we were contacted by colleagues from the RDS East Midlands and, with their assistance, organised two workshops for regional staff.
Develop, deliver and refine information and resources for patients and other lay participants in realist evaluation
To develop resources for patients and other lay participants in realist evaluation, we first discussed, within the project team, what might be required. We also sought input from our project’s Advisory Group. We then drafted a specimen document outlining what a realist evaluation is and when it might be used, and this also explained what might be expected of a participant when taking part in a realist evaluation. We did not develop any materials for seeking ethics approvals as we established that organisations or institutions had a diverse range of processes, and so a one-size-fits-all set of documents was not likely to be useful. To gain feedback on the documents, we convened a 90-minute face-to-face meeting with six members of the public from diverse backgrounds in September 2016 in Oxford (only five out of the six invited participants attended on the day). This meeting was facilitated by Geoff Wong, who made contemporaneous notes. At this meeting, we introduced ourselves and then proceeded to explain the purpose of the session. The participants then spent time refining and providing feedback on the documents we provided. We also discussed their ideas about best how to present this information. The session finished with a summary of what they had suggested, and also a way of taking their proposals forward. Based on their suggestions and feedback, Geoff Wong drafted new materials and these were sent round to the participants for comments and feedback.
In brief, after some clarification, the participants felt that it probably does not matter to the person who is being recruited into a realist evaluation what exactly a realist evaluation is. In other words, the detail of what a realist evaluation is or is not was unlikely to matter to the potential participant, so much of the detail in the text of the documents we initially provided was not needed. We were advised the text should be short and kept to half of a side of A4- or one side A5-sized paper. The agenda and notes from the session may be found in Appendix 5. The only new material that the participants felt was needed was a ‘generic’ text that could be used in a patient information leaflet when recruiting to a realist evaluation, and this can be found in Box 2.
[INSERT PROJECT TITLE]
Example text: Evaluation of the NHS Health Checks programme
[INSERT BRIEF DESCRIPTION OF THE PURPOSE OF THE PROJECT]
Example text: The NHS Health Checks programme is a national programme that offers a free ‘MOT’ or health check to anyone over the age of 40.
We are researchers/evaluators [DELETE AS APPROPRIATE] from [INSERT ORGANISATION]. We are trying to find out why this programme does, or does not, work for different people. For this, we need your help.
We are interested to know your reasons for taking part in this programme or, if you are not taking part, what your reasons are.
To do this we will . . . [INSERT DATA COLLECTION METHODS]
Example text . . . ask you some questions/watch what happens when you take part in the programme/ask you to join a group where we discuss the programme/ask you to write a diary about the programme, etc.
We will be using a research method called ‘realist evaluation’. If you want to find out more about this method, please . . .. [INSERT PROCESS]
Example text: ask a member of our project team/visit the website, etc.
We hope you agree to take part, and thank you in advance for your time.
To take part please . . .. [INSERT RECRUITMENT PROCESS]
Example text: speak to a member of our project team/e-mail . . . /call . . . /visit our website at . . .
Chapter 4 Discussion
For this project, we developed reporting standards, quality standards and teaching and learning resources for realist evaluation. In addition, we provided methodological support and advice to realist evaluation projects, gave presentations to, and ran training workshops for, fellow realist evaluators and developed information and resources for patients and other lay participants in realist evaluation. Realist evaluation has now been used for close to 20 years in health services research and other disciplines, but there are still many evaluators, researchers and commissioners who were not trained in the approach and to whom it remains ‘new’. It offers great promise in unpacking the black box of the many complex interventions or programmes that are increasingly being developed and used. We see this project as a start to the long journey of advancing the rigour of how realist evaluations are carried out and reported.
As relatively experienced users of realist evaluation, we had noted a number of common and recurrent challenges that face grant-awarding bodies, peer reviewers, evaluators and knowledge users. These centred on two closely related questions:
-
How can we judge if a realist evaluation, or a proposal for such a evaluation, is of high quality (including, for completed evaluations, how credible and robust findings are)?
-
How can we undertake such evaluations?
Our experience suggested that we could go a long way towards answering these questions by developing resources that help fellow evaluators to give due consideration to the theoretical and conceptual underpinnings of realist evaluations, outlined briefly below.
Realist evaluation is based on a realist philosophy of science as set out by Pawson and Tilley,10 which permeates and informs its underlying epistemological assumptions, methodology and quality considerations. One of the most common misapplications we have noted is that evaluators have not always appreciated the underlying philosophical basis of realist evaluation or the implications of this for how the evaluations should be conducted. Instead, they have based their evaluations explicitly or implicitly on fundamentally different philosophical assumptions, commonly taking either the positivist notion that generalisable truths are best generated from controlled experiments, especially randomised trials, or a constructivist position that perceptions are all important. Another common misunderstanding is that realist evaluation is no more than a set of research or evaluation methods. For example, in our review of realist evaluations, we came across many instances where the evaluators appear to assume that realist evaluation is a form of qualitative research, whereas in practice it more commonly uses multiple methods. The appreciation that realist evaluation is an approach, or ‘lens’, through which to understand phenomena was often missing. In other words, many evaluators did not appreciate that realist evaluation uses a realist understanding of generative causation (as captured in the heuristic: context + mechanism = outcome) to:
-
develop realist explanatory theories about phenomena through the use of data
-
confirm, refute or refine (‘test’) realist explanatory theories using data.
A wide range of data-gathering methods may be used. No specific set of data-gathering methods must be used in a realist evaluation. Those chosen should, however, enable the collection of enough relevant data for realist theory development or ‘testing’.
Even when a realist philosophy of science has been understood and adhered to in a realist evaluation, many evaluators – ourselves included – struggled with recurring conceptual and methodological issues. Mechanisms present a particular challenge in realist evaluations – how to define them, where to locate them, how to identify them and how to confirm, refute and refine them. 2,56 Realist evaluation trades on the use of realist theoretical explanations to make sense of the observed data. Realist evaluators commonly grapple with how to define a theory (e.g. what is the difference between a programme theory and a middle-range theory?) and what level of abstraction is appropriate in different circumstances. On a more pragmatic level, those who seek to produce theory-driven evaluations of heterogeneous topic areas wrestle with a broad range of ‘how to’ issues: how to define the scope of the evaluation; how, and to what extent, to refine this scope as the evaluation unfolds; what should the evaluation design be; what data are needed; which data-gathering methods should be used; who to recruit and sample; how to collate, analyse and synthesise findings; and how to make recommendations that are academically defensible and useful to policy-makers and so on. We believe that the resources we have produced from this project will go some way to addressing the challenges we have highlighted above.
In undertaking this project, we were faced with one main dilemma that related to how best to allocate time and resources to the multiple work packages. For example, we could easily have spent more time on our literature review, but this may potentially have been at the expense of neglecting our Delphi panels, provision of support to review teams or development of resources and training materials. In retrospect, our project was very ambitious in its aims and, as such, we had to prioritise some aspects of the project above others. For example, we felt that it was more important to devote more time to (a) getting our Delphi process right so that we had a solid consensus on which to develop our quality and publication standards (and, to a lesser extent, our resources and training materials) and (b) the resources and training materials themselves. This meant that our literature review had to be rapid/truncated/abbreviated (see Chapter 2, Details of literature search methods and Chapter 3, Literature search for more details). Another example of prioritisation was in the breadth and depth of our resources and training materials. Entire textbooks could be written for these, but instead we chose to focus on common challenges. Our hope is that we have started the journey towards addressing some of the issues around the realist evaluation approach as set out by Pawson and Tilley – namely, how do you judge quality, how do you report it and how do you do X, Y or Z? We do, however, fully accept that more work is needed and, therefore, we have provided recommendations in Chapter 4, Research recommendations and implications for practice.
Changes to the protocol
Near the start of this project we published our project protocol. 15 During the course of the project we varied the following aspects of our protocol. One of the objectives of our project was to produce resources and training materials for lay participants, and those seeking to involve them, in realist evaluations. We have partially addressed this objective, in that some of the resources and training materials we have produced about aspects of realist evaluations are such that they are accessible to those with no to limited prior knowledge or experience of realist evaluations (see Chapter 3, Developing, delivering and refining training materials for realist evaluation). From our discussions within the project team, with other realist evaluators (e.g. in training workshops) and our project’s Advisory Group, we came to the judgement that these materials would be accessible and helpful to lay participants who are more involved in realist evaluations, for example in their capacity as co-applicants or co-investigators in a project, and would help them understand more about realist evaluations.
However, for individuals who will be recruited into a realist evaluation, we had initially intended to develop draft template information sheets and consent forms that could be adapted for ethics and governance activity. On the issue of consent forms, again from discussion within the project team, other realist evaluators and our project’s Advisory Group, we came to the judgement that there was too much diversity between organisations that grant ethics approval for us to be able to produce a generic template. Different organisations had such diverse processes and requirements for seeking ethics approval that we judged it best for those seeking such approvals to consult and adhere to their organisation’s requirements. As such, we did not produce draft consent forms for realist evaluations. We were, however, able to develop a resource and training material entitled ‘Realist evaluation and ethical considerations’ (see Table 7) that will help to guide realist evaluators when developing information sheets and consent forms for recruiting participants into realist evaluations.
We had planned to deliver three 2-day ‘realist evaluation’ workshops and three 2-day ‘training the trainers’ workshops for a range of audiences. When we approached, or were approached by, those interested, we negotiated with them the logistics and content of each workshop. The preference from those interested was overwhelmingly for shorter workshops, so we ended up providing more workshops, but of a shorter duration, than we had planned. We were unable to find a mutually convenient time before the end of the project to organise any further ‘training the trainer’ workshops beyond the two 1-day workshops we provided to RDS East Midlands in June and October 2016.
Limitations
To develop the briefing materials for our Delphi panels, we undertook a literature review. This review has limitations that are likely to have introduced a number of biases and so, potentially, at least, they limit the inferences that can be made from the included evaluations and methodological pieces. For example, the search process for the review, despite being developed by an expert librarian, was not exhaustive. All the screening for inclusion and exclusion was undertaken by one screener and no quality checks were undertaken. Both processes may mean that we are likely to have missed some evaluations. However, given that the intent was to reach theoretical saturation, and that we retrieved many more evaluations than were necessary to achieve it, this is unlikely to have caused a significant problem to the other stages of the project.
An additional challenge we faced during the literature review was that, at the time of the project, there were no quality standards against which to judge the quality of realist evaluations; it was a function of this project to develop them. This was identified as a need in a range of methodological pieces we analysed as part of the review. 12,13 Therefore, we had to use the project team’s collective judgement, informed by our experience in conducting and teaching realist evaluations, and the literature, to judge the quality of the realist evaluations included in the review. This is an important limitation of our review processes.
Once evaluations had been included, data extraction was undertaken by one researcher, and omissions in data extraction are likely to have occurred. However, all the included evaluations and the data extraction spreadsheet were circulated to all project team members, and so a degree of informal quality checking did occur.
Decision-making on what should be included in the Delphi panel’s briefing materials was undertaken by the entire project team. We are aware that any item or topic included in the briefing materials was included as a result of our subjective interpretations, raising questions about reproducibility. However, the briefing documents we produced were not an end product in themselves, but the starting point for the Delphi panel to build a consensus. In addition, we deliberately asked Delphi panel members to enter into a discussion and suggest items for inclusion in the quality and reporting standards. We also provided the panel members with an end-of-round report and invited them to contact us should they have any concerns about the actions we had taken after we analysed their ratings. As such, we expected that changes would occur as we ran each round of the Delphi process, and thus we are confident that any omissions as a result of the review’s limitations processes are unlikely to have a significant impact on the final reporting and quality standards. We accept that the review of the literature could have been more thorough (e.g. all evaluations analysed and more than one reviewer involved), but we made the judgement that the findings of the review contributed only part (albeit an important part) of the data to inform the Delphi Panel’s briefing document. Other sources of data were the project team’s expertise, that of the Delphi panel itself and data from the RAMESES JISCMail list. We felt that, in order for us to ensure that we delivered as much as possible on all the objectives of this project, the review needed to be truncated and our energies spent elsewhere. To provide transparency on what we have done, we have reported, in detail, all stages of the review itself and the rest of the project.
We recognise that there is much more to cover in terms of the breadth and depth of the training materials we have produced. Because realist evaluation is developing as an approach to evaluation, the ‘wish list’ we were able to elicit from our fellow evaluators who have used this approach was quite long. Given the time and resources allocated for this project, we elected to focus on providing sufficient depth in an accessible manner, rather than breadth on the issues that were the most challenging. With time, we hope to use the community of practice we have developed to address more, and more methodological, challenges.
As experience grows with the use of realist evaluation, it is very likely that many of the resources we have produced will need to be updated. We welcome and invite methodological development in realist evaluations. We expect that what we have produced should be gradually refined and updated as methodological developments take place with increasing use of realist evaluation. Thus, we view the reporting and quality standards and resources and training materials more as a starting point than as definitive resources that must not be altered in any way.
We are aware that realist evaluation is used to evaluate a wide range of topics and by evaluators from a broad range of disciplines and affiliations. The level of expertise of the users of our resources will also vary considerably, from novice to seasoned evaluators. These two aspects mean that some latitude is needed in the use of the resources we have produced. For example, not all the items in the reporting standards will be applicable for all evaluations. Or, when assessing the quality of an evaluation, there may be justifiable reasons for an evaluation to not meet some quality criteria. We have tried to anticipate the varied uses that realist evaluation might be put to by providing a degree of flexibility in our standards. For example, in our reporting standards, if adaptations are made to the evaluation design (as originally described), then evaluators are invited to provide an explanation for any such adaptations.
Finally, we were not able to produce detailed generic templates of draft information and consent forms for participant recruitment into realist evaluations. We have explained why this was the case above (see Chapter 3, Develop, deliver and refine information and resources for patients and other lay participants in realist evaluation, and Chapter 4, Changes to the protocol).
Research recommendations and implications for practice
Realist evaluation, despite having been first introduced in 1997, still has a great deal to do in terms of capacity building and methodological development. This is because it is only in the last few years that its popularity has grown as a form of theory-driven evaluation approach to make sense of complex interventions or programmes. This has created a situation in which some evaluators are using the realist evaluation approach for the first time on projects and some struggle with it.
Thus, capacity building is the priority for realist evaluation as an approach. Dedicated training courses, run by experienced realist evaluators, are needed. We anticipate that developing and running such courses will be easier with the key topic areas and consensus standards identified in this study, although even with such resources, some learners may still struggle to engage with the philosophical basis of the realist evaluation approach. Practical ‘how to’ resources and training materials were limited before, but this project has developed 15 of these to help fill this need. Course developers now have a reference point from which to build their training courses, and learners, a yard-stick against which to judge the quality of their work. The resources and training materials we have developed for this project are designed to be accessible to the novice but also to signpost more advanced learners to further resources. As such, they may be used as part of the basic building blocks of a ‘curriculum’ for realist evaluation courses.
As experience with realist evaluation grows and more evaluations are undertaken, new methodological insights are likely to occur. These need to be captured and analysed to determine if the quality and reporting standards we have produced continue to be fit for purpose or need to be updated. At present, no formal process exists to advance this agenda. Ideally, further funding might enable a project similar to this one – that is, RAMESES III – to address the updating of the standards, although, because much groundwork has already been done, a more truncated project may suffice.
At present, those interested in realist evaluation (and review) might initially need to make small and gradual methodological gains, perhaps by embedding an element of methodological development with in their projects. Disseminating what they have learnt from undertaking their realist evaluations (or reviews) will be a key activity. At present, only ad hoc, informal help and support from more experienced realist researchers given to novices, the sharing of tips and templates used, and debate and discussion of contentious issues takes place. Some of this activity is happening on the RAMESES JISCMail list that we set up as part of the RAMESES Project. There is the potential that the RAMESES JISCMail list could be further developed and supported to serve as an avenue for advancing and disseminating methodological lessons in realist evaluation (and review). For example, it may be one way for realist evaluators to address the issue of generic templates of draft information and consent forms for participant recruitment into realist evaluations, an area that we did not fully address in our project. This might be through the sharing of particularly useful examples of these resources between evaluators and researchers.
Realist evaluators might want to consider learning from the example of organisations like the Cochrane Collaboration, in which motivated researchers have collaborated in a more organised way to systematically and gradually undertake methodological development. At present, many who contribute to, and support, the RAMESES JISCMail do so voluntarily, and with the end of this project all inputs to this list will be on a voluntary basis. Building some sort of future structure that is more sustainable is important. A potential benefit of being more organised is that priorities can be established on which methodological issues in realist evaluation (and review) need more attention, and duplication can be avoided. For example, the resources and training materials we developed are focused on what we were able to identify as the main issues that fellow evaluators found the most challenging to understand and/or execute. There are additional issues that we have not focused on or have only been able to address in passing. Further work is needed to develop resources for these, and other issues, as they arise. The resources and training materials we have designed are also intentionally brief. Developing new resources and building on the ones we developed, by drawing on the methodological lessons learnt from undertaking realist evaluations (or reviews), could potentially be a focus of a better-organised body of realist evaluators and researchers.
Finally, there is a dearth of research to demonstrate that quality and reporting standards necessarily change practice and improve the quality of research. 57,58 This will also be true for the standards we have produced and, therefore, research to demonstrate a change in practice and improvement in the quality of realist evaluations is needed at some point. There is also a counter-theory that such standards may constrain innovation in the development and application of realist methods, and testing this theory could form part of any evaluation of the standards.
Chapter 5 Conclusion
Although realist evaluation holds much promise for developing theory and informing policy in some of the health and other sectors’ most pressing questions, misunderstandings and misapplications of it are common. To try to address these problems, we used a range of methods to gather the data needed to produce reporting and quality standards and resources and training materials. These included a literature review, Delphi panel, feedback from fellow realist evaluators, participants from training workshops and an e-mail list dedicated to realist research. In addition, we provided methodological support and advice to realist evaluation projects, gave presentations and ran training workshops for fellow realist evaluators and developed some resources for patients and other lay participants in realist evaluation. Undertaking this project was not without its challenges; our ambitious objectives meant that we had to shorten some aspects of the project (e.g. the literature review) and adapt others (such as workshop formats) to meet the needs of those we were training. We also found that we had over-anticipated the informational requirements of patients and other lay participants who might be involved in realist evaluation, thus narrowing our range of outputs for this group. We hope that what we have developed will be the start of an iterative journey of refinement and development of better resources for realist evaluations. An important priority for the realist evaluation approach is to build capacity. Acknowledging that the science of evaluation should never be static, the RAMESES II project seeks not to produce the last word on these issues but to capture current expertise and establish an agreed state of the science that future researchers will use and, no doubt, build on.
Acknowledgements
This project was funded by the National Institute for Health Research Health Services and Delivery Research programme. The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the Health Services and Delivery Research programme, NIHR, NHS or Department of Health. Trisha Greenhalgh’s salary is part-funded by the Oxford Biomedical Research Centre, NIHR grant number BRC-1215-20008.
We would like to thank Nia Roberts from the Bodleian Library, University of Oxford, for her help with developing and running our literature search.
We are most grateful for the time, invaluable feedback and advice we received from our Project Advisory Group, who are all from the University of Leeds – Nick Emmel (chairperson), Jane Nixon and Rebecca Randall.
The following contributed to the patient and public panel: Maria Clark, Roger Ede, Matthew Le Croissette, Jo Lewis-Wood and Jeanne Nicholls. We wish to thank them for their help with this project.
We also wish to thank Ray Pawson and Nick Tilley for their advice, comments and suggestions when we were developing these reporting standards.
Finally, we are indebted to the Delphi Panel members, who freely and generously gave us their time and shared their wisdom:
Brad Astbury, University of Melbourne, Melbourne, VIC, Australia.
Paul Batalden, Dartmouth College, Hanover, NH, USA.
Annette Boaz, Kingston and St George’s University, London, UK.
Rick Brown, Australian Institute of Criminology, Canberra, ACT, Australia.
Richard Byng, Plymouth University, Plymouth, UK.
Margaret Cargo, University of South Australia, Adelaide, SA, Australia.
Simon Carroll, University of Victoria, Victoria, BC, Canada.
Sonia Dalkin, Northumbria University, Newcastle, UK.
Helen Dickinson, University of Melbourne, Melbourne, VIC, Australia.
Dawn Dowding, Columbia University, New York, NY, USA.
Nick Emmel, University of Leeds, Leeds, UK.
Andrew Hawkins, ARTD Consultants, Sydney, NSW, Australia.
Gloria Laycock, University College London, London, UK.
Frans Leeuw, Maastricht University, Maastricht, the Netherlands.
Mhairi Mackenzie, University of Glasgow, Glasgow, UK.
Bruno Marchal, Institute of Tropical Medicine, Antwerp, Belgium.
Roshanak Mehdipanah, University of Michigan, Ann Arbor, MI, USA.
David Naylor, King’s Fund, London, UK.
Jane Nixon, University of Leeds, Leeds, UK.
Peter O’Halloran, Queen’s University Belfast, Belfast, UK.
Ray Pawson, University of Leeds, Leeds, UK.
Mark Pearson, Exeter University, Exeter, UK.
Rebecca Randell, University of Leeds, Leeds, UK.
Jo Rycroft-Malone, Bangor University, Bangor, UK.
Robert Street, Youth Justice Board, London, UK.
Nick Tilley, University College London, London, UK.
Robin Vincent, freelance consultant, Sheffield, UK.
Kieran Walshe, University of Manchester, Manchester, UK.
Emma Williams, Charles Darwin University, Darwin, NT, Australia.
All of the authors were also members of the Delphi panel.
Contributions of authors
Geoff Wong (Clinical Research Fellow, Realist Research Methodologist) carried out the literature review, analysed the findings from the review, produced the materials for the Delphi panel, analysed the results of the Delphi panel and developed the patient and lay materials.
Gill Westhorp (Professorial Research Fellow, Evaluator and Realist Research Methodologist), Joanne Greenhalgh (Associate Professor and Realist Research Methodologist), Ana Manzano (Lecturer in Health and Social Policy, Social Research Methodologist), Justin Jagosh (Senior Research Fellow and Realist Research Methodologist) and Trisha Greenhalgh (Professor of Primary Care and Social Scientist) analysed the findings from the review, produced the materials for the Delphi panel and analysed the results of the Delphi panel.
Joanne Greenhalgh assisted in the development of the patient and lay materials.
Gill Westhorp, Joanne Greenhalgh, Ana Manzano and Justin Jagosh developed and internally peer reviewed the resources and training materials.
Trisha Greenhalgh conceived the study and all the authors participated in its design.
All the authors provided realist evaluation support and training to various organisations during this study. All authors read and contributed critically to the contents of this report and approved the final manuscript.
Publication
Wong G, Westhorp G, Manzano A, Greenhalgh J, Jagosh J, Greenhalgh T. RAMESES II reporting standards for realist evaluations. BMC Med 2016;14:96.
Data sharing statement
All non-personal data from this project can be obtained from the corresponding author.
Disclaimers
This report presents independent research funded by the National Institute for Health Research (NIHR). The views and opinions expressed by authors in this publication are those of the authors and do not necessarily reflect those of the NHS, the NIHR, NETSCC, the HS&DR programme or the Department of Health. If there are verbatim quotations included in this publication the views and opinions expressed by the interviewees are those of the interviewees and do not necessarily reflect those of the authors, those of the NHS, the NIHR, NETSCC, the HS&DR programme or the Department of Health.
References
- Pawson R. The Science of Evaluation: A Realist Manifesto. London: Sage; 2013.
- Dalkin SM, Greenhalgh J, Jones D, Cunningham B, Lhussier M. What’s in a mechanism? Development of a key concept in realist evaluation. Implement Sci 2015;10. https://doi.org/10.1186/s13012-015-0237-x.
- Greenhalgh T, Kristjansson E, Robinson V. Realist review to understand the efficacy of school feeding programmes. BMJ 2007;335:858-61. https://doi.org/10.1136/bmj.39359.525174.AD.
- Greenhalgh T, Humphrey C, Hughes J, Macfarlane F, Butler C, Pawson R. How do you modernize a health service? A realist evaluation of whole-scale transformation in London. Milbank Q 2009;87:391-416. https://doi.org/10.1111/j.1468-0009.2009.00562.x.
- Hoddinott P, Britten J, Pill R. Why do interventions work in some places and not others: a breastfeeding support group trial. Soc Sci Med 2010;70:769-78. https://doi.org/10.1016/j.socscimed.2009.10.067.
- Ranmuthugala G, Cunningham FC, Plumb JJ, Long J, Georgiou A, Westbrook JI, et al. A realist evaluation of the role of communities of practice in changing healthcare practice. Implement Sci 2011;6. https://doi.org/10.1186/1748-5908-6-49.
- Cowe A, Cowe M, Goodman C, Kendal S, Mathie E, McNeilly E, et al. RAPPORT: ReseArch With Patient and Public InvOlvement: A Realist EvaluaTion n.d.
- Randell R, Greenhalgh J, Hindmarch J, Dowding D, Jayne D, Pearman A, et al. Integration of robotic surgery into routine practice and impacts on communication, collaboration, and decision making: a realist process evaluation protocol. Implement Sci 2014;9. https://doi.org/10.1186/1748-5908-9-52.
- Manzano-Santaella A. A realistic evaluation of fines for hospital discharges: incorporating the history of programme evaluations in the analysis. Evaluation 2011;17:21-36. https://doi.org/10.1177/1356389010389913.
- Pawson R, Tilley N. Realistic Evaluation. London: Sage; 1997.
- Pawson R. Evidence-Based Policy: A Realist Perspective. London: Sage; 2006.
- Marchal B, van Belle S, van Olmen J, Hoerée T, Kegels G. Is realist evaluation keeping its promise? A review of published empirical studies in the field of health systems research. Evaluation 2012;18:192-21. https://doi.org/10.1177/1356389012442444.
- Pawson R, Manzano-Santaella A. A realist diagnostic workshop. Evaluation 2012;18:176-91. https://doi.org/10.1177/1356389012440912.
- Wong G, Greenhalgh T, Westhorp G, Pawson R. Development of methodological guidance, publication standards and training materials for realist and meta-narrative reviews: the RAMESES (Realist And Meta-narrative Evidence Syntheses – Evolving Standards) project. Health Serv Deliv Res 2014;2. https://doi.org/10.3310/hsdr02300.
- Greenhalgh T, Wong G, Jagosh J, Greenhalgh J, Manzano A, Westhorp G, et al. Protocol – the RAMESES II study: developing guidance and reporting standards for realist evaluation. BMJ Open 2015;5. https://doi.org/10.1136/bmjopen-2015-008567.
- Booth A, Harris J, Crott E, Springett J, Campbell F, Wilkins E. Towards a methodology for cluster searching to provide conceptual and contextual ‘richness’ for systematic reviews of complex interventions: case study (CLUSTER). BMC Med Res Methodol 2013;13. https://doi.org/10.1186/1471-2288-13-118.
- Greenhalgh T, Peacock R. Effectiveness and efficiency of search methods in systematic reviews of complex evidence: audit of primary sources. BMJ 2005;331:1064-5. https://doi.org/10.1136/bmj.38636.593461.68.
- Lefroy J, Hawarden A, Gay SP, McKinley RK, Cleland J. Grades in formative workplace-based assessment: a study of what works for whom and why. Med Educ 2015;49:307-20. https://doi.org/10.1111/medu.12659.
- Wye L, Lasseter G, Percival J, Duncan L, Simmonds B, Purdy S. What works in ‘real life’ to facilitate home deaths and fewer hospital admissions for those at end of life?: results from a realist evaluation of new palliative care services in two English counties. BMC Palliat Care 2014;13. https://doi.org/10.1186/1472-684X-13-37.
- Sorinola OO, Thistlethwaite J, Davies D, Peile E. Faculty development for educators: a realist evaluation. Adv Health Sci Educ Theory Pract 2015;20:385-401. https://doi.org/10.1007/s10459-014-9534-4.
- Sheaff R, Windle K, Wistow G, Ashby S, Beech R, Dickinson A, et al. Reducing emergency bed-days for older people? Network governance lessons from the ‘Improving the Future for Older People’ programme. Soc Sci Med 2014;106:59-66. https://doi.org/10.1016/j.socscimed.2014.01.033.
- Rushmer RK, Hunter DJ, Steven A. Using interactive workshops to prompt knowledge exchange: a realist evaluation of a knowledge to action initiative. Public Health 2014;128:552-60. https://doi.org/10.1016/j.puhe.2014.03.012.
- Riippa I, Kahilakoski O, Linna M, Hietala M. Can complex health interventions be evaluated using routine clinical and administrative data? A realist evaluation approach. J Eval Clin Pract 2014;20:1129-36. https://doi.org/10.1111/jep.12175.
- Rauf A, Anto B, Koffuor G, Buabeng K, Abdul-Kabir M. Introducing malaria rapid diagnostic tests (MRDTs) at registered retail pharmacies in Ghana: practitioners’ perspective. Br J Pharm Res 2014;4:943-53. https://doi.org/10.9734/BJPR/2014/8910.
- Prashanth NS, Marchal B, Devadasan N, Kegels G, Criel B. Advancing the application of systems thinking in health: a realist evaluation of a capacity building programme for district managers in Tumkur, India. Health Res Policy Syst 2014;12. https://doi.org/10.1186/1478-4505-12-42.
- Parker J, Mawson S, Mountain G, Nasr N, Zheng H. Stroke patients’ utilisation of extrinsic feedback from computer-based technology in the home: a multiple case study realistic evaluation. BMC Med Inform Decis Mak 2014;14. https://doi.org/10.1186/1472-6947-14-46.
- Ogrinc G, Ercolano E, Cohen ES, Harwood B, Baum K, van Aalst R, et al. Educational system factors that engage resident physicians in an integrated quality improvement curriculum at a VA hospital: a realist evaluation. Acad Med 2014;89:1380-5. https://doi.org/10.1097/ACM.0000000000000389.
- Noyes J, Lewis M, Bennett V, Widdas D, Brombley K. Realistic nurse-led policy implementation, optimization and evaluation: novel methodological exemplar. J Adv Nurs 2014;70:220-37. https://doi.org/10.1111/jan.12169.
- Nielsen K, Abildgaard J, Daniels K. Putting context into organizational intervention design: using tailored questionnaires to measure initiatives for worker well-being. Human Relations 2014;67:1537-60. https://doi.org/10.1177/0018726714525974.
- Meier K, Parker P, Freeth D. Mechanisms that support the assessment of interpersonal skills: a realistic evaluation of the interpersonal skills profile in pre-registration nursing students. J Pract Teach Learn 2014;12:6-24. https://doi.org/10.1921/7701240205.
- McConnell T, O’Halloran P, Donnelly M, Porter S. Factors affecting the successful implementation and sustainability of the Liverpool Care Pathway for dying patients: a realist evaluation. BMJ Support Palliat Care 2015;5:70-7. https://doi.org/10.1136/bmjspcare-2014-000723.
- Masterson-Algar P, Burton CR, Rycroft-Malone J, Sackley CM, Walker MF. Towards a programme theory for fidelity in the evaluation of complex interventions. J Eval Clin Pract 2014;20:445-52. https://doi.org/10.1111/jep.12174.
- Machin AI, Pearson P. Action learning sets in a nursing and midwifery practice learning context: a realistic evaluation. Nurse Educ Pract 2014;14:410-16. https://doi.org/10.1016/j.nepr.2014.01.007.
- Kwamie A, van Dijk H, Agyepong IA. Advancing the application of systems thinking in health: realist evaluation of the Leadership Development Programme for district manager decision-making in Ghana. Health Res Policy Syst 2014;12. https://doi.org/10.1186/1478-4505-12-29.
- Husted GR, Esbensen BA, Hommel E, Thorsteinsson B, Zoffmann V. Adolescents developing life skills for managing type 1 diabetes: a qualitative, realistic evaluation of a guided self-determination-youth intervention. J Adv Nurs 2014;70:2634-50. https://doi.org/10.1111/jan.12413.
- Higgins A, O’Halloran P, Porter S. The management of long-term sickness absence in large public sector healthcare organisations: a realist evaluation using mixed methods. J Occup Rehabil 2015;25:451-70. http://dx.doi.org/10.1007/s10926-014-9553-2.
- Higgins A, Porter S, O’Halloran P. General practitioners’ management of the long-term sick role. Soc Sci Med 2014;107:52-60. https://doi.org/10.1016/j.socscimed.2014.01.044.
- Hernández AR, Hurtig AK, Dahlblom K, San Sebastián M. More than a checklist: a realist evaluation of supervision of mid-level health workers in rural Guatemala. BMC Health Serv Res 2014;14. https://doi.org/10.1186/1472-6963-14-112.
- Harwood L, Clark AM. Dialysis modality decision-making for older adults with chronic kidney disease. J Clin Nurs 2014;23:3378-90. https://doi.org/10.1111/jocn.12582.
- Harris P, Haigh F, Thornell M, Molloy L, Sainsbury P. Housing, health and master planning: rules of engagement. Public Health 2014;128:354-9. https://doi.org/10.1016/j.puhe.2014.01.006.
- Evans D, Coad J, Cottrell K, Dalrymple J, Davies R, Donald C, et al. Public involvement in research: assessing impact through a realist evaluation. Health Serv Deliv Res 2014;2.
- Eriksson C, Fredriksson I, Froding K, Geidne S, Pettersson C. Academic practice-policy partnerships for health promotion research: experiences from three research programs. Scand J Pub Health 2014;42:88-95. https://doi.org/10.1177/1403494814556926.
- Deschesnes M, Drouin N, Tessier C, Couturier Y. Schools’ capacity to absorb a Healthy School approach into their operations: insights from a realist evaluation. Health Educ 2014;114:208-24. https://doi.org/10.1108/HE-10-2013-0054.
- Davey C, McShane K, Pulver A, McPherson C, Firestone M. A realist evaluation of a community-based addiction program for urban aboriginal people. Alcohol Treat Q 2014;32:33-57. https://doi.org/10.1080/07347324.2013.831641.
- Campbell C, Scott K, Mupambireyi Z, Nhamo M, Nyamukapa C, Skovdal M, et al. Community resistance to a peer education programme in Zimbabwe. BMC Health Serv Res 2014;14. https://doi.org/10.1186/s12913-014-0574-5.
- Blanchet-Cohen N, Cook P. The transformative power of youth grants: sparks and ripples of change affecting marginalised youth and their communities. Child Soc 2014;28:392-403. https://doi.org/10.1111/j.1099-0860.2012.00473.x.
- Bartlett YK, Haywood A, Bentley CL, Parker J, Hawley MS, Mountain GA, et al. The SMART personalised self-management system for congestive heart failure: results of a realist evaluation. BMC Med Inform Decis Mak 2014;14. https://doi.org/10.1186/s12911-014-0109-3.
- Ambrose LJ, Ker JS. Levels of reflective thinking and patient safety: an investigation of the mechanisms that impact on student learning in a single cohort over a 5 year curriculum. Adv Health Sci Educ Theory Pract 2014;19:297-310. https://doi.org/10.1007/s10459-013-9470-8.
- Allan H, Brearley S, Byng R, Christian S, Clayton J, Mackintosh M, et al. People and teams matter in organizational change: professionals’ and managers’ experiences of changing governance and incentives in primary care. Health Serv Res 2014;49:93-112. https://doi.org/10.1111/1475-6773.12084.
- Horrocks I, Budd L. Into the void: a realist evaluation of the eGovernment for You (EGOV4U) project. Evaluation 2015;21:47-64.
- Taylor H. Evaluating Criminal Justice Interventions in the Field of Domestic Violence: A Realist Approach 2014.
- Olsen K, Legg S, Hasle P. How to use programme theory to evaluate the effectiveness of schemes designed to improve the work environment in small businesses. Work 2012;41:5999-6006. https://doi.org/10.3233/WOR-2012-0036-5999.
- Kazi M, Frounfelker S, Bartone A, Buchanan P. Improving outcomes for a juvenile justice model court: a realist evaluation. Juven Fam Court J 2012;63:37-54. https://doi.org/10.1111/j.1755-6988.2012.01079.x.
- Hasle P, Kvorning L, Rasmussen C, Smith L, Flyvholm M. A model for design of tailored working environment intervention programmes for small enterprises. Saf Health Work 2012;3:181-91. https://doi.org/10.5491/SHAW.2012.3.3.181.
- Wong G, Westhorp G, Manzano A, Greenhalgh J, Jagosh J, Greenhalgh T. RAMESES II reporting standards for realist evaluations. BMC Med 2016;14. https://doi.org/10.1186/s12916-016-0643-1.
- Astbury B, Leeuw F. Unpacking black boxes: mechanisms and theory building in evaluation. Am J Eval 2010;31:363-81. https://doi.org/10.1177/1098214010371972.
- Cobo E, Cortés J, Ribera JM, Cardellach F, Selva-O’Callaghan A, Kostov B, et al. Effect of using reporting guidelines during peer review on quality of final manuscripts submitted to a biomedical journal: masked randomised trial. BMJ 2011;343. https://doi.org/10.1136/bmj.d6783.
- Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ 2009;339. https://doi.org/10.1136/bmj.b2700.
Appendix 1 Example of search terms use for MEDLINE (via OvidSP)
Search number | Search terms | References found |
---|---|---|
1 | (realist adj5 (evaluat* or analys* or asses* or intervention? or stud*)).ti,ab. | 121 |
2 | (realist adj5 (approach* or understand* or theor* or methodolog* or framework*)).ti,ab. | 188 |
3 | (realistic adj (evaluat* or analys* or asses* or intervention? or stud*)).ti. | 52 |
4 | (realistic adj (approach* or understand* or theor* or methodolog* or framework*)).ti. | 103 |
5 | Program Evaluation/ and realist.mp. | 33 |
6 | realist.ti. | 175 |
7 | 1 or 2 or 3 or 4 or 5 or 6 | 455 |
Appendix 2 RAMESES II Delphi Panel Briefing Document: developing reporting standards for realist evaluations
Appendix 3 ‘Paper’ version of round 2 online Delphi panel survey
Appendix 4 ‘Paper’ version of round 3 online Delphi panel survey
Appendix 5 Agenda and notes from public participant session
List of abbreviations
- CINAHL
- Cumulative Index to Nursing and Allied Health Literature
- CMOC
- context–mechanism–outcome configuration
- CPCI-S
- Conference Proceedings, Citation Index – Science
- ERIC
- Education Resources Information Center
- IQR
- interquartile range
- NIHR
- National Institute for Health Research
- RDS
- Research Design Service
- SCI
- Science Citation Index
- SSCI
- Social Science Citation Index