Whither the Impact Evaluation?

By Heidi Reynolds, PhD, and Sian Curtis, PhD. It is not necessary to limit impact evaluations, but instead to make smarter decisions about when investing in them is the best way to yield valid, usable information about the value of investing in a program for a given target population.

By Heidi Reynolds, PhD, Director of Evaluation, and Sian Curtis, PhD, Senior Evaluation Advisor, MEASURE Evaluation

In 2013, an evaluation of the U.S. President’s Emergency Plan for AIDS Relief (PEPFAR), conducted by the Institute of Medicine, underscored the need for “strategic and coordinated research” to guide program decisions and improve the effectiveness and impact of program investments (IOM, 2013). In response to the IOM report and recommendations from the U.S. Government Accountability Office (GAO, 2012), PEPFAR launched the Monitoring, Evaluation and Reporting (MER) initiative, which was designed to strengthen data collection and use to guide decision making. The MER included issuance of eleven PEPFAR evaluation standards of practice. Since then, reporting of adherence to those standards has improved (PEPFAR, 2017).

In a statement issued on March 21, 2017, PEPFAR moved away from supporting impact evaluations (IEs)—citing cost, difficulties with tracking, and challenges translating results for program improvement. The statement says that previously approved, ongoing IEs that need additional funding will be reviewed on a case-by-case basis to determine continued support. And it says that the PEPFAR 2017 Country Operational Plan (COP) guidance indicates no new IE submissions.

This decision will change the course of investments made in evaluations of HIV programs, at least in the near future. The potential effects of this decision on programs deserve careful consideration.

Scrutiny of the value of IEs in a climate of constrained or reduced funding is reasonable. IEs can be difficult to implement and expensive and take time. In many contexts of global health work, we face challenges to identify appropriate comparison groups, delays in program start-up, or natural or political events that cause delays and raise costs (Thomas, Curtis, & Smith, 2011; Skiles, Hattori & Curtis, 2014). Moreover, IEs compete for funding with other data investments, such as routine health information systems for monitoring coverage and supplies, surveillance systems to monitor disease outbreaks, and surveys to monitor health outcomes, among others.

Impact evaluations provide scientifically rigorous evaluation to “measure the change in an outcome that is attributable to a defined intervention by comparing actual impact to what would have happened in the absence of the intervention. . . [and to] control for factors other than the intervention that might account for the observed change” (PEPFAR, 2015). Even in the context of an IE, a number of approaches exist to make information available when decision makers need it. Programs use baseline and midline data to set targets, refine program strategies, and understand prevalence of behaviors and health status (e.g., HIV prevalence). Many IEs employ mixed methods: for example, combining quantitative and qualitative data—the latter being particularly helpful for programs at different points in the evaluation cycle. Other methods, such as integrating a process evaluation into an IE, can provide timely evidence about program implementation, coverage, quality, and mechanisms of impact.

We fully support the use of evaluation methods beyond IEs, depending on information needs and the maturity of the intervention. However, a categorical limit on IEs will have consequences for PEFPAR, particularly when interventions have little or no evidence of effectiveness in a population group, or with interventions that take time for effects to be realized at the population level. 

Not performing IEs may be more costly in the end if inefficient interventions continue to operate or, conversely, if interventions that would have proven to be effective are stopped too soon. Prioritizing investments in other means to gather information will not necessarily be sufficient to replace the kind of evidence that IEs provide. For example, data from surveys such as population-based HIV assessments have limited attribution to interventions. Moreover, the sample sizes—particularly at subnational levels—may not be sufficient to show statistical differences in indicators over short periods.

That said, designs other than IE can yield rigorous results of plausible associations between programs and outcomes, stopping short of causal attribution. For example, in a study in Ukraine, the USAID- and PEPFAR-funded MEASURE Evaluation is using mixed methods (medical facility surveys, provider interviews, and patient chart extraction) to study the impact on a range of tuberculosis (TB) and HIV treatment outcomes of providing social support services to improve TB treatment adherence and improve the integration of TB and HIV services (MEASURE Evaluation, 2015). And in collaboration with members of Roll Back Malaria’s Monitoring and Evaluation Reference Group and with support from the President’s Malaria Initiative (PMI), MEASURE Evaluation used plausibility study designs and multiple existing data sources (such as Demographic and Health Surveys [DHS] and country-specific surveys and datasets) to evaluate the likely link between malaria interventions and child mortality (Mortality Task Force, 2014).

In our view, the decision to conduct an IE should not be determined by a blanket prohibition, but rather depend on an assessment of the question and the value of the information generated and its costs, compared with the costs to health and efficiency when programs of questionable effectiveness persist. At the same time, we hope that PEPFAR and other organizations supporting program evaluation will prioritize testing of new methods and secondary data sources (e.g., the synthetic control analysis that MEASURE Evaluation is testing with support from the USAID Mission in Tanzania) so that we identify less costly ways to gather timely information about program effectiveness.  

What is needed is not to limit IEs, but instead to make smarter decisions about when investing in them is the best way to yield valid, usable information about the value of investing in a program for a given target population.

Republished from the Evaluate blog.


Government Accountability Office (GAO), U.S. President's Emergency Plan for AIDS Relief. (2012). Agencies can enhance evaluation quality, planning, and dissemination, GAO-12-673, May 31, 2012. Retrieved from http://www.gao.gov/products/GAO-12-673

MEASURE Evaluation. (2015). Strengthening tuberculosis control in Ukraine. Impact evaluation baseline survey, Ukraine 2014. Chapel Hill, NC: MEASURE Evaluation; 2015, University of North Carolina.

Mortality Task Force, Monitoring and Evaluation Reference Group, Roll Back Malaria. (2014). Guidance for evaluating the impact of national malaria control programs in highly endemic counties. Chapel Hill, NC: MEASURE Evaluation, University of North Carolina.

U.S. President’s Emergency Plan for AIDS Relief (PEPFAR). (2017). 2017 annual report to Congress. Washington, DC: PEPFAR. Retrieved from https://www.pepfar.gov/documents/organization/267809.pdf

U.S. President’s Emergency Plan for AIDS Relief (PEPFAR). (2015). Evaluation standards of practice, 2.0. Washington, DC: PEFPAR. Retrieved from https://www.pepfar.gov/documents/organization/247074.pdf

Skiles, M. P., Hattori, A., & Curtis, S. L. (2014). Impact evaluations of large-scale public health intervention: experiences from the field. Chapel Hill, NC: MEASURE Evaluation, University of North Carolina. Retrieved from https://www.measureevaluation.org/resources/publications/wp-14-157

Thomas, J. C., Curtis, S., & Smith, J. (2011).The broader context of implementation science [letter].  Journal of Acquired Immune Deficiency Syndromes, 58: e19­−21.

Filed under: Monitoring , Monitoring, Evaluation , Evaluation , Impact Evaluation
share this