Almost exactly four years ago, I wrote a blog post, titled “Poverty Reduction: Sorting Through the Hype,” which described the paper by Banerjee et al. (2015) in Science on the impacts of the ultra-poor graduation approach, originally associated with BRAC in Bangladesh, in six countries. Now comes a new paper by Naila Kabeer, which reports ...
Berk Ozler considers the following as important:
This could be interesting, too:
New York Fed writes How Has China’s Economy Performed under the COVID-19 Shock?
New York Fed writes At the New York Fed: Sixth Annual Conference on the U.S. Treasury Market
Amol Agrawal writes Prof JR Varma explains his dissent in Oct-20 policy
Almost exactly four years ago, I wrote a blog post, titled “Poverty Reduction: Sorting Through the Hype,” which described the paper by Banerjee et al. (2015) in Science on the impacts of the ultra-poor graduation approach, originally associated with BRAC in Bangladesh, in six countries. Now comes a new paper by Naila Kabeer, which reports the findings from a qualitative evaluation, which was conducted in two of the six study sites. The paper aims to provide a different perspective to the RCT by digging deeper into issues regarding implementation (including random assignment in the RCT), refusal of take-up, and mediation of effects (or lack thereof) through differences in environment and household characteristics.
While I always like to start with what I liked about a paper, here, for the sake of the reader’s full understanding of the comparisons between the RCT findings and the qualitative study, I need to point out an unfortunate fact. While the qualitative work took place in the same general areas as the study sites, no qualitative work (at least not by the team that included Prof. Kabeer) was conducted in places where the RCT collected data and vice versa. This, despite the fact that the areas the qualitative team worked in (described as a few hundred miles away, which does not seem close to me in absolute terms) had the same procedures of identification of households, random assignment, etc. The short Section 3.1 gives some clues about why the RCT was not conceived as one mixed-methods evaluation rather than two separate studies with different methodologies in different sites, but the fact that a team of BRAC Development Institute and IDS (funded by MasterCard) conducted a study in separate locations than the RCT is simply unfortunate. Many times, throughout reading this otherwise nice and useful paper, I just wished to know what the larger sample quantitative data would have said about the same respondents. Alas, they were not interviewed by the other set of researchers…
Nonetheless, I learned a fair amount from sections 2-4, which provided useful context for the TUP operations in the Sindh (Pakistan) and West Bengal (India). The settings are quite different, including many factors that would reasonably affect project success, such as isolation vs. connectedness of the study villages; role of village elites, NGOs, men, and intra-household dynamics; ability and flexibility of the NGOs to adapt when the basic TUP template needs to be tweaked; social and cultural norms regarding women’s participation in the necessary activities, etc. In fact, by simply laying these out, the qualitative study does a service to the RCT and its readers by laying bare the most glaring shortcoming of a six-country study published in a journal like Science: the template simply does not allow for this kind of detail to be provided as background context – not only in each country, but site by site within countries. Some might say the reporting of the plain facts alone (averaged and analyzed for some basic heterogeneity) without the interpretation of data from small samples in the qualitative study (QS from hereon) is a strength and not a weakness, but it would be hard to argue that we, as the readers, the researchers, and perhaps most of all future adopters would not benefit from the qualitative work. I have more to say on this below, towards the end of the post…
From the qualitative study, we learn a lot. For example, there is evidence that the random assignment procedure may not have been followed by the NGO in the Sindh, who, instead, might have chosen the beneficiaries based on their relationship to the elites. Note that different NGOs were working at different sites even within countries, so this may not have necessarily happened in the RCT sample. Furthermore, the evidence does not actually come from the qualitative work but another independent process-evaluation type study. Nonetheless, it’s not encouraging that procedures were not followed when public lotteries were envisioned.
Much more interesting, however, are discussions of refusal to take up the intervention in West Bengal, how this might have been associated with religion (Muslim villages and households might have been suspicious about the aims of the project, while Muslim women might have had a hard time trying to take advantage of the interventions provided by the program); the suitability of livestock rearing in the arid site in the Sindh; how the NGO in West Bengal was able to change course based on early feedback on which households fared better with what type of entrepreneurial activity; the categorization of households as slow or fast climbers based on a subjective-wellbeing scale and asset accumulation; discussions of the importance of certain eligibility criteria, such as the existence able-bodied male adults in the household; chronic illnesses and availability of health care; the cooperation or lack thereof between husbands and wife; higher mobility and empowerment of certain groups of women at baseline (influential in taking advantage of what’s on offer), etc. I really enjoyed reading these sections, plus I am a sucker for quotes from study participants (even though I try to be aware of the dangers of building elaborate narratives from a quote or two).
While the issues raised by the QS might be cause for some worry regarding internal and external validity of the findings in the RCT, I found that the two studies generally agree with each other. The qualitative study is generally careful and fair in its discussion of how the RCT dealt with the issues of non-compliance, attrition analysis, and the like (nothing non-standard that we need to get into). The weakness of the RCT is, in my mind, not in bias in any obvious way, but in the lack of detail it is able to provide. For example, the issue of low take-up is dealt with by the adoption of the standard ITT estimation – households that did not get treated in the treatment group are still in the sample. Looking at India in the science paper, the effects are huge, which is doubly surprising: the standardized effects on assets and incomes are upwards of 0.5 SD (the internal rate of return is also highest in West Bengal at 23%). But, if we assume that take-up rates were equally low (about 50%) in RCT sites, the ToT effects would have to be double these already unusually high impacts, i.e. huge. Had I known about this issue when I was writing my blog, I would have certainly raised it as a question mark. Kabeer (2019) relates this issue to the quantile regression analysis in Banerjee et al. (2015) and suggests that it may partly explain why we see higher quintiles doing better. Other issues such as the possibility of spillover effects (both countries had individual, rather than cluster, randomization) are also fair, something I raised in my blog four years ago.
It’s important to note that we are judging studies that were planned almost 15 years ago by today’s norms and standards, which is not fair. I don’t think anyone would design such a transfer program as an individually-randomized RCT today, but it was not so obvious a decade ago. The study had a spillover design in some sites, but not others.
Another interesting takeaway from the QS is the roles of the NGOs. For example in West Bengal, the NGO responsible for project implementation in the RCT areas excluded the possibility of self-help groups (SHG) and microfinance clients (in line with the eligibility rules), as such households are also more likely to benefit from other anti-poverty programs. The NGO in the QS area, however, did explicitly make use of SHGs, which was one of its own strengths and might have contributed significantly to the proposed success of the project, by allowing women to bond together, learn from each other, hide their savings in that formal setup (“saving by the book”), and provided them loans with better terms than available otherwise. The NGO in the RCT might have also steered initially unsuccessful participants from livestock rearing towards vegetable growing and other activities and reallocated livestock to better-off, more experienced families. Such adaptation of a basic intervention template to local circumstances seems key, but also introduces heterogeneity of outcomes based on the quality of the chosen implementer. In contrast, the NGO in the Sindh does not come across in the best light, failing to follow instructions and making key errors in judgment (it’s not clear whether these were avoidable or only clearer in hindsight). You can already see why it is unfortunate that we don’t have all these data from a subset of the study villages…
Sure, the sample size in the QS is small (20 in each site, if I am correct) and self-admittedly so, but there is nonetheless a wealth of information here, which help provide context, formulate some hypothesis for further testing, for heterogeneity analysis, and yes, even to assess bias and external validity. So, it pains me more to say that Sections 1 (introduction) and 5 (conclusion) are just discordant with the rest of the paper. Why make this a paper that sounds like it is mainly a critique of RCTs in development economics? Maybe, the author thinks repetition is good and useful? If so, I disagree: the mentions of the paper on Twitter, including by the author herself emphasizing the shortcomings of RCTs, certainly did not make me want to read the paper.
These critiques, which seems to have caused the author to spend an inordinate amount of time whether the assignment was really randomized or not, don’t add value to the paper or the literature. And, they sometimes obfuscate or confuse issues by accident. Program take-up is a completely different issue of non-compliance than randomization procedures, yet they are discussed together. The former is not only a common issue, but also in no way limited to RCTs. The mention of methods employed ex poste to deal with econometric issues feels dated. Sentences like “…RCTs frequently do not collect information on [relevant] variables because they do not consider them relevant to their experiment or even know what they might be” are both unnecessary and unfair to a lot of researchers in the field. What is fair is the statement that the RCT findings could have been interpreted much better with more knowledge of the local context and the trial population (compared with the target population IF they differ). It’s also fun to read the paragraph in the final section about mundane reasons (arising from the QS study) as to why the program effects might have been small in absolute terms in contrast with the higher-level informed speculations of Banerjee et al. (2015) on poverty traps. There is a dig there in there somewhere...
Two things to round up the discussion. First, I was surprised that there is no discussion (or even a passing mention) of the Bandiera et al. (2017) paper in the Quarterly Journal of Economics, which spun together a much better narrative about the impacts of TUP at both the household level and the changes in these village economies, along with the pathways, such as spillovers, general equilibrium changes, occupational change, etc. I do not remember mention of qualitative data collection reading that study, but I would not be surprised if there was one. If not, the authors show that it is possible to provide much needed context and explanations within a well-designed RCT, who are given room to write all their findings, including the background labor market context, pathways, longer-term findings, etc. Of course, they benefit from working with one, and the original, NGO (BRAC) in Bangladesh, but I have a suspicion that Bandiera et al. would not have provided as fertile ground for a critical comparison of QS vs. RCT.
This brings me to the issue of how to design studies, including plans for endline reporting of all the findings. It is not completely fair to criticize Banerjee et al. (2015) for writing a paper that fit the required template of a journal like Science. Many medical journals, but not economics ones, do the same. In the biomedical/public health field, the Lancet/Science articles are accompanied by a series of secondary publications in other journals. Perhaps, one can take issue with the authors not putting out follow-up papers that provide more in-depth context for the interpretation of findings, but maybe they are using their time better by designing follow-up studies – I don’t know. But, in economics, we have neither the culture nor the incentives to write those secondary papers in lower-tier journals. So, what could be done?
One idea might be to not only pre-register the study and get pre-trial acceptance (a la what the Journal of Development Economics is doing with their new Registered Reports) but be even more ambitious and consider a special issue. Remember the American Economic Journal: Applied Economics special issue that published six RCTs of microcredit? Like that, but even more ambitious... First, all articles rather than just the summary article would be pre-accepted. Second, the main article would be a formal meta-analysis of all country studies, but each country study would be written up separately in the same issue by a different team providing needed context. These papers would not only make use of mixed-methods analysis, but there might also be a final (or first or second) paper in the same special issue that is written by the qualitative methods lead – perhaps providing a different perspective. If you are working on a topic of outsized importance in development economics, have secured funding for a multi-country study, then there is no reason why (a) the study cannot have the best design and data collection efforts possible, and (b) be good enough for pre-registration and acceptance prior to the trial. Crazy? Maybe, but it might at least make some people think about the feasible alternatives.
For me personally, I already have mixed methods RCTs in the field, but will be paying even more attention to the data coming from the qualitative side. Researchers with different traditions of arriving at causal effects might have some discomfort and conflict trying to work closely together, but I do believe that the projects, on average, should benefit from such efforts.