The widespread availability of mobile phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. While advanced applications on smartphones allow for richer multimedia content and two-way communication between beneficiaries and health coaches, simpler text and voice messaging services can be effective in disseminating information to large communities, particularly those that are underserved with limited access to information and smartphones. ARMMAN1, one non-profit doing just this, is based in India with the mission of improving maternal and child health outcomes in underserved communities.
One of the programs run by them is mMitra, which employs automated voice messaging to deliver timely preventive care information to expecting and new mothers during pregnancy and until one year after birth. These messages are tailored according to the gestational age of the beneficiary. Regular listenership to these messages has been shown to have a high correlation with improved behavioral and health outcomes, such as a 17% increase in infants with tripled birth weight at end of year and a 36% increase in women knowing the importance of taking iron tablets.
However, a key challenge ARMMAN faced was that about 40% of women gradually stopped engaging with the program. While it’s possible to mitigate this with live service calls to women to explain the advantage of listening to the messages, it is infeasible to call all the low listeners in the program because of limited support staff — this highlights the importance of effectively prioritizing who receives such service calls.
In “Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health”, published in AAAI 2022, we describe an ML-based solution that uses historical data from the NGO to predict which beneficiaries will benefit most from service calls. We address the challenges that come with a large-scale real world deployment of such a system and show the usefulness of deploying this model in a real study involving over 23,000 participants. The model showed an increase in listenership of 30% compared to the current standard of care group.
We model this resource optimization problem using restless multi-armed bandits (RMABs), which have been well studied for application to such problems in a myriad of domains, including healthcare. An RMAB consists of n arms where each arm (representing a beneficiary) is associated with a two-state Markov decision process (MDP). Each MDP is modeled as a two-state (good or bad state, where the good state corresponds to high listenership in the previous week), two-action (corresponding to whether the beneficiary was chosen to receive a service call or not) problem. Further, each MDP has an associated reward function (i.e., the reward accumulated at a given state and action) and a transition function indicating the probability of moving from one state to the next under a given action, under the Markov condition that the next state depends only on the previous state and the action taken on that arm in that time step. The term restless indicates that all arms can change state irrespective of the action.
Finally, the RMAB problem is modeled such that at any time step, given n total arms, which k arms should be acted on (i.e., chosen to receive a service call), to maximize reward (engagement with the program).
The probability of transitioning from one state to another with (active probability) or without (passive probability) receiving a service call are therefore the underlying model parameters that are critical to solving the above optimization. To estimate these parameters, we use the demographic data of the beneficiaries collected at time of enrolment by the NGO, such as age, income, education, number of children, etc., as well as past listenership data, all in-line with the NGO’s data privacy standards (more below).
However, the limited volume of service calls limits the data corresponding to receiving a service call. To mitigate this, we use clustering techniques to learn from the collective observations of beneficiaries within a cluster and enable overcoming the challenge of limited samples per individual beneficiary.
In particular, we perform clustering on listenership behaviors, and then compute a mapping from the demographic features to each cluster.
This mapping is useful because when a new beneficiary is enrolled, we only have access to their demographic information and have no knowledge of their listenership patterns, since they haven’t had a chance to listen yet. Using the mapping, we can infer transition probabilities for any new beneficiary that enrolls into the system.
We used several qualitative and quantitative metrics to infer the optimal set of of clusters and explored different combinations of training data (demographic features only, features plus passive probabilities, features plus all probabilities, passive probabilities only) to achieve the most meaningful clusters, that are representative of the underlying data distribution and have a low variance in individual cluster sizes.
Comparison of passive transition probabilities obtained from different clustering methods with number of clusters s = 20 (red dots) and 40 (green dots), using ground truth passive transition probabilities (blue dots). Clustering based on features+passive probabilities (PPF) captures more distinct beneficiary behaviors across the probability space.
Clustering has the added advantage of reducing computational cost for resource-limited NGOs, as the optimization needs to be solved at a cluster level rather than an individual level. Finally, solving RMAB’s is known to be P-space hard, so we choose to solve the optimization using the popular Whittle index approach, which ultimately provides a ranking of beneficiaries based on their likely benefit of receiving a service call.
We evaluated the model in a real world study consisting of approximately 23,000 beneficiaries who were divided into three groups: the current standard of care (CSOC) group, the “round robin” (RR) group, and the RMAB group. The beneficiaries in the CSOC group follow the original standard of care, where there are no NGO initiated service calls. The RR group represents the scenario where the NGO often conducts service calls using some systematic set order — the idea here is to have an easily executable policy that services enough of a cross-section of beneficiaries and can be scaled up or down per week based on available resources (this is the approach used by the NGO in this particular case, but the approach may vary for different NGOs). The RMAB group receives service calls as predicted by the RMAB model. All the beneficiaries across the three groups continue to receive the automated voice messages independent of the service calls.
Distributions of clusters picked for service calls by RMAB and RR in week 1 (left) and 2 (right) are significantly different. RMAB is very strategic in picking only a few clusters with a promising probability of success (blue is high and red is low), RR displays no such strategic selection.
At the end of seven weeks, RMAB-based service calls resulted in the highest (and statistically significant) reduction in cumulative engagement drops (32%) compared to the CSOC group.
The plot shows cumulative engagement drops prevented compared to the control group. RMAB vs CSOC RR vs CSOC RMAB vs RR % reduction in cumulative engagement drops 32.0% 5.2% 28.3% p-value 0.044 0.740 0.098
An ethics board at the NGO reviewed the study. We took significant measures to ensure participant consent is understood and recorded in a language of the community’s choice at each stage of the program. Data stewardship resides in the hands of the NGO, and only the NGO is allowed to share data. The code will soon be available publicly. The pipeline only uses anonymized data and no personally identifiable information (PII) is made available to the models. Sensitive data, such as caste, religion, etc., are not collected by ARMMAN for mMitra. Therefore, in pursuit of ensuring fairness of the model, we worked with public health and field experts to ensure other indicators of socioeconomic status were measured and adequately evaluated as shown below.
The proportion of beneficiaries that received a live service call within each income bracket reasonably matches the proportion in the overall population. However, differences are observed in lower income categories, where the RMAB model favors beneficiaries with lower income and beneficiaries with no formal education. Lastly, domain experts at ARMMAN have been deeply involved in the development and testing of this system and have provided continuous input and oversight in data interpretation, data consumption, and model design.
After thorough testing, the NGO has currently deployed this system for scheduling of service calls on a weekly basis. We are hopeful that this will pave the way for more deployments of ML algorithms for social impact in partnerships with non-profits in service of populations that have so far benefited less from ML. This work was also featured in Google for India 2021.
This work is part of our AI for Social Good efforts and was led by Google Research, India. Thanks to all our collaborators at ARMMAN, Google Research India, Google.org, and University Relations: Aparna Hegde, Neha Madhiwalla, Suresh Chaudhary, Aditya Mate, Lovish Madaan, Shresth Verma, Gargi Singh, Divy Thakkar.
1ARMMAN runs multiple programs to provide preventive care information to women through pregnancy and infancy enabling them to seek care, as well as programs to train and support health workers for timely detection and management of high-risk conditions. ↩