September 2023: Predictive Data Analytics in Global Health | Center for Global Digital Health Innovation

This month’s Research Roundup is focused on the use of predictive data analytics within global health to support health outcomes. We’ve invited Diwakar Mohan, faculty at the Center for Global Digital Health Innovation in the Johns Hopkins Bloomberg School of Public Health, to provide his thoughts and considerations on how big data can be leveraged to support population health outcomes. There are many aspects to explore, including studies that demonstrate:

How the use of big data, such as Facebook ad data or geo-referenced household data, can allow us to track and monitor progress on United Nations Sustainable Development Goals (SDGs)
The need for clear governance and legislation around the use of big data as well as data privacy and security.

Guest Editor's Remarks:

In the digital age, the use of predictive analytical models has revolutionized various fields, with global health not being an exception. The field involves the use of data, statistical algorithms, and machine learning techniques to predict future outcomes based on historical—or, in some cases, live—data. These models have emerged as powerful tools, enabling researchers and policymakers to detect disease outbreaks, allocate resources efficiently, and design effective interventions.

Early detection and prevention of diseases
A big impact of predictive analytical models in global health is their ability to detect and predict disease outbreaks, in time for corrective action. By connecting and analyzing vastly different data, these models can identify spatial patterns and time trends, allowing health organizations to prepare and respond proactively. These models, for example, have been instrumental in predicting the spread of infectious diseases such as Ebola and Zika virus, as well as the COVID-19 pandemic.

Optimizing resource allocation
Resource allocation is a critical aspect of global health management, especially when it is a fixed pie. Predictive analytical models have helped healthcare organizations in high-income settings to optimize their resources by forecasting patient admissions, medication needs, and equipment requirements—ensuring health systems have the right resources in the right place at the right time. We are working towards achieving similar gains in low resource settings where optimization efforts are slower due to a lack of timely and good quality data.

Precision medicine
Predictive analytical models are revolutionizing healthcare by enabling individualized treatment plans and targeting. By analyzing a patient's medical history, lifestyle, and environmental factors, models can predict disease risks and recommend tailored management options. These capabilities enhance patient outcomes, and reduce unnecessary treatments and tests, minimizing preventable side effects, lowering healthcare expenses, and optimizing use of available resources.

Targeting public health interventions
Predictive analytical models also play a vital role in the design and rollout of targeted public health interventions. Analyzing contextual factors in the community alongside demographic data, the social determinants of health, and historical health records allows models to identify at-risk populations and tailor interventions to specific sub-populations. These models ensure that public health efforts are focused, efficient, and impactful.

Challenges and ethical considerations
While predictive analytical models offer immense potential, their use in global health will not be without challenges. Ensuring data privacy (especially amongst the most vulnerable), addressing biases in algorithms, and building trust amongst stakeholders are significant hurdles. Ethical considerations around consent, data ownership, and the responsible use of predictive models need to be handled carefully in order to maintain public trust and uphold the integrity of those responsible for public health.

Conclusion
Predictive analytics have emerged as indispensable tools within global health. By harnessing the power of improved data and advanced algorithms, these models facilitate early disease detection, optimal resource allocation, precision medicine, and targeted public health interventions. However, stakeholders (including policymakers, researchers, and public interest organizations) need to address ethical concerns as well as work collaboratively to harness predictive analytics' full potential. Due to continuous advancements in technology, predictive analytical models will continue to shape the future of global health, saving millions of lives and improving well-being, but only when implemented in an ethical and responsible manner.

The scope of big data and predictive analytics in global health

Read the CGDHI key takeaways and comments on the research articles hand-picked by our guest editor:

Benefits, risks, and opportunities for big data in LMIC healthcare

Wyber et al, Big data in global health: improving health in low- and middle-income countries, Bulletin of the World Health Organization, 2015

Summary & Takeaways:

Despite massive growth in the ability to collect and analyze big data within the last decade, progress within the health sector has been slower. With appropriate governance, improvements in the quality, quantity, storage, and analysis of health data could lead to significant improvement in various health outcomes, especially in LMICs.

Wyber et al. explore the use of big data to improve healthcare delivery in LMICs, evaluating the benefits, risks, and opportunities for big data in health. Though the use of big data in LMICs remains complex, it often provides the greatest potential rewards. For example, India’s issuance of digital identification cards (Aadhaar) offers the opportunity to generate and monitor health and social data at scale. The implementation of big data systems in LMICs, however, is also fraught with ethical, regulatory, and technological issues. To minimize potential risks, the authors propose stronger legislation supporting the privacy of health information, increased use of anonymized data, and more transparent and effective processes for data governance. However, the use of big data should not be seen as a one-size-fits-all solution for persistent challenges in low-resource settings. Rather, appropriate and scaled use of big data in LMICs requires collaboration across stakeholders, a clear governance and decision-making framework, and the development of robust interoperability standards.

Comment from the Center for Global Digital Health Innovation:

While big data can be used to enhance data collection methods and improve health information systems, thereby increasing the potential for improving health outcomes in low-resource settings, the sheer size of big data sets may also introduce risks. These risks could unintentionally exacerbate existing challenges within low-resource settings such as weak health systems and scarce resources. Without strong governance and clear data regulations, the big data approach remains vulnerable to fragmentation and misuse in LMICs. Big data may represent a major milestone in the field of healthcare delivery in LMICs, but it is no panacea to existing global health challenges. Thus, to recognize the promise of big data use, a broader effort is needed to establish robust interoperability standards and clear governance models, key components of a proactive, normative-forming approach.

Using big data to understand mLearning program functioning and impact

Bashingwa et al, Examining the reach and exposure of a mobile phone-based training programme for frontline health workers (ASHAs) in 13 states across India, BMJ Global Health, 2021

Summary & Takeaways:

Mobile learning programs (mLearning) have the potential to train large groups of providers on high-quality, standardized content at low cost, at the time and location of their choice. Mobile Academy is an interactive voice response training program for Accredited Social Health Activists (ASHAs) that is in the process of being nationally scaled in 13 of 36 states in India; three states have state-based Mobile Academy platforms. This article describes an analysis of a big data set—ASHA call data records—to evaluate coverage, user engagement, and completion of Mobile Academy from 2015-19. This dataset included over 158,596 ASHAs who initiated the national version of Mobile Academy. The study found 41% of ASHAs registered with the government-initiated Mobile Academy (national and state-based programs), with an 81% completion rate and 99% passing rate among ASHAs who initiated the national version of the program. However, rates varied across states, with Rajasthan having the highest initiation rate at 64%. On average, ASHAs spent 5 hours over 10 calls to complete the course for the first time. ASHAs also called more often in the latter half of the day while longer calls took place earlier in the day. Differences in program initiation rates were attributed to phone number quality, government promotion, variable oversight of ASHA’s adoption and progress through the course by their supervisors, and mobile network quality.

Comment from the Center for Global Digital Health Innovation:

This study offers valuable insights into the differing reach and adoption of the Mobile Academy mLearning program across Indian states. It also underscores the need to evaluate scaled mLearning programs more rigorously. The authors were able to leverage a large dataset to understand the mechanics of how Mobile Academy functioned. Insights from the data analysis can be used to further improve and update how to effectively design and deliver mLearning programs for frontline health workers.

Tracking the global digital gender gap with Facebook ad data

Fatehkia et al, Using Facebook ad data to track the global digital gender gap, World Development, 2018

Summary & Takeaways:

We are in an age of increasingly high mobile phone penetration; the latest GSMA Mobile Economy 2023 report indicates there are over 8 billion mobile phone connections globally, with 5.4 billion mobile service subscribers and 4.4 billion people using mobile internet by the end of 2022. However, despite these improvements in mobile phone access and the elevation of gender inequities on the global agenda, persistent gender-based inequities remain. The availability of gender-disaggregated data remains limited, especially in LMICs—a limitation that undermines opportunities to both identify and address these inequities.

Fatehkia et al.’s study leverages the big data in Facebook’s user database to track and predict mobile and internet gender gaps. For its targeted advertising efforts, Facebook collects data on user demographics and user device(s) used to access the platform. To assess how effective these Facebook data would be for their analysis, Fatehkia et al. compared them with ‘offline’ data sources, i.e., official country-level data on development and gender gaps. The authors found that Facebook data were highly correlated with official statistics for both mobile and internet gender gaps. When looking at internet-based gender gaps, the models that just used Facebook data did better than ‘offline’ data. They concluded that the strongest models were those that utilized both Facebook and ‘offline’ data, as they provided the highest predictive power.

Comment from the Center for Global Digital Health Innovation:

Although SDGs ask countries to track progress in ameliorating the global digital gender divide, it can be difficult to identify and measure the gaps because data are not often current or available. This study employed a highly innovative approach for measuring gender disparities in mobile phone and internet coverage. Leveraging Facebook data enables an understanding of the digital gender gap across wider geographies. Moreover, since Facebook data benefits from continuous updates, using these big datasets allows for more routine monitoring of progress in addressing gender-based inequities in mobile phone and internet access.

Importance of subnational data analysis in big datasets on child survival rates

Burstein et al, Mapping 123 million neonatal, infant, and child deaths between 2000 and 2017, Nature, 2019

Summary & Takeaways:

Child survival rates are often used to understand and assess improvements in a country’s overall progress in population health and development. However the focus on country-to-country estimates does not account for regional child survival rate variations within in a country. This gap in the data is important because the subnational level is often where we want to intervene with specific and relevant interventions that improve child survival.

Burstein et al.’s study used a geostatistical survival model to look at subnational data on child mortality rates and the number of deaths of neonates, infants, and children under-5 between 2000-2017 across 99 low- and middle-income countries (which account for 93% of all child deaths). The authors used big data for their analysis, focused on household surveys and census data. They found that nearly 71.8 million child deaths, or 58% of the total recorded, could be avoided between 2000-2017 if geographical inequities could be addressed. The authors also noted that 32% of subnational units (representing almost 12% of the under-5 population across the 99 countries included in the study) met the target of 25 deaths per 1,000 live births for Sustainable Development Goal 3.2 (Ending Preventable Child Deaths by 2030). Instrumental to these findings was the availability and quantity of high-quality and timely data.

Comment from the Center for Global Digital Health Innovation:

This study demonstrates the importance of understanding subnational variations in health outcomes for informing policy and planning that addresses disparities through the provision of specific interventions, financial resources, infrastructure, or other needs at the subnational level. Having access to subnational data proved critical for the researchers' work. However, such data might not always be available depending on the context of the health system and its health information systems (HIS). Such a limitation would impact the ability of health planners and policymakers to leverage data for evidence-based decision-making. The analyses shared by the authors in this paper do provide a method with which to ascertain child mortality data at subnational levels in the absence of robust HIS. Even so, it is essential to invest in civil registration and vital statistics (CRVS) systems which have routine and continuous data on births, deaths, and causes of death with the appropriate geographical unit of measurement.

Click the links below to read previous editions of Research Roundup and to receive the latest updates in global digital health!

See previous editions

Subscribe now

Meet the Guest Editor

Dr. Diwakar Mohan is a public health physician and health systems epidemiologist working in LMIC settings since 2003. He completed his MPH and DrPH from the Department of International Health at the Johns Hopkins Bloomberg School of Public Health. As an expert in health systems epidemiology and evaluation methods, he specializes in measurement across various public health programs including developing frameworks and metrics, utilizing new tools and modalities to collect data, and focusing on the intersection of digital technologies and data for strategic decision-making. His interests lie in the application of epidemiological and statistical methods, including machine learning, to support the evaluation and implementation of health program research in LMIC settings.