2024
The role of spatial epidemiology to support public health policies : case studies applied to health promotion, noncommunicable and infectious diseases in the canton of Vaud, Switzerland
In a context of escalating public health challenges, including the rise of chronic diseases, the impact of climate change, and the COVID-19 pandemic, certain populations bear a disproportionate burden. As a result, there is an urgent need to develop public health strategies that not only promote overall well-being but also mitigate these health inequities. Addressing these issues through a geographic lens is essential because health status is strongly influenced by the social determinants of health, i.e., the conditions under which individuals are born, grow, live, work, and age. Such spatial epidemiology studies could facilitate the prioritization of public health interventions, and the design of initiatives tailored to the characteristics of populations and their environments. In recent decades, the research field of spatial epidemiology has been boosted by the increasing availability of high-resolution spatial data and advances in computational techniques. However, academic findings are rarely translated into population health interventions. This thesis aims to bridge this gap by exploring the potential of spatial epidemiology in supporting public health policies. To this end, the research was structured around case studies aligned with the challenges faced by the Public Health Department of the canton of Vaud. First, indicators related to social determinants of health were developed and mapped at a fine spatial scale (hectare level) to address the challenges of a national health promotion program engaged with municipalities. These indicators were then associated with individual health data to investigate the influence of the physical and social environments on the spatial distribution of cardiovascular risk factors. By identifying a pronounced geographic pattern of hypertension, obesity, and diabetes in the adult population of the city of Lausanne, the results provided insights for prioritizing and adapting future prevention campaigns. The role of spatial epidemiology in infectious disease surveillance was then explored in the context of the COVID-19 pandemic. Spatio-temporal approaches were applied to individual RT-PCR test data to identify emerging clusters of COVID-19 cases. Subsequent genomic analysis of these clusters demonstrated that incorporating geographic approaches could improve the effectiveness of current surveillance systems by guiding prioritization strategies for contact tracing and virus tracking. In the final case study, spatial approaches were used to design the COVID-19 mobile vaccination campaign in the canton of Vaud, illustrating the translation of research into practice. This thesis demonstrates that fine-scale spatial epidemiology can inform strategic decision-making for various health challenges, and concludes with practical recommendations for adopting a geographic lens within public health departments.
Lausanne, EPFL, 2024.Towards Language Learning From Passive Exposures To In-Context Examples
This thesis explores innovative methods for language learning through passive exposure to in-context examples. We aim to reduce the discipline, willpower, and time investment required to learn a new language by integrating language learning into existing daily habits. The fundamental idea is to translate some parts of the text that the user is reading. In a first work we present the software components necessary to implement this idea, discuss efficient learning strategies for smart word selection and evaluate the learning paradigm in a comprehensive user study. To improve the learning experience and intuitively guide the reader towards a correct understanding of the foreign language demonstrations, we design a novel system that can translate text into a series of semantic images. In a cloze-test-based user study, we find that our visual semantic cues significantly increase the chance of correctly guessing a masked word. We hope that in practice, our work will optimize the process of learning from passive exposures by reducing ambiguity. Finally, we discuss a shortcoming of translating words in place: We are limited to teaching vocabulary to our users that occurs naturally in the text they are reading. To overcome this problem, we design an NLP pipeline that uses generative AI to rewrite and extend the text read by our users.
Lausanne, EPFL, 2024.Generalization and Personalization of Machine Learning for Multimodal Mobile Sensing in Everyday Life
A range of behavioral and contextual factors, including eating and drinking behavior, mood, social context, and other daily activities, can significantly impact an individual’s quality of life and overall well-being. Therefore, inferring everyday life aspects with the use of smartphone and wearable sensors, also broadly known as mobile sensing, is gaining traction across both clinical and non-clinical populations due to the widespread use of smartphones around the world. Such inferences are of use in mobile health apps, mobile food diaries, and generic mobile apps. However, despite the long-standing promise in the domain, realizing the full potential of models, in the wild, is still far from reality due to two primary deployment challenges: the generalization and personalization of models. In addition, there are understudied domains, such as eating and drinking behavior modeling with multimodal mobile sensing and machine learning. Hence, this thesis delves into the realm of multimodal mobile sensing with an eye for the generalization and personalization of models, exploring a range of novel inferences at the intersection of eating and drinking behavior, mood, daily activities, and context. After introducing the topic in the first chapter and discussing data collection in the second, we expand on passive sensing for drink behavior modeling using multimodal sensor data in the third chapter. The fourth chapter demonstrates how smartphone sensors can infer self-perceived food consumption levels with personalized models. The fifth chapter showcases how phone sensors could be used to infer eating events with personalized models. The sixth chapter highlights the challenge of generic mood inference models struggling to adapt to specific contexts like eating. To tackle this, we propose a personalization technique to enhance model performance even with limited data. In the next three chapters, we delve further into the realm of model generalization within the context of multimodal mobile sensing. We also investigate the impact of personalization on generalization performance. Specifically, we investigate model generalization across countries—a problem that has been scarcely addressed in prior research. To this end, in the seventh chapter, we examine the generalization capabilities of mood inference models, while the eighth chapter focuses on the generalization of models for complex daily activity recognition. Upon highlighting the limitations of model generalization in the aforementioned chapters, we introduce a novel technique to enhance model generalization in the context of multimodal sensor data in the ninth chapter. In summary, this thesis offers an extensive exploration of novel inferences and deployment challenges in multimodal mobile sensing. First, the thesis explores eating and drinking behavior and its interplay with mood, social context, and daily activities, viewed through the lens of both model personalization and generalization. Additionally, the thesis delves into the challenge of cross-country generalization for mobile sensing-based models and presents a novel deep learning architecture for unsupervised domain adaptation, yielding enhanced performance in unfamiliar domains. As a result, this thesis contributes both empirically and methodologically to the fields of ubiquitous and mobile computing and digital health.
Lausanne, EPFL, 2024.Infectious disease spread in connected communities
Ecohydrology and epidemiology share a deep bond in infectious disease modelling. The former focuses on the interaction among species and their water-controlled environment (i.e., the study of water controls on the biota). The latter revolves around specific host-pathogen relationships and the pathogen’s propagation in space and time. Combining these disciplines is crucial when considering some disease-causing water-based pathogens, such as Vibrio cholerae and Opisthorchis viverrini. The importance of a spatially explicit framework to better understand the spread of airborne diseases was also asserted during the COVID-19 pandemic. While many countries still struggle to fight disease transmission, comprehensive knowledge regarding the spread of these diseases in space is still lacking. To this end, this Thesis bridges ecohydrological and epidemiological concepts to advance towards a more thorough understanding of the mechanisms regulating the spread of the pathogens mentioned above and diseases in space and time. Throughout the development of this Thesis, these mechanisms have been pinpointed to epidemiological indicators such as reproduction numbers and epidemicity indices derived from the concept of reactivity in ecological dynamics. These metrics often result from algebraic analyses based on the eco-epidemiological models, which are hereby showcased both in continuous and discrete time. Specifically, models in continuous time, here adapted to water-based diseases, are built on sets of coupled ordinary differential equations that consider any relevant hydrological forcing. A model in discrete time, derived from a suitable discretization of an integro-differential model in continuous time, is applied to an airborne disease, COVID-19. Appropriate calibration algorithms, based on either a Markov Chain Monte Carlo Bayesian framework or on particle filtering techniques, are implemented to calibrate the models on data on the 2010s Haitian cholera outbreak, the endemic transmission of O. viverrini along the Mekong River, and the COVID-19 pandemic in Italy. Where relevant, human actions, such as vaccinations or non-pharmaceutical interventions, are embedded in the model to account for the reduced transmission. The experiments on the two considered water-based pathogens show that including human mobility and riverine transmission into any relevant model is essential to correctly capture the pathogen’s spread and, therefore, design appropriate containment measures that, among other things, also target spatial transmission. In addition, a new framework for the computation of effective reproduction numbers based on epidemiological data and mobility fluxes indicates that including the latter into renewal equations may often produce different values of the epidemiological indicators. This suggests that failing to include spatial transmission may misrepresent the actual epidemiological situation. The implications of these results are diverse. On the one hand, they suggest that embedding spatial connectivity into the epidemiological models substantially helps design containment measures to curb the spread of the disease. On the other hand, owing to the more precise nature of the epidemiological metrics computed within a spatially explicit framework, so-computed reproduction numbers and epidemicity indices can improve our surveillance systems and function as early-warning indicators that may anticipate future outbreaks.
Lausanne, EPFL, 2024.2023
Privacy-preserving contact tracing curbed COVID
Despite controversies over decentralized contact-tracing apps, the data now show that they saved thousands of lives during the pandemic. National and international authorities must heed the lessons.
Nature. 2023. Vol. 619, num. 7968, p. 31 – 33. DOI : 10.1038/d41586-023-02130-6.Hits and Misses: Digital Contact Tracing in a Pandemic
Traditional contact tracing is one of the most powerful weapons people have in the battle against a pandemic, especially when vaccines do not yet exist or do not afford complete protection from infection. But the effectiveness of contact tracing hinges on its ability to find infected people quickly and obtain accurate information from them. Therefore, contact tracing inherits the challenges associated with the fallibilities of memory. Against this backdrop, digital contact tracing is the “dream scenario”-an unobtrusive, vigilant, and accurate recorder of danger that should outperform manual contact tracing on every dimension. There is reason to celebrate the success of digital contact tracing. Indeed, epidemiologists report that digital contact tracing probably reduced the incidence of COVID-19 cases by at least 25% in many countries, a feat that would have been hard to match with its manual counterpart. Yet there is also reason to speculate that digital contact tracing delivered on only a fraction of its potential because it almost completely ignored the relevant psychological science. We discuss the strengths and weaknesses of digital contact tracing, its hits and misses in the COVID-19 pandemic, and its need to be integrated with the science of human behavior.
Perspectives On Psychological Science. 2023. DOI : 10.1177/17456916231179365.COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter
IntroductionThis study presents COVID-Twitter-BERT (CT-BERT), a transformer-based model that is pre-trained on a large corpus of COVID-19 related Twitter messages. CT-BERT is specifically designed to be used on COVID-19 content, particularly from social media, and can be utilized for various natural language processing tasks such as classification, question-answering, and chatbots. This paper aims to evaluate the performance of CT-BERT on different classification datasets and compare it with BERT-LARGE, its base model. MethodsThe study utilizes CT-BERT, which is pre-trained on a large corpus of COVID-19 related Twitter messages. The authors evaluated the performance of CT-BERT on five different classification datasets, including one in the target domain. The model’s performance is compared to its base model, BERT-LARGE, to measure the marginal improvement. The authors also provide detailed information on the training process and the technical specifications of the model. ResultsThe results indicate that CT-BERT outperforms BERT-LARGE with a marginal improvement of 10-30% on all five classification datasets. The largest improvements are observed in the target domain. The authors provide detailed performance metrics and discuss the significance of these results. DiscussionThe study demonstrates the potential of pre-trained transformer models, such as CT-BERT, for COVID-19 related natural language processing tasks. The results indicate that CT-BERT can improve the classification performance on COVID-19 related content, especially on social media. These findings have important implications for various applications, such as monitoring public sentiment and developing chatbots to provide COVID-19 related information. The study also highlights the importance of using domain-specific pre-trained models for specific natural language processing tasks. Overall, this work provides a valuable contribution to the development of COVID-19 related NLP models.
Frontiers In Artificial Intelligence. 2023. Vol. 6, p. 1023281. DOI : 10.3389/frai.2023.1023281.Dynamics of social media behavior before and after SARS-CoV-2 infection
IntroductionOnline social media have been both a field of research and a source of data for research since the beginning of the COVID-19 pandemic. In this study, we aimed to determine how and whether the content of tweets by Twitter users reporting SARS-CoV-2 infections changed over time. MethodsWe built a regular expression to detect users reporting being infected, and we applied several Natural Language Processing methods to assess the emotions, topics, and self-reports of symptoms present in the timelines of the users. ResultsTwelve thousand one hundred and twenty-one twitter users matched the regular expression and were considered in the study. We found that the proportions of health-related, symptom-containing, and emotionally non-neutral tweets increased after users had reported their SARS-CoV-2 infection on Twitter. Our results also show that the number of weeks accounting for the increased proportion of symptoms was consistent with the duration of the symptoms in clinically confirmed COVID-19 cases. Furthermore, we observed a high temporal correlation between self-reports of SARS-CoV-2 infection and officially reported cases of the disease in the largest English-speaking countries. DiscussionThis study confirms that automated methods can be used to find digital users publicly sharing information about their health status on social media, and that the associated data analysis may supplement clinical assessments made in the early phases of the spread of emerging diseases. Such automated methods may prove particularly useful for newly emerging health conditions that are not rapidly captured in the traditional health systems, such as the long term sequalae of SARS-CoV-2 infections.
Frontiers In Public Health. 2023. Vol. 10, p. 1069931. DOI : 10.3389/fpubh.2022.1069931.Timeliness of online COVID-19 reports from official sources
IntroductionMaking epidemiological indicators for COVID-19 publicly available through websites and social media can support public health experts in the near-real-time monitoring of the situation worldwide, and in the establishment of rapid response and public health measures to reduce the consequences of the pandemic. Little is known, however, about the timeliness of such sources. Here, we assess the timeliness of official public COVID-19 sources for the WHO regions of Europe and Africa. MethodsWe monitored official websites and social media accounts for updates and calculated the time difference between daily updates on COVID-19 cases. We covered a time period of 52 days and a geographic range of 62 countries, 28 from the WHO African region and 34 from the WHO European region. ResultsThe most prevalent categories were social media updates only (no website reporting) in the WHO African region (32.7% of the 1,092 entries), and updates in both social media and websites in the WHO European region (51.9% of the 884 entries for EU/EEA countries, and 73.3% of the 884 entries for non-EU/EEA countries), showing an overall clear tendency in using social media as an official source to report on COVID-19 indicators. We further show that the time difference for each source group and geographical region were statistically significant in all WHO regions, indicating a tendency to focus on one of the two sources instead of using both as complementary sources. DiscussionPublic health communication via social media platforms has numerous benefits, but it is worthwhile to do it in combination with other, more traditional means of communication, such as websites or offline communication.
Frontiers In Public Health. 2023. Vol. 10, p. 1027812. DOI : 10.3389/fpubh.2022.1027812.Empathetic Conversational Agents for Distress Support
Due to the increasing demands of today’s fast-paced world, mental health concerns are on the rise, which necessitates innovative approaches to provide support to those in need. Open-domain conversational agents known as chatbots, offer a unique opportunity to provide empathetic support to individuals struggling with psychological distress. By combining the advancements in natural language processing, such as the advent of large language models and machine learning techniques that can understand human emotions, empathetic chatbots can establish meaningful connections, provide support in distress, and promote mental well-being. This thesis aims to develop empathetic conversational agents that are capable of providing emotional support to people undergoing distress. They are designed in a way such that they offer a reliable space for individuals to express their feelings and motivate them to navigate their emotional challenges and cope with them, ultimately leading to enhanced mental well-being. However, developing such chatbots poses significant challenges such as understanding subtle variations in human emotion, overcoming limitations in training data, ensuring interpretability and reliability of responses, and adhering to established psychological norms and professional tone when responding to distressing situations. In this thesis, we develop resources and methods to address the above challenges and attempt to pave the way for a more compassionate and accessible approach to emotional well-being. To achieve this goal, first, we look at subtle emotional variations present in human conversations and communication strategies humans use to convey empathy, which form the foundation for developing more controllable and interpretable chatbot models that can respond to a wide range of emotions. Then we narrow our attention toward the more challenging task of responding empathetically to extremely negative emotions in psychologically distressing situations. Analyzing dialogues from online peer support forums, we build a knowledge graph that identifies a multitude of distress-related topics and emotionally relieving responses associated with them, facilitating the development of more reliable and topically appropriate chatbot models for distress support. Moving a step further, we analyze the differences in language used by laypersons and professionals when responding to distress and guided by these observations, develop methods to enhance chatbots’ professional tone and adherence to therapeutic norms. Overall, this thesis contributes to the advancement of empathetic chatbots that can provide safe, dependable, and professional assistance to users in need.
Lausanne, EPFL, 2023.2022
Deploying Decentralized, Privacy-Preserving Proximity Tracing
Communications Of The Acm. 2022. Vol. 65, num. 9, p. 48 – 57. DOI : 10.1145/3524107.The Food Recognition Benchmark: Using Deep Learning to Recognize Food in Images
The automatic recognition of food on images has numerous interesting applications, including nutritional tracking in medical cohorts. The problem has received significant research attention, but an ongoing public benchmark on non-biased (i.e., not scraped from web) data to develop open and reproducible algorithms has been missing. Here, we report on the setup of such a benchmark using publicly available food images sourced through the mobile MyFoodRepo app used in research cohorts. Through four rounds, the benchmark released the MyFoodRepo-273 dataset constituting 24,119 images and a total of 39,325 segmented polygons categorized in 273 different classes. Models were evaluated on private tests sets from the same platform with 5,000 images and 7,865 annotations in the final round. Top-performing models on the 273 food categories reached a mean average precision of 0.568 (round 4) and a mean average recall of 0.885 (round 3), and were deployed in production use of the MyFoodRepo app. We present experimental validation of round 4 results, and discuss implications of the benchmark setup designed to increase the size and diversity of the dataset for future rounds.
Frontiers In Nutrition. 2022. Vol. 9, p. 875143. DOI : 10.3389/fnut.2022.875143.Associations Between Device-Measured Physical Activity and Glycemic Control and Variability Indices Under Free-Living Conditions
Background: Disturbances of glycemic control and large glycemic variability have been associated with increased risk of type 2 diabetes in the general population as well as complications in people with diabetes. Long-term health benefits of physical activity are well documented but less is known about the timing of potential short-term effects on glycemic control and variability in free-living conditions.Materials and Methods: We analyzed data from 85 participants without diabetes from the Food & You digital cohort. During a 2-week follow-up, device-based daily step count was studied in relationship to glycemic control and variability indices using generalized estimating equations. Glycemic indices, evaluated using flash glucose monitoring devices (FreeStyle Libre), included minimum, maximum, mean, standard deviation, and coefficient of variation of daily glucose values, the glucose management indicator, and the approximate area under the sensor glucose curve.Results: We observed that every 1000 steps/day increase in daily step count was associated with a 0.3588 mg/dL (95% confidence interval [CI]: -0.6931 to -0.0245), a 0.0917 mg/dL (95% CI: -0.1793 to -0.0042), and a 0.0022% (95% CI: -0.0043 to -0.0001) decrease in the maximum glucose values, mean glucose, and in the glucose management indicator of the following day, respectively. We did not find any association between daily step count and glycemic indices from the same day.Conclusions: Increasing physical activity level was linked to blunted glycemic excursions during the next day. Because health-related benefits of physical activity can be long to observe, such short-term physiological benefits could serve as personalized feedback to motivate individuals to engage in healthy behaviors.
Diabetes Technology & Therapeutics. 2022. Vol. 24, num. 3, p. 167 – 177. DOI : 10.1089/dia.2021.0294.Visible-Light-Driven Water Oxidation on Self-Assembled Metal-Free Organic@Carbon Junctions at Neutral pH
Sustainable water oxidation requires low-cost, stable, and efficient redox couples, photosensitizers, and catalysts. Here, we introduce the in situ self-assembly of metal-atom-free organic-based semiconductive structures on the surface of carbon supports. The resulting TTF/TTF center dot+@carbon junction (TTF = tetrathiafulvalene) acts as an all-in-one highly stable redox-shuttle/photosensitizer/molecular-catalyst triad for the visible-light-driven water oxidation reaction (WOR) at neutral pH, eliminating the need for metallic or organometallic catalysts and sacrificial electron acceptors. A water/butyronitrile emulsion was used to physically separate the photoproducts of the WOR, H+ and TTF, allowing the extraction and subsequent reduction of protons in water, and the in situ electrochemical oxidation of TTF to TTF center dot+ on carbon in butyronitrile by constant anode potential electrolysis. During 100 h, no decomposition of TTF was observed and O-2 was from the emulsion while H-2 was in the This work new for a new generation of metal-atom-free, low-cost, redox-driven water-splitting strategies.
Jacs Au. 2022. Vol. 1, num. 12, p. 2294 – 2302. DOI : 10.1021/jacsau.1c00408.2021
Clusters of science and health related Twitter users become more isolated during the COVID-19 pandemic
COVID-19 represents the most severe global crisis to date whose public conversation can be studied in real time. To do so, we use a data set of over 350 million tweets and retweets posted by over 26 million English speaking Twitter users from January 13 to June 7, 2020. We characterize the retweet network to identify spontaneous clustering of users and the evolution of their interaction over time in relation to the pandemic’s emergence. We identify several stable clusters (super-communities), and are able to link them to international groups mainly involved in science and health topics, national elites, and political actors. The science- and health-related super-community received disproportionate attention early on during the pandemic, and was leading the discussion at the time. However, as the pandemic unfolded, the attention shifted towards both national elites and political actors, paralleled by the introduction of country-specific containment measures and the growing politicization of the debate. Scientific super-community remained present in the discussion, but experienced less reach and became more isolated within the network. Overall, the emerging network communities are characterized by an increased self-amplification and polarization. This makes it generally harder for information from international health organizations or scientific authorities to directly reach a broad audience through Twitter for prolonged time. These results may have implications for information dissemination along the unfolding of long-term events like epidemic diseases on a world-wide scale.
Scientific Reports. 2021. Vol. 11, num. 1, p. 19655. DOI : 10.1038/s41598-021-99301-0.Digital proximity tracing on empirical contact networks for pandemic control
Digital contact tracing is a relevant tool to control infectious disease outbreaks, including the COVID-19 epidemic. Early work evaluating digital contact tracing omitted important features and heterogeneities of real-world contact patterns influencing contagion dynamics. We fill this gap with a modeling framework informed by empirical high-resolution contact data to analyze the impact of digital contact tracing in the COVID-19 pandemic. We investigate how well contact tracing apps, coupled with the quarantine of identified contacts, can mitigate the spread in real environments. We find that restrictive policies are more effective in containing the epidemic but come at the cost of unnecessary large-scale quarantines. Policy evaluation through their efficiency and cost results in optimized solutions which only consider contacts longer than 15-20 minutes and closer than 2-3 meters to be at risk. Our results show that isolation and tracing can help control re-emerging outbreaks when some conditions are met: (i) a reduction of the reproductive number through masks and physical distance; (ii) a low-delay isolation of infected individuals; (iii) a high compliance. Finally, we observe the inefficacy of a less privacy-preserving tracing involving second order contacts. Our results may inform digital contact tracing efforts currently being implemented across several countries worldwide. Digital contact tracing is increasingly considered as one of the tools to control infectious disease outbreaks, in particular the COVID-19 epidemic. Here, the authors present a modeling framework informed by empirical high-resolution contact data to analyze the impact of digital contact tracing apps.
Nature Communications. 2021. Vol. 12, num. 1, p. 1655. DOI : 10.1038/s41467-021-21809-w.Supervised Learning Computer Vision Benchmark for Snake Species Identification From Photographs: Implications for Herpetology and Global Health
We trained a computer vision algorithm to identify 45 species of snakes from photos and compared its performance to that of humans. Both human and algorithm performance is substantially better than randomly guessing (null probability of guessing correctly given 45 classes = 2.2%). Some species (e.g., Boa constrictor) are routinely identified with ease by both algorithm and humans, whereas other groups of species (e.g., uniform green snakes, blotched brown snakes) are routinely confused. A species complex with largely molecular species delimitation (North American ratsnakes) was the most challenging for computer vision. Humans had an edge at identifying images of poor quality or with visual artifacts. With future improvement, computer vision could play a larger role in snakebite epidemiology, particularly when combined with information about geographic location and input from human experts.
Frontiers In Artificial Intelligence. 2021. Vol. 4, p. 582110. DOI : 10.3389/frai.2021.582110.On the use of applied machine learning and digital infrastructure to leverage social media data in health and epidemiology
The quantification of population-level health behaviors is crucial for guiding public health policy. However, traditional methods for measuring such health behaviors have several short- comings. In recent years social media data has been successfully used to measure health behaviors and may be used as a low-cost and real-time addition to traditional data sources. Methods from the field of natural language processing are increasingly used to automatically process, filter and categorize the rapidly growing amount of publicly available social media data. However, a number of methodological challenges limit the rate at which we can generate insight from such data. In this work I will argue that long-term investment into digital infrastructure and open source tooling is required in order to overcome these challenges. In chapter 2 we introduce the Crowd- breaks platform which is the basis of this thesis. Crowdbreaks is an open source framework for real-time data collection, continuous crowdsourced annotation, and continuous re-training of machine learning classifiers. In contrast to traditional research workflows, projects on Crowdbreaks run over an extended period of time, allowing for the observation of health trends over multiple years while keeping algorithms up-to-date. In chapter 3 we quantify the occurrence of concept drift in vaccine-related Twitter data, which further validates the need for the Crowdbreaks platform. In chapter 4 we use the Crowdbreaks platform to trace sentiment towards the novel gene-editing technology CRISPR/Cas9 back to its first application in 2013 and investigate how public opinion may have been affected in context of recent scandals sur- rounding the technology. In chapter 5 we turn our attention to the COVID-19 pandemic and analyze who was speaking and who was heard in the early months of the pandemic. Chapter 6 builds on this work and explores the dynamics of Twitter communities during the COVID-19 pandemic. Lastly, in chapter 7 we introduce COVID-Twitter-BERT, a domain-specific language model which has been used in various downstream natural language processing applications on COVID-19-related Twitter data.
Lausanne, EPFL, 2021.Learning Self-Exciting Temporal Point Processes Under Noisy Observations
Understanding the diffusion patterns of sequences of interdependent events is a central question for a variety of disciplines. Temporal point processes are a class of elegant and powerful models of such sequences; these processes have become popular across multiple fields of research due to the increasing availability of data that captures the occurrence of events over time. A notable example is the Hawkes process. It was originally introduced by Alan Hawkes in 1971 to model the diffusion of earthquakes and was subsequently applied across fields such as epidemiology, neuroscience, criminology, finance, genomic, and social-network analysis. A central question in these fields is the inverse problem of uncovering the diffusion patterns of the events from the observed data. The methods for solving this inverse problem assume that, in general, the data is noiseless. However, real-world observations are frequently tainted by noise in a number of ways. Most existing methods are not robust against noise and, in the presence of even a small amount of noise in the data, they might completely fail to recover the underlying dynamics. In this thesis, we remedy this shortcoming and address this problem for several types of observational noise. First, we study the effects of small event-streams that are known to make the learning task challenging by amplifying the risk of overfitting. Using recent advances in variational inference, we introduce a new algorithm that leads to better regularization schemes and provides a measure of uncertainty on the estimated parameters. Second, we consider events corrupted by unknown synchronized time delays. We show that the so-called synchronization noise introduces a bias in the existing estimation methods, which must be handled with care. We provide an algorithm to robustly learn the diffusion dynamics of the underlying process under this class of synchronized delays. Third, we introduce a wider class of random and unknown time shifts, referred to as random translations, of which synchronization noise is a special case. We derive the statistical properties of Hawkes processes subject to random translations. In particular, we prove that the cumulants of Hawkes processes are invariant to random translations and we show that cumulant-based algorithms can be used to learn their underlying causal structure even when unknown time shifts distort the observations. Finally, we consider another class of temporal point processes, the so-called Wold process that solves a computational limitation of the Bayesian treatment of Hawkes processes while retaining similar properties. We address the problem of learning the parameters of a Wold process by relaxing some of the restrictive assumptions made in the state of the art and by introducing a Bayesian approach for inferring its parameters. In summary, the results presented in this dissertation highlight the shortcomings of standard inference methods used to fit temporal point processes. Consequently, these results deepen our ability to extract reliable insights from networks of interdependent event streams.
Lausanne, EPFL, 2021.Toward a Common Performance and Effectiveness Terminology for Digital Proximity Tracing Applications
Frontiers in Digital Health. 2021. Vol. 3. DOI : 10.3389/fdgth.2021.677929.CrowdNotifier: Decentralized Privacy-Preserving Presence Tracing
There is growing evidence that SARS-CoV-2 can be transmitted beyond close proximity contacts, in particular in closed and crowded environments with insufficient ventilation. To help mitigation efforts, contact tracers need a way to notify those who were present in such environments at the same time as infected individuals. Neither traditional human-based contact tracing powered by handwritten or electronic lists, nor Bluetooth-enabled proximity tracing can handle this problem efficiently. In this paper, we propose CrowdNotifier, a protocol that can complement manual contact tracing by efficiently notifying visitors of venues and events with SARS-CoV-2-positive attendees. We prove that CrowdNotifier provides strong privacy and abuse resistance, and show that it can scale to handle notification at a national scale.
2021. p. 350 – 368. DOI : 10.2478/popets-2021-0074.Modeling infectious disease dynamics towards informed public health interventions, with applications on COVID-19 and cholera
Emerging and existing infectious diseases pose a constant threat to individuals and communities across the world. In many cases, the burden of these diseases is preventable through public health interventions. However, taking the right decisions and designing effective policies is an intricate task: epidemics are complex phenomena resulting from the interaction between the environment, pathogens, individuals, and societies. Modeling offers a principled way to reason about infectious disease dynamics from scarce and biased information and to guide decision-makers towards effective policies. This thesis tackles selected topics in cholera and COVID-19 modeling towards informed public-health decisions. These two contrasting diseases were associated by a twist of fate, but also through the lens of a common modeling approach: compartmental, SIR-based, models are conditioned on the available evidence using computer-age statistical inference frameworks. A set of five models is developed, each tackling a different facet of the spread and control of these two infectious diseases. Each model aims at answering questions related to either the understanding of the mechanisms behind disease transmission, the projection of the future dynamics under different scenarios, or the assessment of the effectiveness of past interventions. Moreover, a novel application of epidemiological models to the formal design of control policies is proposed. Optimal control provides a rigorous framework to identify the most effective control measures under a set of operational constraints, providing a benchmark on what it is possible to achieve with the available resources. The results presented in this thesis range from scientific insight on the relationship between cholera and rainfall in Juba, South Sudan to the COVID Scenario Pipeline which produces reports used to inform the response to the COVID-19 pandemic of different governmental entities. Furthermore, the effectiveness of the non-pharmaceutical interventions against COVID-19 in Switzerland is evaluated; and so is the probability of eliminating cholera from Haiti under different scenarios of mass vaccination campaigns. Finally, the development of an optimal control framework towards the effective spatial allocation of vaccines against SARS-CoV-2 in Italy closes this conversation of models. The present thesis demonstrates how infectious disease modeling enables informed decision-making by projecting the uncertainties under the light of the available evidence. It also highlights the effort needed to tailor the models and inference methods to the specificities of the transmission setting and the research question considered. From insights on transmission pathways to weekly reports aimed at decision-makers, it explores different applications of infectious disease modeling. Methods developed along the way enrich the toolbox available to modelers, to guide policy decisions further towards a reduction of the burden of infectious diseases on communities.
Lausanne, EPFL, 2021.2020
Early evidence of effectiveness of digital contact tracing for SARS-CoV-2 in Switzerland
In the wake of the pandemic of coronavirus disease 2019 (COVID-19), contact tracing has become a key element of strategies to control the spread of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). Given the rapid and intense spread of SARS-CoV-2, digital contact tracing has emerged as a potential complementary tool to support containment and mitigation efforts. Early modelling studies highlighted the potential of digital contact tracing to break transmission chains, and Google and Apple subsequently developed the Exposure Notification (EN) framework, making it available to the vast majority of smartphones. A growing number of governments have launched or announced EN-based contact tracing apps, but their effectiveness remains unknown. Here, we report early findings of the digital contact tracing app deployment in Switzerland. We demonstrate proof-of-principle that digital contact tracing reaches exposed contacts, who then test positive for SARS-CoV-2. This indicates that digital contact tracing is an effective complementary tool for controlling the spread of SARS-CoV-2. Continued technical improvement and international compatibility can further increase the efficacy, particularly also across country borders.
Swiss Medical Weekly. 2020. Vol. 150, p. w20457. DOI : 10.4414/smw.20457.A digital reconstruction of the 1630–1631 large plague outbreak in Venice
The plague, an infectious disease caused by the bacterium Yersinia pestis, is widely considered to be responsible for the most devastating and deadly pandemics in human history. Starting with the infamous Black Death, plague outbreaks are estimated to have killed around 100 million people over multiple centuries, with local mortality rates as high as 60%. However, detailed pictures of the disease dynamics of these outbreaks centuries ago remain scarce, mainly due to the lack of high-quality historical data in digital form. Here, we present an analysis of the 1630–1631 plague outbreak in the city of Venice, using newly collected daily death records. We identify the presence of a two-peak pattern, for which we present two possible explanations based on computational models of disease dynamics. Systematically digitized historical records like the ones presented here promise to enrich our understanding of historical phenomena of enduring importance. This work contributes to the recently renewed interdisciplinary foray into the epidemiological and societal impact of pre-modern epidemics.
Scientific Reports. 2020. Vol. 10, num. 1, p. 17849. DOI : 10.1038/s41598-020-74775-6.Assessing Public Opinion on CRISPR-Cas9: Combining Crowdsourcing and Deep Learning
Conclusions: Overall, this type of analysis can provide valuable and complementary insights into ongoing public debates, extending the traditional empirical bioethics toolset.
Journal Of Medical Internet Research. 2020. Vol. 22, num. 8, p. e17830. DOI : 10.2196/17830.Self-Supervised Prototypical Transfer Learning for Few-Shot Classification
Recent advances in transfer learning and few-shot learning largely rely on annotated data related to the goal task during (pre-)training. However, collecting sufficiently similar and annotated data is often infeasible. Building on advances in self-supervised and few-shot learning, we propose to learn a metric embedding that clusters unlabeled samples and their augmentations closely together. This pre-trained embedding serves as a starting point for classification with limited labeled goal task data by summarizing class clusters and fine-tuning. Experiments show that our approach significantly outperforms state-of the-art unsupervised meta-learning approaches, and is on par with supervised performance. In a cross-domain setting, our approach is competitive with its classical fully supervised counterpart.
2020. 7th ICML Workshop on Automated Machine Learning (AutoML 2020), Vienna, Austria, Jul 12, 2020 – Jul 18, 2020.A research agenda for digital proximity tracing apps
Swiss Medical Weekly. 2020. Vol. 150, p. w20324. DOI : 10.4414/smw.2020.20234.Keep calm and carry on vaccinating: Is anti-vaccination sentiment contributing to declining vaccine coverage in England?
Background: In England, coverage for childhood vaccines have decreased since 2012/13 in the context of an increasingly visible anti-vaccination discourse. We determined whether anti-vaccination sentiment is the likely cause of this decline in coverage. Methods: Descriptive study triangulating a range of data sources (vaccine coverage, cross-sectional survey of attitudes towards vaccination, UK-specific Twitter social media) and assessing them against the following Bradford Hill criteria: strength of association, consistency, specificity, temporality, biological gradient and coherence. Results: Strength of association: compared with well-documented vaccine scares, the decline in childhood vaccination seen since 2012/13 is 4-20 times smaller;consistency: while coverage for completed courses of the hexavalent and meningococcal vaccines decreased by 0.5-1.2 percentage points (pp) between 2017 and 2019, coverage for the first dose of these vaccines increased 0.5-0.7 pp; specificity: Since 2012-13, coverage decreased for some vaccines (hexavalent, MMR, HPV, shingles) and increased for others (MenACWY, Td/IPV, antenatal pertussis, influenza in 2 years of children), with no agespecific patterns. Temporality and biological gradient: the decline in vaccine coverage was preceded by an increase in vaccine confidence and a decrease in the proportion of parents encountering antivaccination materials. Coherence: attitudes towards vaccination expressed on Twitter in the UK became increasingly positive between 2017 and 2019 as vaccine coverage for childhood vaccines decreased. Conclusions: In England, trends in vaccine coverage between 2012/13 and 2018/19 were not homogenous and varied in magnitude and direction according to vaccine, dose and region. In addition, confidence in vaccines increased during the same period. These findings are not compatible with anti-vaccination sentiment causing a decline in vaccine coverage In England. (C) 2020 The Authors. Published by Elsevier Ltd.
Vaccine. 2020. Vol. 38, num. 33, p. 5297 – 5304. DOI : 10.1016/j.vaccine.2020.05.082.Covid-19 contact tracing: efficacy and privacy
This article assesses the risk-risk trade-off between privacy and efficacy that the use of contact tracing apps entails. It argues for the use of privacy-preserving apps, but highlights potential weaknesses and cautions against allowing digital tracing to detract from other pillars in the pandemic response.
2020
COVID-19 epidemic in Switzerland: on the importance of testing, contact tracing and isolation
Switzerland is among the countries with the highest number of coronavirus disease-2019 (COVID-19) cases per capita in the world. There are likely many people with undetected SARS-CoV-2 infection because testing efforts are currently not detecting all infected people, including some with clinical disease compatible with COVID-19. Testing on its own will not stop the spread of SARS-CoV-2. Testing is part of a strategy. The World Health Organization recommends a combination of measures: rapid diagnosis and immediate isolation of cases, rigorous tracking and precautionary self-isolation of close contacts. In this article, we explain why the testing strategy in Switzerland should be strengthened urgently, as a core component of a combination approach to control COVID-19.
Swiss Medical Weekly. 2020. Vol. 150, p. w202205. DOI : 10.4414/smw.2020.20225.Contact networks across ages: computational analysis of epidemiological dynamics
For decades mathematical modeling in epidemiology has helped understanding the dynamics of infectious diseases, as well as describe possible intervention scenarios to prevent and control them. However, such models were relying on several assumptions, such as the ones on the structure of the underlying contact networks. The robustness of their predictions was therefore limited by this lack of knowledge. About a decade ago, with the advent of digital epidemiology, scientist have finally started to try to corroborate those assumptions, for instance with the use of wearable sensors to measure indeed contact networks. In this thesis, together with collaborators, I try to combine the digital collection of public health data with computational tools, in order to have a more realistic understanding of the phenomena under consideration. In two projects it was possible to finalise such marriage, thanks also to fruitful collaborations with other researchers who provided the data. This is for instance the case for the two chapters respectively on the modeling of influenza and plague outbreaks. Although they involve different technologies for the data collection, historical epochs and data types, the traditional epidemiological modeling allowed us to derive interpretable conclusions, capable for instance to inform public health interventions. In other projects, either the relevant data collection is still ongoing in the lab (like for the FoodRepo project), or the data collection has not started yet (like for the project on measles), although our work provides insights on the importance of such data collection for future studies. In the first chapter, we explore different mechanistic interpretation compatible with our data on the 1630 plague outbreak in Venice, collected through the digitisation of parish books from the historical State Archives of Venice. The data shows a non trivial temporal structure, which led us to propose few different epidemiological explanations. Further data collection will be needed to better constrain such interpretations. In the second chapter, we use previously recorded contact data in a high-school to assess the relative effect of ventilation on the influenza spread, with respect to vaccination strategies. Our result suggest the usefulness of non pharmacological interventions such as indeed improved ventilation, which become even more meaningful in the context of vaccination hesitancy and low vaccine efficacy, due for instance to the high mutation rate of viruses like influenza virus. In the third chapter, we propose a simple network generation model to try to explain differences in the incidence of highly infectious diseases (such as measles), across countries with similar vaccination coverages. Such differences are indeed one of the main open questions in public health, which are not yet fully understood even considering social phenomena such as recent anti-vax movements. In the last chapter, we present our open database of barcoded food products, FoodRepo. This database represents on the one hand, the first piece of a large study ongoing in our lab, in the field of nutritional epidemiology, that aims to assess the variability of glycemic response in an healthy cohort. On the other hand, important features such as its openness and programmatic accessibility make it an important digital tool at the service of any private or public actors in the field of nutrition.
Lausanne, EPFL, 2020.2019
On the Design of a Youth-Led, Issue-Based, Crowdsourced Global Monitoring Framework for the SDGs
In this paper, we propose a novel methodology and design to contribute towards the achievement of the 17 Sustainable Development Goals (SDGs) adopted by member states of the United Nations for a better and more sustainable future for all. We particularly focus on achieving SDG 4.7—using education to ensure all learners acquire the knowledge and skills needed to promote sustainable development. We describe the design of a crowdsourced approach to monitor issues at a local level, and then use the insights gained to indicate how learning can be achieved by the entire community. We begin by encouraging local communities to identify issues that they are concerned about, with an assumption that any issue identified will fall within the purview of the 17 SDGs. Each issue is then tagged with a plurality of actions taken to address it. Finally, we tag the positive or negative changes in the issue as perceived by members of the local community. This data is used to broadly indicate quantitative measures of community learning when solving a societal problem, in turn telling us how SDG 4.7 is being achieved. The paper describes the design of a unique, youth-led, technology-based, bottom-up approach, applicable to communities across the globe, which can potentially ensure transgressive learning through participation of and monitoring by the local community leading to sustainable development
Sustainability. 2019. Vol. 11, num. 23, p. 6839. DOI : 10.3390/su11236839.Snakebite and snake identification: empowering neglected communities and health-care providers with AI
Lancet Digital Health. 2019. Vol. 1, num. 5, p. E202 – E203. DOI : 10.1016/S2589-7500(19)30086-X.Assessment of menstrual health status and evolution through mobile apps for fertility awareness
For most women of reproductive age, assessing menstrual health and fertility typically involves regular visits to a gynecologist or another clinician. While these evaluations provide critical information on an individual’s reproductive health status, they typically rely on memory-based self-reports, and the results are rarely, if ever, assessed at the population level. In recent years, mobile apps for menstrual tracking have become very popular, allowing us to evaluate the reliability and tracking frequency of millions of self-observations, thereby providing an unparalleled view, both in detail and scale, on menstrual health and its evolution for large populations. In particular, the primary aim of this study was to describe the tracking behavior of the app users and their overall observation patterns in an effort to understand if they were consistent with previous small-scale medical studies. The secondary aim was to investigate whether their precision allowed the detection and estimation of ovulation timing, which is critical for reproductive and menstrual health. Retrospective self-observation data were acquired from two mobile apps dedicated to the application of the sympto-thermal fertility awareness method, resulting in a dataset of more than 30 million days of observations from over 2.7 million cycles for two hundred thousand users. The analysis of the data showed that up to 40% of the cycles in which users were seeking pregnancy had recordings every single day. With a modeling approach using Hidden Markov Models to describe the collected data and estimate ovulation timing, it was found that follicular phases average duration and range were larger than previously reported, with only 24% of ovulations occurring at cycle days 14 to 15, while the luteal phase duration and range were in line with previous reports, although short luteal phases (10 days or less) were more frequently observed (in up to 20% of cycles). The digital epidemiology approach presented here can help to lead to a better understanding of menstrual health and its connection to women’s health overall, which has historically been severely understudied.
npj Digital Medicine. 2019. Vol. 2, p. 64. DOI : 10.1038/s41746-019-0139-4.WHO and ITU establish benchmarking process for artificial intelligence in health
Lancet. 2019. Vol. 394, num. 10192, p. 9 – 11. DOI : 10.1016/S0140-6736(19)30762-7.Crowdbreaks: Tracking Health Trends Using Public Social Media Data and Crowdsourcing
In the past decade, tracking health trends using social media data has shown great promise, due to a powerful combination of massive adoption of social media around the world, and increasingly potent hardware and software that enables us to work with these new big data streams. At the same time, many challenging problems have been identified. First, there is often a mismatch between how rapidly online data can change, and how rapidly algorithms are updated, which means that there is limited reusability for algorithms trained on past data as their performance decreases over time. Second, much of the work is focusing on specific issues during a specific past period in time, even though public health institutions would need flexible tools to assess multiple evolving situations in real time. Third, most tools providing such capabilities are proprietary systems with little algorithmic or data transparency, and thus little buy-in from the global public health and research community. Here, we introduce Crowdbreaks, an open platform which allows tracking of health trends by making use of continuous crowdsourced labeling of public social media content. The system is built in a way which automatizes the typical workflow from data collection, filtering, labeling and training of machine learning classifiers and therefore can greatly accelerate the research process in the public health domain. This work describes the technical aspects of the platform, thereby covering the functionalities at its current state and exploring its future use cases and extensions.
Frontiers In Public Health. 2019. Vol. 7, p. 81. DOI : 10.3389/fpubh.2019.00081.Wet Markets and Food Safety: TripAdvisor for Improved Global Digital Surveillance
Background: Wet markets are markets selling fresh meat and produce. Wet markets are critical for food security and sustainable development in their respective regions. Due to their cultural significance, they attract numerous visitors and consequently generate tourist-geared information on the Web (ie, on social networks such as TripAdvisor). These data can be used to create a novel, international wet market inventory to support epidemiological surveillance and control in such settings, which are often associated with negative health outcomes. Objective: Using social network data, we aimed to assess the level of wet markets’ touristic importance on the Web, produce the first distribution map of wet markets of touristic interest, and identify common diseases facing visitors in these settings. Methods: A Google search was performed on 31 food market-related keywords, with the first 150 results for each keyword evaluated based on their relevance to tourism. Of all these queries, wet market had the highest number of tourism-related Google Search results; among these, TripAdvisor was the most frequently-occurring travel information aggregator, prompting its selection as the data source for this study. A Web scraping tool (ParseHub) was used to extract wet market names, locations, and reviews from TripAdvisor. The latter were searched for disease-related content, which enabled assignment of GeoSentinel diagnosis codes to each. This syndromic categorization was overlaid onto a mapping of wet market locations. Regional prevalence of the most commonly occurring symptom group – food poisoning – was then determined (ie, by dividing the number of wet markets per continent with more than or equal to 1 review containing this syndrome by the total number of wet markets on that continent with syndromic information). Results: Of the 1090 hits on TripAdvisor for wet market, 36.06% (393/1090) conformed to the query’s definition; wet markets were heterogeneously distributed: Asia concentrated 62.6% (246/393) of them, Europe 19.3% (76/393), North America 7.9% (31/393), Oceania 5.1% (20/393), Africa 3.1% (12/393), and South America 2.0% (8/393). Syndromic information was available for 14.5% (57/393) of wet markets. The most frequently occurring syndrome among visitors to these wet markets was food poisoning, accounting for 54% (51/95) of diagnoses. Cases of this syndrome were identified in 56% (22/39) of wet markets with syndromic information in Asia, 71% (5/7) in Europe, and 71% (5/7) in North America. All wet markets in South America and Oceania reported food poisoning cases, but the number of reviews with syndromic information was very limited in these regions (n=2). Conclusions: The map produced illustrates the potential role of touristically relevant social network data to support global epidemiological surveillance. This includes the possibility to approximate the global distribution of wet markets and to identify diseases (ie, food poisoning) that are most prevalent in such settings.
Jmir Public Health And Surveillance. 2019. Vol. 5, num. 2, p. e11477286 – 291. DOI : 10.2196/11477.Chemokine profiling in serum from patients with ovarian cancer reveals candidate biomarkers for recurrence and immune infiltration
The management of advanced ovarian cancer is challenging due to the high frequency of recurrence, often associated with the development of resistance to platinum-based chemotherapy. Molecular analyses revealed the complexity of ovarian cancer with particular emphasis on the immune system, which may contribute to disease progression and response to treatment. Cytokines and chemokines mediate the cross-talk between cancer and immune cells, and therefore, present as potential biomarkers, reflecting the tumor microenvironment. A panel of circulating C-C motif chemokine ligand (CCL) and C-X-C motif chemokine ligand (CXCL) chemokines were examined in the serum of 40 high-grade patients with ovarian cancer prior to primary surgery. The level of immune infiltration in tumors was also analyzed. The preoperative levels of chemokines differ between patients. Elevated levels of circulating CXCL4 + CCL20 + CXCL1 combination can discriminate patients with shorter recurrence-free survival and overall survival. The presence of tumor-infiltrating T lymphocytes was detected in half of the patients. The mRNA expression analysis suggests the presence of antitumoral and immunosuppressive elements in the tumor microenvironment. The combination of circulating CXCL9 + CXCL10 can distinguish immune-infiltrated tumors that will lead to shorter recurrence-free survival. The results suggest that preoperative profiling of circulating chemokines in patients with ovarian cancer may provide valuable information regarding tumor recurrence and immune infiltration. The findings demonstrate that combinations have better prognostic utility than single chemokines, and may serve as patient stratification tools.
Oncology Reports. 2019. Vol. 41, num. 2, p. 1238 – 1252. DOI : 10.3892/or.2018.6886.Assessing the Dynamics and Control of Droplet- and Aerosol-Transmitted Influenza Using an Indoor Positioning System
There is increasing evidence that aerosol transmission is a major contributor to the spread of influenza. Despite this, virtually all studies assessing the dynamics and control of influenza assume that it is transmitted solely through direct contact and large droplets, requiring close physical proximity. Here, we use wireless sensors to measure simultaneously both the location and close proximity contacts in the population of a US high school. This dataset, highly resolved in space and time, allows us to model both droplet and aerosol transmission either in isolation or in combination. In particular, it allows us to computationally quantify the potential effectiveness of overlooked mitigation strategies such as improved ventilation that are available in the case of aerosol transmission. Our model suggests that recommendation-abiding ventilation could be as effective in mitigating outbreaks as vaccinating approximately half of the population. In simulations using empirical transmission levels observed in households, we find that bringing ventilation to recommended levels had the same mitigating effect as a vaccination coverage of 50% to 60%. Ventilation is an easy-to-implement strategy that has the potential to support vaccination efforts for effective control of influenza spread.
Scientific Reports. 2019. Vol. 9, num. 1, p. 2185. DOI : 10.1038/s41598-019-38825-y.An Interactive Gameplay to Crowdsource Multiple Sequence Alignment of Genome Sequences: Genenigma
Comparative genomics is a field of research that compares genomes of different organisms to identify common patterns. It is a powerful method used to identify the genetic diseases that cause mutations. Multiple Sequence Alignment (MSA) is an intermediate step in comparative genomics analysis that aligns three or more biological sequences of similar length. MSA is an NP-hard problem for which no efficient algorithm exists to perform this in a reasonable amount of time. However, humans across evolution have developed special intuition to identify visual patterns in short periods of time. Hence, a citizen science approach can be devised to solve the MSA problem by transforming it into a human computing game on creating visually similar patterns. In this paper, we introduce the mobile game “Genenigma”, which harnesses the human computing capability to align multiple sequences of genomes and use the results to help geneticists to understand the genetic code. The usability and performance scores of “Genenigma” predicts a larger user base than existing mobile games built for this purpose.
2019. 9th International Conference on Bioscience, Biochemistry and Bioinformatics (ICBBB), Singapore, SINGAPORE, Jan 07-09, 2019. p. 28 – 35. DOI : 10.1145/3314367.3314374.2018
Translating Science Into Business Innovation: The Case of Open Food and Nutrition Data Hackathons
In this article, we explore the use of hackathons and open data in corporations’ open innovation portfolios, addressing a new way for companies to tap into the creativity and innovation of early-stage startup culture, in this case applied to the food and nutrition sector. We study the first Open Food Data Hackdays, held on 10–11 February 2017 in Lausanne and Zurich. The aim of the overall project that the Hackdays event was part of was to use open food and nutrition data as a driver for business innovation. We see hackathons as a new tool in the innovation manager’s toolkit, a kind of live crowdsourcing exercise that goes beyond traditional ideation and develops a variety of prototypes and new ideas for business innovation. Companies then have the option of working with entrepreneurs and taking some the ideas forward.
Frontiers in Nutrition. 2018. Vol. 5, num. 96, p. 1 – 6. DOI : 10.3389/fnut.2018.00096.Digital epidemiology: what is it, and where is it going?
Digital Epidemiology is a new field that has been growing rapidly in the past few years, fueled by the increasing availability of data and computing power, as well as by breakthroughs in data analytics methods. In this short piece, I provide an outlook of where I see the field heading, and offer a broad and a narrow definition of the term.
Life Sciences Society And Policy. 2018. Vol. 14, p. 1. DOI : 10.1186/s40504-017-0065-7.Augmenting Research, Education, and Outreach with Client-Side Web Programming
TRENDS IN BIOTECHNOLOGY. 2018. Vol. 36, num. 5, p. 473 – 476. DOI : 10.1016/j.tibtech.2017.11.009.FoodRepo: An Open Food Repository of Barcoded Food Products
Frontiers in Nutrition. 2018. Vol. 5, p. 57. DOI : 10.3389/fnut.2018.00057.Localizing the Source of an Epidemic Using Few Observations
Localizing the source of an epidemic is a crucial task in many contexts, including the detection of malicious users in social networks and the identification of patient zeros of disease outbreaks. The difficulty of this task lies in the strict limitations on the data available: In most cases, when an epidemic spreads, only few individuals, who we will call sensors, provide information about their state. Furthermore, as the spread of an epidemic usually depends on a large number of variables, accounting for all the possible spreading patterns that could explain the available data can easily result in prohibitive computational costs. Therefore, in the field of source localization, there are two central research directions: The design of practical and reliable algorithms for localizing the source despite the limited data, and the optimization of data collection, i.e., the identification of the most informative sensors. In this dissertation we contribute to both these directions. We consider network epidemics starting from an unknown source. The only information available is provided by a set of sensor nodes that reveal if and when they become infected. We study how many sensors are needed to guarantee the identification of the source. A set of sensors that guarantees the identification of the source is called a double resolving set (DRS); the minimum size of a DRS is called the double metric dimension (DMD). Computing the DMD is, in general, hard, hence estimating it with bounds is desirable. We focus on G(N,p) random networks for which we derive tight bounds for the DMD. We show that the DMD is a non-monotonic function of the parameter p, hence there are critical parameter ranges in which source localization is particularly difficult. Again building on the relationship between source localization and DRSs, we move to optimizing the choice of a fixed number K of sensors. First, we look at the case of trees where the uniqueness of paths makes the problem simpler. For this case, we design polynomial time algorithms for selecting K sensors that optimize certain metrics of interest. Next, turning to general networks, we show that the optimal sensor set depends on the distribution of the time it takes for an infected node u to infect a non-infected neighbor v, which we call the transmission delay from u to v. We consider both a low- and a high-variance regime for the transmission delays. We design algorithms for sensor placement in both cases, and we show that they yield an improvement of up to 50% over state-of-the-art methods. Finally, we propose a framework for source localization where some sensors (called dynamic sensors) can be added while the epidemic spreads and the localization progresses. We design an algorithm for joint source localization and dynamic sensor placement; This algorithm can handle two regimes: offline localization, where we localize the source after the epidemic spread, and online localization, where we localize the source while the epidemic is ongoing. We conduct an empirical study of offline and online localization and show that, by using dynamic sensors, the number of sensors we need to localize the source is up to 10 times less with respect to a strategy where all sensors are deployed a priori. We also study the resistance of our methods to high-variance transmission delays and show that, even in this setting, using dynamic sensors, the source can be localized with less than 5% of the nodes being sensors.
Lausanne, EPFL, 2018.Likelihood-free Inference of Population Genetic Parameters from Time-Sampled Genetic Data
This thesis consists of five papers published in peer-reviewed journals, two of which focus on method development of time-serial inference, represented by publications in methodology journals, and three others that describe collaborative data applications based on experimental evolution projects, which are published in biological journals. The first methodology publication (Chapter 2) is a comparison study of recently-introduced time-serial methods, performed using a systematic approach to evaluate, through simulation, the strengths and weaknesses of each method in quantifying selection coefficients from time-sampled data sets. The second methodology chapter (Chapter 3) introduced a novel method extending Wright-Fisher Approximate Bayesian Computation (WFABC) to detect and quantify changes in selection strength from temporal allele trajectories. This method was particularly novel in its use of simulation-based inference to detect and quantify €˜changes€™ in evolutionary trajectories, which involved the application of Change-Point Analysis. In the first data application (Chapter 4), a time-sampled experiment of echovirus 11 evolved under the disinfectant (chlorine dioxide) was assessed on the associated genotypic and phenotypic trait in collaboration with the Environmental Chemistry Laboratory (LCE) at EPFL. This study contributes to a better understanding of disinfection resistance in waterborne viruses by identifying the mutations associated with enhanced replicative fitness. In the next application (Chapter 5), a comprehensive time-serial analysis was performed on experimental evolution data of echovirus 11 on UVC adaptation in collaboration with the Environmental Chemistry Laboratory (LCE). This study shows that the UVC adaptation of echovirus 11 is associated with a decrease in the virus mutation rate, which is an evidence of the ability of echovirus 11 to adapt to the commonly used disinfectant procedure of UVC radiation in clinical settings and water treatment plants. Additionally, a paper implementing Change-Point WFABC (CP-WFABC) to assess the experimental evolution of influenza A virus is shown in Appendix; specifically, the effects of a novel mutagenic drug (favipiravir) on adaptive allele trajectories were evaluated, with results indicating mutation meltdown under a high drug dosage.
Lausanne, EPFL, 2018.A field-based modelling framework of the ecohydrology of schistosomiasis
Successful control of schistosomiasis, a water-borne parasitic disease, is challenged by the intricacy of the wormâ s lifecycle, which depends on aquatic snail intermediate hosts, and involves environmental, ecologic, and socio-economic factors. Current strategies rely on deworming through mass drug administration which however do not protect against reinfection and the persistence of hotspots. It is recognized that multifaceted approaches will be necessary to reach elimination, whose development will require a renewed focus on the diseaseâ s social-ecological drivers. Taking cue from the hydrological underpinning of these drivers, this Thesis aims at developing an ecohydrological approach to schistosomiasis with a view to identifying and exploiting the points in which its cycle can be broken. Schistosomiasis is a poverty-reinforcing disease affecting more than 150 million people in sub-Saharan Africa, being the parasitic disease causing the largest health burden after malaria. However, the impairing morbidity it causes has been undervalued in the past, qualifying it as a neglected tropical disease. Moreover, water resources development often exacerbate transmission, posing scientific and ethical challenges in addressing the ensuing trade-off between economic development and public health. The relevance of this Thesisâ work lies in furthering tools to offset this trade-off by unlocking the predictive appraisal of the social-ecological drivers of transmission. An integration of fieldwork applied in Burkina Faso (West Africa) and theoretical methods are employed to address this aim. This Thesis establishes the use of spatially explicit mathematical models of schistosomiasis at the national-scale, allowing to study the effect of human mobility and spatial heterogeneity of transmission parameters. Weekly ecological samplings of snail abundance and continuous environmental monitoring were preformed at three sites along the countryâ s climatic gradient, leveraged through ecological modelling. A novel methodology for the large-scale prediction of river network ephemerality allowed for refined snail species distribution models, and the analysis of the diseaseâ s geography in link with socio-economic covariates. Finally, surveys and participatory workshops shed light on local-scale water contact patterns. The obtained results substantiate the stance that hydrology is a first-order control of disease transmission. Stability analysis of the spatially explicit model generated additional insight into the impact of the expansion of suitable snail habitat due to water resources development, highlighting the interplay between local and country-wide effects driven by human mobility. Models of snail ecology revealed key hydrological drivers, and disputed density feedbacks. Uncovered phase shifts between permanent and ephemeral habitats were adequately reproduced at the national scale through model regionalization. Characterization and predictions of hydrological ephemerality improved the estimation of the snailsâ ecological range, mirroring the diseaseâ s geography. Finally a national-scale association between ephemerality and disease risk was observed, possibly due to human-water contacts aggregation, as supported by preliminary results at village-level. The future incorporation of these ecohydrological findings into spatially explicit models of schistosomiasis is considered promising for optimizing control strategies and attaining disease elimination.
Lausanne, EPFL, 2018.2017
An ecological and digital epidemiology analysis on the role of human behavior on the 2014 Chikungunya outbreak in Martinique
Understanding the spatio-temporal dynamics of endemic infections is of critical importance for a deeper understanding of pathogen transmission, and for the design of more efficient public health strategies. However, very few studies in this domain have focused on emerging infections, generating a gap of knowledge that hampers epidemiological response planning. Here, we analyze the case of a Chikungunya outbreak that occurred in Martinique in 2014. Using time series estimates from a network of sentinel practitioners covering the entire island, we first analyze the spatio-temporal dynamics and show that the largest city has served as the epicenter of this epidemic. We further show that the epidemic spread from there through two different propagation waves moving northwards and southwards, probably by individuals moving along the road network. We then develop a mathematical model to explore the drivers of the temporal dynamics of this mosquito-borne virus. Finally, we show that human behavior, inferred by a textual analysis of messages published on the social network Twitter, is required to explain the epidemiological dynamics over time. Overall, our results suggest that human behavior has been a key component of the outbreak propagation, and we argue that such results can lead to more efficient public health strategies specifically targeting the propagation process.
Scientific Reports. 2017. Vol. 7, p. 5967. DOI : 10.1038/s41598-017-05957-y.Critical dynamics in population vaccinating behavior
Vaccine refusal can lead to renewed outbreaks of previously eliminated diseases and even delay global eradication. Vaccinating decisions exemplify a complex, coupled system where vaccinating behavior and disease dynamics influence one another. Such systems often exhibit critical phenomena-special dynamics close to a tipping point leading to a new dynamical regime. For instance, critical slowing down (declining rate of recovery from small perturbations) may emerge as a tipping point is approached. Here, we collected and geocoded tweets about measles-mumps-rubella vaccine and classified their sentiment using machine-learning algorithms. We also extracted data on measles-related Google searches. We find critical slowing down in the data at the level of California and the United States in the years before and after the 2014-2015 Disneyland, California measles outbreak. Critical slowing down starts growing appreciably several years before the Disneyland outbreak as vaccine uptake declines and the population approaches the tipping point. However, due to the adaptive nature of coupled behavior-disease systems, the population responds to the outbreak by moving away from the tipping point, causing “critical speeding up” whereby resilience to perturbations increases. A mathematical model of measles transmission and vaccine sentiment predicts the same qualitative patterns in the neighborhood of a tipping point to greatly reduced vaccine uptake and large epidemics. These results support the hypothesis that population vaccinating behavior near the disease elimination threshold is a critical phenomenon. Developing new analytical tools to detect these patterns in digital social data might help us identify populations at heightened risk of widespread vaccine refusal.
Proceedings Of The National Academy Of Sciences Of The United States Of America. 2017. Vol. 114, num. 52, p. 13762 – 13767. DOI : 10.1073/pnas.1704093114.Precision global health in the digital age
Precision global health is an approach similar to precision medicine, which facilitates, through innovation and technology, better targeting of public health interventions on a global scale, for the purpose of maximising their effectiveness and relevance. Illustrative examples include: the use of remote sensing data to fight vector-borne diseases; large databases of genomic sequences of foodborne pathogens helping to identify origins of outbreaks; social networks and internet search engines for tracking communicable diseases; cell phone data in humanitarian actions; drones to deliver healthcare services in remote and secluded areas. Open science and data sharing platforms are proposed for fostering international research programmes under fair, ethical and respectful conditions. Innovative education, such as massive open online courses or serious games, can promote wider access to training in public health and improving health literacy. The world is moving towards learning healthcare systems. Professionals are equipped with data collection and decision support devices. They share information, which are complemented by external sources, and analysed in real time using machine learning techniques. They allow for the early detection of anomalies, and eventually guide appropriate public health interventions. This article shows how information- driven approaches, enabled by digital technologies, can help improving global health with greater equity.
Swiss Medical Weekly. 2017. Vol. 147, p. w14423. DOI : 10.4414/smw.2017.14423.Spatially explicit modeling of cholera epidemics
Understanding the epidemiology of cholera, when and where it occurs and how it spreads, is key to its prevention and control. Models can help to apprehend cholera outbreaks by providing insight into critical epidemiological processes, and may be used to evaluate alternative intervention strategies or to predict the future course of epidemics. This thesis aims at advancing the evolution of spatially explicit epidemiological models of cholera outbreaks through methodological developments and practical applications. Over 160 years after John Snow first analyzed the spatial pattern of cholera cases in London and identified water as its pathway of contagion, the disease remains a major public health threat in many regions around the globe. It causes an estimated number of 2.86 (1.30 — 4.00) million cases and 95 000 (21 000 — 143 000) deaths in 69 endemic countries every year. A set of metapopulation and individual-based, mechanistic and semi-mechanistic epidemiological models has been developed to tackle epidemiological questions at the country, subnational and city scale. The models explicitly take into account the spatial variability of epidemiological processes such as the spread of the disease through hydrological connectivity and human mobility, or the high resolution spatiotemporal clustering of cases. A method to extract large-scale mobility fluxes from mobile phone call records and directly incorporate them into a model has also been established. Different environmental drivers of cholera epidemics have been taken into account. The models have been applied to recent cholera outbreaks in Haiti, Senegal, Chad and the Democratic Republic of the Congo. Results highlight the important part played by human mobility in the spreading of the disease and the influence of rainfall and other climatic variables as drivers of disease dynamics in several settings. applications demonstrate how models can inform epidemiological policy and show the effect of alternative intervention strategies on the course of an epidemic. The evaluation of the preventive allocation of oral cholera vaccine, antibiotics and/or water, sanitation and hygiene interventions within a given radius around reported cases in densely populated areas shows that such interventions are effective and efficient alternatives to mass intervention campaigns. Moreover, an alternative type of oral rehydration solution proves to have a significant effect on the course of a simulated epidemic. This thesis concludes that the explicit treatment of spatial heterogeneity at an appropriate scale is crucial to reproduce real-world dynamics of cholera outbreaks. It highlights how suitable models can address relevant questions about the dynamics of the disease, provide insights into ongoing epidemics, may aid emergency management and complement current epidemiological practice.
Lausanne, EPFL, 2017.2016
Digital Pharmacovigilance and Disease Surveillance: Combining Traditional and Big-Data Systems for Better Public Health
The digital revolution has contributed to very large data sets (ie, big data) relevant for public health. The two major data sources are electronic health records from traditional health systems and patient-generated data. As the two data sources have complementary strengths-high veracity in the data from traditional sources and high velocity and variety in patient-generated data-they can be combined to build more-robust public health systems. However, they also have unique challenges. Patient-generated data in particular are often completely unstructured and highly context dependent, posing essentially a machine-learning challenge. Some recent examples from infectious disease surveillance and adverse drug event monitoring demonstrate that the technical challenges can be solved. Despite these advances, the problem of verification remains, and unless traditional and digital epidemiologic approaches are combined, these data sources will be constrained by their intrinsic limits.
Journal of Infectious Diseases. 2016. Vol. 214, p. S399 – S403. DOI : 10.1093/infdis/jiw281.Infectious Disease Containment Based on a Wireless Sensor System
Infectious diseases pose a serious threat to public health due to its high infectivity and potentially high mortality. One of the most effective ways to protect people from being infected by these diseases is through vaccination. However, due to various resource constraints, vaccinating all the people in a community is not practical. Therefore, targeted vaccination, which vaccinates a small group of people, is an alternative approach to contain infectious diseases. Since many infectious diseases spread among people by droplet transmission within a certain range, we deploy a wireless sensor system in a high school to collect contacts happened within the disease transmission distance. Based on the collected traces, a graph is constructed to model the disease propagation, and a new metric (called connectivity centrality) is presented to find the important nodes in the constructed graph for disease containment. Connectivity centrality considers both a node’s local and global effect to measure its importance in disease propagation. Centrality based algorithms are presented and further enhanced by exploiting the information of the known infected nodes, which can be detected during targeted vaccination. Simulation results show that our algorithms can effectively contain infectious diseases and outperform other schemes under various conditions.
Ieee Access. 2016. Vol. 4, p. 1558 – 1569. DOI : 10.1109/Access.2016.2551199.Using Deep Learning for Image-Based Plant Disease Detection
Crop diseases are a major threat to food security, but their rapid identification remains difficult in many parts of the world due to the lack of the necessary infrastructure. The combination of increasing global smartphone penetration and recent advances in computer vision made possible by deep learning has paved the way for smartphone-assisted disease diagnosis. Using a public dataset of 54,306 images of diseased and healthy plant leaves collected under controlled conditions, we train a deep convolutional neural network to identify 14 crop species and 26 diseases (or absence thereof). The trained model achieves an accuracy of 99.35% on a held-out test set, demonstrating the feasibility of this approach. Overall, the approach of training deep learning models on increasingly large and publicly available image datasets presents a clear path toward smartphone-assisted crop disease diagnosis on a massive global scale.
Frontiers In Plant Science. 2016. Vol. 7, p. 1419. DOI : 10.3389/fpls.2016.01419.Statistical physics of vaccination
Historically, infectious diseases caused considerable damage to human societies, and they continue to do so today. To help reduce their impact, mathematical models of disease transmission have been studied to help understand disease dynamics and inform prevention strategies. Vaccination one of the most important preventive measures of modern times is of great interest both theoretically and empirically. And in contrast to traditional approaches, recent research increasingly explores the pivotal implications of individual behavior and heterogeneous contact patterns in populations. Our report reviews the developmental arc of theoretical epidemiology with emphasis on vaccination, as it led from classical models assuming homogeneously mixing (mean-field) populations and ignoring human behavior, to recent models that account for behavioral feedback and/or population spatial/social structure. Many of the methods used originated in statistical physics, such as lattice and network models, and their associated analytical frameworks. Similarly, the feedback loop between vaccinating behavior and disease propagation forms a coupled nonlinear system with analogs in physics. We also review the new paradigm of digital epidemiology, wherein sources of digital data such as online social media are mined for high-resolution information on epidemiologically relevant individual behavior. Armed with the tools and concepts of statistical physics, and further assisted by new sources of digital data, models that capture nonlinear interactions between behavior and disease dynamics offer a novel way of modeling real-world phenomena, and can help improve health outcomes. We conclude the review by discussing open problems in the field and promising directions for future research. (C) 2016 Elsevier B.V. All rights reserved.
Physics Reports-Review Section Of Physics Letters. 2016. Vol. 664, p. 1 – 113. DOI : 10.1016/j.physrep.2016.10.006.2015
Identifying Adverse Effects of HIV Drug Treatment and Associated Sentiments Using Twitter
Background: Social media platforms are increasingly seen as a source of data on a wide range of health issues. Twitter is of particular interest for public health surveillance because of its public nature. However, the very public nature of social media platforms such as Twitter may act as a barrier to public health surveillance, as people may be reluctant to publicly disclose information about their health. This is of particular concern in the context of diseases that are associated with a certain degree of stigma, such as HIV/AIDS. Objective: The objective of the study is to assess whether adverse effects of HIV drug treatment and associated sentiments can be determined using publicly available data from social media. Methods: We describe a combined approach of machine learning and crowdsourced human assessment to identify adverse effects of HIV drug treatment solely on individual reports posted publicly on Twitter. Starting from a large dataset of 40 million tweets collected over three years, we identify a very small subset (1642; 0.004%) of individual reports describing personal experiences with HIV drug treatment. Results: Despite the small size of the extracted final dataset, the summary representation of adverse effects attributed to specific drugs, or drug combinations, accurately captures well-recognized toxicities. In addition, the data allowed us to discriminate across specific drug compounds, to identify preferred drugs over time, and to capture novel events such as the availability of preexposure prophylaxis. Conclusions: The effect of limited data sharing due to the public nature of the data can be partially offset by the large number of people sharing data in the first place, an observation that may play a key role in digital epidemiology in general.
JMIR Public Health and Surveillance. 2015. Vol. 1, num. 2, p. e7. DOI : 10.2196/publichealth.4488.Ethical challenges of big data in public health
PLoS computational biology. 2015. Vol. 11, num. 2, p. e1003904. DOI : 10.1371/journal.pcbi.1003904.Modeling Individual-Level Infection Dynamics Using Social Network Information
Epidemic monitoring systems engaged in accurate discovery of infected individuals enable better understanding of the dynamics of epidemics and thus may promote effective disease mitigation or prevention. Currently, infection discovery systems require either physical participation of potential patients or provision of information from hospitals and health-care services. While social media has emerged as an increasingly important knowledge source that reflects multiple real world events, there is only a small literature examining how social media information can be incorporated into computational epidemic models. In this paper, we demonstrate how social media information can be incorporated into and improve upon traditional techniques used to model the dynamics of infectious diseases. Using flu infection histories and social network data collected from 264 students in a college community, we identify social network signals that can aid identification of infected individuals. Extending the traditional SIRS model, we introduce and illustrate the efficacy of an Online-Interaction-Aware Susceptible-Infected-Recovered-Susceptible (OIA-SIRS) model based on four social network signals for modeling infection dynamics. Empirical evaluations of our case study, flu infection within a college community, reveal that the OIA-SIRS model is more accurate than the traditional model, and also closely tracks the real-world infection rates as reported by CDC ILINet and Google Flu Trend
2015. The 24th ACM International Conference on Information and Knowledge Management, Melbourne, Australia, October 19-23, 2015. p. 1501 – 1510. DOI : 10.1145/2806416.2806575.Targeted vaccination based on a wireless sensor system
Vaccination is one of the most effective ways to protect people from being infected by infectious disease. However, it is often impractical to vaccinate all people in a community due to various resource constraints. Therefore, targeted vaccination, which vaccinates a small group of people, is an alternative approach to contain infectious disease spread. To achieve better performance in targeted vaccination, we collect student contact traces in a high school based on wireless sensors carried by students. With our wireless sensor system, we can record student contacts within the disease propagation distance, and then construct a disease propagation graph to model the infectious disease propagation. Based on this graph, we propose a metric called connectivity centrality to measure a node’s importance during disease propagation and design centrality based algorithms for targeted vaccination. The proposed algorithms are evaluated and compared with other schemes based on our collected traces. Trace driven simulation results show that our algorithms can help to effectively contain infectious disease
2015. Thirteenth IEEE International Conference on Pervasive Computing and Communications, St. Louis, Missouri, USA, March 23-27, 2015. p. 215 – 220. DOI : 10.1109/PERCOM.2015.7146531.Measles Vaccination Coverage and Cases among Vaccinated Persons
Emerging infectious diseases. 2015. Vol. 21, num. 8, p. 1480 – 1. DOI : 10.3201/eid2108.150284.2014
How should social mixing be measured: comparing web-based survey and sensor-based methods
Contact surveys and diaries have conventionally been used to measure contact networks in different settings for elucidating infectious disease transmission dynamics of respiratory infections. More recently, technological advances have permitted the use of wireless sensor devices, which can be worn by individuals interacting in a particular social context to record high resolution mixing patterns. To date, a direct comparison of these two different methods for collecting contact data has not been performed.
BMC infectious diseases. 2014. Vol. 14, p. 136. DOI : 10.1186/1471-2334-14-136.An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages
The role of social media as a source of timely and massive information has become more apparent since the era of Web 2.0.Multiple studies illustrated the use of information in social media to discover biomedical and health-related knowledge.Most methods proposed in the literature employ traditional document classification techniques that represent a document as a bag of words.These techniques work well when documents are rich in text and conform to standard English; however, they are not optimal for social media data where sparsity and noise are norms.This paper aims to address the limitations posed by the traditional bag-of-word based methods and propose to use heterogeneous features in combination with ensemble machine learning techniques to discover health-related information, which could prove to be useful to multiple biomedical applications, especially those needing to discover health-related knowledge in large scale social media data.Furthermore, the proposed methodology could be generalized to discover different types of information in various kinds of textual data.
Journal of biomedical informatics. 2014. Vol. 49, p. 255 – 68. DOI : 10.1016/j.jbi.2014.03.005.Targeting HIV-related Medication Side Effects and Sentiment Using Twitter Data
We present a descriptive analysis of Twitter data. Our study focuses on extracting the main side effects associated with HIV treatments. The crux of our work was the identification of personal tweets referring to HIV. We summarize our results in an infographic aimed at the general public. In addition, we present a measure of user sentiment based on hand-rated tweets
arXiv. 2014. Vol. 1404.3610.Two Sides of a Coin: Separating Personal Communication and Public Dissemination Accounts in Twitter
There are millions of accounts in Twitter. In this paper, we categorize twitter accounts into two types, namely Personal Communication Account (PCA) and Public Dissemination Account (PDA). PCAs are accounts operated by individuals and are used to express that individual’s thoughts and feelings. PDAs, on the other hand, refer to accounts owned by non-individuals such as companies, governments, etc. Generally, Tweets in PDA (i) disseminate a specific type of information (e.g., job openings, shopping deals, car accidents) rather than sharing an individual’s personal life; and (ii) may be produced by non-human entities (e.g., bots). We aim to develop techniques for identifying PDAs so as to (i) facilitate social scientists to reduce “noise” in their study of human behaviors, and (ii) to index them for potential recommendation to users looking for specific types of information. Through analysis, we find these two types of accounts follow different temporal, spatial and textual patterns. Accordingly we develop probabilistic models based on these features to identify PDAs. We also conduct a series of experiments to evaluate those algorithms for cleaning the Twitter data stream
2014. 18th Pacific-Asia Conference, PAKDD 2014, Tainan, Taiwan, May 13-16, 2014. p. 163 – 175. DOI : 10.1007/978-3-319-06608-0_14.Positive network assortativity of influenza vaccination at a high school: implications for outbreak risk and herd immunity
Schools are known to play a significant role in the spread of influenza. High vaccination coverage can reduce infectious disease spread within schools and the wider community through vaccine-induced immunity in vaccinated individuals and through the indirect effects afforded by herd immunity. In general, herd immunity is greatest when vaccination coverage is highest, but clusters of unvaccinated individuals can reduce herd immunity. Here, we empirically assess the extent of such clustering by measuring whether vaccinated individuals are randomly distributed or demonstrate positive assortativity across a United States high school contact network. Using computational models based on these empirical measurements, we further assess the impact of assortativity on influenza disease dynamics. We found that the contact network was positively assortative with respect to influenza vaccination: unvaccinated individuals tended to be in contact more often with other unvaccinated individuals than with vaccinated individuals, and these effects were most pronounced when we analyzed contact data collected over multiple days. Of note, unvaccinated males contributed substantially more than unvaccinated females towards the measured positive vaccination assortativity. Influenza simulation models using a positively assortative network resulted in larger average outbreak size, and outbreaks were more likely, compared to an otherwise identical network where vaccinated individuals were not clustered. These findings highlight the importance of understanding and addressing heterogeneities in seasonal influenza vaccine uptake for prevention of large, protracted school-based outbreaks of influenza, in addition to continued efforts to increase overall vaccine coverage.
PloS One. 2014. Vol. 9, num. 2, p. e87042. DOI : 10.1371/journal.pone.0087042.On the ground validation of online diagnosis with Twitter and medical records
Social media has been considered as a data source for tracking disease. However, most analyses are based on models that prioritize strong correlation with population-level disease rates over determining whether or not specific individual users are actually sick. Taking a different approach, we develop a novel system for social-media based disease detection at the individual level using a sample of professionally diagnosed individuals. Specifically, we develop a system for making an accurate influenza diagnosis based on an individual’s publicly available Twitter data. We find that about half (17/35 = 48.57%) of the users in our sample that were sick explicitly discuss their disease on Twitter. By developing a meta classifier that combines text analysis, anomaly detection, and social network analysis, we are able to diagnose an individual with greater than 99% accuracy even if she does not discuss her health
2014. 23rd International World Wide Web Conference, Seoul, Korea, April 7-11, 2014. p. 651 – 656. DOI : 10.1145/2567948.2579272.2013
Synthesizing Social Proximity Networks by Combining Subjective Surveys with Digital Traces
Synthetic social contact networks play a central role in the study of epidemics and methods to control them. In this paper we propose a new methodology that combines subjective surveys and data obtained using digital devices to synthesize detailed social networks for high schools in the United States. The two data sources are diverse and have their relative merits. The proposed methodology yields high quality dynamic social proximity networks. We evaluate our methodology by carrying out a detailed structural analysis of the resulting networks. Epidemic simulations and intervention analysis using these networks provide further insights into the role of network structure on epidemics. Our results indicate that the in-class networks have a highly clustered structure with contact duration following a heavy tail distribution. SEIR-based epidemic simulations demonstrate that we may use existing theoretic graph models to fit digital trace in-class networks, but only after critical structure metrics including degree and edge weight are tuned to the real data. For practical use, the detailed model for in-class contacts using digital trace data therefore seems to add important and valuable structure needed when developing public health policies. Our methodology is quite general and can be combined with subjective assessments such as surveys and other available information. The technique is also applicable to other micro-networks such as conferences with multiple sessions, and office campuses. It is efficient and applicable in settings where data is hard or relatively expensive to obtain
2013. IEEE 16th International Conference on Computational Science and Engineering (CSE), Sydney, Australia, December 03-05, 2013. p. 188 – 195. DOI : 10.1109/CSE.2013.38.A low-cost method to assess the epidemiological importance of individuals in controlling infectious disease outbreaks
Infectious disease outbreaks in communities can be controlled by early detection and effective prevention measures. Assessing the relative importance of each individual community member with respect to these two processes requires detailed knowledge about the underlying social contact network on which the disease can spread. However, mapping social contact networks is typically too resource-intensive to be a practical possibility for most communities and institutions.
BMC medicine. 2013. Vol. 11, p. 35. DOI : 10.1186/1741-7015-11-35.Discovering health-related knowledge in social media using ensembles of heterogeneous features
Social media is emerging as a powerful source of communication, information dissemination and mining. Being colloquial and ubiquitous in nature makes it easier for users to express their opinions and preferences in a seamless, dynamic manner. Epidemic surveillance systems that utilize social media to detect the emergence of diseases have been proposed in the literature. These systems mostly employ traditional document classification techniques that represent a document with a bag of N-grams. However, such techniques are not optimal for social media where sparsity and noise are norms. The authors address the limitations posed by the traditional N-gram based methods and propose to use features that represent different semantic aspects of the data in combination with ensemble machine learning techniques to identify health-related messages in a heterogenous pool of social media data. Furthermore, the results reveal significant improvement in identifying health related social media content which can be critical in the emergence of a novel, unknown disease epidemic
2013. 22nd ACM international conference on Information & Knowledge Management (CIKM 2013), San Francisco, CA, USA, October 27 – November 01, 2013. p. 1685 – 1690. DOI : 10.1145/2505515.2505629.Validating models for disease detection using twitter
Data mining social media has become a valuable resource for infectious disease surveillance. However, there are considerable risks associated with incorrectly predicting an epidemic. The large amount of social media data combined with the small amount of ground truth data and the general dynamics of infectious diseases present unique challenges when evaluating model performance. In this paper, we look at several methods that have been used to assess influenza prevalence using Twitter. We then validate them with tests that are designed to avoid and illustrate issues with the standard k-fold cross validation method. We also find that small modifications to the way that data are partitioned can have major effects on a model’s reported performance
2013. 22nd International World Wide Web Conference (WWW 2013), Rio de Janeiro, Brazil, May 13 – 17, 2013. p. 699 – 702. DOI : 10.1145/2487788.2488027.The Social Maintenance of Cooperation through Hypocrisy
Cooperation is widespread in human societies, but its maintenance at the group level remains puzzling if individuals benefit from not cooperating. Explanations of the maintenance of cooperation generally assume that cooperative and non-cooperative behavior in others can be assessed and copied accurately. However, humans have a well known capacity to deceive and thus to manipulate how others assess their behavior. Here, we show that hypocrisy – claiming to be acting cooperatively while acting selfishly – can maintain social cooperation because it prevents the spread of selfish behavior. We demonstrate this effect both theoretically and experimentally. Hypocrisy allows the cooperative strategy to spread by taking credit for the success of the non-cooperative strategy
arXiv. 2013. Vol. 1304.3747.Understanding population displacements on location-based call records using road data
Large population displacements are usually observed after nature disasters. The best approximations to real world population movement in such a short temporal scale are the users’ movements patterns derived from the cell phone usage data. However, due to a lot of political, economic and privacy constraints, these sensitive data are not always available. On the other hand, population movements are usually observed on the underlying road network. The correlation between the cell phone users’ movement and the road network has yet been examined. The aim of this research is to compare the topological structure and the network metrics of the road network to the cell phone users’ movement network in Abidjan, Cote D’Ivoire, and to inspect the correlations of movement volume and road connectivity. A flooding scenario was assumed to inspect the responds from both the movement network and road network. Our analysis shows that the cellphone users’ movement network and the road network present significant similarities in terms of network partition. Our research also indicates that the road topology could be utilized as a proxy to approximate the population movement volume on above. We present an initial step to help the data-scarce area understand the population movement pattern from more readily-available road network data. Furthermore, our results suggest that traditional evacuation planning should consider the social perspective of population connections and periodical movements
2013. 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2013), Orlando, Florida, USA, November 5-8, 2013. p. 17 – 21. DOI : 10.1145/2534190.2534199.The dynamics of health behavior sentiments on a large online social network
Modifiable health behaviors, a leading cause of illness and death in many countries, are often driven by individual beliefs and sentiments about health and disease. Individual behaviors affecting health outcomes are increasingly modulated by social networks, for example through the associations of like-minded individuals – homophily – or through peer influence effects. Using a statistical approach to measure the individual temporal effects of a large number of variables pertaining to social network statistics, we investigate the spread of a health sentiment towards a new vaccine on Twitter, a large online social network. We find that the effects of neighborhood size and exposure intensity are qualitatively very different depending on the type of sentiment. Generally, we find that larger numbers of opinionated neighbors inhibit the expression of sentiments. We also find that exposure to negative sentiment is contagious – by which we merely mean predictive of future negative sentiment expression – while exposure to positive sentiments is generally not. In fact, exposure to positive sentiments can even predict increased negative sentiment expression. Our results suggest that the effects of peer influence and social contagion on the dynamics of behavioral spread on social networks are strongly content-dependent
EPJ Data Science. 2013. Vol. 2, num. 1, p. 4. DOI : 10.1140/epjds16.Influenza A (H7N9) and the importance of digital epidemiology
The New England journal of medicine. 2013. Vol. 369, num. 5, p. 401 – 4. DOI : 10.1056/NEJMp1307752.Complex social contagion makes networks more vulnerable to disease outbreaks
Social network analysis is now widely used to investigate the dynamics of infectious disease spread. Vaccination dramatically disrupts disease transmission on a contact network, and indeed, high vaccination rates can potentially halt disease transmission altogether. Here, we build on mounting evidence that health behaviors – such as vaccination, and refusal thereof – can spread across social networks through a process of complex contagion that requires social reinforcement. Using network simulations that model health behavior and infectious disease spread, we find that under otherwise identical conditions, the process by which the health behavior spreads has a very strong effect on disease outbreak dynamics. This dynamic variability results from differences in the topology within susceptible communities that arise during the health behavior spreading process, which in turn depends on the topology of the overall social network. Our findings point to the importance of health behavior spread in predicting and controlling disease outbreaks.
Scientific Reports. 2013. Vol. 3, p. 1905. DOI : 10.1038/srep01905.3D Optical Imaging of Living Cells in Microgravity : Application to Study Dynamic Changes of Cytoskeleton
Lausanne, EPFL, 2013.2012
Non-genetic inheritance and the patterns of antagonistic coevolution
Antagonistic species interactions can lead to coevolutionary genotype or phenotype frequency oscillations, with important implications for ecological and evolutionary processes. However, direct empirical evidence of such oscillations is rare. The rarity of observations is generally attributed to inherent difficulties of ecological and evolutionary long-term studies, to weak or absent interaction between species, or to the absence of negative frequency-dependence.
BMC evolutionary biology. 2012. Vol. 12, p. 93. DOI : 10.1186/1471-2148-12-93.Digital epidemiology
Mobile, social, real-time: the ongoing revolution in the way people communicate has given rise to a new kind of epidemiology. Digital data sources, when harnessed appropriately, can provide local and timely information about disease and health dynamics in populations around the world. The rapid, unprecedented increase in the availability of relevant data from various digital sources creates considerable technical and computational challenges.
PLoS computational biology. 2012. Vol. 8, num. 7, p. e1002616. DOI : 10.1371/journal.pcbi.1002616.Governing the global commons with local institutions
Most problems faced by modern human society have two characteristics in common–they are tragedy-of-the-commons type of problems, and they are global problems. Tragedy-of-the-commons type of problems are those where a commonly shared resource is overexploited by free riders at the expense of everyone sharing the resource. The exploitation of global resources such as clean air and water, political stability and peace, etc. underlies many of the most pressing human problems. Punishment of free riding behavior is one of the most frequently used strategies to combat the problem, but the spatial reach of sanctioning institutions is often more limited than the spatial effects of overexploitation. Here, we analyze a general game theoretical model to assess under what circumstances sanctioning institutions with limited reach can maintain the larger commons. We find that the effect of the spatial reach has a strong effect on whether and how the commons can be maintained, and that the transitions between those outcomes are characterized by phase transitions. The latter indicates that a small change in the reach of sanctioning systems can profoundly change the way the global commons can be managed.
PloS One. 2012. Vol. 7, num. 4, p. e34051. DOI : 10.1371/journal.pone.0034051.2011
Measuring school contact networks using wireless sensor technology
Interaction networks shaped by social processes constitute the substrate on which various phenomena of interest to human biology occur, for example, epidemics, diffusion of health information, and the exertion of social influence related to health. Understanding the structure of network formation is thus crucial to our understanding of how relational human interactions mediate key biosocial outcomes. Research in this area has been hampered by a lack of high-quality data on the formation and structure of contact networks. Using Wireless Sensor Network (WSN) technology, measured the temporal dynamics of close-proximity interaction networks during a typical school day in a high school in the San Francisco Bay Area. Participants wore small wireless sensors which send and receive radio signals to and from other sensors nearby. This technology allowed us to collect dynamic contact network data with unparalleled precision. At a 94% coverage, we collected 762,868 CPIs at a maximal distance of 3 meters among 788 individuals. The data revealed a high density network with typical small world properties and a relatively homogenous distribution of both interaction time and interaction partners among subjects. Computer simulations of the spread of an influenza-like disease on the weighted contact graph are in good agreement with absentee data during the most recent influenza season. Analysis of targeted immunization strategies suggested that contact network data are required to design strategies that are significantly more effective than random immunization. Immunization strategies based on contact network data were most effective at high vaccination coverage
2011. 36th Annual Meeting: Human Biology Association, Minneapolis, Minesota, USA, April 13-14, 2011. DOI : 10.1002/ajhb.21153.Assessing vaccination sentiments with online social media: implications for infectious disease dynamics and control
There is great interest in the dynamics of health behaviors in social networks and how they affect collective public health outcomes, but measuring population health behaviors over time and space requires substantial resources. Here, we use publicly available data from 101,853 users of online social media collected over a time period of almost six months to measure the spatio-temporal sentiment towards a new vaccine. We validated our approach by identifying a strong correlation between sentiments expressed online and CDC-estimated vaccination rates by region. Analysis of the network of opinionated users showed that information flows more often between users who share the same sentiments – and less often between users who do not share the same sentiments – than expected by chance alone. We also found that most communities are dominated by either positive or negative sentiments towards the novel vaccine. Simulations of infectious disease transmission show that if clusters of negative vaccine sentiments lead to clusters of unprotected individuals, the likelihood of disease outbreaks is greatly increased. Online social media provide unprecedented access to data allowing for inexpensive and efficient tools to identify target areas for intervention efforts and to evaluate their effectiveness.
PLoS computational biology. 2011. Vol. 7, num. 10, p. e1002199. DOI : 10.1371/journal.pcbi.1002199.2010
A high-resolution human contact network for infectious disease transmission
The most frequent infectious diseases in humans–and those with the highest potential for rapid pandemic spread–are usually transmitted via droplets during close proximity interactions (CPIs). Despite the importance of this transmission route, very little is known about the dynamic patterns of CPIs. Using wireless sensor network technology, we obtained high-resolution data of CPIs during a typical day at an American high school, permitting the reconstruction of the social network relevant for infectious disease transmission. At 94% coverage, we collected 762,868 CPIs at a maximal distance of 3 m among 788 individuals. The data revealed a high-density network with typical small-world properties and a relatively homogeneous distribution of both interaction time and interaction partners among subjects. Computer simulations of the spread of an influenza-like disease on the weighted contact graph are in good agreement with absentee data during the most recent influenza season. Analysis of targeted immunization strategies suggested that contact network data are required to design strategies that are significantly more effective than random immunization. Immunization strategies based on contact network data were most effective at high vaccination coverage.
Proceedings of the National Academy of Sciences of the United States of America. 2010. Vol. 107, num. 51, p. 22020 – 5. DOI : 10.1073/pnas.1009094108.Plague outbreaks in prairie dog populations explained by percolation thresholds of alternate host abundance
Highly lethal pathogens (e.g., hantaviruses, hendra virus, anthrax, or plague) pose unique public-health problems, because they seem to periodically flare into outbreaks before disappearing into long quiescent phases. A key element to their possible control and eradication is being able to understand where they persist in the latent phase and how to identify the conditions that result in sporadic epidemics or epizootics. In American grasslands, plague, caused by Yersinia pestis, exemplifies this quiescent-outbreak pattern, because it sporadically erupts in epizootics that decimate prairie dog (Cynomys ludovicianus) colonies, yet the causes of outbreaks and mechanisms for interepizootic persistence of this disease are poorly understood. Using field data on prairie community ecology, flea behavior, and plague-transmission biology, we find that plague can persist in prairie-dog colonies for prolonged periods, because host movement is highly spatially constrained. The abundance of an alternate host for disease vectors, the grasshopper mouse (Onychomys leucogaster), drives plague outbreaks by increasing the connectivity of the prairie dog hosts and therefore, permitting percolation of the disease throughout the primary host population. These results offer an alternative perspective on plague’s ecology (i.e., disease transmission exacerbated by alternative hosts) and may have ramifications for plague dynamics in Asia and Africa, where a single main host has traditionally been considered to drive Yersinia ecology. Furthermore, abundance thresholds of alternate hosts may be a key phenomenon determining outbreaks of disease in many multihost-disease systems.
Proceedings of the National Academy of Sciences of the United States of America. 2010. Vol. 107, num. 32, p. 14247 – 50. DOI : 10.1073/pnas.1002826107.Experiences in measuring a human contact network for epidemiology research
This paper discusses our experience in designing and deploying a 994-node sensor network to measure the social contact network of a high school over one typical day. The system aims to capture interactions of human subjects for the study of infectious disease spread. We describe unique challenges posed by a large-scale network that is heavily affected by humans. We present techniques to address challenges such as frequent node reboots and global timestamps. The end result of the deployment is a dataset of 792 traces which can be used to calculate the school population’s contact network and the rough location where interactions occurred
2010. 6th Workshop on Hot Topics in Embedded Networked Sensors (HotEMNETS’10), Killarney, Ireland, June 28-29, 2010. DOI : 10.1145/1978642.1978651.On the evolution of sexual reproduction in hosts coevolving with multiple parasites
Host-parasite coevolution has been studied extensively in the context of the evolution of sex. Although hosts typically coevolve with several parasites, most studies considered one-host/one-parasite interactions. Here, we study population-genetic models in which hosts interact with two parasites. We find that host/multiple-parasite models differ nontrivially from host/single-parasite models. Selection for sex resulting from interactions with a single parasite is often outweighed by detrimental effects due to the interaction between parasites if coinfection affects the host more severely than expected based on single infections, and/or if double infections are more common than expected based on single infections. The resulting selection against sex is caused by strong linkage-disequilibria of constant sign that arise between host loci interacting with different parasites. In contrast, if coinfection affects hosts less severely than expected and double infections are less common than expected, selection for sex due to interactions with individual parasites can now be reinforced by additional rapid linkage-disequilibrium oscillations with changing sign. Thus, our findings indicate that the presence of an additional parasite can strongly affect the evolution of sex in ways that cannot be predicted from single-parasite models, and that thus host/multiparasite models are an important extension of the Red Queen Hypothesis.
Evolution; international journal of organic evolution. 2010. Vol. 64, num. 6, p. 1644 – 56. DOI : 10.1111/j.1558-5646.2010.00951.x.Dynamics and control of diseases in networks with community structure
The dynamics of infectious diseases spread via direct person-to-person transmission (such as influenza, smallpox, HIV/AIDS, etc.) depends on the underlying host contact network. Human contact networks exhibit strong community structure. Understanding how such community structure affects epidemics may provide insights for preventing the spread of disease between communities by changing the structure of the contact network through pharmaceutical or non-pharmaceutical interventions. We use empirical and simulated networks to investigate the spread of disease in networks with community structure. We find that community structure has a major impact on disease dynamics, and we show that in networks with strong community structure, immunization interventions targeted at individuals bridging communities are more effective than those simply targeting highly connected individuals. Because the structure of relevant contact networks is generally not known, and vaccine supply is often limited, there is great need for efficient vaccination algorithms that do not require full knowledge of the network. We developed an algorithm that acts only on locally available network information and is able to quickly identify targets for successful immunization intervention. The algorithm generally outperforms existing algorithms when vaccine supply is limited, particularly in networks with strong community structure. Understanding the spread of infectious diseases and designing optimal control strategies is a major goal of public health. Social networks show marked patterns of community structure, and our results, based on empirical and simulated data, demonstrate that community structure strongly affects disease dynamics. These results have implications for the design of control strategies.
PLoS computational biology. 2010. Vol. 6, num. 4, p. e1000736. DOI : 10.1371/journal.pcbi.1000736.Modelling the influence of human behaviour on the spread of infectious diseases: a review
Human behaviour plays an important role in the spread of infectious diseases, and understanding the influence of behaviour on the spread of diseases can be key to improving control efforts. While behavioural responses to the spread of a disease have often been reported anecdotally, there has been relatively little systematic investigation into how behavioural changes can affect disease dynamics. Mathematical models for the spread of infectious diseases are an important tool for investigating and quantifying such effects, not least because the spread of a disease among humans is not amenable to direct experimental study. Here, we review recent efforts to incorporate human behaviour into disease models, and propose that such models can be broadly classified according to the type and source of information which individuals are assumed to base their behaviour on, and according to the assumed effects of such behaviour. We highlight recent advances as well as gaps in our understanding of the interplay between infectious disease dynamics and human behaviour, and suggest what kind of data taking efforts would be helpful in filling these gaps.
Journal of the Royal Society, Interface / the Royal Society. 2010. Vol. 7, num. 50, p. 1247 – 56. DOI : 10.1098/rsif.2010.0142.2009
The role of epistasis on the evolution of recombination in host-parasite coevolution
Antagonistic coevolution between hosts and parasites is known to affect selection on recombination in hosts. The Red Queen Hypothesis (RQH) posits that genetic shuffling is beneficial for hosts because it quickly creates resistant genotypes. Indeed, a large body of theoretical studies have shown that for many models of the genetic interaction between host and parasite, the coevolutionary dynamics of hosts and parasites generate selection for recombination or sexual reproduction. Here we investigate models in which the effect of the host on the parasite (and vice versa) depend approximately multiplicatively on the number of matched alleles. Contrary to expectation, these models generate a dynamical behavior that strongly selects against recombination/sex. We investigate this atypical behavior analytically and numerically. Specifically we show that two complementary equilibria are responsible for generating strong linkage disequilibria of opposite sign, which in turn causes strong selection against sex. The biological relevance of this finding stems from the fact that these phenomena can also be observed if hosts are attacked by two parasites that affect host fitness independently. Hence the role of the Red Queen Hypothesis in natural host parasite systems where infection by multiple parasites is the rule rather than the exception needs to be reevaluated.
Theoretical population biology. 2009. Vol. 75, num. 1, p. 1 – 13. DOI : 10.1016/j.tpb.2008.09.007.Evolution of stochastic switching rates in asymmetric fitness landscapes
Uncertain environments pose a tremendous challenge to populations: The selective pressures imposed by the environment can change so rapidly that adaptation by mutation alone would be too slow. One solution to this problem is given by the phenomenon of stochastic phenotype switching, which causes genetically uniform populations to be phenotypically heterogenous. Stochastic phenotype switching has been observed in numerous microbial species and is generally assumed to be an adaptive bet-hedging strategy to anticipate future environmental change. We use an explicit population genetic model to investigate the evolutionary dynamics of phenotypic switching rates. We find that whether or not stochastic switching is an adaptive strategy is highly contingent upon the fitness landscape given by the changing environment. Unless selection is very strong, asymmetric fitness landscapes-where the cost of being maladapted is not identical in all environments-strongly select against stochastic switching. We further observe a threshold phenomenon that causes switching rates to be either relatively high or completely absent, but rarely intermediate. Our finding that marginal changes in selection pressures can cause fundamentally different evolutionary outcomes is important in a wide range of fields concerned with microbial bet hedging.
Genetics. 2009. Vol. 182, num. 4, p. 1159 – 64. DOI : 10.1534/genetics.109.103333.On the causes of selection for recombination underlying the red queen hypothesis
The vast majority of plant and animal species reproduce sexually despite the costs associated with sexual reproduction. Genetic recombination might outweigh these costs if it helps the species escape parasite pressure by creating rare or novel genotypes, an idea known as the Red Queen hypothesis. Selection for recombination can be driven by short- and long-term effects, but the relative importance of these effects and their dependency on the parameters of an antagonistic species interaction remain unclear. We use computer simulations of a mathematical model of host-parasite coevolution to measure those effects under a wide range of parameters. We find that the real driving force underlying the Red Queen hypothesis is neither the immediate, next-generation, short-term effect nor the long-term effect but in fact a delayed short-term effect. Our results highlight the importance of differentiating clearly between immediate and delayed short-term effects when attempting to elucidate the mechanism underlying selection for recombination in the Red Queen hypothesis.
The American naturalist. 2009. Vol. 174, num. Suppl 1, p. S31 – 42. DOI : 10.1086/599085.Early assessment of anxiety and behavioral response to novel swine-origin influenza A(H1N1)
Since late April, 2009, a novel influenza virus A (H1N1), generally referred to as the “swine flu,” has spread around the globe and infected hundreds of thousands of people. During the first few days after the initial outbreak in Mexico, extensive media coverage together with a high degree of uncertainty about the transmissibility and mortality rate associated with the virus caused widespread concern in the population. The spread of an infectious disease can be strongly influenced by behavioral changes (e.g., social distancing) during the early phase of an epidemic, but data on risk perception and behavioral response to a novel virus is usually collected with a substantial delay or after an epidemic has run its course.
PloS One. 2009. Vol. 4, num. 12, p. e8032. DOI : 10.1371/journal.pone.0008032.2008
The state of affairs in the kingdom of the Red Queen
One of the most prominent hypotheses to explain the ubiquity of sex and recombination is based on host-parasite interactions. Under the name of the Red Queen hypothesis (RQH), it has had theoretical and empirical support since its conception, but recent theoretical work has shown that the circumstances under which the RQH works remain unclear. Here we review the current status of the theory of the RQH. We argue that recent theoretical work calls for new experimental data and an increased theoretical effort to reveal the driving force of the RQH.
Trends in ecology & evolution. 2008. Vol. 23, num. 8, p. 439 – 45. DOI : 10.1016/j.tree.2008.04.010.The effect of opinion clustering on disease outbreaks
Many high-income countries currently experience large outbreaks of vaccine-preventable diseases such as measles despite the availability of highly effective vaccines. This phenomenon lacks an explanation in countries where vaccination rates are rising on an already high level. Here, we build on the growing evidence that belief systems, rather than access to vaccines, are the primary barrier to vaccination in high-income countries, and show how a simple opinion formation process can lead to clusters of unvaccinated individuals, leading to a dramatic increase in disease outbreak probability. In particular, the effect of clustering on outbreak probabilities is strongest when the vaccination coverage is close to the level required to provide herd immunity under the assumption of random mixing. Our results based on computer simulations suggest that the current estimates of vaccination coverage necessary to avoid outbreaks of vaccine-preventable diseases might be too low.
Journal of the Royal Society, Interface / the Royal Society. 2008. Vol. 5, num. 29, p. 1505 – 8. DOI : 10.1098/rsif.2008.0271.Rapid parasite adaptation drives selection for high recombination rates
The Red Queen hypothesis proposes that sex is maintained through selection pressure imposed by coevolving parasites: susceptible hosts are able to escape parasite pressure by recombining their genome to create resistant offspring. However, previous theoretical studies have shown that the Red Queen typically selects against sex unless selection is strong, arguing that high rates of recombination cannot evolve when parasites are of low virulence. Here we show that under the biologically plausible assumption of a severe fitness cost for parasites that fail to infect, the Red Queen can cause selection for high recombination rates, and that the strength of virulence is largely irrelevant to the direction of selection for increased recombination rates. Strong selection on parasites and short generation times make parasites usually better adapted to their hosts than vice versa and can thus favor higher recombination rates in hosts. By demonstrating the importance of host-imposed selection on parasites, our findings resolve previously reported conflicting results.
Evolution; international journal of organic evolution. 2008. Vol. 62, num. 2, p. 295 – 300. DOI : 10.1111/j.1558-5646.2007.00265.x.Parasites lead to evolution of robustness against gene loss in host signaling networks
Many biological networks can maintain their function against single gene loss. However, the evolutionary mechanisms responsible for such robustness remain unclear. Here, we demonstrate that antagonistic host-parasite interactions can act as a selective pressure driving the emergence of robustness against gene loss. Using a model of host signaling networks and simulating their coevolution with parasites that interfere with network function, we find that networks evolve both redundancy and specific architectures that allow them to maintain their response despite removal of proteins. We show that when the parasite pressure is removed, subsequent evolution can lead to loss of redundancy while architecture-based robustness is retained. Contrary to intuition, increased parasite virulence hampers evolution of robustness by limiting the generation of population level diversity in the host. However, when robustness emerges under high virulence, it tends to be stronger. These findings predict an increased presence of robustness mechanisms in biological networks operating under parasite interference. Conversely, the presence of such mechanisms could indicate current or past parasite interference.
Molecular systems biology. 2008. Vol. 4, p. 202. DOI : 10.1038/msb.2008.44.2007
The Red Queen and the persistence of linkage-disequilibrium oscillations in finite and infinite populations
The Red Queen Hypothesis (RQH) suggests that the coevolutionary dynamics of host-parasite systems can generate selection for increased host recombination. Since host-parasite interactions often have a strong genetic basis, recombination between different hosts can increase the fraction of novel and potentially resistant offspring genotypes. A prerequisite for this mechanism is that host-parasite interactions generate persistent oscillations of linkage disequilibria (LD).
BMC evolutionary biology. 2007. Vol. 7, p. 211. DOI : 10.1186/1471-2148-7-211.The evolution of complexity on the level of genes, individuals and populations
Ecole Polytechnique Fédérale de Zürich (ETHZ), 2007.2006
High epitope expression levels increase competition between T cells
Both theoretical predictions and experimental findings suggest that T cell populations can compete with each other. There is some debate on whether T cells compete for aspecific stimuli, such as access to the surface on antigen-presenting cells (APCs) or for specific stimuli, such as their cognate epitope ligand. We have developed an individual-based computer simulation model to study T cell competition. Our model shows that the expression level of foreign epitopes per APC determines whether T cell competition is mainly for specific or aspecific stimuli. Under low epitope expression, competition is mainly for the specific epitope stimuli, and, hence, different epitope-specific T cell populations coexist readily. However, if epitope expression levels are high, aspecific competition becomes more important. Such between-specificity competition can lead to competitive exclusion between different epitope-specific T cell populations. Our model allows us to delineate the circumstances that facilitate coexistence of T cells of different epitope specificity. Understanding mechanisms of T cell coexistence has important practical implications for immune therapies that require a broad immune response.
PLoS computational biology. 2006. Vol. 2, num. 8, p. e109. DOI : 10.1371/journal.pcbi.0020109.Mutation accumulation in space and the maintenance of sexual reproduction
The maintenance of sexual reproduction remains one of the major puzzles of evolutionary biology, since, all else being equal, an asexual mutant should have a twofold fitness advantage over the sexual wildtype. Most theories suggest that sex helps either to purge deleterious mutations, or to adapt to changing environments. Both mechanisms have their limitations if they act in isolation because they require either high genomic mutation rates or very virulent pathogens, and it is therefore often thought that they must act together to maintain sex. Typically, however, these theories have in common that they are not based on spatial processes. Here, we show that local dispersal and local competition can explain the maintenance of sexual reproduction as a means of purging deleterious mutations. Using a spatially explicit individual-based model, we find that even with reasonably low genomic mutation rates and large total population sizes, asexual clones cannot invade a sexual population. Our results demonstrate how spatial processes affect mutation accumulation such that it can fully erode the twofold benefit of asexuality faster than an asexual clone can take over a sexual population. Thus, the cost of sex is generally overestimated in models that ignore the effects of space on mutation accumulation.
Ecology letters. 2006. Vol. 9, num. 8, p. 941 – 6. DOI : 10.1111/j.1461-0248.2006.00942.x.The effect of multifunctionality on the rate of evolution in yeast
Multifunctional genes are expected to evolve at lower rates because mutations in such genes that improve one function might often have deleterious effects on other functions. Here we tested for an association between multifunctionality and evolutionary rates in genes of Saccharomyces cerevisiae, and we find a highly significant negative correlation between the number of biological processes in which a gene is involved in and its rate of evolution. However, the magnitude of this effect is small, and the results do not support the notion that multifunctionality limits a gene’s rate of evolution.
Molecular biology and evolution. 2006. Vol. 23, num. 4, p. 721 – 2. DOI : 10.1093/molbev/msj086.