Privacy. The information you provide will help us meet the research goals presented below. No immediately identifiable information - such as your name or IP addressed - will be used. Only researchers involved in this project will have access to your Twitter data. If you have questions about data storage and security, please contact [elaine@digitalpophealth.org]. For more information about Twitter’s privacy policy, click here.
Background. There are a number of systems that use data from the Internet (such as, news, social media and crowd-sourced reports) and other digital sources (e.g., cell phones, wearable devices) to monitor disease spread, assess population attitudes towards vaccines, and improve understanding of the interaction between population behavioral changes and health. In addition to challenges in extracting public health signals from the noise inherent in these data sources, there are significant biases due to differences in the representation of individuals from different locations, age and race/ethnic backgrounds. Although there have been several publications discussing the limitations of these data sources, no project has developed a rigorous and comprehensive approach to systematically investigate these limitations and explore mitigation strategies.
Specific Aims. We seek to improve methods for automated inference of key demographic traits – including age, race/ethnicity and gender – of Twitter users. We will then use these tools to assess the quality and representativeness of health information provided by users on Twitter, as well as examine how the public discusses personal health using these data. Through this research we seek to improve the way researchers use Twitter as a means of learning about – and eventually improving – the health of the American public.
Funding. This project is funded by the Robert Wood Johnson Foundation.
Researchers involved in this collaboration are at the Institute of Health Metrics and Evaluation at the University of Washington and the University of Oklahoma.