Data analysis using Stata
National Institute of Hygiene and Epidemiology, Hanoi
Monday 27th of March to Saturday 1st of April 2023
Epidemiology of Emerging Diseases
Tuan Anh Nguyen & Thang Hong Pham
National Institute of Hygiene and Epidemiology
This training will be held at the National Institute of Hygiene and Epidemiology in Hanoi (Vietnam) from Monday 27th of March to Saturday 1st of April 2023.
Goals of the training:
The course aims to build local capacity in epidemiology and to provide participants with a strong working knowledge of the main statistical techniques used in the analysis of epidemiological data using Stata - one of the foremost used statistical packages in scientific research.
Although participants are instructed in Stata, the basic analytic concepts and techniques covered in the course are applicable to the analysis of data using any statistical package.
At the end of the training, participants will have acquired the following capacities:
- Use Stata to do some data-management,
- Apply basic Stata commands to the proper handling of data, including importing and merging of files, generation of variables, and data-management (check for duplicates, missing data, outliers…),
- Import and export data,
- Describe the data and perform graphical analysis,
- Perform comparison tests for categorical and continuous variables, using parametric or non-parametric tests,
- Properly report and interpret p-values,
- Perform univariate and multivariate analysis,
- Construct and interpret logistic regression models,
- Assess confounding and interaction using regression models.
The training is dedicated to people working in the medical research field (e.g. clinicians, pharmacists, veterinarians, engineers, data-managers…) in Vietnam but also in other countries in South-Eastern Asia.
The training is dedicated to people already having some knowledge of epidemiology and basic statistics (e.g. familiar with normal distributions, p-values and hypothesis testing). No knowledge of the Stata software is required, as an introduction to Stata is part of the training.
The training will be provided in English, and only applications in English will be considered.
Each participant is requested to bring his/her own laptop.
Duration and organisation of the training
The training lasts six days. Three days are dedicated to an introduction to Stata (opening and merging datasets, creating programmes, graphics, cleaning datasets…) and introduction to statistical tests; then, three days are dedicated to the logistic regression analysis.
The training will be based on introductory lectures completed by individual practical exercises on a computer with one computer per student. Teaching assistants are available full-time to help students in using Stata commands and performing exercises.
Every participant is requested to come with his own laptop.
The training will be in English.
Stata licences will be purchased for this training, at the end of which all participants will be given a licence.
Day 1 to day 3: Stata commands and univariate analysis
- Introduction to Stata, including basic command syntax, importing and saving data,
- Merging datasets,
- Creating and running programs,
- Command syntax for generation of numerical and string variables, formatting of variables, dates, functions, ordering variables, and assigning labels,
- Quality control and cleaning of databases, including identification of duplicate observations, discrepancies and missing and extreme values,
- Summarizing and describing data using the summarize, tabulate, browse, and list commands,
- Introduction to graphics,
- Comparing continuous variables between two or more groups (student t-test, Mann-Whitney test, ANOVA, Kruskal-Wallis test, paired t-test, Wilcoxon test),
- Comparing categorical variables between groups (chi-2, fisher exact tests).
Day 4 to day 6: Logistic regression
- Introduction to logistic regression, including model assumptions, the logistic function, and interpretation of odds ratios and beta coefficients
- Epidemiological concepts of confounding, interaction, and mediation
- Evaluating and interpreting confounding in multivariate regression models
- Evaluating and interpreting interactions
- Strategies for constructing multivariate models, including categorization of variables, variable selection, and model interpretation
- Assessing model fit
- Assessing missingness and using multiple imputation methods
- Calculating and interpreting the attributable fraction
|Arnaud Fontanet||Institut Pasteur (Paris, France)||Epidemiology|
|Bich-Tram Huynh||Institut Pasteur (Paris, France)||Epidemiology|
|Yoann Madec||Institut Pasteur (Paris, France)||Biostatistics|
There is no registration fee. Support for transport and accommodation will be provided to participants not from Hanoi.
Your application must include:
- An application form (download here: Application_form.doc),
- a cover letter in which you explain your expectations of the course and how it will contribute to your current research/project,
- a support letter from your manager,
- a detailed curriculum-vitae (3 pages maximum).
Your application must be sent to firstname.lastname@example.org before December 31st 2022.
Notifications about decision by January 15, 2023.