Thông báo lớp tập huấn "Data analysis using Stata"



Data analysis using Stata

National Institute of Hygiene and Epidemiology, Hanoi

Monday 27th of March to Saturday 1st of April 2023

Organisation : 

Yoann Madec

Epidemiology of Emerging Diseases

Institut Pasteur

Paris, France


Tuan Anh Nguyen & Thang Hong Pham

National Institute of Hygiene and Epidemiology

Hanoi, Vietnam

This training will be held at the National Institute of Hygiene and Epidemiology in Hanoi (Vietnam) from Monday 27th of March to Saturday 1st of April 2023.

Goals of the training: 

The course aims to build local capacity in epidemiology and to provide participants with a strong working knowledge of the main statistical techniques used in the analysis of epidemiological data using Stata - one of the foremost used statistical packages in scientific research. 

Although participants are instructed in Stata, the basic analytic concepts and techniques covered in the course are applicable to the analysis of data using any statistical package.

At the end of the training, participants will have acquired the following capacities:

  • Use Stata to do some data-management,
  • Apply basic Stata commands to the proper handling of data, including importing and merging of files, generation of variables, and data-management (check for duplicates, missing data, outliers…),
  • Import and export data,
  • Describe the data and perform graphical analysis,
  • Perform comparison tests for categorical and continuous variables, using parametric or non-parametric tests,
  • Properly report and interpret p-values,
  • Perform univariate and multivariate analysis,
  • Construct and interpret logistic regression models,
  • Assess confounding and interaction using regression models.


The training is dedicated to people working in the medical research field (e.g. clinicians, pharmacists, veterinarians, engineers, data-managers…) in Vietnam but also in other countries in South-Eastern Asia.

The training is dedicated to people already having some knowledge of epidemiology and basic statistics (e.g. familiar with normal distributions, p-values and hypothesis testing). No knowledge of the Stata software is required, as an introduction to Stata is part of the training.

The training will be provided in English, and only applications in English will be considered.

Each participant is requested to bring his/her own laptop. 

Duration and organisation of the training

The training lasts six days. Three days are dedicated to an introduction to Stata (opening and merging datasets, creating programmes, graphics, cleaning datasets…) and introduction to statistical tests; then, three days are dedicated to the logistic regression analysis.

The training will be based on introductory lectures completed by individual practical exercises on a computer with one computer per student. Teaching assistants are available full-time to help students in using Stata commands and performing exercises.

Every participant is requested to come with his own laptop.

The training will be in English.

Stata licences will be purchased for this training, at the end of which all participants will be given a licence.

Day 1 to day 3: Stata commands and univariate analysis

  • Introduction to Stata, including basic command syntax, importing and saving data,
  • Merging datasets,
  • Creating and running programs,
  • Command syntax for generation of numerical and string variables, formatting of variables, dates, functions, ordering variables, and assigning labels,
  • Quality control and cleaning of databases, including identification of duplicate observations, discrepancies and missing and extreme values,
  • Summarizing and describing data using the summarize, tabulate, browse, and list commands,
  • Introduction to graphics,
  • Comparing continuous variables between two or more groups (student t-test, Mann-Whitney test, ANOVA, Kruskal-Wallis test, paired t-test, Wilcoxon test),
  • Comparing categorical variables between groups (chi-2, fisher exact tests).

Day 4 to day 6: Logistic regression

  • Introduction to logistic regression, including model assumptions, the logistic function, and interpretation of odds ratios and beta coefficients,
  • Epidemiological concepts of confounding, interaction, and mediation,
  • Evaluating and interpreting confounding in multivariate regression models,
  • Evaluating and interpreting interactions
  • Strategies for constructing multivariate models, including categorization of variables, variable selection, and model interpretation,
  • Assessing model fit,
  • Assessing missingness and using multiple imputation methods,
  • Calculating and interpreting the attributable fraction.

Teaching team

Arnaud FontanetInstitut Pasteur (Paris, France)Epidemiology
Bich-Tram HuynhInstitut Pasteur (Paris, France)Epidemiology
Yoann MadecInstitut Pasteur (Paris, France)Biostatistics


There is no registration fee. Support for transport and accommodation will be provided to participants not from Hanoi.


Your application must include:

  • An application form (download here: Application_form.doc),
  • a cover letter in which you explain your expectations of the course and how it will contribute to your current research/project,
  • a support letter from your manager,
  • a detailed curriculum-vitae (3 pages maximum).

Your application must be sent to before December 31st 2022.

Notifications about decision by January 15, 2023. 


Các bài viết liên quan