Advanced and Specialised Research Methods

Upcoming courses

Workshop: Basics of structural equation modeling

September 18th, 19th, 21st, 22nd, 2023, at 9:30-16:00

Credits: 2

Grading: Pass/fail

Prerequisite: Students should have (some) basic knowledge of the R software.

Description: This workshop will treat three basic elements of structural equation modelling: (1) path analysis, (2) latent variables and factor analysis, and (3) causality. The basic principles of path analysis (including Wright’s tracing rules to find the relations between variables in the model) and drawing path diagrams are introduced, together with the R-software package lavaan, to estimate path models. The use of latent variables in path models and the consequences of using latent variables for estimation (identification) of the models are discussed, as well as the psychometric quality (reliability, validity) of the measurement models for the latent variables. Other important topics are model testing, model comparison and goodness-of-fit measures.

These topics are all treated against the background of causal models (causal graphs), where path models are used to inspect causal relationships between concepts (variables). Examples of mediation analyses and Simpson’s paradox will be discussed. To make students more familiar with the procedures and the lavaan-software, exercises are provided. Students will work on these exercises during the workshop, and important findings/answers will be discussed.


Marku Huisman.Mark Huisman is an assistant professor of statistics and statistical methods of the department of Sociology at the University of Groningen. After his study of econometrics, he started working at the faculty of social sciences, specializing in missing data and multiple imputation, statistical methods for social network analysis, and structural equation modeling. He is a teacher of basic and advanced statistics courses, and one of his main tasks in the department is helping other researchers with statistical issues.






Wendy Post.Wendy Post is an associate professor of statistics and methodology of the department of Child and family welfare. Her expertise lies in advanced statistical models (structural equation models including causality, mixed effects models in longitudinal studies), and nonparametric item response theory models, especially unfolding models. Her teaching focuses on applied statistics on basic and advanced levels.






Past courses

Workshop on Interactions in Statistical Models

November 3rd, 10th and 24th, 2022, 12–4 pm

Elina Kilpi-Jakonen, INVEST Flagship Centre, University of Turku

The course introduces students to quantitative research using interactions. The course covers how interactions work in statistical models and what types of research questions require interactions. The focus of the course is on implementing interactions in Stata, producing graphs to visualize the results, and learning to interpret the results.

The course is taught in the IT class on November 3rd, 10th and 24th at 12–4 pm.

In order to take the course, students should have completed the multivariate methods course or its equivalent. Good knowledge of Stata is also required. Students can obtain 1 ECTS through participation and completing coursework.

Elina Kilpi-JakonenElina Kilpi-Jakonen is an Assistant Professor of Sociology and Academy Research Fellow at the INVEST Research Flagship Centre. Her research interests focus on social inequalities related to social origin, education and migration background. She completed her doctorate at the University of Oxford in 2010 and has worked as a postdoc at the universities of Oxford, Bamberg and Turku as well as the European University Institute in Florence. She was previously the scientific programme co-ordinator of the NORFACE-funded Dynamics of Inequality Across the Life-course (DIAL) research programme.




Introduction to latent (and related) variables

May 18, 2022, 9:15-15:00

Samuli Helle, Department of Social Sciences, University of Turku

The aim of the workshop is to provide an introduction to unobserved latent variables, and related variables like composites or emergent variables, and how researchers can use them to represent scientific concepts they are interested in. Being able to account for measurement error in scientific constructs, latent variables provide more reliable causal inference and thus possibly more generalized scientific answers. Basic knowledge of statistics (e.g. regression analysis) and Rstudio are assumed (all the code needed e.g. in the assignment are provided in the course).

The workshop will be held online. Zoom link to follow closer to date for those who have enrolled in Moodle. For those who want credits (1 cr for a small assignment), more details are provided in the workshop.

Samuli Helle.

Samuli Helle is a senior researcher in the NetResilience and INVEST Joint Research Center at the University of Turku. He obtained his PhD in evolutionary biology from the University of Turku in 2004.

Current research interests include human evolutionary demography and intergenerational relations.

His work has been published for example in The Lancet, Science, Nature, Nature Communications and PNAS.




Introduction to Sequence Analysis in the Social Sciences

March 16-18, 2022

Emanuela Struffolino, University of Milan
Marcel Raab, Ifb – Staatsinstituts für Familienforschung an der Universität Bamberg


Day 1
09.00 – 10.00 Session 1. Introduction Welcome | motivation | exemplary applications
10.00 – 10.15 Break
10. 15 – 11.15 Session 2. Longitudinal data management & creating sequence data Introduction to pairfam data | data management in Stata & R
11.15 – 11.30 Break
11.30 – 12.30 Session 3. Defining & describing sequence data Basic concepts & terminology | Defining sequence data | Basic description (state distribution, durations, episodes, transitions) | Modal and representative sequences
12.30 – 13.30 Lunch
13:30 – 14.30 Session 4. Hands-on
14.30 – 14:45 Break
14:45 – 15.45 Session 5. Visualization of sequences Choosing colors | Data summarization graphs | Data representation graphs
15.45 – 16:00 Break
16:00 – 17:00 Session 6. Hands-on

Day 2
09.00 – 10.00 Session 7. Assessing sequence complexity and quality Unidimensional measures | composite indices
10.00 – 10.15 Break
10.15 – 11.15 Session 8. Hands-on
11.15 – 11.30 Break
11.30 – 12.30 Session 9. Dissimilarity measures I Basics: Optimal matching | Cost specification
12.30 – 13.30 Lunch
13.30– 14.30 Session 10. Dissimilarity measures II Criticism of optimal matching | Variants of optimal matching
14.30– 14.45 Break
14.45– 15.45 Session 11. Hands-on
15.45– 16.00 Break
16.00 – 17.00 Session 12. Cluster Analysis Introduction to cluster analysis | cut-off criteria | cluster quality

Day 3
09.00 – 10.00 Session 13. Hands-on
10.00 – 10.15 Break
10.15 – 11.15 Session 14. Regression analysis with clusters Predicting cluster membership | Using clusters as predictors | Hands-on
11.15 – 11.30 Break
11.30 – 12.30 Session 15. Multichannel sequence analysis Theory | Hands-on
12.30 – 13:30 Lunch
13:30 – 14:30 Session 16. Selection of more recent advances Comparing groups (discrepancy analysis, BIC, LRT, regression trees) | Implicative statistic | Combining sequence and event history analysis | Combining causal analysis (matching) and sequence analysis
14:30 – 14:45 Break
14:45 – 15:45 Session 17. Hands-on
15:45 – 16.00 Break
16.00 – 17.0 Session 18. Wrap-up and Q&A

Workshop on Sibling Fixed Effects Models

October 12 and 14 2021

This workshop provides an introduction to fixed effects models with a focus on sibling fixed effects. In these models, one sibling serves as the control for another sibling. By these means, sibling fixed effects models control for unobserved variables that are shared among siblings.

The workshop is structured into three parts. The first part discusses the central ideas motivating sibling fixed effects models and how they relate to individual fixed effects models. This introduction requires no prior knowledge of fixed effects models.

The second part of the workshop provides practical exercises using Stata, allowing participants to implement and interpret these (sibling and other) fixed effects models themselves.

The third and final part of the workshop uses some published studies using sibling fixed effects models as examples of how the approach can actually be implemented. This part also discusses the assumptions underlying sibling fixed effects models, their limitations, and the critiques which have emerged against them.


Tuesday, October 12, 11.30–13.00
Part I: An Introduction to Fixed Effects and Sibling Fixed Effects Models

Tuesday, October 12, 15.30–17.00
Part II: Practical Exercises using Stata

Thursday, October 14, 10.00–13.00
Part III: Examples of Sibling Fixed Effects Studies, their Assumptions, Limitations, and Critiques

SIBLING-FIXED-EFFECTS programme (.pdf)

Some preliminary readings (examples of sibling fixed effects models):

Barclay, Kieron J., and M. Myrskylä. 2016. “Advanced Maternal Age and Offspring Outcomes: Reproductive Aging and Counterbalancing Period Trends.” Population and Development Review 42:6994.

Duncan, Greg J., W. Jean Yeung, Jeanne Brooks-Gunn, and Judith R. Smith. 1998. “How Much Does Childhood Poverty Affect the Life Chances of Children?” American Sociological Review 63:406–23.

Elstad, Jon I., and Anders Bakken. 2015. “The Effects of Parental Income on Norwegian Adolescents’ School Grades: A Sibling Analysis.” Acta Sociologica 58:265–82.

Grätz, Michael. 2018. “Competition in the Family: Inequality between Siblings and the Intergenerational Transmission of Educational Advantage.” Sociological Science 5:246–69.


Michael Grätz is a lecturer (Maître assistant Ambizione) in sociology at the University of Lausanne where he conducts a research project financed by an Ambizione grant of the Swiss National Science Foundation. The project estimates the effects of demographic behavior on the intergenerational transmission of advantage. He is also an associate professor (docent) at the Swedish Institute for Social Research (SOFI), Stockholm University. He received my PhD in Political and Social Sciences from the European University Institute (EUI) in 2015. His research interests are in the fields of child development, social stratification, and social demography. His research aims at contributing to our understanding of the intergenerational transmission of advantage.

Workshop on Finnish Registry data

 19 May 2021 10.15 – 13.45

By Sanna Kailaheimo-Lönnqvist

The aim of this workshop is to provide an introduction to Finnish registry data. The workshop introduces different kinds of Finnish registers, how to get them to scientific use, the basic structure of the data, and most importantly, what kind of research questions can be answered using registry data. The workshop includes both discussion and lectures. After the workshop, students know some benefits and limitations of registry data and know what kind of studies can be conducted using registry data.

1 credit: Short-essays and participation in the discussions.

Sanna Kailaheimo-Lönnqvist is a researcher in the Finnish National Rescue Association and visiting researcher in various institutions such as in the Institute of Criminology and Legal Policy at University of Helsinki. She obtained a PhD in sociology from the University of Turku in 2021. She has published articles in international peer-reviewed journals such as Demography, Research on Social Mobility and Stratification and European Societies.

Her main research interests are in the areas of social inequality and intergenerational relations. In the doctoral thesis she examined how resources and different life events are linked with children’s adulthood outcomes. She has conducted all her research using Finnish and Swedish register data.

Causal Inference for nonexperimental data

 17-19 March 2021 (12 hours)

By Bruno Arpino, Department of Statistics, Computer Science, Applications, University of Florence, Italy

What is the effect of smoking on health? Does having an additional child increase the risk of poverty? Are development policies targeted on small firms effective in increasing investments?

Most studies in the social sciences are motivated by questions that are causal in nature.

However, in these areas experiments are not always possible because of ethical or practical reasons and the estimation of causal effects has often to rely on observational studies. The validity of inference will then strictly depend on the plausibility of the assumptions underlying the employed statistical techniques.

This course will cover some of the most popular techniques for estimating causal effects with observational data: propensity score matching, instrumental variable regression, regression discontinuity designs and fixed effects models. Special emphasis will be placed during the course on discussing the plausibility of the identifying assumptions, the data requirements and other practical and theoretical challenges for the implementation of each method.

This short course will offer participants theoretical and applied perspectives on the covered topics. Examples will be drawn from political science, sociology, economics, public health and policy evaluation. Lab sessions will demonstrate the implementation of the covered techniques using the software STATA.

More information and the programme of the course

Bruno Arpino is an associate professor at the Department of Statistics, Computer Science, Applications, University of Florence (Italy). Previously he was an Associate Professor at the Department of Political and Social Sciences, Universitat Pomepu Fabra (UPF) and co-director of the Research and Expertise Centre on Survey Methodology (RECSM, UPF). He obtained a PhD in Applied Statistics from the University of Florence in 2008.

His main research interests are in the areas of causal inference, applied statistics and social demography. From a substantive point of view, he has been studying intergenerational relationships, ageing and health, fertility and immigrants’ assimilation.

He has published articles in international peer-reviewed journals such as The Annals of Applied Statistics, Demography, European Sociological Review, The Journals of Gerontology: Series B, Journal of Marriage and Family, Journal of the Royal Statistical Society – A and C, Proceedings of the National Academy of Sciences (PNAS), Statistics in Medicine. Since October 2017 he is member of the Editorial Board of Statistical Methods and Applications.


Introduction to R

by Simon N. Chapman, INVEST Flagship, Department of Social Research, University of Turku, Finland

6-7 April 2021

Many researchers now work almost exclusively in the R programming environment, but what is R and how does one even get started? Why should I use R over (preferred statistics program)? What is a function and what is a package? What do those square brackets mean? 

Learning R can seem like a daunting task, like trying to climb Everest with no tools and no mountaineering experience. There is no need to fear though: as the famous idiom goes, the best way to eat an elephant is one bite at a time. In this course, we will break the basics of R into bite-sized chunks.

This course aims to give participants a starting point for working within the R environment, and is suitable for absolute beginners and more experienced programmers alike. Learn to import and export objects, view and summarise datasets, create variables, and much more.

Simon Chapman is a senior researcher in the INVEST Flagship at the University of Turku. He obtained his PhD in evolutionary biology from the University of Turku in 2020. His current interests are intergenerational relations, cooperation and conflict within families, life history evolution, and the impacts of parental leave on the life-course. His work has been published, amongst others, in Current Biology, Nature Communications, Evolution & Human Behavior, and Biology Letters.




Longitudinal social network analysis with RSiena 27.-29.10.2020

by: Tom Snijders

The course will be online. It will consist of an alternation of lectures, Q&A sessions, and practical work in breakout groups of 2-4 participants.

It is assumed that the participants have a good basic understanding of statistical methods, including regression and logistic regression; a good understanding of the basics of social network analysis (e.g., the textbook by Borgatti, Everett, and Johnson); and a good working knowledge of R.

The program is tentative, especially for the later days, and will be adapted to the interests of the participants.

Tom A.B. Snijders ( is professor of Statistics and Methodology in the Social Sciences at the University of Groningen and emeritus fellow of Nuffield College, University of Oxford. He studied mathematics and obtained a PhD in 1979 from the University of Groningen with a dissertation in mathematical statistics. His research concentrates on social network analysis and multilevel analysis. His work on developing statistical methodology for network dynamics is implemented in the software package RSiena (Simulation Inference for Empirical Network Analysis) in the statistical system R. With Roel J. Bosker he wrote Multilevel Analysis; An Introduction to Basic and Advanced Multilevel Modeling (Sage, 2nd ed., 2012). Combining these two research strands, together with Emmanuel Lazega he edited Multilevel Network Analysis for the Social Sciences; Theory, Methods and Applications (Springer, 2016). Together with Patrick Doreian he was co-editor of Social Networks from 2006 to 2011. He supervised and co-supervised more than 60 PhD dissertations. From 2002 to 2006 he was scientific director of the graduate school ICS (Inter-university Center for Social Science Theory and Methodology). He received two awards from INSNA (International Network for Social Network Analysis): the Georg Simmel Award in 2010 and the Bill Richards software award in 2017; and honorary doctorates from the University of Stockholm (2005) and the Université Paris-Dauphine (2005).

Introduction to Social Science Genetics 10.-12.11.2020

by: Felix Tropf

A growing number of social science data sources are providing molecular genetic data and researchers all over the world are interested in utilizing this information in order to better understand various social phenomena. In this course, we will learn about the history of social science and behaviour genetics as well as about the state of the art research and cutting-edge methods. After attending this workshop, participants should have a basic understanding of the fundamental advantages of integrating genetics into social science. They should understand the basic technical terms from quantitative genetics literature and be able to read and interpret studies concerning social science genetics. They should be able to conduct basic quantitative genetics analyses and interpret their findings. Participants need an interest and a basic understanding of quantitative social science research and some experience concerning the software R & Stata.

We will start with a general introduction of genetics in social sciences discussing potential research questions we can answer using genetic data. We subsequently learn about the theory behind twin and family models and how to estimate heritability as the proportion of observed variance in an outcome, which is explained by genetic effects. We move on to see how heritability is measured using molecular genetic data and discuss various challenges and applications. We use Plink software to prepare and analyze genetic data and GCTA software to estimate quantitative genetic models.

We will discuss how to genetic variants are discovered, which are associated with social science outcomes of interest and how we can utilize these results in social science research in terms of controlling for confounding effects, dealing with genetic heterogeneity in social science models, estimating gene-environment interaction models and using genes as instrumental variables. Substantively, we will rely on recently published genetic discovery studies on educational attainment, subjective well-being and fertility.

Felix is a sociologist and his current interests focus on social demography, genetics, and the life course. He is an Assistant Professor in Social Science Genetics at CREST/ENSAE, an Associate member of Nuffield College in Oxford and a Visiting Scientist at the Queensland Institute for Medical Research (QIMR) in Australia. He received the European Demography Award for best PhD Thesis. Felix’ research has been published, amongst others, in Demography, Nature Genetics, Nature Human Behaviour, JAMA Psychiatry, Proceedings of the National Academy of Sciences and Population Studies.

Advanced Causal Inference with Observational Data, 2-4 ECTS

by Moris Triventi

The aim of this course is to provide an introduction to the identification and estimation of causal effects using observational data typical of the social sciences. Each theoretical lesson is complemented by a laboratory/computer session in which the Stata software is used to analyze real-world data. Requirements: the students are expected to have basic knowledge of statistics (descriptive, inferential) and linear regression. Basic knowledge of Stata (files management, data preparation) is also warranted.

Moris Triventi, PhD, is Associate Professor in the Department of Sociology and Social Research at the University of Trento (Italy), where he teaches Quantitative Research Methods and Sociology of Education. From 2013 to 2016 he was Research Fellow at the European University Institute (Fiesole, Italy). His research interests comprise social inequalities, education, crime, migration and policy evaluation. His works have been published, among others, in Annual Review of Sociology, Policy Sciences, International Migration Review, and European Sociological Review.





Experimental Social Science – Lab and Field Experiments, 5 ECTS

by Lauri Sääksvuori

This course is about experimental social science. Students will learn to understand how to gather data using experimental methods and how various experimental designs relate to different statistical methods. After the course, students know how to design meaningful experiments and draft implementation and analysis plans to run the experiments in practice.

Behavioral Genetic Modeling using Twin data

by Tina Baier

The aim of this course is to introduce social scientists to twin studies and the related quantitative methods of behavioral genetic analysis. The first part of the course provides the relevant background and introduces the main concepts used in quantitative genetics. The second, applied part uses the statistical software Stata and the “acelong-package” developed for behavioral genetic modeling. Prerequisites: Participants should have basic knowledge of Stata 14 and regression analysis. A basic understanding of multilevel modeling is an advantage.