Frequently asked questions

INVEST Full Population Data

The Utilisation of population register data and genetic data in research on intergenerational welfare and health inequalities (INVEST Full Population Data) project studies socio-economic and health inequalities in childhood, youth and early adulthood. It studies mechanisms that connect the socio-economic positions and health factors of parents and children and how the institutions and services of a welfare state can influence these mechanisms. The project utilises population register and genetic data, which will be combined into unique data sets.

The aim of the research is to produce new, internationally unique information with a view to reducing intergenerational welfare and health inequalities. The research is carried out within the INVEST Research Flagship Centre. Operated by the University of Turku (UTU) and the Finnish Institute for Health and Welfare (THL), the centre studies inequality, intervention and the new welfare society.

The research project utilises genetic data from THL’s health surveys for the same purpose it was originally collected for. The genetic data is summarised as a risk score describing the genetic propensity for a particular disease or other health-related factor. Individuals cannot be identified based on their risk score.

1. Ethical questions related to the use of the data

Why is the data being used? Why is this socially important?

The data provides new information on the intergenerational importance of social and biological factors over the research subjects’ life cycle and in changing institutional and historical circumstances. It helps to conduct a research that aims to improve the socio-economic and health-related well-being of Finnish children, adolescents, and young adults as well as to increase their equality not only as citizens but also as users of the services provided by society. The study, which covers the full population of Finland, will benefit the entire Finnish society.

The research will be carried out according to good scientific practice. The research follows the principles endorsed by the scientific community: honesty, overall carefulness and accuracy in the research work, the recording and presentation of the results, and the assessment of the studies and their results. The research uses ethically sustainable data collection, research and assessment methods that meet the scientific research criteria. The research results will be published according to the principle of openness that is a natural part of scientific information, and they will be released according to responsible scientific communication practices.

Where does the data come from?

Most of the data is collected from different registers (THL, Statistics Finland, Kela, etc.) The registers are described in detail in the data catalogue and privacy notice on the INVEST website. The genetic data is specified on the basis of blood samples collected by THL and its predecessor the National Public Health Institute of Finland (KTL) in connection with population representative surveys:

  • The National FINRISK Study, years 1992, 1997, 2002, 2007 and 2012
  • Health 2000 and Health 2011 Surveys
  • FinHealth 2017 Study

THL/KTL has collected data and samples in its health surveys and studies as part of its statutory duties. These duties are related to monitoring and studying the health and well-being of the Finnish population.

Data for the National FINRISK Study were collected in 1992–2012 once every five years in the following areas in Finland: The regions of North Karelia, North Savonia, Kainuu and North Ostrobothnia (former Oulu Province), Turku and Loimaa, five municipalities in Southwest Finland (Aura, Oripää, Punkalaidun, Pöytyä and Ypäjä), and the cities of Helsinki and Vantaa.

Data collection for the Health 2000 Survey and its follow-up study, Health 2011 Survey, took place in 2000–2001 and 2011–2012 in a total of 80 municipalities across Finland.

The National FinHealth 2017 Study was conducted in 2017 in 50 municipalities across Finland.

In the aforementioned studies, data was collected through interviews, questionnaires and comprehensive health inspections that also involved blood sampling. Genetic data has been specified based on the DNA extracted from these blood samples.

I have participated in a survey that is used as a source of data in the project. Can I be identified?

No. Genetic data is summarised as a risk score that describes the genetic propensity for a certain disease or other health-related factor. Individuals cannot be identified based on the risk score.

What does risk score mean? How is it calculated and interpreted?

The individual risk score summarises the combined effects of millions of gene variants on the development of a disease or feature. The risk score is also known as the polygenic score estimate (PGS). In order to calculate the risk score, a genome-wide association study (GWAS) is needed for the disease or health-related factor in question. Risk score calculations use software specifically designed for this purpose, such as PRS-CS.

What can be deducted on the basis of the risk score? What cannot be deducted on the basis of the risk score?

The risk score enables deducting whether the person has a higher, lower or an average genetic risk of developing a certain disease, relative to the population at large. Similarly, it enables determining whether the person is more likely to take big risks in life or obtain a higher level of education, based on their genotype. In addition to the risk score, diseases and features are affected by the environment and genetic factors that are not covered by the risk score, such as rare risk variants or changes in the copy number. The risk score is calculated on the basis of common gene variants. The studied feature must be heritable to some degree in order to be studied with the risk score.

According to previous studies, a harmful lifestyle and genetic propensity jointly affect the emergence of various common diseases.

The risk score does not enable determining whether a person will fall ill or not. A high risk score is only one of several risk factors that increase the likelihood of developing a disease.

Do the risk scores predict, for example, education level or mental health problems?

The risk score does not describe the likelihood of falling ill or the absolute risk. Instead, it describes the relative risk: how likely it is for a person for developing a certain disease compared to other people.

Where can I find more information on the data?

You can find more information about the data on the project website, the privacy notice and the public notice by THL.

2. Questions related to privacy

How is the privacy of the data ensured?

Confidentiality

All the data used in the study is confidential. The data is analysed in the secure FIONA remote access environment that is maintained by Statistics Finland. Accessing the data requires an access permit, a data security analysis (covering also the working premises), as well as two-factor authentication when logging into the remote access environment.

Statistics Finland is responsible for combining the data sets. Statistics Finland’s Research Services removes personal identification numbers from the data during the combination process and replaces them with random research codes. The genetic risk score calculated by THL does not enable identifying individuals. In accordance with the terms of use of Statistics Finland, no individual or identifying data may be reported from the data. With regard to this information, Statistics Finland carries out a preliminary inspection before reporting the results.

Storing the data

The data is stored on the secure services of Statistics Finland’s FIONA operating environment. The genetic risk scores are also stored in THL in a secure manner for possible future research. The combined data stored at Statistics Finland will be deleted at the end of the research.

Why did you not tell me personally that my data will be used in research?

No new data is collected in this study. The data includes register data that has been collected for administrative purposes and genetic data collected in connection with previous studies examining population health.

Finnish authorities collect register data for statistical and administrative purposes. Although the data of these registers was not originally collected for research use, the law permits the controller to grant access to the data for research use (the Act on the Secondary Use of Health and Social Data and the Act on the National Institute for Health and Welfare). Using the registered data for scientific research that serves the public interest does not require the data subjects’ consent.

The genetic data has been specified from the blood samples collected by THL in its population surveys for its statutory duties of monitoring and studying population welfare and health (668/2008). The data subjects have given their informed consent to participate in population surveys, and they have been told that their samples will be used for research purposes. On its website, THL describes the current uses of the data and the rights of data subjects on the privacy notices for the population surveys.

In the INVEST project, the approach is more strongly social, which is why the project was also announced to the data subjects by a public notice in March 2022.

Where can I find information about my rights and the use of my data?

Saat lisätietoja THLn väestötutkimuksiin osallistuneiden tutkittavien oikeuksista tutkimusten tietosuojailmoituksista:

More information on the rights of data subjects in THL’s population surveys is available in the privacy notices:

Participants may also ask more information via e-mail at: finterveys@thl.fi.

Who is the controller and the processor of the data?

THL and the University of Turku are the controllers and processors. The data has no other processors.

Who may access the data?

Only authorised researchers at the INVEST Research Flagship Centre may access the data.

Can the data be leaked to outsiders?

All the data used in the study is confidential. The data is analysed in the secure FIONA remote access environment that is maintained by Statistics Finland. Accessing the data requires an access permit, a data security analysis (covering also the working premises), as well as two-factor authentication when logging into the remote access environment.

Statistics Finland removes the personal identification numbers that enable identifying individuals from the data. In accordance with the terms of use of Statistics Finland, no individual or identifying data may be reported from the data. With regard to this information, Statistics Finland carries out a preliminary inspection before reporting the results.

How does disclosing data for research use differ from, for example, a social media platform sharing my data?

INVEST Research Flagship Centre does not share your data to outsiders. The data will not be used, for example, for decision-making concerning individuals or for marketing purposes.

How long will my data be used by the researchers?

The data will be used until further notice.

3. Questions about the research project

What does the project study?

The Utilisation of population register data and genetic data in research on intergenerational welfare and health inequalities (INVEST Full Population Data) project studies socio-economic and health inequalities in childhood, youth and early adulthood. It studies mechanisms that connect the socio-economic positions and health factors of parents and children and how the institutions and services of a welfare state can influence these mechanisms.

Where can I find more information on the research?

More information is available on the website of the research project.