Measurement equivalence in comparative survey research


Diana Zavala-Rojas, a member of the European Social Survey Core Scientific Team based at Universitat Pompeu Fabra, Barcelona, has written about measurement equivalence in surveys.

I used to be a very skinny girl... at least that was what my great auntie used to tell me, and each time I went to visit her in the countryside, she tried to feed me with the largest amount of the most delicious homemade food.

In the rural areas of the country where I was brought up, people, in general, have the idea that slim children are somewhat unhealthy. My grandmother - who migrated to the capital city at a young age - shared my great auntie‘s ideas to some extent but had also adopted new ideas. She thought I was very slim without any health problems but kept an eye on my eating, just in case. My mother, born and brought up in the capital, thought I was just a kid with an average weight for my age - something the paediatrician told her at each visit.

If my great auntie, grandmother and mother were answering a survey question about my health, they would have given a different answer because they had arrived at their conclusion from different perspectives. Does that mean that we cannot aggregate the data resulting from this survey? Quite the opposite. The sampling theory tells us that by randomly selecting people that may have very varied opinions on a topic, the distribution of diverse opinions gives the full picture. Computing the mean opinion cancels out differences, allowing us to get the average opinion.

However, when we want to compare those opinions across countries or cultures we do not have a random sample of groups. We have a substantive interest in comparing European countries with each other, collective and individual oriented cultures, autochthonous and migrant populations, age groups within or across countries, gender groups or any other predetermined group composition that we want to compare. Therefore, we need to test if the measurement process of a concept is the same regardless of group membership.

In order to make meaningful comparisons across groups that have an inherent uniqueness, we have to test statistically if those comparisons make sense, by ensuring that the measurement structure of the data is the same. An analogous example is the measurement of temperature. If we want to compare temperature in South Africa and Norway, the same measurement conditions must be applied and, if the understanding of what is a cold and hot temperature are the same, we need to know if the same measurement instrument was used, say a mercury or a digital thermometer and, if not, we should have a way to convert their scales. Moreover, we need to know if the thermometers operate equally at each place. For instance, mercury thermometers can be less precise, thus their level of precision should be the same in both countries.

Depending on the nature of the data and the survey questions, e.g. if the data can be modelled by continuous or categorical distributions, there are several statistical techniques that we can apply to conduct such a test1–3.

The test is a requirement to draw meaningful comparisons using observed data. If the test fails to establish measurement equivalence, does that mean that the data is useless and that we cannot compare countries? No. The test indicates whether we can use the data directly as it was gathered, the observed data to estimate and compare indexes. When the test fails, we may still use other statistical techniques to make meaningful comparisons3,4,5(chap16),6.

Sometimes the test already gives very valuable information on a problem under study. Bart Meuleman and Jack Billet, from University of Leuven, tested if the measures asked in the European Social Survey for religious involvement were equivalent across 20 countries or not. They found that religious involvement was "conceived" very differently in Turkey and that the data cannot be directly compared with other countries in the ESS7(p188). First, because attending services is not a predictor of religiosity, as female observant Muslims are not required to participate but can attend. On the other hand, praying behaviour is different for male Muslim observant respondents. Therefore, using a measure of how frequently a respondent prayed or attended religious services was not a good one to assess religiosity in that country.

In another example, Eldad Davidov  and his colleagues have found that measures of human values are strongly comparable8,9. This research has shown that humans are very diverse in their moral conceptions but that the questions used to measure that are highly equivalent across European cultures.

By adding together data from thousands of individuals like we do in the European Social Survey, researchers can explore empirical relationships and extract the mean opinion for a country on a certain topic, enabling a greater understanding of that society. If, in addition, they test and establish measurement equivalence, statistics are then comparable across countries and can help us all have a better understanding of what different societies really think and how they compare to each other.

Photo: Arthimedes/

1. Davidov E, Meuleman B, Cieciuch J, Schmidt P, Billiet J. Measurement equivalence in cross-national research. Annu Rev Sociol. 2014;40:55-75.
2. Meredith W. Measurement invariance, factor analysis and factorial invariance. Psychometrika. 1993;58(4):525-543. doi:10.1007/BF02294825.
3. Muthén B, Asparouhov T. New methods for the study of measurement invariance with many groups. Mplus statmodel com [1204 2014]. 2013.
4. Davidov E, Dü Lmer H, Cieciuch J, Kuntz A, Seddig D, Schmidt P. Explaining Measurement Nonequivalence Using Multilevel Structural Equation Modeling: The Case of Attitudes Toward Citizenship Rights. Sociol Methods Res. 2017:1-32. doi:10.1177/0049124116672678.
5. Saris WE, Gallhofer I. Design, Evaluation, and Analysis of Questionnaires for Survey Research. Second Edi. John Wiley & Sons; 2014.
6. Steinmetz H. Estimation and Comparison of Latent Means Across Cultures. In: Davidov E, Schmidt P, Billiet J, eds. Cross-Cultural Analysis: Methods and Applications. New York: Routledge Academic; 2011:85-116.
7. Meuleman B, Billiet J. Religious involvement: its relation to values and social attitudes. In: Davidov E, Schmidt P, Billiet J, eds. Cross-Cultural Analysis: Methods and Applications. Taylor and Francis Group New York; 2011:173-206.
8. Davidov E. A Cross-Country and Cross-Time Comparison of the Human Values Measurements with the Second Round of the European Social Survey. Surv Res Methods. 2008;2(1):33-46. Accessed March 10, 2014.
9. Cieciuch J, Davidov E, Algesheimer R, Schmidt P. Testing for Approximate Measurement Invariance of Human Values in the European Social Survey. Sociol Methods Res. April 2017:1-22. doi:10.1177/0049124117701478.