# Should samples be weighted to reduce selection bias in online surveys during the COVID-19 pandemic? Data from seven data sets | BMC Medical Research Methodology

### Description of age and sex using simple unweighted and weighted methods

Table 2 shows the distribution of age and sex in the seven datasets using simple unweighted and weighted methods. The proportions differed by age and sex. For example, in the first data set, a high relative difference was mainly observed in participants over 45 (250%); a similar result was found in the third data set for age

### Description of variables using weighted and unweighted methods

Table 3 summarizes the description of the dependent variables (DV) and independent variables (IV) using simple (unweighted) and weighted methods. The weighting applied to the demographic characteristics shows small relative differences, and the values are very similar between the two groups, whether the variables are continuous or categorical. The bivariate analysis between the independent variables and the dependent variables is shown in Supplementary Table 1.

### Correlation between unweighted and weighted values

A strong positive correlation was found between the values of the weighted and unweighted data when taking into account the values of sex, age, dependent variables and independent variables (*r*= 0.918, *p*r= 1.000, *p*r= 0.824, *p*r= 0.780, *p*= 0.001), and independent variables (*r*= 1.000, *p*

### Correlation between the relative differences of the variables and the measure of association

A strong correlation was found between the relative age difference (*r*= 0.863, *p*= 0.012) and the sample size (*r*= -0.891, *p*= 0.007) with the adjusted OR relative difference (Table 4). No significant association was found between the relative difference in the adjusted OR, sex and the relative differences in the independent variables.

### Multivariate analysis comparing weighted and unweighted samples

Table 5 presents the results of the weighted and unweighted multivariate models (linear or logistic regressions), showing the differences between the models.

In the first dataset (*NOT*= 310), the association of the independent variable (attitude towards COVID-19) with the dependent variable (practice towards COVID-19) remained non-significant (*p*-value > 0.05) between the two methods used. However, there was an increase in the relative difference of 133.33% between the unweighted and weighted values.

In the second dataset (*NOT*= 509), the association of the independent variables (fear of COVID-19 and financial well-being) with the dependent variables (stress, anxiety and insomnia) remained significant in both methods when considering the three dependent variables, except for the model where the dependent variable was anxiety (LAS-10). In the latter, the financial well-being scale (VI) produced a significant association in the unweighted regression (*p*= 0.02) but a non-significant result in the weighted regression (*p*= 0.38). The weighted beta value was 98% lower than the unweighted beta value.

In the third dataset (*NOT*= 202), the association of the independent variable (fear of COVID-19) with the dependent variables (knowledge and practice) was not significant in the unweighted sample. However, a statistically significant association was found in the weighted sample. A relative increase in beta value was found for gender in the weighted method, with a 150% decrease in beta for the independent variable. When considering the attitude scale as the dependent variable, no significant association was found between VI and DV using both methods.

In the fourth dataset (*NOT*= 2336), the association of the independent variable (preventive measures scale) with the dependent variable (having been diagnosed with COVID-19 or not) was not significant in the unweighted sample. However, a statistically significant association was found in the weighted sample. The relative differences in OR varied between -1% and 1% after weighting.

In the fifth dataset (*NOT*= 324), the association of the independent variables (soft skills and emotional intelligence) with the dependent variable (burnout scale) gave different results. It was significant for soft skills in both methods, while emotional intelligence remained insignificant when using both methods, with p-value tending to be significant in the weighted sample. A negative relative difference was found for the independent variable after weighting.

In the sixth data set (*NOT*= 405), the association of the independent variable (knowledge scale) with the dependent variable (stigma discrimination scale) was significant in both methods. Fear of COVID-19 and anxiety remained non-significant when using both methods. A decrease or increase in the relative difference was found after weighting.

In the seventh data set (*NOT*= 410), the association of the major independent variables (fear of COVID-19 and anxiety) with the dependent variable (eating behaviors) was significant in both methods. The boredom scale remained insignificant when using both methods. Relative differences varied after weighting.

### Analysis of secondary data: factors affecting the relative change of the main measures of association

Table 6 shows the association between age, gender, differences in independent variables (between sample and population), significance of associations, and sample size with relative change in major association. The results showed that a larger sample size (Beta = -0.001, *p*= 0.001), a higher gender gap (Beta = -0.007, *p*= 0.003), and the presence of a significant association between weighted age and DV (Beta = -0.221, *p*= 0.013) would significantly reduce the relative change in the major association. However, a higher age gap (Beta = 0.010, *p*= 0.005) was significantly associated with a higher relative change in the primary association. In terms of absolute impact, the highest impact on the measure of association was related to sample size, followed by relative age difference, relative sex difference, and finally, significance. of the association between the weighted age and the dependent variable.