How reliable are self-report-based measurements of skills?

Organizations increasingly recognize that skills and learning go beyond knowledge and test scores. Life skills, social-emotional skills, financial literacy, and various other skills all play a critical role in people’s lives – but they are also much harder to measure.

One of the most common approaches is self-reporting: asking respondents to rate their skills or perceptions by themselves. For example, respondents might be asked to rate how strongly they agree with the statement, like:

“If I try hard, I can improve my situation in life.”

  • Strongly disagree
  • Disagree
  • Agree
  • Strongly agree

This approach is simple, easy to introduce, and widely used in both NGO monitoring and evaluation and academic research. However, understandably, there is sometimes strong skepticism about the reliability of skills measured in this way.

  • Do respondents answer honestly?
  • Do they report what they truly believe, or what they think they should say?
  • Can respondents assess their own abilities in the first place?

In short, self-report measurements can reliably capture skills if the measures are designed carefully and implemented well. In this article, I further explore these four topics.

  1. Self-report can be reliable (or it is not as bad as some people think)
  2. Men might be more likely to overestimate or over-report their skills
  3. Answering what society wants to see – social desirability bias
  4. What should we (always) do when measuring skills?

 

1.     Self-assessment can be reliable (or it is not as bad as some people think)

One reason to believe that self-report-based assessment of skills (social-emotional skills or financial skills) can be reliable is that the skills measured in this way are often well correlated or predictive of skills measured in other approaches, such as direct assessment, game-based assessment, or assessment based on observations. Some, but not all, tools used by NGOs and researchers are validated by checking how closely the self-reported measures are associated with other relevant measures (this is called convergent validity, one of the psychometric properties assessed to validate the measurement). For example, a research in France measured students’ skills (conscientiousness, self-control, and grit) in three different approaches: self-reports, task-based assessments, and teacher observations, and compared the results.[i] While teacher observations were the most predictive of behavioral outcomes (such as school administrative records of absenteeism, tardiness, sanctions, and disciplinary actions) overall, self-reports performed better than task-based assessments. This challenges the common assumption that “objective” methods are always superior. Therefore, it is not right to assume that all self-reported skills are not accurate, or that behavioral or task-based measures are always more accurate than self-reported skills.

It is important to note that a tool or scale validated in one context might not be reliable in another context, hence it is necessary to check if the questions are appropriate for the audience you intend to use the assessment. For example, this study shows that the Big Five inventory of personality traits, one of the most widely used assessments, might not be as reliable when used for non-WEIRD (western, educated, industrialized, rich, and democratic) people.[ii]

 

2.     Men might be more likely to overestimate or over-report their skills

I am not sure if this is a universal phenomenon or if this occurs for many types of skills, but I am aware of two cases that indicate that men might have a higher tendency to overestimate or over-report their skills compared to women. One example comes from research on financial literacy.  According to research, in a questionnaire-based assessment, women are significantly more likely to choose “do not know” as an answer. For example, in the Netherlands, 42% of women chose “do not know” on a question about risk diversification, compared to 26% of men. Similar patterns appear in countries such as the USA, Germany, and Japan[iii]. A more recent study in the Netherlands found that when the “do not know” option is removed, women are more likely to choose the correct answer. In their analysis, women’s lack of confidence in their financial literacy accounts for one-third of the gender gap in financial literacy.[iv]

The second case is a study that analyzed self-report and behavior-based assessment of social-emotional skills of NEET youth in Tanzania.[v] In this study, men reported higher levels of skills across multiple domains in self-reported measures. However, when skills were assessed using behavioral tasks, the gender gap largely disappeared.

The researchers also found that over-reporting (the difference between self-report and behavioral tasks) was more common among:

  • Individuals with lower cognitive ability
  • those with more regressive gender beliefs
  • men who believed that men are generally better at problem-solving and decision-making than women

Based on these results, the researchers suggest that the gender gap in self-reported social-emotional skills is due to men’s overestimation of their skills rather than women’s underestimation/lack of confidence.

3.     Answering what society wants to see – social desirability bias

Another well-known concern of self-reported data is social-desirability bias; the tendency for people to give answers they believe are socially acceptable or desirable, rather than what they truly think.

For example, if we ask “Do you support gender equality?”, many respondents may provide the “right” answer, even if their personal beliefs are different. To address this, well-designed surveys often ask questions indirectly, use multiple questions to capture the same concept, and present scenarios rather than abstract statements.

When does this bias matter most? The impact of social desirability bias depends on the purpose of the measurement. First, if your purpose is to understand the skills or preferences of people without specific comparison (e.g., needs assessment), social desirability bias can be a major issue. For example, if we want to estimate how many people believe intimate partner violence is justified, social desirability bias could lead to serious underreporting.

However, if we measure skills for evaluation purposes or for making any comparison, the issue is more nuanced. In many evaluations, we compare before and after a program, or participants and non-participants. In these cases, desirability bias leads to overestimation or underestimation of the program’s effect if it differs between groups. For example, if the participants are more likely to give socially-desirable answers after the program, the difference between skills before and after the program might be driven by this bias, not the program. If both groups are equally likely to give socially desirable answers, at least the program’s effect you find is not biased (to a certain direction), while the general reliability of the measurement is more questionable.

Nevertheless, this assumption does not always hold.

For example, participants in a life skills program may become more aware of what the “correct” answers are. As a result, they may report improved attitudes, not because their beliefs changed, but because they learned what the “right” answers are (or they might be “parroting” the message taught in the program).

Some researchers went further to assess if the estimated effect of the life skills program is driven by social desirability bias. Their approach to this issue was creative: they measured the tendency of individual participants to provide socially desirable answers and checked if the impact they found was driven by the answers of the participants with a higher social desirability tendency. The social desirability is usually scored based on the answers to some questions that are “too good to be true”, for example, “I am always a good listener.” The results of two studies assessing the influence of social desirability bias in this approach, both in India, indicate that the impact they found is not primarily driven by social desirability.[vi] [vii]

 

4.     What should we (always) do when measuring skills?

What these cases indicate is that self-reported skills can still be a great source of information, but we should be cautious of their limitations.

It is important to remember that any quantitative assessment of these difficult-to-measure skills is only a proxy, designed to understand a big picture of the skills for certain groups/populations, not the full depth of individual abilities. That being said, when collecting data using self-report tools, there are many things we can practice that can significantly improve the reliability of the data collected.

  • Combine multiple sources of information or approaches, such as self-assessment, direct assessment, and observations
  • For 5-point scale questions, include statements that are not so easy to agree on
  • Test the tool with the exact audience you work with
  • Consider if “do not know” options should be included, and if included, check if the ratio differs by gender
  • If you conduct a survey/assessment for program evaluation, conduct an endline assessment after some time (a few months to a year) rather than immediately after the program completion. This can reduce the risk of strong social-desirability bias and parroting.

 

Self-reported measures of skills are not without limitations. They can be influenced by overconfidence, social norms, and survey design. When carefully designed and interpreted, self-reported data can provide valuable, practical, and scalable insights, especially in contexts where alternative methods are costly or infeasible.

 

Tomohisa Miyamoto[viii]

MERL Specialist, Aflatoun International

 

[i] Boon-Falleur, M., Bouguen, A., Charpentier, A., Algan, Y., Huillery, É., & Chevallier, C. (2022). Simple questionnaires outperform behavioral tasks to measure socio-emotional skills in students. Scientific Reports, 12(1), 442. https://doi.org/10.1038/s41598-021-04046-5

[ii] Laajaj, R., Macours, K., Pinzon Hernandez, D. A., Arias, O., Gosling, S. D., Potter, J., Rubio-Codina, M., & Vakis, R. (2019). Challenges to capture the big five personality traits in non-WEIRD populations. Science Advances, 5(7), eaaw5226. https://doi.org/10.1126/sciadv.aaw5226

[iii] Lusardi, A., & Mitchell, O. S. (2014). The economic importance of financial literacy: Theory and evidence. American Economic Journal: Journal of Economic Literature, 52(1), 5–44.

[iv] Bucher-Koenen, T., Alessie, R., Lusardi, A., & van Rooij, M. (2021). Fearless Women: Financial Literacy and Stock Market Participation. NBER Working Paper Series, (Working Paper 28723).

[v] Cassidy, R., Das, S., Delavallade, C., Kipchumba, E., & Komba, J. (2026). Do men really have greater socio-emotional skills than women? Evidence from Tanzanian youth. Journal of Behavioral and Experimental Economics, 121, 102530. https://doi.org/10.1016/j.socec.2026.102530

[vi] Dhar, D., Jain, T., & Jayachandran, S. (2022). Reshaping adolescents’ gender attitudes: Evidence from a school-based experiment in India. American Economic Review, 112(3), 899–927.

[vii] Edmonds, E., Feigenberg, B., & Leight, J. (2021). Advancing the agency of adolescent girls. The Review of Economics and Statistics, 1–46.

[viii] I appreciate the review and suggestions by Klára Opršalová.