TheMindReport

A German Psychological Society task force proposes a two-phase, quality-focused alternative to metric-driven evaluation.

New guidance argues that impact factors and the h-index are poor tools for judging individual researchers and can intensify “publish or perish” incentives. A task force from the German Psychological Society proposes four principles for responsible research assessment in psychology, plus a two-phase process that blends indicators with qualitative review. The proposal broadens what counts as research contributions and pushes evaluators to weigh quality and methodological rigor more than raw output.

Quick summary

  • What the study found: A task force report proposes four principles and a two-phase hiring/promotion procedure that uses indicators for efficiency, then a qualitative, discursive assessment for shortlisted candidates; it also expands recognized contributions to include datasets and research software.
  • Why it matters: It offers a practical path away from over-reliance on impact factors and productivity metrics toward evaluations that better reflect rigor and research quality.
  • What to be careful about: This is a proposal, not evidence that the procedure improves decisions; “quality” judgments can still be inconsistent without clear criteria and trained evaluators.

What was found

The report challenges the use of journal impact factors and other productivity indicators, including the h-index, as tools for assessing individual researchers. It argues these metrics are invalid for that purpose and can help sustain a damaging “publish or perish” culture. In response, it proposes an alternative model for hiring and promotion in psychology built around responsible research assessment.

The report sets out four principles for responsible research assessment (the abstract does not list them individually). It also suggests a two-phase assessment procedure designed to keep some benefits of indicators—namely objectivity and efficiency—while avoiding metric dominance. In this model, indicators help structure early stages, while shortlisted candidates receive a qualitative, discursive evaluation.

Two emphasis shifts are central. First, the range of relevant research contributions is broadened beyond research papers to include published datasets and research software. Second, evaluation should place greater weight on research quality and methodological rigor, rather than quantity-based outputs.

What it means

For busy committees, the practical message is straightforward: stop letting a single number stand in for scientific value. The report does not argue against all indicators; instead, it recommends using them in a limited, structured way and then moving to deeper review when stakes and uncertainty are highest. This approach aims to prevent the common shortcut where “high-impact journal” becomes a proxy for “high-quality scientist.”

Broadening recognized outputs changes incentives. When datasets and research software are treated as first-class contributions, researchers who invest in infrastructure, transparency, and reuse can be rewarded rather than penalized for producing fewer traditional papers. In psychology, where reproducibility and analytic choices can strongly shape results, elevating rigor and methods also aligns assessment with what actually supports trustworthy knowledge.

The emphasis on qualitative, discursive evaluation is an attempt to make “quality” something committees actively articulate. Discursive review forces evaluators to name what they value—clarity of methods, robustness of evidence, openness of materials, theoretical contribution, and coherence of a research program—rather than outsourcing judgment to rankings. It also makes disagreements visible, which can be a feature rather than a bug when handled with clear standards.

Where it fits

This proposal sits inside a broader shift often described as “responsible research assessment,” which pushes institutions to evaluate research on its content and contributions rather than venue prestige. The report explicitly frames itself as a response to criticism of impact factors and productivity metrics and as aligned with initiatives calling for alternatives that better reflect quality. It is presented as field-specific guidance for psychology rather than a generic policy statement.

It also reflects a well-known tension in selection systems: committees want efficiency, but efficiency tools can become the decision. Indicators feel objective and fast, and they can be useful for organizing information across many applications. The risk is that they quietly become the criteria, narrowing the definition of excellence and reinforcing behaviors that optimize for counts, not craft.

Recognizing datasets and software as contributions connects to how modern research is produced. Many major advances depend on reusable code, curated data, and tool-building that rarely maps neatly onto “paper count.” When assessment catches up to this reality, it can reduce hidden labor and make collaboration and sharing more attractive career moves.

Within psychology specifically, the focus on methodological rigor echoes widely accepted priorities: careful design, transparent reporting, and analytic discipline. None of that guarantees a “right” answer, but it raises the odds that results are interpretable and cumulative. The report’s logic is that career evaluation should reward those properties directly.

How to use it

If you sit on a hiring or promotion committee, use the two-phase idea as a workflow redesign. In the first phase, use indicators only to support efficiency—organize the pool, identify broad patterns, and ensure no candidate is ignored because their work is distributed across formats. Then, commit to a second phase where judgment is explicit and evidence-based: a structured qualitative review of shortlisted candidates.

Make “broader contributions” real by requiring candidates to present them clearly. Ask applicants to include selected datasets and software in their research portfolio alongside papers, with short statements of what each contribution enabled. Encourage candidates to describe reuse, maintenance, documentation, and how these outputs supported robust research—without assuming that any one signal automatically equals quality.

To emphasize methodological rigor, decide in advance what your committee considers strong practice. You can evaluate clarity of methods, appropriateness of design to the research question, and how carefully limitations are handled. You can also look for coherent reasoning across a research program rather than isolated “hits.” The key is to agree on criteria before you start comparing people, so the process does not drift into post-hoc justifications.

To keep discursive evaluation fair, standardize parts of it. Use the same prompts for each candidate’s qualitative review and ensure every evaluator addresses the same core topics. In meetings, require claims to be tied to concrete evidence from the candidate’s work rather than vibes, lab reputation, or journal name. This does not eliminate subjectivity, but it makes it accountable.

Finally, communicate the assessment philosophy publicly. When departments state that they value datasets, software, and rigor—and show how those will be assessed—they change what applicants choose to spend time on. That transparency can reduce anxiety-driven output chasing and improve alignment between institutional goals and researcher behavior.

Limits & what we still don’t know

The abstract describes a proposal, not an evaluation of outcomes. It does not report whether the two-phase procedure leads to better hiring decisions, more equitable outcomes, or reduced “publish or perish” pressure in practice. It also does not specify how indicators should be selected, weighted, or constrained in phase one.

The four principles are mentioned but not detailed in the abstract, so readers cannot assess their specificity or how they resolve common tradeoffs. For example, any “quality” framework needs operational definitions to prevent inconsistent application across committees. Without shared rubrics and training, qualitative review can drift into idiosyncratic preferences or privilege candidates who are more fluent in self-presentation.

Broadening contributions also raises implementation questions. Committees will need ways to judge the value of software and datasets without defaulting to simplistic proxies. They will need to decide what counts as “published,” what documentation is expected, and how to compare very different kinds of outputs fairly. None of those details are provided in the abstract.

Finally, efficiency pressures are real. Qualitative, discursive assessment takes time, and time costs are not evenly distributed across institutions. The proposal’s success will depend on whether departments invest in the process—clear criteria, reviewer preparation, and disciplined meetings—rather than treating qualitative review as an unstructured conversation.

Closing takeaway

The message from Responsible Research Assessment I: Implementing DORA and CoARA for hiring and promotion in psychology is that psychology should evaluate researchers less by prestige metrics and more by the substance and rigor of their contributions. The report recommends a two-phase procedure that uses indicators for efficiency but reserves final judgment for a qualitative review of shortlisted candidates. If institutions adopt this responsibly, the biggest shift is cultural: rewarding careful, reusable, methodologically strong work—including datasets and software—rather than just more papers in “better” venues, as this journal article argues.

Data in this article is provided by Semantic Scholar.

Related Articles

Leave a Reply