Inside Grant Review Panels: What New Reviewers Learn and What Every Applicant Should Know
March 19, 2026 · 10 min read
Jared Klein
A 2018 study in the Proceedings of the National Academy of Sciences asked 43 oncology reviewers to evaluate 25 NIH R01 applications. The intraclass correlation coefficient for their scores was effectively zero. Two reviewers evaluating the same application agreed with each other no more than two reviewers evaluating entirely different applications. The finding shook parts of the scientific community, but it would not have surprised anyone who has sat on a federal grant review panel. The process is more structured, more human, and more consequential than most applicants realize.
Every year, tens of thousands of scientists, engineers, clinicians, and program experts serve as peer reviewers for NIH, NSF, DOD, HRSA, SAMHSA, and dozens of other federal agencies. Many of them arrive at their first panel meeting with strong publication records but no formal training in how to evaluate someone else's work. What they learn in those first meetings -- about scoring mechanics, discussion dynamics, time constraints, and the surprising degree to which panel conversation alters outcomes -- shapes how proposals live or die. Understanding the reviewer's experience is one of the most underused advantages in grant writing.
How Reviewers Get Recruited and (Barely) Trained
The Scientific Review Officer at NIH, or the cognate role at NSF and other agencies, is the architect of the review panel. SROs analyze the scientific content of incoming applications, identify the expertise required, and recruit a panel diverse in subdiscipline, career stage, geography, and demographics. A typical NIH study section has 15 to 25 members on rotating four-year terms, supplemented by ad hoc reviewers brought in for specialized topics. NSF panels are often smaller and assembled fresh for each review cycle.
What happens after recruitment is, by most accounts, surprisingly thin. A 2020 study published in Research Integrity and Peer Review identified 21 competencies deemed essential for effective grant reviewers -- including subject matter expertise, impartiality, analytical thinking, and communication skills -- and then found that formal training was nearly absent. Reviewers described learning primarily through observation: watching experienced panelists frame critiques, listening to how they weigh strengths against weaknesses, absorbing the culture of the study section through side conversations. One respondent put it bluntly: there is no formal training process.
NIH does require reviewers to complete orientation modules covering scoring mechanics, confidentiality rules, and conflict-of-interest policies. Reviewers must certify that they understand the integrity and confidentiality requirements and the consequences for violating them. But the substantive skill -- the ability to read a 12-page research plan and render a judgment that holds up against other expert opinions -- is learned almost entirely on the job.
This has practical implications for applicants. Your proposal is not being evaluated by a calibrated instrument. It is being read by a working scientist who may be reviewing grants for the first time, who learned what counts by watching colleagues, and who is applying a scoring rubric that she interpreted independently. Writing for that reader -- clearly, accessibly, with the key points impossible to miss -- is not a stylistic preference. It is a strategic necessity.
The Scoring Machinery: From Five Criteria to Three Factors
For decades, NIH reviewers scored five individual criteria -- Significance, Investigators, Innovation, Approach, and Environment -- each on a 1-to-9 scale (1 being exceptional, 9 being poor). For applications with due dates starting January 25, 2025, NIH consolidated those five criteria into three factors under its Simplified Peer Review Framework.
Factor 1, Importance of the Research, merges Significance and Innovation into a single scored dimension. Factor 2, Rigor and Feasibility, covers the Approach -- the experimental design, methods, statistical plans, and reproducibility safeguards -- and remains scored on the 1-to-9 scale. Factor 3, Expertise and Resources, combines Investigators and Environment but receives no numerical score. Instead, reviewers make a binary determination: sufficient or insufficient. If insufficient, they must explain the gap.
The binary treatment of Factor 3 was an explicit response to concerns about reputational bias -- the worry that well-known investigators at elite institutions received inflated scores regardless of the proposed work. By removing the 1-to-9 scale from investigator qualifications, NIH aims to refocus the review on the science itself.
NSF uses a different framework. Proposals are evaluated on two criteria -- Intellectual Merit and Broader Impacts -- with reviewers considering the originality of the proposed activities, the qualifications of the team, the adequacy of resources, and the project's potential to benefit society. DOD's Congressionally Directed Medical Research Programs (CDMRP) adds another layer entirely: a two-tier system where peer review panels score scientific merit, and a separate programmatic review panel evaluates mission relevance and portfolio balance. High peer review scores do not guarantee a CDMRP recommendation. The agency does not use a pay line.
For applicants, the takeaway is that scoring criteria are not a checklist to satisfy -- they are the literal framework reviewers use to structure their reading. If your proposal forces a reviewer to hunt for your innovation statement or leaves the feasibility of your timeline ambiguous, you are not just making their job harder. You are making it structurally likely that the score for that factor will drift downward.
Triage: Half Your Competition Never Gets Discussed
Before any panel discussion begins, NIH study sections run a triage process (formally called streamlining) that removes roughly half of all applications from discussion. Here is how it works: after all assigned reviewers submit their preliminary scores and written critiques, the SRO compiles the results. Applications unanimously judged to fall in the bottom half are flagged as "not discussed." They receive no overall impact score, no percentile ranking, and no panel discussion. They are effectively dead -- though applicants receive their written critiques and can revise and resubmit.
The triage cut is not a formal percentile threshold. It is based on the preliminary scores of the assigned reviewers relative to the rest of the applications in that study section's round. An application that might survive triage in a weak round can be streamlined in a strong one. Any single assigned reviewer can rescue an application from triage by requesting discussion, but all assigned reviewers must agree for an application to be streamlined.
NSF does not use a formal triage system in the same way, but program officers exercise discretion in determining which proposals receive full panel discussion versus ad hoc review only.
The practical lesson is stark. Your first audience is not the full panel. It is the two or three assigned reviewers who read your proposal in its entirety, and the bar they are applying is not "Is this fundable?" but "Is this competitive enough to deserve 10 minutes of panel time?" A proposal with a muddled specific aims page, an unclear research question, or a methodology section that raises more questions than it answers will not make it past this gate. The full panel will never see it.
Primary, Secondary, Tertiary: The Division of Labor
Each NIH application is assigned to at least three reviewers, ranked by the relevance of their expertise to the proposed work. The primary reviewer is the closest match to the applicant's field. The secondary brings adjacent expertise. The tertiary (or reader) often covers a methodological specialty -- biostatistics, clinical trial design, a particular technology platform.
Each assigned reviewer reads the full application, writes an independent critique, and assigns preliminary criterion scores and an overall impact score. At the meeting, the primary reviewer presents the application first, summarizing key strengths and weaknesses. The secondary follows with additional observations, often from a different angle. The tertiary adds methodological or technical commentary. Then the floor opens to all non-conflicted panelists.
The typical reviewer workload is substantial. An NIH study section member is usually assigned 8 to 12 applications as a primary, secondary, or tertiary reviewer per meeting cycle. Each R01 application includes a 12-page research strategy, plus specific aims, biosketches, budgets, facilities descriptions, and supplementary materials. Reviewers report that the pre-meeting preparation alone consumes dozens of hours. Panels meet for roughly two days, three times a year.
This workload creates a dynamic that applicants rarely consider: reviewer fatigue. By application number eight, a reviewer's patience for ambiguity, jargon, and poorly organized prose has worn thin. The applications that score best are not necessarily the most intellectually ambitious -- they are the ones that communicate their ambition in a way that a tired expert can absorb quickly and completely.
The Discussion That Reshapes Everything
The panel discussion is where preliminary scores converge, diverge, or shift dramatically. Research on NIH study sections has found that discussion does not reliably improve agreement among reviewers -- in some cases, it worsens it. But discussion does systematically change scores, and the direction of that change is not random.
The typical dynamic works like this. The primary reviewer presents and states a preliminary overall impact score. If the secondary and tertiary scores are similar, the discussion is often brief: the panel confirms the assessment, and non-assigned reviewers ask a few clarifying questions before voting. But when the assigned reviewers disagree -- one scores a 2 and another scores a 5 -- the discussion opens up. Other panelists weigh in. The reviewer who scored more favorably may defend the application or concede points. The reviewer who scored more harshly may soften or dig in. The SRO and panel chair manage the time and ensure the review criteria are properly addressed.
What applicants need to understand is that the discussion almost always moves scores toward the middle. An enthusiastic primary reviewer who gave a 1 may shift to a 2 after hearing the secondary raise a concern about feasibility. A skeptical tertiary who scored a 6 may move to a 4 after the primary explains the methodological approach. The panel's collective score is the average of all non-conflicted members' votes, not just the assigned reviewers' scores. A single strong advocate on the panel can pull an application from the margin into the fundable range. A single well-articulated concern can push it out.
This is why your application needs to arm its champions. The reviewer who loves your science needs specific, quotable language to defend it during discussion. An innovation claim that reads "This approach is novel" gives your advocate nothing to work with. An innovation claim that names the specific gap, identifies the methodological advance, and explains why prior attempts failed gives them a talking point that can withstand scrutiny from 20 other experts.
Conflicts of Interest and the Choreography of Recusal
Conflict-of-interest management is one of the most visible rituals of the review panel. Before the meeting, reviewers complete pre-review conflict-of-interest forms disclosing any personal, financial, or professional relationships with applicants. Conflicts include collaborations within the past three years, mentor-trainee relationships, financial interests, and institutional affiliations.
When an application from a conflicted party comes up for discussion, the conflicted reviewer leaves the room (or is moved to a breakout room in virtual meetings). They do not participate in the discussion, do not see the scores, and do not vote. The SRO tracks conflicts meticulously, and the process is taken seriously -- a reviewer who fails to disclose a conflict and is later discovered faces potential removal from the panel and investigation.
For applicants, the conflict-of-interest system means that your closest collaborators cannot advocate for your application. If you have worked with one of the strongest experts in your subfield, that person is likely conflicted out of reviewing your proposal. Your application needs to stand on its own merits with reviewers who may be expert-adjacent but not deeply embedded in your specific niche. Writing for a knowledgeable but not specialist audience is not dumbing down your science -- it is adapting to the structural reality of how panels are composed.
What Experienced Reviewers Wish Applicants Understood
After dozens of conversations with scientists who have served on federal review panels, a set of consistent themes emerges.
Clarity is not optional. Reviewers read hundreds of pages of dense science in compressed time. NIMH's guidance on common application mistakes notes that seemingly simple errors -- misused terminology, unrecognizable jargon, unclear prose -- can make an application incomprehensible to a busy reviewer with a stack of proposals to get through. Reviewers do not have time to decode your intentions. They evaluate what is on the page.
Alignment with the funder is table stakes. The single most common reason proposals are rejected, across agencies and programs, is failure to match the funder's stated priorities. A proposal that addresses an important problem but does not connect that problem to the specific goals of the funding opportunity announcement is asking reviewers to make inferences that the scoring rubric does not reward.
Budget credibility matters more than applicants think. An unrealistic budget -- whether inflated or suspiciously lean -- signals that the applicant does not understand the scope of the proposed work. Reviewers notice. It erodes confidence in the overall feasibility assessment.
The specific aims page is the proposal. For NIH applications, many reviewers report forming a preliminary impression from the specific aims page alone. If the aims are clear, logically connected, and matched to a significant problem, the reviewer enters the research strategy already inclined toward a favorable score. If the aims are confusing or disconnected, every subsequent section has to overcome that first impression.
The grant review process is not a black box. It is a structured, time-pressured conversation among scientists who are doing their best to identify the strongest work with imperfect tools and limited hours. Applicants who understand the mechanics of that conversation -- the scoring framework, the triage threshold, the discussion dynamics, the reviewer's cognitive load -- write proposals that survive the process at higher rates. For researchers looking to stress-test their applications against these realities before submission, tools like Granted can simulate structured, criteria-based review and surface the weaknesses a panel is most likely to find.