Intermediate

Hiring Rubric Generator

From a job description, produce a structured candidate rubric with skills weights, evaluation prompts per skill, and red flags to listen for during interviews.

When to use this prompt

Before you open a hiring loop. Most hiring loops fail because the team never defined what “good” looks like, so each interviewer evaluates on a different mental rubric. The result: candidates who pass three loops and fail the fourth, debriefs that turn into vibes-based debates, and offers that the team doesn’t agree on.

A rubric forces the team to align before candidates start interviewing. Then every interviewer scores against the same dimensions and the debrief becomes about evidence, not preference.

The prompt

<role>Hiring operations specialist who builds candidate rubrics that produce defensible decisions and reduce interviewer disagreement.</role>

<task>Turn the job description below into a structured hiring rubric. Identify 5 to 7 skills, weight them, write 2-3 evaluation prompts per skill, and name the red flags that should sink a candidate.</task>

<inputs>
<role_title>[ROLE TITLE]</role_title>
<level>[IC LEVEL or LEADER LEVEL, e.g., "Senior", "Staff", "Director"]</level>
<job_description>
[FULL JD]
</job_description>
<must_haves>
[3-5 non-negotiables from the hiring manager. If a candidate fails any of these, the offer doesn't go out.]
</must_haves>
<team_context>[1-3 sentences on the team's current strengths and gaps, so the rubric weights for what's actually needed]</team_context>
</inputs>

<instructions>
1. Identify 5 to 7 skills the role genuinely requires. Skills must be concrete and observable in interviews. "Strong communication" is too vague; "Can explain a technical decision to a non-technical executive in under 90 seconds" is observable.
2. Weight each skill 10 to 30 points. Total must equal 100. Weighting reflects what actually matters for this role on this team — not what the JD says, what's actually needed per <team_context>.
3. For each skill, write 2 to 3 specific evaluation prompts. Each prompt is something an interviewer can ask or a scenario they can present. Not "Ask about communication" but "Walk me through a time you had to communicate a missed deadline to a stakeholder. Listen for: ownership, specificity, what they did differently next time."
4. For each skill, name the scoring rubric: what does a 1, 3, and 5 look like? Concrete behaviors only.
5. List 3 to 5 red flags that should sink a candidate even if total score is high. These are non-negotiables that override scores. (E.g., "Blames team members for past project failures with no self-reflection.")
6. Map each interviewer in a typical 4-round loop to which skills they should evaluate. No interviewer should evaluate more than 3 skills; spreading reduces depth.
7. Constraints:
   - Do not invent skills not implied by the JD or <must_haves>.
   - Do not produce a rubric that could apply to any role in the same job family. The output must be specific to this role.
   - Mark any skill where the JD doesn't supply enough detail to write evaluation prompts as [NEED MORE INFO from hiring manager: what specifically].
</instructions>

<output_format>
**Skills + weights (total 100):**

| # | Skill | Weight | Why this weight |
|---|-------|--------|-----------------|
| 1 | ... | XX | one sentence |
| 2 | ... | XX | one sentence |

**Per-skill rubrics:**

### Skill 1: [name]
- **Evaluation prompts:**
  - [specific prompt or scenario] — Listen for: [behaviors]
  - [prompt 2]
- **Scoring:**
  - 1 (poor): [specific behavior]
  - 3 (acceptable): [specific behavior]
  - 5 (strong): [specific behavior]

### Skill 2: ...
(repeat for each skill)

**Red flags (any one disqualifies):**
1. [Specific behavior or response]
2. ...

**Interviewer assignment for a 4-round loop:**
- Round 1 (recruiter screen): Skills [#, #]
- Round 2 (hiring manager): Skills [#, #, #]
- Round 3 (peer): Skills [#, #]
- Round 4 (cross-functional): Skills [#, #]

**[NEED MORE INFO from hiring manager:**
- Skill #N: [what's missing]]
</output_format>

How it works

The forced weighting (10 to 30 points each, totaling 100) is the highest-leverage discipline in this prompt. Most rubrics are flat — every skill listed as if equally important. That’s never true on a real team. Forcing the team to weight reveals what they actually care about and surfaces disagreement before candidates show up.

The “Listen for” prompts attached to each evaluation question turn vague evaluation into specific evaluation. “Ask about communication” produces five different interviews; “Listen for: ownership, specificity, what they did differently next time” produces five aligned ones.

The red-flag override mechanism is the safety valve. Sometimes a candidate scores 78/100 but does or says something that should sink the offer regardless. Naming those red flags up front prevents the team from rationalizing past them in the debrief.

The interviewer assignment is a coordination move. Without it, every interviewer asks the same three questions and the loop wastes 6 hours producing redundant signal. The assignment ensures each round contributes unique evaluation data.

The [NEED MORE INFO] markers are the honesty mechanism. If the JD doesn’t give the model enough to write specific evaluation prompts, the model surfaces that gap rather than fabricating questions. The hiring manager fills the gap; the rubric improves.

Example output

Skills + weights:

#SkillWeightWhy
1Owns a metric end-to-end25Team currently lacks anyone with this; biggest gap.
2Translates technical to executive20Role reports into product; must brief CEO weekly.
3Designs measurement systems20Core IC craft for the role.
4Manages contractors15Will own a $200K/year contractor budget.
5Ships in 6-week cycles10Aligns to existing cadence; learnable on the job.
6Writes for senior audiences10Required for board memos.

[Per-skill rubrics, red flags, and interviewer assignments follow]

Variations

  • Reverse-engineering mode: Paste a JD plus a candidate you already know is strong. Ask the model to identify which 3 skills make this candidate strong and weight the rubric accordingly. Useful for cloning a top performer.
  • Calibration mode: Run the rubric on a panel of 3 mock candidates (compose them) and ask the model to score each. Surfaces whether the rubric actually discriminates.
  • Diversity-of-thought check: Add a constraint that asks the model to identify which skills favor one personality type and propose alternative evaluation prompts that test the same skill through a different lens. Useful for teams trying to reduce hiring monoculture.