AI Research & Engineering Roles | Recruiter Reference Guide

Understanding the AI Engineering Org

Frontier AI companies like OpenAI, Anthropic, Google DeepMind, and Meta AI organize their technical teams into distinct clusters. Here's how to navigate them.

🧭 How to Use This Guide

AI research companies hire differently from typical tech companies. Many roles blur the line between research and engineering. Use the tabs above to explore role clusters. For each cluster you'll find: plain-English job descriptions, required skills with technology icons, keyword search operators, and screening questions.

🧪

Research Scientists & Engineers

The core scientists who run experiments and publish findings that push the frontier of what AI can do.

Roles covered: 8

🎯

RL, Post-Training & Pretraining

Engineers who shape how models learn — from initial training on massive datasets to fine-tuning behavior with reinforcement learning.

Roles covered: 7

🛡️

Safety, Alignment & Interpretability

Specialists who ensure AI behaves safely, honestly, and in alignment with human values — one of the fastest-growing areas in AI.

Roles covered: 8

⚡

Infrastructure, GPU & Performance

Engineers who build the high-performance computing systems that make training and serving multi-billion parameter models possible.

Roles covered: 8

🤖

Agents, Products & Applied AI

Engineers building AI products that users interact with — autonomous agents, tools, and developer-facing systems.

Roles covered: 7

🧭

Engineering Management

Technical leaders who run research and engineering teams — they need both deep technical credibility and strong people skills.

Roles covered: 5

The "Research Engineer" Explained

This is the most confusing title for non-technical recruiters. Here's the breakdown.

💡 The Most Important Thing to Understand

In frontier AI companies, a Research Engineer is not the same as a Software Engineer. They are a hybrid — they write high-quality production code and they design and run scientific experiments. Think of them as "scientist + engineer." They sit between a pure researcher (who theorizes) and a pure engineer (who builds). Most senior AI hires at frontier companies fall into this category.

Research Scientist

Primarily a theorist and experimentalist. Generates hypotheses, designs experiments, analyzes results. Often has a PhD. Publishes papers. Code is a tool, not the output.

Research Engineer

The hybrid. Implements and runs experiments at scale. Writes production-grade ML code. Partners with scientists to turn ideas into results. Deep coding + scientific thinking.

ML Systems Engineer

Primarily an engineer. Builds the infrastructure and tools that make research possible — training pipelines, distributed systems, tooling. Strong SWE background + ML knowledge.

Common Technology Stack

Technologies you'll see across virtually all AI research & engineering roles.

Python

PyTorch

JAX

CUDA

AWS

Google Cloud / TPUs

GitHub

Docker / K8s

C++ / Systems

Hugging Face

Research Scientists & Core Researchers

These are the scientists who define what AI is capable of. They run experiments, publish papers, and push the frontier.

📖 In Plain English

Research Scientists at AI companies are like academic researchers — but instead of publishing in university labs, they work inside companies and actually build the systems they study. A PhD is common but not always required. Strong candidates will have a publication record, a GitHub with ML code, or a Kaggle/Hugging Face profile.

Core Research

Research Scientist

Research PhD Preferred

What They Do

Designs and runs experiments to discover new AI capabilities or improve existing models. Generates novel ideas, tests hypotheses, analyzes results, and often co-authors papers. Works closely with Research Engineers who implement their ideas at scale.

Key Skills to Look For

Python PyTorch / JAX Statistics & Math ML Theory Paper Publication Experimental Design

What "Good" Looks Like on a Resume

Published papers at top conferences (NeurIPS, ICML, ICLR, ACL, EMNLP). First-author publications are a strong signal. GitHub with reproducible ML experiments. Citations on Google Scholar or arXiv. PhD in CS, Statistics, Math, or a quantitative field from a top program.

📋 Screening Questions

Walk me through a research project you led from hypothesis to results. What were the most important decisions you made?
What is your most cited or impactful paper, and what problem did it solve?
How do you decide when an experimental result is meaningful vs. noise?
How do you balance exploring new ideas versus improving on known approaches?
Describe a time a research direction failed. What did you learn?

Boolean Search Keywords

"Research Scientist" "NeurIPS" OR "ICML" OR "ICLR" "large language model" "transformer architecture" "first author"

Source On

arXiv Google Scholar ResearchGate GitHub

Specialized Research

Research Scientist, Interpretability

ResearchSafetyPhD Preferred

What They Do

Studies what is happening inside AI models — which parts of the neural network activate for which concepts, and why. This is sometimes called "mechanistic interpretability." The goal is to make AI more understandable and verifiably safe.

Key Skills

Mechanistic Interpretability Neuroscience (conceptual) PyTorch Internals Statistical Analysis Probing Classifiers Superposition / Circuits

📋 Screening Questions

What do you find most interesting or surprising about what we currently know about how transformer models store information internally?
Describe a technique you've used to understand what a specific layer or attention head is doing in a model.
How would you explain the concept of "polysemanticity" to a non-technical executive?
What do you think are the biggest open problems in mechanistic interpretability right now?

Boolean Search Keywords

"mechanistic interpretability""circuits" AI"sparse autoencoder""probing classifiers"

Applied Social Science Research

Research Scientist, Societal Impacts

ResearchPolicy Adjacent

What They Do

Studies how AI models affect society — labor markets, misinformation, access to information, fairness, and systemic risks. Publishes empirical research and informs internal policy decisions about model deployment.

Key Skills

Social Science Methods Causal Inference Survey Design Economics / Labor Research Python / R Stats Policy Communication

📋 Screening Questions

Describe a study you designed to measure how an AI system affected a real-world behavior or outcome.
How do you approach studying phenomena that are difficult to measure (e.g., AI's effect on employment decisions)?
How have you communicated complex research findings to non-academic audiences or policymakers?

Boolean Search Keywords

"AI societal impact""algorithmic fairness""computational social science""AI labor economics"

Multimodal Research

Research Engineer / Scientist — Vision & Audio

ResearchEngineering

What They Do

Extends language models to understand and generate images, video, and audio. Works on training multimodal models — those that can process and produce multiple types of content simultaneously. This is a rapidly growing frontier area.

Key Skills

Computer Vision Audio Processing / Speech Multimodal Transformers CLIP / ViT architectures Diffusion Models PyTorch

📋 Screening Questions

What approaches have you used to align visual and text representations in a shared embedding space?
What are the main challenges of training a model that processes both images and text together?
Describe a multimodal project you've worked on — what was the hardest technical challenge?

Boolean Search Keywords

"multimodal" LLM"vision language model""CLIP" OR "ViT""audio transformer"

Reinforcement Learning, Post-Training & Pretraining

These engineers and scientists shape how AI models learn — from initial training on raw data to fine-tuning behavior with human feedback.

🎯 In Plain English

"Pretraining" is the initial phase where a model learns from massive amounts of text. "Post-training" is the follow-up phase that shapes how the model behaves — making it helpful, honest, and safe. "RL" (Reinforcement Learning) is the technique used most often in post-training, where the model learns from feedback scores rather than direct examples.

Core Training Role

Research Engineer — Reinforcement Learning (RL)

EngineeringResearch

What They Do

Implements and runs reinforcement learning experiments to improve model behavior. This includes RLHF (RL from Human Feedback), PPO, and newer RL variants used to teach models to be more helpful, honest, and safe. They run experiments, debug training instabilities, and scale RL pipelines.

Key Skills

RLHF / PPO Reward Modeling PyTorch Distributed Training Python Experiment Tracking RL Theory

📋 Screening Questions

Explain RLHF in plain terms — what problem does it solve and what are its main limitations?
What does reward hacking mean in RL, and how have you dealt with it in practice?
Walk me through how you debugged a training run that was diverging or producing unexpected behavior.
How do you decide what reward signal to use for a new RL task?
What's the difference between on-policy and off-policy RL, and when does each matter for LLM training?

Boolean Search Keywords

"RLHF""reinforcement learning" LLM"PPO" language model"reward model""Constitutional AI"

Source On

arXiv GitHub Hugging Face

Systems + ML

ML Systems Engineer — RL Engineering

EngineeringSystems

What They Do

Builds the infrastructure and tooling that makes RL training possible at scale. While a Research Engineer runs experiments, this person builds the "engine room" — efficient RL training loops, rollout management, infrastructure that handles thousands of GPUs, and logging systems.

Key Skills

Distributed Systems GPU Optimization Python / C++ CUDA Cluster Orchestration RL Infrastructure

📋 Screening Questions

How have you optimized a training pipeline to reduce wall-clock time or GPU cost?
Describe how you'd design a distributed rollout system for a large RL training run.
What bottlenecks have you identified and resolved in ML training infrastructure?
How do you approach debugging a training pipeline that's silently producing incorrect results?

Boolean Search Keywords

"ML systems" RL"distributed RL training""GPU cluster" ML"training infrastructure"

Foundation Model Training

Research Engineer / Scientist — Pretraining

EngineeringResearchSenior Focus

What They Do

Works on training the base language model — the initial massive training run that happens before any fine-tuning. This involves choosing data, designing training curricula, scaling experiments, and ensuring training runs don't fail or diverge when running on thousands of GPUs for weeks.

Key Skills

Large-Scale Training Data Curation Scaling Laws Transformer Architecture Distributed Training (Megatron, DeepSpeed) TPU / GPU Clusters

📋 Screening Questions

What are scaling laws, and how would you use them to plan a new pretraining run?
What strategies do you use to curate training data, and what quality signals do you rely on?
Walk me through a training run you've managed. What went wrong, and how did you recover?
How do you decide on the architecture choices (depth, width, attention heads) for a new model?

Boolean Search Keywords

"pretraining" LLM"scaling laws""Megatron-LM""data curation" language model"foundation model training"

RLHF Specialist

Senior Research Scientist — Reward Models

ResearchSeniorPhD Preferred

What They Do

Designs and trains the "reward models" used in RLHF. A reward model is a separate AI that scores another AI's outputs — essentially teaching the main AI what "good" looks like. This is a highly specialized, high-impact role that directly determines how helpful and safe the final model is.

Key Skills

Reward Modeling RLHF Pipeline Human Preference Data Annotation Design Model Evaluation Constitutional AI

📋 Screening Questions

What are the failure modes of a reward model trained on human preference data?
How would you design a human annotation process to collect high-quality preference data?
Explain how you would evaluate whether a reward model is generalizing well vs. overfitting to annotator quirks.

Boolean Search Keywords

"reward modeling""human preference learning""RLHF" reward"preference optimization"

Safety, Alignment & Interpretability

One of the most important and fastest-growing areas in AI — ensuring models are safe, honest, and aligned with human values.

🛡️ In Plain English

"AI Safety" at frontier labs means ensuring models don't do harmful things — whether intentionally or accidentally. "Alignment" means making sure the model pursues goals that humans actually want. "Interpretability" means understanding why the model does what it does. These are deeply philosophical AND deeply technical roles. Candidates often come from academia, philosophy of mind, or security research backgrounds.

Core Safety Research

Research Engineer / Scientist — Alignment Science

SafetyResearchPhD Common

What They Do

Works on the fundamental problem of making AI systems reliably pursue intended goals — and nothing else. Designs experiments to test whether models "understand" instructions, have hidden goals, or behave differently when unobserved. Papers in this area often combine philosophy, game theory, and ML.

Key Skills

AI Alignment Theory Behavioral Evaluation Robustness Testing Game Theory Formal Verification (some) Python / PyTorch

📋 Screening Questions

What does it mean for an AI system to be "aligned"? What would misalignment look like in practice?
How would you design an experiment to test whether a model behaves consistently when it believes it is or isn't being evaluated?
What do you think are the most important unsolved problems in AI alignment today?
Describe a piece of alignment research you've found compelling — what was the key insight?

Boolean Search Keywords

"AI alignment""Constitutional AI""RLHF safety""AI honesty""deceptive alignment""scalable oversight"

Frontier Risk Assessment

Research Engineer / Scientist — Frontier Red Team (Cyber)

SafetyEngineering

What They Do

Proactively attempts to find ways that AI models could be used for harmful cyber operations — like helping write malware, compromising systems, or enabling novel attacks. Their job is to find these risks before bad actors do, so the company can add safeguards.

Key Skills

Offensive Security / Pentesting Cybersecurity Research Vulnerability Analysis Python Threat Modeling AI Model Evaluation

📋 Screening Questions

Describe a creative attack vector you've identified in a system — how did you find it and what made it non-obvious?
How would you structure an evaluation to measure whether an AI model provides meaningful uplift to a malicious cyber actor?
How do you think about the distinction between information that is "dangerous to provide" vs. "freely available online"?

Boolean Search Keywords

"red team" AI"AI cyber evaluation""offensive security" ML"adversarial AI"

Source On

GitHub DEF CON / Black Hat talks USENIX Security papers

Privacy & Safety

Privacy Research Engineer — Safeguards

SafetyEngineering

What They Do

Studies and mitigates privacy risks in AI models — such as models memorizing and reproducing private training data, or being tricked into revealing personal information. Combines ML research with privacy law and data governance knowledge.

Key Skills

Differential Privacy Federated Learning Data Memorization Research GDPR / CCPA Awareness Python / PyTorch Membership Inference Attacks

📋 Screening Questions

What is a membership inference attack and how would you test for it in an LLM?
How does differential privacy work conceptually, and what are its trade-offs when applied to LLM training?
How would you evaluate whether a model is memorizing specific training examples it shouldn't reproduce?

Boolean Search Keywords

"differential privacy" LLM"data memorization" language model"federated learning""AI privacy"

Monitoring & Evaluation

Research Engineer — AI Observability

SafetyEngineering

What They Do

Builds systems to monitor AI model behavior in production — detecting when models behave unexpectedly, drift from intended behavior, or exhibit patterns that could indicate emerging risks. Think of them as building the "vital signs monitoring" for deployed AI.

Key Skills

ML Monitoring Anomaly Detection Logging & Observability Systems Statistical Methods Python Data Pipelines

📋 Screening Questions

How would you detect if a deployed model started behaving differently than during evaluation? What signals would you monitor?
What's the difference between monitoring model performance and monitoring model safety? Do they require different approaches?
Describe a monitoring or alerting system you've built for a production ML system.

Boolean Search Keywords

"ML observability""model monitoring""AI drift detection""LLM evaluation"

Infrastructure, GPU & Performance Engineering

The engineers who build and optimize the massive computing systems that make frontier AI possible.

⚡ In Plain English

Training a frontier AI model requires thousands of specialized chips (GPUs and TPUs) running in perfect coordination for weeks or months. A single training run can cost tens of millions of dollars. The engineers in this section build the systems that make this scale possible — and squeeze every bit of performance out of the hardware.

Hardware Optimization

Performance Engineer (GPU)

InfrastructureEngineering

What They Do

Optimizes how AI training and inference code runs on GPU hardware. Writes low-level CUDA kernels, profiles where compute is being wasted, and implements optimizations that can save millions of dollars per training run. This is one of the most specialized and highest-paid roles in AI infrastructure.

Key Skills

CUDA Programming GPU Architecture (NVIDIA) Triton Profiling Tools (Nsight, nvprof) C++ / C PyTorch Custom Ops Memory Optimization

📋 Screening Questions

What does "memory bandwidth bound" vs "compute bound" mean, and how do you identify which one a kernel is hitting?
Walk me through how you would optimize a slow attention kernel. What tools would you use and where would you start?
Explain FlashAttention — what problem does it solve and how?
What profiling tools do you use for GPU performance work, and what do you look for first?

Boolean Search Keywords

"CUDA kernel""GPU performance" ML"Triton" NVIDIA"FlashAttention""compute-bound optimization"

Source On

GitHub NVIDIA GTC talks arXiv (systems papers)

Specialized Hardware

TPU Kernel Engineer

InfrastructureRare Skill

What They Do

Writes optimized code specifically for Google's TPU (Tensor Processing Unit) chips. TPUs are Google's custom-built AI hardware, used for training and inference. This role requires deep knowledge of TPU architecture, XLA compiler, and JAX — a very niche and highly sought-after skill set.

Key Skills

TPU Architecture JAX / XLA Google Cloud TPU Compiler Optimization Profiling (TensorBoard) Python / C++

📋 Screening Questions

What are the key architectural differences between TPU and GPU that affect how you write optimized code?
How does XLA compilation work, and what are common pitfalls when writing JAX code intended for TPUs?
Describe a performance optimization you've done specifically for TPU hardware.

Boolean Search Keywords

"TPU" JAX"XLA compiler""Google TPU" ML"JAX" LLM training

Training Infrastructure

Infrastructure Engineer — Pre-training

InfrastructureEngineering

What They Do

Builds and maintains the compute infrastructure that runs pretraining jobs — cluster management, job scheduling, fault tolerance (so a training run doesn't fail when one GPU out of thousands crashes), data loading pipelines, and storage systems for petabytes of training data.

Key Skills

Kubernetes / Slurm Distributed Systems Python / Go Cloud Infrastructure (AWS/GCP) Fault Tolerance Storage Systems (S3, HDFS)

📋 Screening Questions

How would you design a fault-tolerant system for a training job running on 4,000 GPUs where individual GPUs fail periodically?
What strategies do you use to make data loading fast enough to not bottleneck GPU utilization?
Describe the most complex infrastructure system you've designed or maintained. What made it hard?

Boolean Search Keywords

"ML training infrastructure""distributed training" Kubernetes"GPU cluster management""Slurm" ML

Distributed Systems

Software Engineer — ML Networking

InfrastructureEngineering

What They Do

Optimizes how data moves between GPUs during training. When thousands of GPUs train a single model together, they need to share gradients and parameters constantly — this communication is often the bottleneck. This engineer designs the networking stack (InfiniBand, RDMA, collective operations) to make it faster.

Key Skills

NCCL / RDMA InfiniBand Networking Collective Communication (AllReduce) C++ / Python Network Profiling Low-Level Networking

📋 Screening Questions

What is AllReduce and why is it critical for distributed ML training? What are the common performance bottlenecks?
How would you diagnose a distributed training run that's slower than expected due to communication overhead?
What's the difference between ring-AllReduce and tree-AllReduce, and when would you choose one over the other?

Boolean Search Keywords

"NCCL" distributed training"AllReduce" ML"InfiniBand" AI cluster"ML networking"

Agents, Products & Applied AI Engineering

Engineers building the AI products that users and developers actually interact with — autonomous agents, tools, and developer platforms.

🤖 In Plain English

An "AI Agent" is an AI that doesn't just answer questions — it takes actions: browsing the web, writing and running code, managing files, or completing multi-step tasks autonomously. These engineers build the systems that make agents work reliably and safely in the real world.

Agentic AI

Research Engineer — Agents

EngineeringResearch

What They Do

Builds systems that allow AI models to autonomously use tools, take actions, and complete multi-step tasks. Designs architectures for how agents plan, use memory, call external APIs, and recover from errors. Evaluates agent performance on real-world tasks.

Key Skills

Agent Architectures Tool Use / Function Calling Long-Context Reasoning Prompt Engineering Python Evaluation / Evals Design API Integration

📋 Screening Questions

What are the main failure modes you've seen in autonomous AI agents? How have you addressed them?
How would you design an evaluation suite for a coding agent? What tasks would you include and how would you score them?
Explain the trade-offs between giving an agent more autonomy vs. more human checkpoints.
How do you handle cases where an agent encounters an ambiguous situation mid-task?

Boolean Search Keywords

"AI agents" LLM"tool use" language model"ReAct" OR "chain-of-thought" agent"agentic AI""function calling"

Source On

GitHub (LangChain, AutoGPT contributors) arXiv Hugging Face

Agent Safety Infrastructure

Software Engineer — Sandboxing

EngineeringSafety

What They Do

Builds secure "sandboxes" — isolated computing environments where AI agents can run code safely without causing harm to production systems. When an AI agent writes and executes code, it needs to do so in a container that limits what it can access or break. This is critical security infrastructure for agentic AI.

Key Skills

Container Security (Docker, gVisor) Linux Security (seccomp, namespaces) Virtualization Go / Rust / C++ Systems Programming Security Engineering

📋 Screening Questions

How would you design a sandboxed environment for running untrusted AI-generated code? What isolation mechanisms would you use?
What are the differences between container-based isolation and VM-based isolation? When would you choose each?
How have you approached the trade-off between security and performance in sandboxing systems?

Boolean Search Keywords

"sandboxing" code execution"gVisor" OR "seccomp""container security""isolation" AI agent

Applied AI

Prompt Engineer

Applied AI

What They Do

Crafts, tests, and refines the instructions given to AI models to make them perform specific tasks reliably. At frontier AI companies, this is a technical and experimental role — they run systematic evaluations, develop frameworks for how to best communicate complex requirements to models, and work closely with product teams.

Key Skills

Prompt Engineering Model Evaluation / Evals Python (for automation) A/B Testing Technical Writing LLM API Proficiency

📋 Screening Questions

Walk me through how you would systematically improve a prompt that's producing inconsistent results. What's your process?
How do you evaluate whether one prompt is better than another at a large scale?
Give an example of a complex task you've successfully broken down into a prompt or chain of prompts. What made it hard?
How do you think about the difference between "prompting" and "fine-tuning"? When would you use each?

Boolean Search Keywords

"prompt engineering""LLM evaluation""chain-of-thought""few-shot prompting""model evals"

Interdisciplinary Research

Research Engineer — Economic Research

ResearchEconomics

What They Do

Studies the economic implications of advanced AI — impacts on labor markets, productivity, inequality, and business models. Builds tools and datasets to measure AI's economic footprint. A rare hybrid of economics research + ML engineering.

Key Skills

Applied Economics Causal Inference Python / R Labor Economics Large Dataset Analysis ML for Social Science

📋 Screening Questions

How would you design a study to measure whether AI tools increase or decrease productivity in a specific occupation?
What data sources would you use to study AI's effect on labor markets, and what are their limitations?
Describe a piece of economic research you've done — what was the most challenging aspect of establishing causal claims?

Boolean Search Keywords

"AI economics""labor economics" technology"AI productivity""computational economics"

Engineering Management Roles

Technical leaders who manage AI research and engineering teams — they need both deep technical credibility AND strong people leadership skills.

🧭 In Plain English

AI engineering managers are almost always former engineers or researchers first. They cannot just manage — they need to understand what their teams are doing technically, make resourcing decisions, and maintain enough technical knowledge to evaluate work quality. Look for a track record of both personal technical contributions AND team leadership.

Management — Hardware

Engineering Manager — GPU / ML Accelerator

ManagementInfrastructureSenior

What They Do

Leads a team of GPU and ML accelerator engineers. Responsible for hiring, mentoring, project prioritization, and technical direction for the team that makes AI training hardware run as efficiently as possible. Reports to a VP or Director of Engineering.

Key Skills

GPU / CUDA Background Team Management (5–15 engineers) Technical Roadmapping Cross-functional Collaboration Hiring & Leveling Distributed Systems Experience

📋 Screening Questions

How do you balance giving your team technical autonomy while ensuring project delivery on time?
Describe how you've hired for a highly specialized technical role. What signals did you look for beyond technical skills?
Tell me about a situation where you had to make a difficult technical trade-off decision for your team. How did you approach it?
How do you stay technically current enough to evaluate the work of GPU performance engineers while also handling management responsibilities?
How have you handled a high-performing engineer who was struggling with their behavior or communication on the team?

Boolean Search Keywords

"Engineering Manager" GPU"ML infrastructure" manager"Director" ML accelerator"CUDA team lead"

Management — Inference

Engineering Manager — Inference & Inference Routing

ManagementInfrastructure

What They Do

"Inference" is what happens when a trained model responds to a user — it's running the model in real-time at scale. This manager leads teams that make inference fast, cheap, and reliable for millions of API calls per day. "Routing" is about intelligently directing requests to the right model or hardware to balance cost, speed, and quality.

Key Skills

ML Inference Systems Latency Optimization Distributed Systems Background Cost Optimization Team Leadership API / Platform Engineering

📋 Screening Questions

What does "time-to-first-token" mean and why is it critical for user experience in LLM serving? How would you approach optimizing it?
How would you think about the architecture of a system routing requests across multiple model sizes or data centers?
Describe how you've managed a team delivering a latency-critical production system. What did you prioritize?

Boolean Search Keywords

"ML inference" manager"LLM serving" lead"model serving" engineer manager"inference optimization"

Research Management

Research Manager — Interpretability

ManagementResearchSafetyPhD Expected

What They Do

Leads a team of interpretability researchers. Sets the research agenda, manages individual contributors, coordinates with other safety teams, and represents the team's work externally. Unlike pure engineering management, this role requires deep research expertise and the ability to evaluate the quality of novel research ideas.

Key Skills

Research Background (Publications) Interpretability Research Team Mentoring Research Strategy External Communication (papers, talks)

📋 Screening Questions

How do you evaluate the quality of a research idea before committing a team to pursue it for months?
How have you mentored a researcher who was technically strong but struggling to produce impactful results?
What is your own research vision for interpretability? Where do you think the field should be in 3 years?

Boolean Search Keywords

"Research Manager" AI safety"interpretability" research lead"AI research director"

Candidate Sourcing Guide

Where to find these candidates beyond LinkedIn — and how to use each platform effectively.

🔍 The Reality of Sourcing AI Talent

The best AI researchers and engineers are not actively applying for jobs. They are publishing papers, posting code on GitHub, sharing models on Hugging Face, or presenting at conferences. Your job is to find them where they work publicly. This guide will show you how.

Platform	What to Search	Best For	How to Engage
arXiv.org	Search by topic: `"mechanistic interpretability"`, `"RLHF"`, `"constitutional AI"`, `"LLM scaling"`. Look at first-author names on recent papers.	Research Scientists, Alignment researchers, Pretraining engineers	Find their email in the paper. Send a short, personalized outreach mentioning their specific paper. Never use a template.
GitHub	Search repositories: `language:Python topic:RLHF`, `topic:mechanistic-interpretability`, `topic:llm-training`. Look at contributors to major ML repos (Transformers, vllm, TRL).	Research Engineers, ML Systems Engineers, Infrastructure Engineers, GPU engineers	Check their profile for email or Twitter/X. Reference their specific project in outreach. Look at star count of their repos as a signal of impact.
Hugging Face	Browse the Models hub. Search for `models by task: text-generation, safety`. Look at who has published fine-tuned models, datasets, or popular "Spaces" (demos). Check profile pages for LinkedIn links.	Research Scientists, Alignment researchers, NLP specialists, Applied AI engineers	Hugging Face profiles often link to LinkedIn or GitHub. Reference a specific model or Space they created. Community forum discussions are another source.
Google Scholar	Search paper titles or topics. Look at author profiles — check h-index, citation count, and institution. Sort by "recent" to find people actively publishing. Search `"large language model" safety site:scholar.google.com`.	Research Scientists (especially senior/PhD), Research Managers, Societal Impact researchers	Find their institutional email on the paper or university page. Reference their research specifically. For academic candidates, emphasize the research freedom and resources at the company.
ResearchGate	Search by topic, institution, or paper title. Profiles show full publication history and co-author networks — useful for finding connected clusters of researchers.	Research Scientists, Social Science researchers, Privacy researchers	Message feature available on ResearchGate. Lead with genuine interest in their research. Useful for finding researchers at international institutions.
Kaggle	Browse competition leaderboards in NLP and ML categories. Look for Grandmasters and Masters. Filter profiles by "notebook" activity. Competition winners in LLM-related tasks are strong ML engineering candidates.	ML Engineers, Applied Research Engineers, Data Scientists with ML depth	Many Kaggle profiles link to LinkedIn or GitHub. Reference their competition placement or a specific notebook they've published.
Conference Proceedings	NeurIPS, ICML, ICLR, ACL, EMNLP, USENIX Security, IEEE S&P. Look at accepted paper lists — especially papers from non-obvious institutions (startups, independent researchers).	All research roles, especially Research Scientists and Research Managers	Conference talks are often on YouTube — you can reference their talk. Workshops (not just main track) are where up-and-coming researchers present early-stage work.
Twitter / X	Search hashtags: `#mechanisticinterpretability`, `#AIalignment`, `#RLHF`, `#LLM`. Follow major AI researchers — their networks are full of candidates. Retweet activity reveals who is engaged with what topics.	All AI research and engineering roles	Twitter DMs are often more effective than email for AI researchers. Keep it brief, genuine, and specific. Many researchers have public email in their bio.

Boolean Search Templates

Copy-paste these into LinkedIn Recruiter or other sourcing tools.

Research Scientist (Core)


          ("Research Scientist" OR "Research Engineer") AND ("large language model" OR "LLM" OR "transformer") AND ("NeurIPS" OR "ICML" OR "ICLR" OR "publications")

Safety / Alignment


          ("AI safety" OR "AI alignment" OR "interpretability" OR "RLHF") AND ("language model" OR "Claude" OR "GPT" OR "LLM")

RL / Post-Training


          ("reinforcement learning from human feedback" OR "RLHF" OR "reward model" OR "PPO") AND ("language model" OR "LLM" OR "post-training")

GPU / Performance


          ("CUDA" OR "GPU kernel" OR "Triton" OR "GPU optimization") AND ("machine learning" OR "deep learning" OR "training")

General Screening Tips for AI Roles

Cross-cutting advice that applies to all AI research and engineering roles.

🎯 Key Principle

AI researchers and engineers are often skeptical of recruiters who don't understand their work. The more specific and knowledgeable your outreach and screening conversations are, the better your response rates will be. Use this guide to sound informed — even if you don't understand every detail.

✓ Strong Candidate Signals — All AI Roles

Has published papers at top venues (NeurIPS, ICML, ICLR, ACL, USENIX)
Active GitHub with well-maintained ML repositories that others star or fork
Can clearly explain the business/mission impact of their past work
Speaks specifically about failures and what they learned
Mentions collaboration with cross-functional teams, not just working solo
Shows progression in scope — has gone from individual contributor to leading projects
Can explain deeply technical concepts clearly without jargon overload
Expresses genuine intellectual curiosity beyond just their current role
Has presented research externally (conferences, blog posts, talks)
Can describe trade-offs in technical decisions, not just "I used X technology"

⚠ Red Flags — Warning Signs Across AI Roles

Claims expertise in every hot AI topic (GPT-4, Llama, RLHF, etc.) without depth in any
Resume lists cutting-edge frameworks but can't explain why they chose them
Can't point to a specific system they built that is in production or was published
Vague about team size — "worked on a team" without clarity on their personal contribution
For research roles: no publications, no citations, no open-source contributions at senior level
Can't describe a research failure or a wrong hypothesis they pursued
Claims full credit for large team accomplishments with no specific personal ownership
Dismissive of safety or ethics concerns ("that's someone else's job")
Primarily uses commercial AutoML tools or ChatGPT outputs without understanding the underlying models

Universal Screening Questions

These apply across all AI research and engineering roles.

Opening Questions

What are you working on right now that you're most excited about?
What drew you to AI research or engineering specifically, rather than other fields of software?
Tell me about the project you're most proud of. What was your specific contribution?

Mission & Values Questions

How do you think about the responsibility that comes with working on very powerful AI systems?
What do you think is the most important problem in AI to be working on right now, and why?
How do you stay current with the pace of change in this field?

Depth Probe Questions

Explain [a specific term from their resume] in simple terms, as if talking to a smart non-technical person.
What's a technical decision you made that you'd make differently today? What changed?
What's the most important paper or piece of research from the last year that influenced your work?

Collaboration Questions

How do you work with people who have very different technical backgrounds from yours?
Describe a time you disagreed with a technical direction your team was taking. How did you handle it?
How have you communicated a complex technical result to a non-technical stakeholder?

Salary & Compensation Benchmarks

Approximate ranges for frontier AI companies (US-based). These vary significantly by level, location, and company stage.

Role Level	Total Comp Range (US)	Key Compensation Notes
New Grad Research Engineer / Scientist	$200K – $350K	Significant equity component at frontier labs. Base ~$150–200K at top companies.
Mid-Level (3–5 yrs) Research Engineer	$300K – $500K	Equity refreshes add significantly. Equity vesting usually 4 years with 1-year cliff.
Senior Research Scientist / Engineer	$400K – $700K+	Performance-based bonuses common. Level definitions vary widely by company.
Staff / Principal Research Engineer	$600K – $1M+	Equity grants can dominate at this level, especially at pre-IPO companies.
Engineering Manager (Research Teams)	$450K – $800K+	Similar to senior IC, sometimes slightly lower base; management premium varies.
GPU / Performance Engineer (Senior)	$500K – $900K+	Extreme scarcity of talent drives premium. CUDA expertise is especially compensated.

* Ranges are illustrative based on publicly available data and industry reports as of 2024–2025. Actual compensation varies by company, location, negotiation, and equity value at time of exercise.

AI Research & EngineeringRole Decoder

Understanding the AI Engineering Org

Research Scientists & Engineers

RL, Post-Training & Pretraining

Safety, Alignment & Interpretability

Infrastructure, GPU & Performance

Agents, Products & Applied AI

Engineering Management

The "Research Engineer" Explained

Common Technology Stack

Research Scientists & Core Researchers

What They Do

Key Skills to Look For

What "Good" Looks Like on a Resume

📋 Screening Questions

Boolean Search Keywords

Source On

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

Reinforcement Learning, Post-Training & Pretraining

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

Source On

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

Safety, Alignment & Interpretability

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

Source On

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

Infrastructure, GPU & Performance Engineering

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

Source On

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

What They Do

Key Skills

📋 Screening Questions

Boolean Search Keywords

What They Do

AI Research & Engineering
Role Decoder