About Kog

Kog builds the fastest LLM inference engine on standard datacenter GPUs. Our Kog Inference Engine generates 3,000 output tokens per second per request on a single 8× AMD MI300X node and 2,100 on an 8× NVIDIA H200 node (FP16, batch size 1, no speculative decoding).

We co-design the model architecture and the execution engine together. Our Laneformer model uses Delayed Tensor Parallelism (DTP), a novel architecture that restructures the Transformer dependency graph so inter-GPU communication overlaps with computation rather than blocking it. We pretrained a 2B-parameter DTP model on 6T tokens on 256 H100 GPUs.

We are a team of 11 people, including 10 engineers and 4 PhDs.

Test it at playground.kog.ai. Read the technical details on the Kog Labs blog.

What you will work on

You will own the model architecture roadmap at Kog. You will imagine, design and run experiments to understand how architectural decisions propagate through inference behavior, morph existing open-weight models into architecture variants optimized for speed, and turn findings into measurable gains in generation speed and model quality. You move with equal rigor between theory, implementation, and measured outcomes.

Design new model architecture variants, including routing strategies, attention mechanisms, and MoE structure, with execution constraints as a first-order design input.
Extend the Laneformer thesis by exploring inference-aware architectural variants such as DTP, Ladder Residual, and PT-Transformer, and finding what compounds at scale.
Own the post-training pipeline across fine-tuning, evaluation methodology, and adaptation of existing open-weight models toward architecture variants optimized for inference speed.
Scale the stack to large MoE models such as DeepSeek v4 and Qwen 3, working through routing, expert parallelism, and communication patterns at inference time.
Write up findings as research papers, submit them to top venues, and present them at conferences.
Set research direction for a small team, make calls on what to pursue and what to cut, and connect ambitious research targets to work that can be shipped and measured quickly.
Contribute to building AI agents that will perform architecture research and training experiments autonomously, starting from the research foundations we are building now.

What we look for

You have owned research directions that produced measurable results and can explain the mechanism behind them clearly. You have made architectural or post-training decisions that worked, and you have a public trace of serious work with implementation depth — a paper, a repository, or open-source contributions that others have built on.

You have operated with real ownership over a research agenda and raised the standard of the people around you through research direction, code, and decision-making. You are comfortable carrying both individual research depth and team-level responsibility in the same role.

Strong signals include experience adapting or modifying existing model architectures, understanding of how communication structure and layer dependencies affect inference behavior, fluency in Transformers and MoE with enough depth to reason across trade-offs, and production-grade research code in PyTorch or JAX.

Top 0.1% for this role

The strongest candidates have already developed original architectural judgment. When they encounter an idea like DTP, they immediately understand the downstream consequences for routing, layer structure, optimization, and convergence. They know how to turn that understanding into a research program and a sequence of experiments that compounds.

They bring both authorship and taste. They know when a result is fundamental, when it is local, and when a model change is worth the system cost it introduces. And they make the researchers around them better.

What we offer

Direct access to AMD and NVIDIA datacenter GPUs from day one
A team where creativity and technical judgment carry weight and where the people closest to the problem shape the key decisions
Problems that sit on the critical path of model execution speed and that directly influence what the system can become
Compensation aligned with top technical profiles in the Paris AI market, including meaningful equity

ML/AI Work links you to the employer's original posting — always verify the details there before applying.

Lead Research Engineer

Job description

About Kog

What you will work on

What we look for

Top 0.1% for this role

What we offer

More ML Systems and Inference roles

Principal Network Engineer - AI Infrastructure

Principal Network Engineer - AI Infrastructure

Principal Network Engineer - AI Infrastructure

Senior Principal AI Agent / ML Engineer (OCI)

Principal AI Agent / ML Software Engineer (OCI)

Founding Software Engineer (Backend, Cloud & AI Infrastructure)