About Huawei Research and Development UK Limited

Founded in 1987, Huawei is a leading global provider of information and communications technology (ICT) infrastructure and smart devices. We have 207,000 employees and operate in over 170 countries and regions, serving more than three billion people around the world.

Our vision and mission is to bring digital to every person, home and organization for a fully connected, intelligent world. To this end, we will drive ubiquitous connectivity and promote equal access to networks; bring cloud and artificial intelligence to all four corners of the earth to provide superior computing power where you need it, when you need it; build digital platforms to help all industries and organizations become more agile, efficient, and dynamic; redefine user experience with AI, making it more personalized for people in all aspects of their life, whether they’re at home, in the office, or on the go.

This spirit of innovation has led Huawei to work in close partnership with leading academic institutions in the UK to develop and refine the latest technologies. With a shared commitment to innovation and progress, both parties have worked together to achieve common goals and establish a strong partnership. The partnership between UK and Huawei help to develop the technologies of the future that will transform the way we all communicate, work and live.

For the past 30 years we have maintained an unwavering focus, rejecting shortcuts and easy opportunities that don't align with our core business. With a practical approach to everything we do, we concentrate our efforts and invest patiently to drive technological breakthroughs.

This strategic focus is a reflection of our core values:

staying customer-centric,
inspiring dedication,
persevering,
Growing by reflection

Huawei Research and Development UK Limited Overview

Huawei’s vision is a fully connected, intelligent world. To achieve this, we work to inspire passion for basic research around the world. Our combined passion drives development across the global innovation value chain. Huawei has the largest Research and Development organization in the world with 96,000+ employees in research centers around the globe. In the UK, we already have design centers in Cambridge, London, Edinburgh and Ipswich. We continue to explore and define new research directions and new services. We have expanded our collaborations with academic researchers; researched new network architectures, integration of communications and key enabling technologies; and developed the fundamental theories of these technologies. We invite you to join us on this exciting journey and drive your career forward.

Job Vision

In an era where computing power is evolving toward full intelligence, AI training and inference are driving a comprehensive reconstruction of the foundational software stack. With the launch of Huawei CloudMatrix super-node clusters, the architectures for AI model training, inference, and agent systems are undergoing unprecedented transformation. Enabled by multi-chip heterogeneous acceleration, ultra-high-speed interconnect, high-bandwidth memory, and multi-tier compute pooling, the industry is reaching a critical turning point: AI can no longer rely on traditional cloud resource scheduling and service frameworks. It requires a newly rebuilt, model and intelligence-centric infrastructure — AI Infra.

Against this backdrop, we have initiated development of a new AI Infra & Agentic Serving architecture. Our goal is to build a unified AI foundation that allows LLM inference, multimodal processing, and agent workflows to operate on CloudMatrix super-nodes with extreme performance, scalability, and stability. This will become a core enabling capability for future Huawei Cloud, on-device intelligence, and industry-scale model platforms.

In this role, you will help define and build:

AI-native Runtime and Serving framework: Redesign inference execution paths, orchestration logic, model lifecycle and data flows to fully exploit super-node hardware features such as memory pooling and fully interconnected high-speed fabrics;
Agentic AI core infrastructure: provides base-level state management, KV-Cache system, and efficient retrieval capabilities for agent memory, tool calling, workflow execution, and multi-agent collaboration;
Next-generation serverless AI: implements Function-as-a-Service (FaaS) Inference, near instantaneous scaling, multi-model hybrid loading, cold start optimization, and peak throughput mode on the unified Runtime;
Performance and cost engineering at ultra scale: Deliver industry-leading performance, latency, and cost efficiency under scenarios such as billion-level QPS, multi-model sharing/co-hosting, and multi-tenant isolation.

Here, you will face the most critical challenges for AI Infra in the next decade:

How do we design a truly AI-native execution framework on super-node hardware;
How can we make agent memory more efficient and support continuously improving reasoning;
How do we achieve optimal trade-offs under hardware constraints, power budgets, bandwidth bottlenecks, and scheduling limitations;
How do we build the world’s leading AI serving platform?

This is a position that offers end-to-end design participation from ‘Hardware Capabilities -> AI Runtime -> Agent Systems’. Your work will influence Huawei's future LLM platform, the capability boundary between industry-level knowledge workers and agents, and the design of the next-generation AI Runtime.

Join us, and you will not only be building a system — you will be shaping the foundational standards of the intelligent era. This is one of the few positions where technical insight, systems architecture, and engineering execution converge to the highest level, offering a uniquely creative and impactful stage in your career.

Key Responsibilities:

Architecture Planning: Design a unified AI Infra & Serving architecture platform for composite AI workloads such as LLM Training & Inference, RLHF, Agent, and Multimodal processing. This platform will integrate inference, orchestration, and state management, defining the technical evolution path for Serverless AI + Agentic Serving within the company;
Agent Serving Framework: Design a heterogeneous execution framework across CPU/GPU/NPU for agent memory, tool invocation, and long-running multi-turn conversations and tasks. Build an efficient memory/KV-cache/vector store/logging and state-management subsystem to support agent retrieval, planning, and persistent memory.
Serverless AI Foundation Design: Build a high-performance Runtime/Framework that defines the next-generation Serverless AI foundation through elastic scaling, cold start optimization, batch processing, function-based inference, request orchestration, dynamic decoupled deployment, and other features to support performance scenarios such as multiple models, multi-tenancy, and high concurrency.
Performance and Cost Optimization: Leverage Huawei’s self-developed hardware stack and End–Edge–Cloud co-optimization to deliver AI infrastructure with industry-leading performance and throughput, ultra-low latency, and best-in-class observability.
Frontier Technology Insights: Continuously track cutting-edge developments in Serverless AI, LLM Serving, and Agentic AI, generate structured insights, and feed them back into architecture evolution and product roadmaps.
Cross-Team Collaboration: Serve as a Team leader, working closely with accelerator, operating system, cloud platform, and AI application teams to drive successful deployment of the architecture solutions in real-world business scenarios.

This job description is only an outline of the tasks, responsibilities and outcomes required of the role. The jobholder will carry out any other duties as may be reasonably required by his/her line manager. The job description and personal specification may be reviewed on an ongoing basis in accordance with the changing needs of Huawei Research and Development UK Limited.

Person Specification:

Required:

Strong foundational knowledge in system architecture: Proficient in computer architecture, operating systems, and runtime environments; familiar with large-scale distributed service architectures and the fundamental principles of storage and networking;
Serverless & Cloud-Native experience: Hands-on experience with FaaS/Serverless architectures; Familiar with cloud-native optimization technologies such as containers, Kubernetes, service orchestration, and autoscaling;
Expertise in AI Serving: Deep familiar with the core mechanisms of mainstream LLM Serving technologies (e.g., vLLM, SGLang, Ray Serve, etc.); understand common optimization concepts such as continuous batching, KV-Cache reuse, parallelism, and compression/quantization/distillation;
Experience in Agentic AI domains: Solid understanding of the basic architecture and typical components of Agentic AI/AI Agents (Memory, Tool/Function Calling, Planner, Executor, Multi-Agent Collaboration, etc.); Clear understanding of memory organization, retrieval, and state management;
Performance analysis and optimization skills: Proficient in using Profiling/Tracing tools; experienced in analyzing and optimizing system-level bottlenecks regarding GPU utilization, memory/bandwidth, Interconnect Fabric, and network/storage paths;
Programming and engineering capabilities: Proficient in at least one system-level language (e.g., C/C++, Go, Rust) and one scripting language (e.g., Python); able to maintain high code quality standards and follow engineering best practices;
Communication and architectural articulation skills: Able to clearly articulate complex architectural solutions and collaborate across multiple teams and regions to shape future technology choices and roadmap evolution.

Desired:

Experience in production deployment of LLM Serving/Agent/Serverless platforms (e.g., supporting Cloud FaaS platforms, search/recommendation systems, or Agent-based products), with practical experience in inference acceleration and system co-optimization for GPUs/heterogeneous hardware.
Significant technical achievements or peer-reviewed publications (e.g., at top-tier conferences) in fields related to distributed systems and AI infrastructure. Active contribution or a Maintainer role in open-source communities (such as LangChain, LangGraph, Kubernetes, Ray, vLLM, SGLang, TensorRT-LLM, etc.).
Experience in leading teams to deliver architectural implementations and drive cross-functional technical projects.

What we offer

33 days annual leave entitlement per year (including UK public holidays)
Group Personal Pension
Life insurance
Private medical insurance
Medical expense claim scheme
Employee Assistance Program
Cycle to work scheme
Company sports club and social events
Additional time off for learning and development

AI Infrastructure Architect

Job Description