Large Model Training Acceleration Engineer 4007

San Jose, US-United States

Posted 13 hours ago

About The Company

This company pioneers short-form video creation and social engagement, boasting a vast, engaged user base. Its platform empowers users with creative tools, filters, and effects. With a diverse content ecosystem, it’s a hub of creativity and expression. The proprietary algorithm ensures personalized content feeds, enhancing user engagement and satisfaction. This company wields significant influence on digital media, making it an invaluable partner for innovative collaborations and marketing endeavors.

📍 San Jose, CA | Onsite | Full-Time | Visa Support Possible

About the Team

We are an AI platform engineering group focused on large-scale model training systems and performance acceleration. The team builds distributed training infrastructure and optimization technologies for next-generation generative AI and computer vision models. The work supports high-scale production AI systems and cutting-edge model training pipelines.

Role Overview

We are seeking an engineer specializing in large model training acceleration and distributed optimization. This role focuses on improving training efficiency, scalability, and performance for large generative and multimodal models across distributed compute environments.

Responsibilities

• Optimize large model training pipelines for performance and scalability
• Design and improve distributed training systems
• Implement and tune data, model, and pipeline parallelism strategies
• Benchmark and profile training workloads to identify bottlenecks
• Improve GPU utilization and training throughput
• Collaborate with infrastructure and research teams on large-scale training systems
• Build performance tooling and optimization frameworks for training acceleration

Required Qualifications

• Bachelor’s, Master’s, or PhD in Computer Science, AI, Electrical Engineering, or related field
• 3–10 years of experience in deep learning systems or large model training
• Strong experience with distributed training optimization
• Hands-on experience with parallel training methods:
• Data parallelism
• Model parallelism
• Pipeline parallelism
• Strong software engineering skills in Python and C++
• CUDA and GPU performance optimization experience
• Experience with deep learning frameworks such as PyTorch
• Experience with large model toolchains such as Megatron or DeepSpeed
• Familiarity with transformer and diffusion models
• Experience with benchmarking and profiling tools

Preferred Background

• Experience in generative AI or computer vision training systems
• Experience building large-scale training infrastructure
• Experience with high-performance distributed compute environments

Language

Mandarin Chinese proficiency preferred

Compensation (Estimated Range)

Base salary range: $136,800 – $359,720 depending on level and experience Equity and additional benefits may be included

Work Authorization

Visa sponsorship may be considered under certain conditions

Job Features

Job Category	AI Research
Seniority	Senior IC / Tech Lead
Base Salary	$136,000 - $359,720
Recruiter	nina.li@ocbridge.ai

Large Model Training Acceleration Engineer 4007

Related

Job Features

Apply Online

Related

About

Service

Contact