San Jose, US-United States
Posted 13 hours ago
| About The Company This company pioneers short-form video creation and social engagement, boasting a vast, engaged user base. Its platform empowers users with creative tools, filters, and effects. With a diverse content ecosystem, it’s a hub of creativity and expression. The proprietary algorithm ensures personalized content feeds, enhancing user engagement and satisfaction. This company wields significant influence on digital media, making it an invaluable partner for innovative collaborations and marketing endeavors. About the Team We are an applied research team focused on Generative AI and Multimodal Understanding. The group works on advanced generative technologies across image, video, and multimodal systems, enabling scalable and practical AI creation tools. Research areas include generative modeling, image and video synthesis, intelligent editing, and virtual human technologies. The team emphasizes translating cutting-edge research into production-ready, efficient model systems. About the Team We are an AI platform engineering team building large-scale end-to-end AI production pipelines covering model training, optimization, deployment, and real-world applications. The team focuses on delivering scalable AI infrastructure and efficiency technologies that support high-volume generative AI and multimodal systems in production environments. Role Overview We are seeking an experienced AI model optimization engineer specializing in large model inference acceleration. This role focuses on optimizing inference performance, scalability, and deployment efficiency for large-scale generative and foundation models across heterogeneous hardware environments. Responsibilities • Design and optimize large model inference pipelines for low-latency and high-throughput production deployments • Apply high-performance optimization techniques across diverse hardware architectures • Benchmark and profile deep learning models to identify performance bottlenecks • Optimize compute, memory, and kernel performance for large model inference • Work on distributed inference and acceleration strategies • Collaborate with infrastructure and production engineering teams to integrate optimized models into production systems Minimum Qualifications • Master’s or PhD in Computer Science, Electrical Engineering, AI, or related field • Strong software engineering skills in Python and C++ • Strong CUDA programming experience • 5+ years of experience in AI model inference optimization or acceleration • Experience with ML compilers and performance optimization techniques • Experience with parallel computing, graph fusion, and kernel optimization • Hands-on experience with inference acceleration frameworks such as TensorRT, Triton, or Cutlass • Solid understanding of transformer and diffusion model architectures • Strong system-level performance debugging skills Language Requirement • Professional working proficiency in Mandarin and English required for cross-regional technical collaboration Preferred Qualifications • Experience optimizing large generative or multimodal models in production • Experience with distributed inference systems • Experience with hardware-aware model optimization • Experience working closely with AI infrastructure or ML systems teams Equal Opportunity Statement We are an equal opportunity employer and consider qualified applicants in accordance with applicable laws. Reasonable accommodations are available during the recruitment process when needed. |
Job Features
| Job Category | AI Research |
| Seniority | Senior IC / Tech Lead |
| Recruiter | nina.li@ocbridge.ai |
