We're seeking an exceptional AI Systems Engineer to build and optimize the infrastructure that powers our large-scale AI models. You will work on model deployment, inference optimization, and ensuring the reliability of our mission-critical systems.
This role sits at the intersection of machine learning and infrastructure engineering. You'll be responsible for designing scalable systems that enable our research team to train state-of-the-art models while ensuring production deployments are fast, reliable, and cost-effective.
Key Responsibilities
Infrastructure Design: Design and maintain scalable infrastructure for training and deploying large language models across cloud platforms (AWS, Azure, GCP).
Performance Optimization: Optimize inference latency and throughput using techniques like quantization, model distillation, and efficient serving architectures.
Container Orchestration: Manage Kubernetes clusters and orchestrate containerized workloads for distributed training and inference at scale.
MLOps Pipelines: Build robust CI/CD pipelines for machine learning workflows, including automated testing, model versioning, and deployment.
Monitoring & Reliability: Implement comprehensive monitoring, logging, and alerting systems to ensure 99.9%+ uptime for production AI services.
Cost Optimization: Architect solutions that balance performance with cost efficiency, leveraging spot instances, auto-scaling, and resource optimization.
Collaboration: Work closely with research scientists to understand model requirements and translate them into production-ready infrastructure.
Required Qualifications
B.S. or M.S. in Computer Science, Engineering, or equivalent practical experience.
Strong proficiency in Python for scripting, automation, and ML pipelines.
Extensive experience with containerization technologies (Docker, Kubernetes).
Hands-on experience with at least one major cloud platform (AWS, Azure, or GCP).
Familiarity with ML frameworks (PyTorch, TensorFlow) and model serving tools (Triton Inference Server, TorchServe, or TensorFlow Serving).
Understanding of distributed systems, networking, and storage solutions.
Experience with Infrastructure as Code tools (Terraform, CloudFormation, or Pulumi).
Preferred Qualifications
Experience deploying and serving large language models (LLMs) in production environments.
Knowledge of GPU optimization techniques (CUDA, TensorRT, mixed precision training).
Familiarity with distributed training frameworks (Ray, Horovod, DeepSpeed).
Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack).
Understanding of model compression techniques (quantization, pruning, knowledge distillation).
Contributions to open-source ML infrastructure projects.
Experience with Bayesian deep learning or uncertainty quantification systems.
At TeraSystemsAI, we cultivate a culture of radical curiosity and psychological safety. We believe in deep work over shallow productivity, flexible schedules over rigid hours, and impact over optics.
We're a distributed team of world-class engineers and researchers committed to building AI that is safe, transparent, and transformative. Every voice matters, from interns to founders.
Ready to Build the Future?
Join us in creating the infrastructure that powers next-generation AI systems.