ML Infrastructure Engineer
(Multiple states)
Full Time
Experienced
Scaled Cognition is developing a new generation of rational, controllable AI models deployable as domain experts for grounded, real-world applications.
As a ML Infrastructure Engineer at Scaled Cognition you will:
- Design and develop the GPU infrastructure that powers our AI models.
- Ensure that our infrastructure can scale.
- Collaborate with research scientists and product engineers to streamline the end-to-end process from model development to production deployment.
Example projects could include:
- Automated GPU Provisioning: Design an automated provisioning system to dynamically allocate GPUs based on workload demands.
- Benchmarking Pipeline: Develop a pipeline for running benchmarking experiments, enabling comparative analysis of model performance.
- Experiment Analysis: Build monitoring and analysis tools to track experiment progress, monitor performance metrics, and visualize results.
You might be the right person for the job if you:
- Are a continuous learner and are eager to explore new tools and technologies.
- Thrive in a dynamic environment and are comfortable adapting to changing priorities while maintaining a focus on delivering high-quality solutions.
- Have successfully navigated projects with significant product and technical ambiguity, and you excel at the intersection of complex technical challenges and user-focused solutions.
Preferred Qualifications:
- Prior experience designing and implementing GPU infrastructure.
- Experience with ML Ops tools such as MLflow, TensorBoard, or Weights & Biases.
- A strong sense for scalability and developing secure, highly reliable environments.
Apply for this position
Required*