ML Infrastructure Engineer

(Multiple states)
Full Time

Scaled Cognition is developing a new generation of rational, controllable AI models deployable as domain experts for grounded, real-world applications.


As a ML Infrastructure Engineer at Scaled Cognition you will:

  • Design and develop the GPU infrastructure that powers our AI models.
  • Ensure that our infrastructure can scale.
  • Collaborate with research scientists and product engineers to streamline the end-to-end process from model development to production deployment.

Example projects could include:

  • Automated GPU Provisioning: Design an automated provisioning system to dynamically allocate GPUs based on workload demands.
  • Benchmarking Pipeline: Develop a pipeline for running benchmarking experiments, enabling comparative analysis of model performance.
  • Experiment Analysis: Build monitoring and analysis tools to track experiment progress, monitor performance metrics, and visualize results.

You might be the right person for the job if you:

  • Are a continuous learner and are eager to explore new tools and technologies.
  • Thrive in a dynamic environment and are comfortable adapting to changing priorities while maintaining a focus on delivering high-quality solutions.
  • Have successfully navigated projects with significant product and technical ambiguity, and you excel at the intersection of complex technical challenges and user-focused solutions.

Preferred Qualifications:

  • Prior experience designing and implementing GPU infrastructure.
  • Experience with ML Ops tools such as MLflow, TensorBoard, or Weights & Biases.
  • A strong sense for scalability and developing secure, highly reliable environments.

Apply for this position

We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*