Skip to content

Careers

Cloud and Cluster Administrator

brainlife.io runs Apps on varaiety of HPC clusters and public and private cloud computing platforms. You will be responsible for maintaining these clusters and optimize our auto-scaling clusters that can meet the current demand from our users in a most efficient manner.

You are a solid fit for this position if...

  • 3+ year working with OpenStack, and on other cloud computing platforms such as Azure, GCP, and AWS.
  • Bachelor’s degree in Computer Science, Computer Engineering or related technical discipline
  • Familiarity with cloud specific deployment mechanisms and experience with deploying auto-scaling Slurm clusters.
  • Strong network engineering / system administration skills on Linux based systems.
  • Strong HPC, and various batch scheduling systems.
  • Experience with docker/singularity.
  • Excellent troubleshooting skills. You can solve complex issues independently and seek input from the team when stuck.

Our idea of a perfect candidate is someone who..

  • Eager to learn new technologies and open to receive feedback from other team members.
  • Familiar with working with Kubernetes and microservice architecture.
  • Demonstrable ability to write clean, concise, and maintainable code.
  • Python, and bash. Familiarity with nodejs, npm and Javascript ecosystems.
  • Familiarity with common OWASP vulnerabilities.
  • Great feedbacks from your past work colleagues and other research / software engineers.
  • git, ansible, experience using monitoring systems (sensu, munin, etc..)
  • Experience with 24x7 operations.
  • You can work without a lot of supervisions, but can keep the team member informed of your progress.

This role will be responsible for...

  • Work with App Engineers to troubleshoot Apps that are failing.
  • Build and administer slurm clusters on public and private cloud infrastructures.
  • Monitor and operate clusters and troubleshoot problems / optimize ineffiency.
  • Assist other engineers to identify security issues and ways to solve those issues.
  • Monitor and identify computing bottlenecks and adjust / create new clusters to alleviate the problems.

Apply Now