Metropole.jpg

Staff Site Reliability Engineer

Grid.ai

facebook-icon.png
linkedin-icon.png
twitter-icon.png

share this job opportunity

€ 110,000 - 160,000 gross/year

United Kingdom

Information Technology, Engineering

English

hybrid, remote

Staff Site Reliability Engineer

Change the text and make it your own. Click here to begin editing.

your profile

We are looking for a software engineer with experience building and scaling complex distributed systems on the cloud using Python, Go, and/or Kubernetes. You thrive in a fast-paced, start-up environment with a focus on speed, quality, and iteration from proof of concept to release.

about the company

Grid.ai enables companies of all sizes to train state-of-the-art AI models on hundreds of cloud GPUs and TPUs from their laptops. From the creator of the popular framework PyTorch Lightning, Grid is a platform for training models that enables rapid research iteration. Grid aims to simplify scalable AI research so that when a network becomes complex, code doesn’t.

diversity statement

"Check out our website and our values!"

is looking for you!
Metropole.jpg

your area of responsibility

You will join the Platform team and report to Noha, our Director of Engineering. This team is responsible for delivering Grid's computing infrastructure at scale, targeting computing resources on multiple cloud providers, and optimizing datastores. Cross-team collaboration is very important to us, so you will also work on data models and microservices with the Application team.


What you’ll do


  • Operate our k8s and cloud infrastructure, mastering our technology stack to support delivering new features, improve system stability, and increase overall performance.

  • Partner with engineering and product leaders on developing the platform architecture, and using your experience to inform the technical direction for large-scale projects.

  • Evaluate, strengthen, and document technical architecture, tools, and processes.

  • Champion software quality, implement automation, drive continuous delivery, and reduce time to production while proactively improving our product.

  • Mentor and coach engineers on system design, operating in high uncertainty, and problem-solving to create a supportive, inclusive environment in which each engineer can grow.

the benefits

Lots, too many to list, but here are some!


  • Medical, Dental, Vision, Life & AD&D insurance, 5 weeks of paid time off per year along with 3 additional weeks where our company shuts down for forced vacation each year!

  • 12 weeks of paid family leave for employees who have worked with the company for 6-12 months and 16 weeks of paid family leave for employees who have worked with the company for 12+ months

  • $500 monthly meal reimbursement, including groceries & food delivery services

  • $1,000 1-time home office build out for a desk, chair, keyboard, pair of headphones, and/or other home office supplies

  • $1,000 annual learning & development stipend for classes, conferences, certifications, and continuing education

  • 100% covered bike membership

  • $45/month to ClassPass, a platform providing access to over 30,000 gyms and studios, as well as 5,000 beauty and spa partners, in 28 countries

  • 🌎 full remote / work from anywhere

  • 🛂 visa sponsorship

  • ✈️ relocation support

facebook-icon.png
linkedin-icon.png
twitter-icon.png

share this job opportunity

join our talent pool and get alerts for dream jobs just like this one