Site Reliability Engineer (SRE)

Site Reliability Engineer


  • Experience managing production Kubernetes clusters.
  • Experience running applications in productions on Kubernetes.
  • Must be fluent in at least one programming language such as Python, GoLang or Ruby.
  • 3+ years in a combination of DevOps, SRE, Software Development or Systems Operations roles.
  • 1+ years experience managing production workloads on a major cloud provider such as AWS, GCP, Azure or DigitalOcean.
  • Demonstrated understanding of containers and container orchestration.
  • Troubleshooting skills that span systems, network (TCP/IP), and code.
  • Must have experience building or managing large-scale systems and application architectures.
  • Solid understanding of system performance and monitoring.
  • Working knowledge of cloud computing including virtualization, hosted services, multi-tenant cloud infrastructures, distributed storage systems and content delivery networks.
  • Experience working with source control management tools, GitHub is a huge plus.
  • Excellent verbal and written communication skills.

Nice to Haves

  • Experience building extensions to the Kubernetes API such as Custom Resource Definitions using tools such as Kubebuilder, Operator SDK or Aggregated API Server.
  • A demonstrated history of working on and contributing to open source projects.
  • Experience work on remotely distributed
  • Experience with load balancers such as Elastic Load Balancer, NGINX, Envoy, HAProxy or Google Cloud Load Balancer
  • Experience with Cloud Native ecosystem projects such as Cluster Autoscaler, CoreDNS, Pod Autoscaler, etc
  • Experience with infrastructure configuration and automation processes and tools: Ansible, Fabric, Terraform, Puppet, Chef.
  • Experience with hosting Content Management applications such as Gatsby, Drupal, Typo3, etc.
  • Experience with monitoring solutions: Prometheus, ELK, Splunk, SUMO, Nagios or fluentd
  • Experience with various data technologies including relational and nonrelational databases and message queues.
  • Experience with distributed storage systems: S3, Ceph, GlusterFS, EFS, EBS or Rook