Site Reliability Engineer (SRE)

Tel Aviv, Israel
Full Time
Engineering
Mid Level

We are looking for a highly experienced Site Reliability Engineer (SRE) or DevOps Engineer to join our passionate Engineering team. In this critical role, you will be instrumental in ensuring the reliability, performance, and scalability of our core infrastructure. You will work with the latest cloud technologies, focusing on automating and optimizing our continuous integration and continuous deployment pipelines, and managing our Kubernetes environment.

Qualifications

  • 5+ years of experience as DevOps or SRE engineer role
  • Experience designing and operating large-scale distributed systems.
  • Deep understanding of SRE principles and practices (SLOs/SLIs, Error Budgets, Toil reduction).
  • Kubernetes cluster administration working knowledge (preferably EKS), using Helm, gitops.
  • Scripting and automation skills (Shell, Python, etc.)
  • Experience using a broad range of AWS technologies (EC2, S3, VPC, Lambda, IAM, CloudWatch, etc.)
  • Proven record of build automation and CI/CD pipelines, including github actions, ArgoCD, FluxCD)
  • Experience working with monitoring frameworks like Grafana, DataDog, Prometheus, ELK
  • Experience with cloud-managed database services (e.g., AWS RDS, Redis, DynamoDB).
  • Knowledge of DNS, Load Balancing, SSL, TCP/IP, networking, and security
  • Provision infra using IaC tools such as Teraform, serverless framework, Cloudformation, Pulumi.
  • Experience with DB administration and maintenance
  • Outstanding interpersonal communication skills

What You’ll Do

  • Analyzes and determines integration needs.
  • Automates infrastructure and application deployment on AWS.
  • Identify manual processes that can be automated
  • Maintain and improve our cloud infrastructure
  • Continuously maintain and improve our CI/CD
  • Design, implement, and maintain scalable and highly-available infrastructure systems, focusing on reliability and performance.
  • Develop and implement robust monitoring, alerting, and logging solutions to proactively identify and resolve potential system issues.
  • Conduct blameless post-mortems for critical incidents, driving continuous improvement in system resilience.
  • Participate in capacity planning and performance tuning to ensure the platform can handle current and future load.
  • Must - Be available on-call to respond to and resolve critical infrastructure issues outside of regular business hours. (including weekends)
 

LinearB Values:

  • Put the Customer First
  • Take Ownership
  • One Team
  • Show Product Expertise
  • Be Data Driven
  • Reach for the Next Level
  • Listen Curiously & Speak Courageously

LinearB is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, color, religion, national origin, disability, protected veteran status, age, or any other characteristic protected by law.

#LI-hybrid.

 
Share

Apply for this position

Required*
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*