Site Reliability Engineer (SRE) (3+ Years ) Noida

Full-Time @AES Technologies Pvt. Ltd
  • Noida, Uttar Pradesh, India View on Map
  • Post Date : December 29, 2025
  • Salary: ₹300,000.00 - ₹500,000.00 / Yearly
  • View(s) 103
Email Job

Job Detail

  • Job ID 63075

Job Description

Site Reliability Engineer (SRE)

Experience: 3+ Years

Location: Noida

Employment Type: Full Time

Role Overview

We are looking for a skilled Site Reliability Engineer (SRE) with strong DevOps fundamentals to ensure the reliability, performance, and scalability of our infrastructure and applications. The ideal candidate has hands-on experience with Linux, Kubernetes, monitoring tools, automation, and production support in cloud-native environments.

Key Responsibilities

Linux Administration

  • Deploy, manage, and maintain Linux-based systems
  • Troubleshoot OS-level issues to ensure high availability
  • Perform performance tuning, security hardening, and system optimization

Kubernetes (Must Have)

  • Deploy, configure, and manage Kubernetes clusters
  • Debug container orchestration, pod, service, and deployment issues
  • Ensure scalable and reliable containerized application deployments

Monitoring & Observability (Must Have)

  • Implement and manage Prometheus and Grafana
  • Build and maintain real-time dashboards for system and application metrics
  • Analyze monitoring data to identify and resolve performance issues

Automation & Scheduling

  • Configure and manage CronJobs for scheduled automation tasks
  • Troubleshoot job failures and improve automation workflows
  • Write scripts for operational efficiency (Shell / Python / Ansible)

Cloud & Platform Operations

  • Work with cloud-based infrastructure and services
  • Assist in troubleshooting cloud-related incidents
  • Understand and operate cloud-native environments

ARGO (Mandatory)

  • Work with ARGO / ARGO Workflows for deployment and workflow orchestration
  • Troubleshoot pipeline and workflow execution issues

Horizon Portal (Good to Have)

  • Use Horizon portal for infrastructure monitoring and incident tracking
  • Support operational visibility and cloud resource management

Required Skills & Experience

  • 3–5 years of experience as a DevOps / SRE / Production Support Engineer
  • Strong expertise in Linux system administration
  • Hands-on experience with Kubernetes deployment and debugging
  • Proficiency with Prometheus & Grafana
  • Exposure to ARGO / ARGO Workflows (Mandatory)
  • Strong troubleshooting and incident-handling skills
  • Good understanding of networking fundamentals in cloud-native environments
  • Ability to work independently in production environments

Good to Have

  • Experience with AWS, Azure, or OpenStack
  • CI/CD pipeline exposure
  • Kubernetes certifications (CKA / CKAD)
  • Knowledge of Ansible, Python, or Shell scripting

Other jobs you may like

Scroll to Top