Site Reliability Engineer (SRE) (3+ Years ) Noida

Job Detail

Job ID 63075

Job Description

Site Reliability Engineer (SRE)

Experience: 3+ Years

Location: Noida

Employment Type: Full Time

Role Overview

We are looking for a skilled Site Reliability Engineer (SRE) with strong DevOps fundamentals to ensure the reliability, performance, and scalability of our infrastructure and applications. The ideal candidate has hands-on experience with Linux, Kubernetes, monitoring tools, automation, and production support in cloud-native environments.

Key Responsibilities

Linux Administration

Deploy, manage, and maintain Linux-based systems
Troubleshoot OS-level issues to ensure high availability
Perform performance tuning, security hardening, and system optimization

Kubernetes (Must Have)

Deploy, configure, and manage Kubernetes clusters
Debug container orchestration, pod, service, and deployment issues
Ensure scalable and reliable containerized application deployments

Monitoring & Observability (Must Have)

Implement and manage Prometheus and Grafana
Build and maintain real-time dashboards for system and application metrics
Analyze monitoring data to identify and resolve performance issues

Automation & Scheduling

Configure and manage CronJobs for scheduled automation tasks
Troubleshoot job failures and improve automation workflows
Write scripts for operational efficiency (Shell / Python / Ansible)

Cloud & Platform Operations

Work with cloud-based infrastructure and services
Assist in troubleshooting cloud-related incidents
Understand and operate cloud-native environments

ARGO (Mandatory)

Work with ARGO / ARGO Workflows for deployment and workflow orchestration
Troubleshoot pipeline and workflow execution issues

Horizon Portal (Good to Have)

Use Horizon portal for infrastructure monitoring and incident tracking
Support operational visibility and cloud resource management

Required Skills & Experience

3–5 years of experience as a DevOps / SRE / Production Support Engineer
Strong expertise in Linux system administration
Hands-on experience with Kubernetes deployment and debugging
Proficiency with Prometheus & Grafana
Exposure to ARGO / ARGO Workflows (Mandatory)
Strong troubleshooting and incident-handling skills
Good understanding of networking fundamentals in cloud-native environments
Ability to work independently in production environments

Good to Have

Experience with AWS, Azure, or OpenStack
CI/CD pipeline exposure
Kubernetes certifications (CKA / CKAD)
Knowledge of Ansible, Python, or Shell scripting

Other jobs you may like

SAP GTS Consultant 7+ Years – PAN India
- @ Algae Services
- PAN India
Full-Time

Site Reliability Engineer (SRE) (3+ Years ) Noida

Job Detail

Job Description

Site Reliability Engineer (SRE)

Role Overview

Key Responsibilities

Linux Administration

Kubernetes (Must Have)

Monitoring & Observability (Must Have)

Automation & Scheduling

Cloud & Platform Operations

ARGO (Mandatory)

Horizon Portal (Good to Have)

Required Skills & Experience

Good to Have

Other jobs you may like

SAP GTS Consultant 7+ Years – PAN India

Login to your account

Reset Password

Signup to your Account

Apply for this Job

Answers

Account Activation