Site Reliability Engineer (SRE) (3+ Years ) Noida
Full-Time @AES Technologies Pvt. Ltd posted 3 days ago Shortlist Email JobJob Detail
-
Job ID 63075
Job Description
Site Reliability Engineer (SRE)
Experience: 3+ Years
Location: Noida
Employment Type: Full Time
Role Overview
We are looking for a skilled Site Reliability Engineer (SRE) with strong DevOps fundamentals to ensure the reliability, performance, and scalability of our infrastructure and applications. The ideal candidate has hands-on experience with Linux, Kubernetes, monitoring tools, automation, and production support in cloud-native environments.
Key Responsibilities
Linux Administration
- Deploy, manage, and maintain Linux-based systems
- Troubleshoot OS-level issues to ensure high availability
- Perform performance tuning, security hardening, and system optimization
Kubernetes (Must Have)
- Deploy, configure, and manage Kubernetes clusters
- Debug container orchestration, pod, service, and deployment issues
- Ensure scalable and reliable containerized application deployments
Monitoring & Observability (Must Have)
- Implement and manage Prometheus and Grafana
- Build and maintain real-time dashboards for system and application metrics
- Analyze monitoring data to identify and resolve performance issues
Automation & Scheduling
- Configure and manage CronJobs for scheduled automation tasks
- Troubleshoot job failures and improve automation workflows
- Write scripts for operational efficiency (Shell / Python / Ansible)
Cloud & Platform Operations
- Work with cloud-based infrastructure and services
- Assist in troubleshooting cloud-related incidents
- Understand and operate cloud-native environments
ARGO (Mandatory)
- Work with ARGO / ARGO Workflows for deployment and workflow orchestration
- Troubleshoot pipeline and workflow execution issues
Horizon Portal (Good to Have)
- Use Horizon portal for infrastructure monitoring and incident tracking
- Support operational visibility and cloud resource management
Required Skills & Experience
- 3–5 years of experience as a DevOps / SRE / Production Support Engineer
- Strong expertise in Linux system administration
- Hands-on experience with Kubernetes deployment and debugging
- Proficiency with Prometheus & Grafana
- Exposure to ARGO / ARGO Workflows (Mandatory)
- Strong troubleshooting and incident-handling skills
- Good understanding of networking fundamentals in cloud-native environments
- Ability to work independently in production environments
Good to Have
- Experience with AWS, Azure, or OpenStack
- CI/CD pipeline exposure
- Kubernetes certifications (CKA / CKAD)
- Knowledge of Ansible, Python, or Shell scripting
Other jobs you may like
-
PostgreSQL Expert Manager, 5-10 years exp, Gurugram
- @ Accenture
- Building No. 2, Tower-B, Tower, Unitech Infospace, A Building No.1, Dundahera Village, Sector 21, 122016