SRE Lead (8+ Years ) Hyderabad
Full-Time @TechMojo Solutions posted 1 day ago Shortlist Email JobJob Detail
-
Job ID 63367
Job Description
SRE Lead
Location: Hyderabad, Telangana
Work Type: Full-Time
Experience
- Minimum 8 years of overall experience
- 5+ years of experience as a Software Developer
- 3+ years of hands-on experience working within an SRE team
Role Overview
We are seeking an experienced Site Reliability Engineering (SRE) Lead to design, build, and operate highly scalable, reliable, and cost-efficient systems. This role requires strong hands-on coding skills, deep expertise in cloud infrastructure, automation, and monitoring, and the ability to provide technical leadership while collaborating with development and operations teams.
Key Responsibilities
- Lead and drive reliability engineering practices across the organization
- Design, develop, and manage large-scale, real-time systems
- Write and maintain production-grade code using Java and scripting languages such as Shell or Python
- Build and enhance automation frameworks to improve system reliability and operational efficiency
- Manage availability, latency, scalability, and performance of production systems
- Implement fault-tolerant and resilient architectures throughout the development lifecycle
- Leverage AWS cloud services to deliver robust, scalable, and cost-effective solutions
- Define, implement, and improve monitoring, alerting, and observability practices
- Ensure strong CI/CD pipelines and release reliability
- Provide regular health, performance, and risk assessments of production systems to senior leadership
- Recommend improvements to enhance system stability and operational excellence
Required Skills & Qualifications
- Strong hands-on coding experience in Java
- Proficiency in scripting languages such as Shell or Python
- Strong command of automation tools and technologies
- Extensive experience working in AWS environments
- Deep understanding of monitoring and observability tools
- Excellent knowledge of CI/CD processes and pipelines
- Proven experience in designing and managing high-availability, real-time distributed systems
- Strong problem-solving, debugging, and incident management skills