Role Overview:
As a Site Reliability Engineer (SRE) Technical Lead, you will be instrumental in overseeing the reliability, availability, and performance of our production environments at an advanced level. You will lead initiatives in proactive monitoring and management of incidents, fostering a culture of rapid resolution and minimal service disruption. Your extensive troubleshooting, log data analysis, and debugging skills will facilitate close collaboration with DevOps, engineering, and internal support teams, allowing us to achieve the highest levels of customer satisfaction.
This is a remote position in India. We will only consider candidates currently in India and are not offering relocation assistance at this time.
Job Description:
About the Role:
• Proficiently utilizing AWS services like EC2, RDS, VPC, and CloudWatch, with expertise in log query analysis and monitoring optimization.
• Driving APM monitoring solutions through hands-on experience with Prometheus, Grafana, and scripting in PMQL to enhance automation capabilities.
• Troubleshooting and debugging issues by analyzing CloudWatch logs and providing detailed insights through log metrics and trend analysis.
• Optimizing service performance by leveraging AWS tools, analyzing cost utilization, and implementing efficient scaling strategies.
• Managing Kubernetes-based setups, including deployment, configuration changes, and service lifecycle management, while collaborating with DevOps teams.
• Overseeing seamless code rollouts, maintaining production integrity, and spearheading root cause analysis for persistent issues.
• Fostering collaboration across global teams to enhance service availability, reliability, and scalability.
About you:
• Over 8 years of experience in the web and e-commerce domain, with a strong focus on cloud hosting, primarily AWS.
• Skilled in log analysis and troubleshooting, with expertise in leveraging observability platforms to trace issues thoroughly.
• Proficient in Prometheus, Grafana, or similar APM tools, with hands-on ability to optimize and enhance monitoring capabilities.
• Passionate about digging deep into technical issues and providing actionable insights for resolution.
• Adept at driving troubleshooting calls and ensuring end-to-end traceability for complex problems.
• Strong knowledge of cloud environments and monitoring frameworks to support robust and scalable solutions.
#LI-Remote
Accommodations:
McAfee recognizes and supports its obligation to reasonably accommodate applicants and employees with disabilities. We are here to help. Please let us know if you need a reasonable accommodation for any part of the application, interviewing, hiring, or at any other time during the employment process. Please do not include personal medical information in the email.
Diversity is foundational for our business success. We want to be a workplace of choice for all people, and we value the unique perspectives offered by a diverse workforce. McAfee does not unlawfully discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity or expression, national origin, citizenship, disability, protected veteran status, age, ancestry, medical condition, genetic information, marital status, pregnancy, or any other legally protected status. This principle applies to all areas of employment: recruitment and hiring, training, performance evaluations, promotions and transfers, compensation and benefits, and social and recreational programs.
McAfee desires to be an employer of choice with an inclusive environment for all individuals. As part of this goal and in compliance with various laws and regulations, McAfee provides reasonable accommodation to applicants and employees. Requests for reasonable accommodation for applicants and employees are evaluated on a case-by-case basis.
Posting Statement:
McAfee prohibits discrimination based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation, or any other legally protected status.
Read more