Job Search

1111 + Open Positions Globally

1111 + Open Positions Globally

Site Reliability Engineering Manager IRC164250

Job: IRC164250
Location: United States - San Jose CA
Designation: Associate Manager
Experience: 10-15 years
Function: Engineering
Skills: Azure, eCommerce and Retail, Incident Management, Linux

Description:

 

About this Opportunity 

The Digital Commerce Site Operations team at client ensures the health and performance of multiple B2B eCommerce capabilities that enable our customers to manage their business. The team strives to automate, perform deep-level analysis, and improve the observability of our systems, minimizing customer disruptions in our multi-billion-dollar value streams.

 

 

Requirements:

     

    How You Contribute to Our Vision: Key Responsibilities

    The SRE Manager will lead engineering and support teams to ensure the health and performance of multiple B2B eCommerce capabilities that enable our customers to manage their business. You will drive teams to automate, perform deep-level analysis, and improve the observability of our systems, minimizing customer disruptions. In addition, you will use the Incident Management process to find opportunities for trending issues that the team can help find permanent solutions to address. From Kubernetes to the kernel and everything in-between, you’ll be working with the latest technology in a fast-paced engineering environment. As the SRE Manager, you will be responsible for the operations engineers in your assigned time zone. In addition, you will lead customer service management, managed services operations, and provide consistent product improvement engineering. Your collaboration with internal customers, product engineering, and development groups is critical to success.

     

    You will make a high business impact by:

    • Leading your team in daily agile SRE practices
    • Optimizing the quality and velocity of both support and operational teams
    • Mentoring engineers and support specialists to improve their skills
    • Identifying and measuring team health indicators
    • Implementing structured engineering and operations processes
    • Ensuring proper team focus on priorities, milestones, and deliverables
    • Working to meet service level agreements with customer deployments around the globe
    • Delivering quality managed services in a consistent, timely manner
    • Representing the Site Operations team to stakeholders, customers, and internal teams

     

     

     

     

    Experience, Skills, Education & Licenses/Certifications:

    Required:

    • 5+ years of Experience delivering high SLA production outcomes (ideally in a Public Cloud environment) – leveraging cloud-native architectures to build and manage resilient, highly available infrastructures that deliver customer outcomes with high SLAs.
    • 5+ years of management experience leading teams of engineers
    • At least 2 years of Experience leading an SRE team
    • Proven experience managing engineering and support teams in eCommerce
    • Experience with Linux/Windows Server troubleshooting
    • Experience with Microservice architecture and container technology troubleshooting
    • Experience with agile software development methodologies
    • Experience working in and managing distributed teams
    • Experience with cloud topologies and technologies (Microsoft Azure preferred) 
    • Experience driving automation and recoverability process through a continuous improvement mindset
    • Excellent problem solving, critical thinking, and interpersonal skills – Lead by example to empower and challenge the team to deliver their best.
    • Knowledge of observability stacks (Dynatrace, Application Insights, Graylog)
    • Technical aptitude for understanding complex distributed systems
    • Experience working with teams in different countries
    • Track record of building and managing high-performance SRE and Operational teams.
    • Extensive experience leading teams responsible for customer-facing systems in a high uptime 24-7 environment.

     

     

     

     

Job Responsibilities:

    Job Duties/Essential Functions:

    • Continuously improve eCommerce system service reliability and efficiency.
    • Manage production incidents, driving for resolution and stakeholder communication during incidents. 
    • Ensure compliance and improvement to the operational SLOs, SLIs, and Error Budgets
    • Prioritize multiple issues in a high-velocity environment. 
    • Design and implement processes to improve response capabilities.
    • Build and manage observability and recovery systems on top of modern cloud services.
    • Lead multi-functional initiatives and provide thought leadership.
    • Understand sophisticated architectures and be comfortable working with multiple teams.
    • Establish and foster partnerships between SRE, Dev, and QE teams toward delivering excellent customer outcomes 
    • Managing above the level of infrastructure to provide great customer outcomes.

     “If you are a California resident, more details on how we process your personal information can be found in the CCPA Recruitment Privacy Notice (https://www.globallogic.com/privacy/ccpa-recruitment-privacy-notice/)”


What We Offer

Exciting Projects:Come take your place at the forefront of digital transformation! With clients across all industries and sectors, we offer an opportunity to work on market-defining products using the latest technologies.

Collaborative Environment: You can expand your skills by collaborating with a diverse team of highly talented people in an open, laidback environment — or even abroad in one of our global centers or client facilities!

Work-Life Balance:GlobalLogic prioritizes work-life balance, which is why we offer flexible work schedules and opportunities to work from home.

Professional Development:We provide continuing education classes, professional certification and training (technical, soft skills, language, and communication skills) to help you realize your professional goals. Being part of a global organization, there are additional learning opportunities through international knowledge exchanges.

Excellent Benefits:We provide our employees with competitive salaries, health and life insurance, short-term and long-term disability insurance, a matched contribution 401K plan, flexible spending accounts, and PTO and holidays

About GlobalLogic

GlobalLogic is a leader in digital engineering. We help brands across the globe design and build innovative products, platforms, and digital experiences for the modern world. By integrating experience design, complex engineering, and data expertise—we help our clients imagine what’s possible, and accelerate their transition into tomorrow’s digital businesses. Headquartered in Silicon Valley, GlobalLogic operates design studios and engineering centers around the world, extending our deep expertise to customers in the automotive, communications, financial services, healthcare and life sciences, manufacturing, media and entertainment, semiconductor, and technology industries. GlobalLogic is a Hitachi Group Company operating under Hitachi, Ltd. (TSE: 6501) which contributes to a sustainable society with a higher quality of life by driving innovation through data and technology as the Social Innovation Business.

Apply Now

Attach your file here or browse
Only .docx, .rtf, .pdf formats allowed to a max size of 5 MB.
  • URL copied!