Senior DevOps Engineer
Descrição
You'll work at the intersection of platform engineering and client-facing presales, translating business goals into delivery-ready architectures and then executing them. This role demands both technical depth — EKS operations, DevSecOps controls, SRE fundamentals — and the communication skills to run discovery workshops and build compelling solution narratives.
You'll produce client-ready artifacts (reference architectures, roadmaps, runbooks, backlog decomposition) and own the platform foundations that accelerate releases while improving reliability and governance.
- Lead technical discovery workshops; capture non-functional requirements, risks, and dependencies
- Define target-state platform architecture, delivery sequencing, and acceptance criteria
- Build demos and solution narratives for platform modernization, DevSecOps enablement, and Kubernetes adoption
- Translate business goals into delivery-ready epics, user stories, and measurable success criteria
- Implement CI/CD and GitOps operating models — promotion, approvals, release governance, rollback patterns
- Build EKS foundations: cluster lifecycle, node pools, autoscaling, ingress, workload placement, and operations
- Enforce DevSecOps controls: secrets management, IAM least-privilege, policy-as-code, image scanning, SBOM workflows, and auditability
- Establish observability and reliability baselines: metrics, logs, tracing, alerting, incident playbooks, capacity planning, and DR patterns
- Linux troubleshooting and performance tuning for containerized workloads and cloud runtimes
Required Qualifications
- Strong experience with Kubernetes cluster administration and operations in production environments
- Advanced knowledge of Kubernetes workload deployment and management best practices
- Experience implementing Kubernetes security practices, including hardening, RBAC, and policy management
- Solid experience with AWS multi-account architecture design
- Advanced knowledge of IAM, VPC networking, security controls, and EKS operations
- Hands-on experience with Cloud-Native tools and practices, including Helm, Kustomize, and GitOps
- Knowledge of container security and software supply-chain security practices
- Familiarity with SRE (Site Reliability Engineering) principles, observability, monitoring, and incident management
- Experience supporting AI infrastructure operations
- Knowledge of GPU-capacity operations and inference platform readiness
- Experience with performance analysis, troubleshooting, and monitoring of highly available environments
Candidate-se no site original
Esta é uma vaga externa, agregada de remotar. A candidatura acontece no site original do anunciante.
Ir para a vaga →Você será redirecionado para um site externo. O facilita.rh não é responsável pelo processo seletivo dessa empresa.
Dica: crie conta no facilita.carreira pra fazer os 4 testes uma vez só e usar em outras vagas no facilita.rh.
Conhecer o facilita.carreira →