Senior DevOps Engineer
Intelex Technologies
Role Overview
We are looking for a Senior DevOps Engineer with deep expertise in building and operating large-scale, distributed, cloud-native systems. This role requires strong engineering fundamentals, deep hands-on knowledge of AWS, and extensive experience implementing CI/CD automation, observability frameworks, container orchestration, infrastructure-as-code, and platform tooling.
You will design, build, and maintain the core infrastructure and developer platforms powering mission-critical SaaS services. You will work closely with engineering, SRE, security, and architecture teams to drive operational excellence, improve deployment velocity, and ensure our systems are reliable, observable, secure, and resilient.
This role is ideal for engineers who are passionate about automation, cloud-native architectures, robust release engineering, AI/MCP platform delivery, and building internal platforms that enable high-frequency deployments across a multi-tenant, multi-region SaaS environment with zero downtime.
Key Responsibilities
Platform Engineering
- Design, build, and evolve internal developer platforms (IDP), paved paths, and self-service capabilities enabling efficient and secure service delivery.
- Build abstractions and tooling that reduce cognitive load and accelerate engineering teams.
- Implement golden pipelines, environment provisioning automation, and standard deployment patterns.
- Build and operate MCP (Model Context Protocol) platform services that integrate AI capabilities into internal and external workflows.
CI/CD Engineering
- Architect and maintain enterprise-grade CI/CD pipelines using GitHub Actions and Jenkins.
- Implement secure, scalable, and reproducible build and release workflows with automated testing, policy enforcement, and progressive delivery.
- Troubleshoot complex CI/CD issues including agent failures, deployment rollbacks, runtime environment conflicts, secrets, and artifact management.
Cloud Infrastructure
- Architect and operate cloud infrastructure on AWS with production-grade reliability, security, and scalability.
- Deep hands-on experience with core AWS services:
- Compute & Networking: EC2, ECS, EKS, Lambda, VPC, ALB/NLB
- Storage & Data: S3, RDS (Postgres/MySQL/SQL Server), CodeArtifact
- Security & Identity: IAM roles/policies, Secrets Manager, Certificate Manager
- Operations: CloudFront, Route 53, Systems Manager, EC2 Image Builder
- Optimize cloud cost, performance, resiliency, and availability across multi-region deployments.
Containerization & Orchestration
- Design, deploy, and operate containerized workloads using Docker and ECS as the primary orchestration layer.
- Implement autoscaling strategies, rollout policies (blue/green, canary), and multi-AZ resilience patterns.
- Exposure to EKS/Kubernetes is an asset for future platform evolution.
Infrastructure as Code
- Build, version, and maintain infrastructure using Terraform (modular design, remote state, single-repo registry pattern, PR-based review pipelines).
- Enforce IaC best practices through policy-as-code, automated testing, and drift detection.
Observability & Monitoring
- Implement end-to-end observability: metrics, logs, traces, synthetic monitors, and SLO dashboards.
- Manage and optimize monitoring platforms such as New Relic (primary) or similar APM/infrastructure tools.
- Define alerting strategy and operational playbooks; integrate with MS Teams and incident management workflows for on-call rotations.
Automation & Scripting
- Develop tooling, automation, and orchestration using:
- Python (primary infrastructure automation, cost reporting, monitoring pipelines)
- PowerShell (Windows server and IIS automation)
- YAML-based pipeline definitions (GitHub Actions, Jenkins)
Operations & Reliability
- Support production workloads across Windows and Linux ecosystems, including IIS-hosted services.
- Drive operational readiness reviews, performance tuning, root cause analysis, and continuous improvement.
- Participate in and improve on-call processes with a focus on reducing MTTR and increasing system resilience.
Disaster Recovery & Business Continuity
- Design and validate DR strategies, backup/restore workflows, failover automation, and chaos testing scenarios.
- Ensure multi-region readiness, RPO/RTO adherence, and recovery orchestration.
Process & Collaboration
- Work with JIRA and ServiceNow for sprint planning, incident management, and change control.
- Partner with security, development, data, and SRE teams to align on architecture, compliance, and operational goals.
Required Technical Skills
- Deep hands-on experience with AWS cloud operations and architecture.
- Strong expertise with CI/CD pipelines: GitHub Actions and Jenkins.
- Proficiency in Terraform, infrastructure modularization, and environment provisioning.
- Strong containerization knowledge: Docker and ECS (primary orchestration platform).
- Monitoring & APM expertise: New Relic or equivalent (Datadog, AppDynamics, etc.).
- Proficient scripting in Python and PowerShell for infrastructure automation.
- Experience with operational tooling: JIRA, ServiceNow, MS Teams integrations.
- Hands-on experience managing Windows and Linux systems, including IIS-hosted applications.
- Database familiarity: MS SQL Server, Oracle, PostgreSQL, RDS.
- Strong understanding of identity, access, secrets, certificates, and platform security.
Preferred Experience
- Azure cloud exposure (for future hybrid-cloud initiatives).
- EKS / Kubernetes experience for future container platform evolution.
- Octopus Deploy or equivalent release orchestration tooling.
- Familiarity with GitOps workflows.
- Experience with service mesh, distributed tracing, and A/B testing platforms.
- Strong knowledge of networking: DNS, TLS, load balancing, WAF, CDN tuning.
- Previous involvement in DR automation, chaos engineering, and reliability programs (SRE-style).
- Experience modernizing legacy workloads to containers or serverless patterns.
- Exposure to MCP (Model Context Protocol) or AI/LLM platform infrastructure.
Minimum Qualifications
- 10+ years in DevOps, SRE, Platform Engineering, Cloud Engineering, or similar.
- Demonstrated experience running production systems with 24/7 availability requirements.
- Strong problem-solving abilities and ownership mindset.
Company Overview
With more than 1,300 clients and 1.6 million users, Intelex Technologies, ULC is a global leader in environmental, health, safety and quality (EHSQ) management software. Since 1992 its scalable, web-based platform and applications have helped clients across all industries improve business performance, mitigate organization-wide risk, and ensure sustained compliance with internationally accepted standards (e.g., ISO 9001, ISO 14001, ISO 45001 and OHSAS 18001) and regulatory requirements. Intelex is one of North America's fastest-growing tech companies recognized as a Great Place to Work for over 7 years, Best Workplace in Technology, Best Workplace for Millennials, and recipient of Waterstone's Most Admired Corporate Cultures award and Deloitte's Best Managed Companies award.
For more information, visit https://www.intelex.com/careers



