Careers

Senior Solutions Architect - Toronto, Canada

Full Time
Toronto, Canada

About the Role

We are seeking a Senior Solutions Architect to help customers successfully deploy, operate, and scale AI/ML workloads on our GPU Platform-as-a-Service (PaaS) offering. In this customer-facing role, you will work closely with platform engineering, MLOps, data science, and infrastructure teams to design and implement production-ready AI infrastructure solutions built on Kubernetes and GPU-accelerated environments.

As a senior technical leader, you will serve as a trusted advisor to enterprise customers, lead complex deployments, drive architecture best practices, and help customers maximize the value of their AI infrastructure investments. You will help customers onboard to the platform, optimize workload performance, automate infrastructure, and ensure reliable operations while serving as a strategic technical partner throughout the customer lifecycle.

Responsibilities

  • Partner with customer platform, MLOps, data science, and executive stakeholders to understand AI/ML workload requirements and translate them into scalable platform architectures.
  • Lead the design and deployment of Kubernetes-based solutions for model training, fine-tuning, and inference workloads.
  • Guide customers through onboarding and implementation of the GPU PaaS platform across cloud and hybrid environments.
  • Architect networking, identity management, observability, and security integrations with enterprise systems.
  • Build and maintain automation assets including Terraform modules, Helm charts, GitOps workflows, and CI/CD pipelines.
  • Monitor and troubleshoot production environments, including GPU utilization, workload performance, cluster health, and cost efficiency.
  • Lead root cause analysis and remediation efforts for complex customer issues.
  • Serve as the primary technical advisor and escalation point for assigned customers.
  • Provide technical leadership during customer engagements and drive adoption of platform best practices.
  • Document reference architectures, implementation guides, and best practices.
  • Provide feedback to Product and Engineering teams to improve platform capabilities and influence product roadmap decisions.
  • Collaborate with internal teams to ensure successful customer adoption, expansion, and long-term success.
  • Mentor junior team members and contribute to the growth of the Solutions Architecture organization.

Required Qualifications

  • 7+ years of experience in Solutions Architecture, DevOps, Platform Engineering, Site Reliability Engineering (SRE), Cloud Engineering, or related fields.
  • Strong hands-on experience with Kubernetes in large-scale production environments.
  • Experience with at least one programming language such as Python or Go.
  • Experience with AWS, Azure, or GCP, including networking, IAM, and managed Kubernetes services.
  • Deep knowledge of Infrastructure as Code and automation tools such as Terraform, Helm, GitOps, and CI/CD platforms.
  • Familiarity with monitoring and observability technologies including Prometheus, Grafana, OpenTelemetry, or similar.
  • Strong understanding of AI/ML infrastructure concepts including GPU-based workloads, model serving, training pipelines, and resource optimization.
  • Proven ability to troubleshoot and resolve complex infrastructure and platform issues.
  • Excellent communication, presentation, and customer-facing skills.
  • Experience leading technical discussions with both engineering teams and executive stakeholders.

Preferred Qualifications

  • Experience supporting enterprise customers in cloud-native environments.
  • Familiarity with AI/ML frameworks such as PyTorch and TensorFlow.
  • Experience with GPU scheduling, autoscaling, and workload optimization.
  • Understanding of multi-tenant Kubernetes environments and platform operations.
  • Experience working with MLOps or AI infrastructure platforms.
  • Experience developing reference architectures and leading technical workshops.
  • Relevant certifications such as CKA, CKAD, AWS Solutions Architect, Azure Solutions Architect, or GCP Professional Cloud Architect.

Why Join Rafay?

Rafay is at the forefront of GPU PaaS technologies and Kubernetes and we offer unique opportunities to join a winning team working on foundational technology for cloud and AI/ML services and enterprises. We work in a collaborative environment that rewards creative thinking and provides opportunities to advance professional careers in advanced technology development. On top of this we offer a fun and dynamic work environment, a competitive salary, robust benefits and attractive stock options. As the first of our kind, we are truly in a class of our own.

Max file size 10MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Your application has been successfully submitted.
Oops! Something went wrong while submitting the form.