[Remote] Engineering Leader – AI & Machine Learning Operations (AIOps)

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. CloudBees enables enterprises to deliver scalable, compliant, and secure software, empowering developers to do their best work. The Engineering Leader will drive the AIOps strategy, lead the development of the CloudBees AI platform, and manage a team focused on building reliable AI & ML infrastructure. Responsibilities • Lead and scale a team responsible for AIOps, including model deployment, monitoring, and lifecycle management • Architect and implement AI/ML pipelines that are scalable, observable, and reproducible • Collaborate with cross-functional teams (data science, DevOps, product) to integrate AI/ML systems into our SaaS platform • Establish best practices for AI/ML experimentation, CI/CD for models, data versioning, and model governance • Own the full stack of AIOps infrastructure, from data ingestion to real-time inference systems • Drive technical vision and roadmap for ML platform development • Act as a mentor and coach, helping engineers grow in a fast-paced, startup environment • Manage a team of 5+ • Ability to launch new platforms 0 - 1 and drive adoption internally and externally with partner teams Skills • 7+ years of engineering experience, including platform engineering, system development, or related roles with at least 3 years in leadership roles • 3 years of experience with large-scale systems, with a focus on reliability, scalability, and maintainability; and 1 year of experience with AI/ML systems • Strong hands-on experience with MLOps tools (e.g., MLflow, Kubeflow, SageMaker, Airflow, Metaflow) • Proven track record building ML pipelines in production environments • Experience with cloud infrastructure (AWS, GCP, or Azure) and container orchestration (Kubernetes) • Deep knowledge of CI/CD practices as they relate to ML lifecycle • Prior experience in a startup or fast-paced SaaS environment • Strong collaboration and communication skills • Experience deploying and managing services such as Amazon bedrock or Vertex AI - LLM • Experience integrating ML capabilities into developer-centric tools or platforms • Familiarity with data observability and ML monitoring tools (e.g., EvidentlyAI, Prometheus/Grafana for models) • Knowledge of data privacy, compliance, and security in ML systems Benefits • Health Insurance • Dental Insurance • Vision Insurance • Short & Long Term Disability • Life Insurance • HSA/FSA • Remote Work Environment • Flexible Time Off • Paid Company Holidays • Parental Leave • Variable Bonus Plan dependent on your role • Stock grant opportunities dependent on your role • 401(k) with Company Match Company Overview • CloudBees enables enterprises to deliver scalable, compliant, and secure software, empowering developers to do their best work. It was founded in 2010, and is headquartered in San Jose, California, USA, with a workforce of 501-1000 employees. Its website is Company H1B Sponsorship • CloudBees has a track record of offering H1B sponsorships, with 2 in 2025, 1 in 2023, 3 in 2022. Please note that this does not guarantee sponsorship for this specific role. Apply tot his job
Apply Now
← Back to Home