Principal Engineer – Applied AI

Remote Full-time
United States, United States · Permanent · RemoteJob Description: Drive the end-to-end technical strategy, architecture, of machine learning systems, large language model (LLM) capabilities, and AI infrastructure. Own how models, evaluation pipelines, data workflows, and observability components are designed, deployed, monitored, and continuously improved to meet reliability, quality, safety, and cost goals. Provide deep AI/ML expertise and leadership across engineering teams, guiding model integration, AI/ML platform decisions, and scalable distributed systems that support enterprise-grade GenAI workloads.Job requirements • 10+ years of software engineering experience with significant recent hands-on AI/ML/AI development. • Bachelor's degree in CS or related field. • Deep technical expertise in machine learning, LLMs, transformers, and modern AI frameworks (PyTorch, TensorFlow, JAX, Scikit-learn). • Proven experience deploying production AI/ML or LLM systems at scale (not prototypes). • Strong programming expertise in Python; additional experience in Java, C++, or JavaScript is a plus. • Experience with data engineering workflows, feature stores, and scalable data pipelines. • Expertise with cloud platforms (AWS/GCP/Azure), containerization, orchestration (Kubernetes), and distributed systems. • Hands-on AI/MLOps: model deployment, monitoring, CI/CD for AI/ML, experiment tracking, and evaluation frameworks. • Demonstrated technical leadership managing teams of 10+ engineers and influencing cross-functional architecture. • Strong ability to translate ambiguous business needs into clear technical requirements and production outcomes. • Expertise with LLM productionization including finetuning, retrieval-augmented generation (RAG), safety/guardrails, and evaluation. • Experience with AI/ML flow, Kubeflow, Vertex AI, SageMaker, or similar platforms. • Background in model governance, drift detection, fairness/bias evaluation, and compliance. • Domain specialization (NLP, computer vision, recommender systems, or agentic systems). Nice to Have: • Master’s or PhD in Computer Science, Machine Learning, or related discipline. • Cloud platform expertise (AWS, GCP, Azure) with experience deploying AI/ML workloads at scale • Strong product mindset with ability to translate business requirements into technical solutions • Contributions to AIops/MLOps platforms (MLflow, Kubeflow, Vertex AI) and CI/CD for ML workflows • Domain expertise in specific AI application areas such as computer vision, NLP, or recommendation systems • Experience with model monitoring, drift detection, and model governance in production environments • Previous experience with AI observability and troubleshooting Job responsibilities • Define and own the architecture for scalable AI/ML systems, including training, fine-tuning, inference, evaluation, and monitoring pipelines. • Translate ambiguous business and product requirements into robust AI/ML system designs and staged delivery plans. • Make strategic decisions on model selection, LLM integrations, evaluation frameworks, model gateways, guardrails, and safety mechanisms. • Lead design reviews, architecture forums, and technical decision-making across teams. • Build and deploy production-grade AI/ML/LLM models, transformers, and generative AI features—from initial concept through production rollout. • Establish standards for model readiness, evaluation gates, rollout/rollback, drift detection, observability, and ongoing performance management. • Partner with engineering teams to integrate models into distributed systems with clear SLOs, telemetry, and error-budget mechanisms. • Design and improve data pipelines, feature stores, and data quality/lineage workflows supporting model training and inference. • Develop scalable AI/MLOps/AIOps practices for automation of training, testing, deployment, and monitoring. • Evaluate and implement AI/ML workflow orchestration platforms (e.g., AI/MLflow, Kubeflow, Vertex AI) and CI/CD for AI/ML. • Own evaluation pipelines—latency, accuracy, cost, hallucination metrics, prompt versioning, and model performance insights. • Instrument tracing and model observability using best-practice frameworks and telemetry standards. • Implement guardrails and safety systems to ensure consistent, controlled behaviour of LLM-powered features. • Partner closely with product, engineering, and leadership to shape platform strategy and AI feature roadmap. • Provide trade-off analyses that incorporate model performance, security, compliance, scalability, and long-term maintainability. • Write clear technical documents, proposals, and mechanism-based recommendations to guide executive decision-making. • Mentor senior/junior engineers in AI/ML best practices, distributed systems, experimentation, and model governance. • Support hiring, leveling, performance feedback, and the growth of a high-calibre engineering team. Job benefits • Fully remote, work from home environment • Employee Share Option Plan • Flexible working hours • Paid Time-Off • Periodic in-person offsites globally (travel permitting) • Continued education support • Advancement opportunity Pay: $98,380.96 - $118,480.30 per year Benefits: • 401(k) • Dental insurance • Flexible schedule • Health insurance • Paid time off • Tuition reimbursement • Vision insurance Work Location: Remote Apply tot his job
Apply Now
← Back to Home