Intermediate or Senior MLOps Engineer - US Federal
About the position At Workday, we are looking for a Software Engineer to join our growth team focused on MLOps. This role involves building machine learning capabilities into our products, specifically within our HR & Talent product portfolio. You will work closely with ML engineers and other software teams to deliver critical infrastructure and software frameworks that enable machine learning across Workday's product ecosystem. Your responsibilities will include applying modern MLOps, CloudOps, and data engineering stacks to enable development, training, deployment, and lifecycle management of various ML capabilities. You will also design and develop new APIs/microservices and deploy them using Python, Terraform, and Kubernetes at scale. This position offers the opportunity to make a significant impact on thousands of enterprises and millions of people by transforming the way our end-users experience Workday. Responsibilities • Work with multi-functional teams to deliver scalable, secure and reliable solutions. • Build MLOps platform primarily using Kubeflow and other ML ecosystem tools and services for a unified ML Development experience. • Communicate with data scientists, ML engineers, PMs, and architects in requirements elaboration and drive technical solutions. • Own and develop cloud-based services from end to end including infrastructure as code. • Design and build software solutions for efficient organization, storage, and retrieval of data to enable substantial scale. • Build systems and dashboards to monitor service & ML health. • Lead in architecture reviews, code reviews, and technology evaluation. • Research, evaluate, prototype, and drive adoption of new ML tools with reliability and scale in mind. Requirements • 5 or more years of proven industry experience. • Bachelor's and/or Master's degree in Computer Science or Computer Engineering. • Experience in building web applications and microservices and API design. • Professional experience in cloud programming preferably in Python or Go. • Experience with running and maintaining Databricks, Sagemaker, & Apache Spark as a service. • Experience in supporting large Kubernetes networks in production. • Design, implement, and maintain robust MLOps services for deploying, monitoring, and scaling machine learning development and data engineering primarily with Kubeflow. • Troubleshoot and resolve performance bottlenecks, system outages, and other operational issues in collaboration with the ML engineering teams. • Optimize public cloud-based infrastructure (AWS, GCP) to support the computational requirements of machine learning workloads. • Implement and manage CI/CD workflows to automate testing, integration, and delivery of machine learning components. • Ensure the security and compliance of machine learning platforms, implementing best practices for encryption, data protection, and access controls. Nice-to-haves • 8 or more years of validated industry experience for Senior Software Engineer role. • Experience in managing relevant tools like Databricks and Sagemaker to perform efficient computation and management of large scale data lakes. • Experience of data and/or ML systems with ability to think across layers of the stack. • Experience in leading or mentoring other team members. Benefits • Workday Bonus Plan or role-specific commission/bonus. • Annual refresh stock grants. • Flexible work schedule with at least 50% in-office time each quarter. • Comprehensive benefits package. Apply tot his job