AI Scientist, Vision AI
Job Description: • Research, design, and implement cutting-edge computer vision models for tasks such as image classification, object detection, semantic/instance segmentation, and video understanding. • Develop and optimize generative vision models, including text-to-image, text-to-video, and image-to-video approaches. • Train, fine-tune, and evaluate large-scale vision foundation models, adapting them to healthcare-specific applications. • Collaborate with AI scientists, engineers, and product teams to integrate vision AI capabilities into Artisight’s platform. • Stay at the forefront of vision AI and multimodal learning research, bringing innovations from the research community into production applications. • Document and share research outcomes through technical reports, internal presentations, and where appropriate, external publications. • Work at the intersection of research and application — designing novel vision models and deploying these technologies into real-world healthcare environments. Requirements: • M.S. or Ph.D. in computer science, electrical engineering, applied AI, machine learning, or related discipline. • Demonstrated expertise in computer vision research, evidenced by open-source contributions or peer-reviewed publications (e.g., CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR). • Hands-on experience with one or more of: Image classification and object detection; Image segmentation (semantic, instance, or panoptic); Video classification and temporal modeling; Text-to-Image / Text-to-Video generation; Image-to-Video or video synthesis. • Strong knowledge of deep learning methods (transformers, diffusion models, CNNs, self-supervised learning, multimodal architectures). • Proficiency in frameworks such as PyTorch or TensorFlow, with experience in large-scale vision model training. • Familiarity with deployment tools such as ONNX, NVIDIA Triton, or similar inference platforms. • Strong problem-solving skills and the ability to clearly communicate research insights across disciplines. • Nice to haves: Experience with multimodal learning (vision + audio + text); Familiarity with 3D vision, medical imaging, or spatiotemporal models; Experience with real-time video analysis and low-latency deployment; Contributions to open-source vision projects (e.g., Detectron2, MMDetection, Segment Anything, Stable Diffusion, OpenMMLab). Benefits: Apply tot his job