Senior Software Engineer – LLM Evaluation
Turing
Location
🇺🇸 Vancouver, United States
Type
contractor
Salary
Undisclosed
Posted
22h ago
Job Description
About Us
: Turing is a premier research accelerator at the forefront of AI development, located in San Francisco, California. We are dedicated to helping both AI labs and global enterprises implement advanced AI systems. Our services include accelerating pioneering research through access to high-quality data, advanced training pipelines, and exceptional AI researchers specializing in various domains, alongside transforming AI concepts into reliable, scalable proprietary intelligence that drives impactful results on the P&L. Ideal Background: This position is suited for engineers with experience working at the cutting edge of AI technology, whether at companies like OpenAI, NVIDIA, Databricks, Palantir, or Snowflake. We value strong
education
al backgrounds from reputed computer science programs, such as the University of Washington, University of Illinois Urbana-Champaign, UT Austin, University of Michigan, and Purdue, but we prioritize proven experience and skill above all. Project
Overview
: As a Software Engineering evaluator, you will play a key role in creating innovative datasets aimed at training and benchmarking large language models.
Your responsibilities
will include curating code examples, providing effective solutions, and debugging code, focusing mainly on Python along with contributions in JavaScript (including ReactJS), C/C++, Java, Rust, and Go. You will also evaluate AI-generated code to enhance efficiency and reliability, and work alongside cross-functional teams to improve enterprise AI coding solutions. What Does a Typical Day Look Like? • Curate code examples, build solutions, and debug code primarily using Python, while also working in JavaScript, C/C++, Java, Rust, and Go. • Analyze and improve AI-generated code to ensure its efficiency and scalability. • Collaborate with diverse teams to elevate AI-driven coding solutions through rigorous performance standards. • Create Python agents and automated tools that verify code quality and identify potential errors. • Engage in the software engineering cycle, hypothesizing and evaluating model capabilities across prototyping, architecture design, implementation, and maintenance. • Devise automatic verification mechanisms to evaluate solutions and software engineering tasks.