AI Test Engineer (Generative AI & LLM Focus)
Mercedes-Benz Research and Development India Private Limited
Location
🇮🇳 India
Type
Full-time
Salary
Undisclosed
Posted
3w ago
Job Description
Aufgaben About MBRDI Mercedes-Benz Research and Development India (MBRDI), headquartered in Bengaluru with a satellite office in Pune, is the largest R&D center for Mercedes-Benz Group AG outside of Germany. Our mission is to drive innovation and excellence in automotive engineering, digitalization, and sustainable mobility solutions, shaping the future of mobility. Job Summary: We are seeking an AI Test Engineer specializing in Generative AI and Large Language Models (LLMs) to lead the quality assurance of our next-generation intelligent applications. This role goes beyond traditional software testing—you will validate non-deterministic AI behavior, detect hallucinations, assess ethical compliance, and ensure the reliability and safety of our LLM-powered platforms. You will develop advanced evaluation frameworks to measure semantic accuracy, factuality, and safety, ensuring robust model performance across diverse scenarios. Here's an expanded
job description
:
Key Responsibilities
: • Model Behavioral Testing: Design and execute evaluation tests to detect hallucinations, assess factual correctness, evaluate tone, and verify safety guardrails. • Evaluation Framework Development: Build and maintain automated evaluation systems using frameworks like RAGAS, DeepEval, or Giskard to score LLM responses. • Adversarial Red Teaming: Conduct proactive "red teaming" to identify vulnerabilities like prompt injection, jailbreaks, and data leakage. • Data Quality Engineering: Validate the quality and diversity of training and validation datasets to prevent "garbage in, garbage out" scenarios. • Drift & Performance Monitoring: Implement observability systems (using tools like Weights & Biases or Arize) to detect model and feature drift in production. • Synthetic Data Generation: Use LLMs to generate high-fidelity synthetic test datasets to cover complex edge cases. • Ethical & Bias Auditing: Run automated and manual checks to mitigate algorithmic bias and ensure compliance with the EU AI Act and other global regulations.
Required Skills
: • AI/ML Knowledge: Strong understanding of machine learning concepts, such as supervised vs. unsupervised learning, model evaluation metrics, neural networks, NLP, etc. Solid understanding of RAG (Retrieval-Augmented Generation) pipelines, vector databases (Pinecone, Weaviate), and transformer architectures. • Statistical Competency: Ability to apply statistical validation (distributions, variance, confidence intervals) to evaluate non-deterministic system results. • Automation & Testing Tools: Practical experience with Playwright, Selenium, PyTest, or equivalent automation frameworks. • Programming: High proficiency in Python (essential for testing automation and data handling). • Collaboration & Workflow Tools: Familiarity with tools such as Confluence, Jira, Xray, and ServiceNow.