Swarm Bench Task Engineer | Python & AI Code Benchmark (Remote Contract)

Swarm Bench Task Engineer | Python & AI Code Benchmark (Remote Contract)
نوع العمل : عمل كلى
الخبرة : 3-5 سنة
الراتب : $15 per hour
المكان : Egypt

تفاصيل الوظيفة

Position: Swarm Bench Task Engineer SWE / Code Type: Short-Term Contract (4 weeks) Compensation: $15 per hour Location: Remote Commitment: 20-40 hours per week with 4 hours overlap with PST
Role Responsibilities Build multi-agent benchmark tasks based on real-world open-source code changes such as bug fixes, migrations, and refactors Work with the Harbor evaluation framework to run and validate tasks inside Docker environments Write clear and precise task instructions specifying file paths, function signatures, expected behavior, and constraints Design and implement Python-based verification scripts to validate correctness of agent-generated code changes Create decomposition strategies that split complex code changes across multiple independent sub-agents Run, debug, and refine tasks within containerized environments to ensure reproducibility and determinism Evaluate task performance signals and improve task quality, clarity, and difficulty Contribute to benchmark development for advanced AI coding agents
Requirements Strong years of experience in Python and Java Script development Experience with AI coding benchmarks (e.g., SWE-bench, Terminal-Bench) Strong experience reading and navigating large open-source codebases (e.g., Django, Flask, Fast API, Node.js, or similar) Familiarity with Git workflows including pull requests, diffs, cherry-picking, and working with specific commits Comfortable with Docker including writing Dockerfiles, building images, and debugging container issues Experience writing test scripts using pytest, unittest, or custom assertion-based testing Ability to write clear, precise, and unambiguous technical specifications Ability to work independently in a remote environment
Application Process Apply/Easy Apply and check email for application form Fill Google form Assessment Link (After shortlisting to be completed within 24 hours)

Skills: Javascript, Engineering, Python

للتقديم الان