Position: Staff Software Engineer - Autonomous Agents
Job Description
What is the opportunity?Intelligent Operations (iO) is an AI-powered safety net for RBC technology operations that keeps services running and reduces outages. We are the first to collect tech operations data from different domains—bringing enormous value from mining information and insights that allow us to predict problems before they happen. This is foundational work for the next generation of AI capabilities 're building the infrastructure that enables AI agents to act autonomously—analyzing data, generating code, executing experiments, and reasoning about operational data.
The iO team designs and operates the execution environments, state management systems, and agent orchestration that make autonomous AI safe and reliable. You'll work at the intersection of distributed systems, ML research, and product—building systems that don't exist anywhere else in the industry.
Join RBC's T&O Int Ops team as a
• * Generative AI Data Scientist / Researcher
• * and help build AI agents that will lead us to a future where RBC employees can leverage autonomous systems to dramatically increase the scope and quality of what we can accomplish. The ideal candidate will have a passion for developer tools, extensive knowledge of diverse programming environments, and experience with advanced LLM features like tool-use, chaining and orchestration patterns, and prompt engineering.
What will you do?
- Build Autonomous Research Agents:Design and develop AI agents that autonomously execute complex ML workflows—from exploratory data analysis and statistical modeling to visualization and report generation. Agents generate complete Python codebases with modular structure rather than isolated scripts.
Agent Architecture & Orchestration: Architect multi-agent systems (Developer/Supervisor patterns) with agentic search trees where each node represents a complete solution attempt. Build state management for long-running tasks—handling checkpoints, recovery, and resumption across failures.
Sandboxed Execution Environments: Design and build secure compute environments where agents can safely execute code, access tools, and interact with external services with process isolation and resource limits.
Vector Search & Retrieval: Implement advanced vector search techniques and RAG (Retrieval-Augmented Generation) systems to enhance agent capabilities across diverse data sources.
Evaluation & Benchmarking: Develop evaluation frameworks to measure agent performance, reliability, and task completion. Build observability and debugging tools for agent execution—understanding what the agent did, why, and how to improve it.
Data Engineering: Build data registry patterns for live database access (Postgre
SQL, MySQL, S3) with automated schema discovery and LLM-enhanced metadata.
Deployment & Integration: Deploy AI agents into production Open Shift/Kubernetes environments with async task processing (Celery/Redis), ensuring scalability and robustness.
Developer Tools &
Experience:
Engage directly with developers to gather insights and improve user experience; contribute across the stack—from front-end UI to back-end infrastructure. Dive into Python notebooks, VS Code plug-ins, and containerized environments.
Collaborative Research: Partner with researchers to improve model capabilities through shared tools and evals. Stay ahead of advancements in AI-assisted coding by experimenting with new tools and evaluating emerging techniques.
What do you need to succeed?
Must-Have:
• PhD or Master's degree in Physics, Computer Science, Data Science, or related field
• 5+ years of research experience with a focus on machine learning applications
• Hands-on experience working with large language models and prompt engineering
• Proficient in Python and libraries:
Num Py, Pandas, Tensor Flow, Keras, PyTorch, FastAPI
• Strong understanding of ML techniques: supervised/unsupervised learning, deep neural networks, Transformers, and agentic AI patterns
• Experience with async processing frameworks (Celery, Redis) and cloud platforms (Azure, AWS, Open Shift/Kubernetes)
• Familiarity with Mongo
DB, Postgre
SQL, and distributed systems architecture
• Excellent communication skills with…