Scaling Engineering with Agentic Workflows

 

TL;DR: Individual Duos can expand our quest to create the best education in the world and make it accessible to everyone more quickly with the aid of a decent prompt in tools like Cursor. However, what would happen if Duos wanted to make the prompt available to a complete team or even Duolingo? Agentic processes aim to achieve that. A growing fleet of coding agents tailored to Duolingo’s most tedious duties is already being developed by Duos. These bespoke agents relieve our engineers of repetitive duties so they may concentrate on core logic and product thinking. Even better, Duos can now generate an agent in less than five minutes for workflows that adhere to standard patterns.

The agents we have

We have already deployed agents for many routine purposes. Some existing agent capabilities:

  1. Remove Deprecated Feature Flags
  2. Launch / Shut Down an Experiment
  3. Modify Terraform and Create PR

The pattern

As we developed coding agents, we found the same pattern emerged again and again as the simple way to make code changes.

  • Clone the repo
  • The AI agent makes a code change
  • Commit the code and optionally open a PR

When a creator only needs a single agentic pass for an agent to make its change, this pattern can cover a large number of scenarios.

Make your first agentic workflow in less than 5 minutes!

We’ve made it simple for Duos to develop a new workflow and distribute it internally for common patterns like this, all without requiring any custom code. Engineers are not the only ones who can set up an agent; PMs and researchers can as well.

How to create

Our Duos start by filling out a simple JSON form about the workflow. This form allows them to provide a prompt, a code repo for it to run against, and 0 or more parameters (useful for sharing and reusing the workflow).

Testing

The prompt is the core of a workflow, so Duos test this prompt until they judge it successful. Generally, we take some time to craft it in Codex or Claude, and make sure it works as intended in a variety of situations. Once the prompt is ready and the form filled out, the workflow can be staged for end to end testing. Faster iteration means we can test more ideas to improve learning efficacy for our learners.

How to run

Once ready, Duos can easily merge their forms for them to automatically show up in a list of internal tools that any Duo can run. We also added Slack notifications to keep users informed about its progress.

Build your own (on Temporal)

Many instances are covered by the straightforward JSON procedure, but others may require to make several agentic passes, run extra tools, decide which activities to conduct at runtime, or perform other intriguing tasks.

Making custom workflows is still quite simple! We often create new workflows in 1-2 days by using the BootstrapTemporalWorkflow and copying from the old workflows. Temporal’s platform is adaptable and its surrounding architecture is rapidly changing. It can be simply made to satisfy needs, or it is probably already set up to do so.


Github library

We have set up a util directory for common interactions (cloning a repo, opening a PR, etc.) This repo common code package is used in all of our agents to avoid repeated code, make it easier to follow our key patterns, and generally support development velocity. This library is able to use a shared Github App token to have all of our PRs correctly come from a bot account with centrally controlled permissions.

Multi-step workflows

While the straightforward pattern enables us to complete engaging tasks in a single activity, multi-step workflows are required for more complex tasks. These workflows can do longer-term, more complex tasks without compromising durability. A separate retryable activity with its own timeouts and retry policy is included in each stage of a multi-step pattern. This prevents AI non-determinism from restarting the entire process, enabling us to make several LLM calls within a single agent.

Next steps

There are a few key features coming down the pipeline which will enable the agents our Duos build to be even more impactful. A large set of features is blocked by issues running Docker in Docker on Temporal, an issue which is being actively addressed and is expected to be solved within the next month.

MCP

MCP access is well-established as a way to grant more abilities to any given agent. Prototype agents with access to the Github MCP are able to make reference to other codebases while upgrading their own, significantly improving their ability to solve problems. Other MCP servers, in particular Atlassian, should allow agentic workflows to connect to other portions of Duolingo’s business besides the codebase.

Expanding agent.json

The JSON created agents run a common workflow under the hood. We can expand its functionality to more flexibly accommodate other common patterns (e.g. multi-step workflows).

Final thoughts

Agentic workflows are still in their early days at Duolingo, and the pattern captured by this blog post is only one of many. More broadly, both the capabilities of agentic workflows and the best ways to support them with infrastructure remain very much open questions.

With that in mind, this is an early, and by no means definitive, attempt to answer these questions. We expect the space to evolve rapidly.

If you want to work at a place that creates innovative internal tools that make complex engineering tasks easier and is innovating in service of a clear mission, we’re hiring!

Leave a Reply

Your email address will not be published. Required fields are marked *