Multi-Agent Orchestration in Go

I spent a whole day debugging why my agent kept hallucinating tool calls. The model would output "Action: web_search" but forget the "Action Input" part, or mix up the format entirely. LangChainGo's ConversationalAgent expects a specific pattern: Thought, Action, Action Input. Local models struggle with it.

The fix was not tuning the prompt. It was abandoning the framework entirely.

This post shows the pipeline architecture I ended up with: three specialized agents (research, write, edit) that pass structured data between each other using direct LLM calls. No agent framework parsing. No format gymnastics. It just works.

Stop Overcomplicating This

Single agents are like hiring one person to be your lawyer, accountant, and chef. They can do all three, but badly. Splitting into specialists gives you clearer prompts, better error isolation, and the ability to parallelize. Your fact-checker agent works across content types without rewriting prompts.

I have tried supervisor patterns and handoff mechanisms. They add complexity you rarely need. The pipeline pattern (Researcher → Writer → Editor → Output) is the easiest to implement correctly and maps cleanly to Go's concurrency model. Supervisors make sense when you need runtime judgment about which specialist to invoke. Handoffs work for conversational bots that pivot mid-stream. For document generation and ETL workflows, use a pipeline.

The problem with LangChainGo's agent framework is it requires models to output this exact format:

Thought: I need to search for this
Action: web_search
Action Input: AI impact on software engineering

Local models mangle this constantly. They skip the Thought section, hallucinate nonexistent actions, or merge fields into gibberish. Direct LLM calls eliminate this parsing overhead. You call GenerateContent(), manually invoke tools when needed, and pass results back. The code is simpler, debugging is easier, and local models behave predictably.

The Code

Three agents using direct LLM calls with streaming support. Here's how it actually works:

2-multi-agent-pipeline/
├── main.go
├── internal/
│   ├── agents/
│   │   ├── simple_researcher.go
│   │   ├── writer.go
│   │   └── editor.go
│   └── tools/
│       └── search.go
└── go.mod

The researcher needs search. I use Brave Search API because it returns clean results without the SEO noise of Google:

package tools

import (
	"context"
	"encoding/json"
	"fmt"
	"io"
	"net/http"
	"net/url"
	"os"
	"time"
)

type Search struct {
	client *http.Client
	apiKey string
}

func NewSearch() *Search {
	return &Search{
		client: &http.Client{Timeout: 10 * time.Second},
		apiKey: os.Getenv("BRAVE_API_KEY"),
	}
}

func (s *Search) Name() string        { return "web_search" }
func (s *Search) Description() string {
	return "Search the web using Brave Search API"
}

type BraveResponse struct {
	Web struct {
		Results []struct {
			Title       string `json:"title"`
			URL         string `json:"url"`
			Description string `json:"description"`
		} `json:"results"`
	} `json:"web"`
}

func (s *Search) Call(ctx context.Context, input string) (string, error) {
	if s.apiKey == "" {
		return s.mockResults(input), nil
	}
	return s.braveSearch(ctx, input)
}

func (s *Search) braveSearch(ctx context.Context, input string) (string, error) {
	params := url.Values{"q": {input}, "count": {"10"}}
	apiURL := "https://api.search.brave.com/res/v1/web/search?" + params.Encode()

	req, err := http.NewRequestWithContext(ctx, "GET", apiURL, nil)
	if err != nil {
		return "", fmt.Errorf("create request: %w", err)
	}

	req.Header.Set("X-Subscription-Token", s.apiKey)
	req.Header.Set("Accept", "application/json")

	resp, err := s.client.Do(req)
	if err != nil {
		return "", fmt.Errorf("search request: %w", err)
	}
	defer resp.Body.Close()

	body, _ := io.ReadAll(resp.Body)
	if resp.StatusCode != http.StatusOK {
		return "", fmt.Errorf("API status %d", resp.StatusCode)
	}

	var braveResp BraveResponse
	if err := json.Unmarshal(body, &braveResp); err != nil {
		return "", fmt.Errorf("parse results: %w", err)
	}

	return s.formatResults(input, braveResp.Web.Results), nil
}

func (s *Search) formatResults(query string, results []struct {
	Title       string `json:"title"`
	URL         string `json:"url"`
	Description string `json:"description"`
}) string {
	formatted := make([]map[string]string, len(results))
	for i, r := range results {
		formatted[i] = map[string]string{
			"title":   r.Title,
			"snippet": r.Description,
			"source":  r.URL,
		}
	}

	output := map[string]interface{}{
		"query":   query,
		"results": formatted,
	}

	data, _ := json.Marshal(output)
	return string(data)
}

func (s *Search) mockResults(input string) string {
	return fmt.Sprintf(`{"query": "%s", "results": [
		{"title": "Guide to %s", "snippet": "Key facts", "source": "example.com"}
	]}`, input, input)
}

Each agent uses GenerateContent directly. No framework wrapping:

package agents

import (
	"context"
	"fmt"
	"log"
	"strings"

	"github.com/tmc/langchaingo/llms"
	langchaintools "github.com/tmc/langchaingo/tools"
)

type StreamHandler func(chunk string)

type SimpleResearcherAgent struct {
	llm   llms.Model
	tools []langchaintools.Tool
}

func NewSimpleResearcher(llm llms.Model, tools []langchaintools.Tool) *SimpleResearcherAgent {
	return &SimpleResearcherAgent{llm: llm, tools: tools}
}

func (r *SimpleResearcherAgent) ExecuteWithStream(
	ctx context.Context,
	topic string,
	handler StreamHandler,
) (string, error) {
	searchResults := r.getSearchResults(ctx, topic)

	prompt := fmt.Sprintf(`Research "%s" based on these search results:

%s

Provide comprehensive notes with facts, statistics, trends, and sources.`,
		topic, searchResults)

	return r.callLLM(ctx, prompt, "You are a research specialist.", handler)
}

func (r *SimpleResearcherAgent) getSearchResults(ctx context.Context, topic string) string {
	for _, tool := range r.tools {
		if tool.Name() == "web_search" {
			results, err := tool.Call(ctx, topic)
			if err != nil {
				log.Printf("[Search] Failed: %v", err)
				return ""
			}
			return results
		}
	}
	return ""
}

func (r *SimpleResearcherAgent) callLLM(
	ctx context.Context,
	prompt, system string,
	handler StreamHandler,
) (string, error) {
	content := []llms.MessageContent{
		llms.TextParts(llms.ChatMessageTypeSystem, system),
		llms.TextParts(llms.ChatMessageTypeHuman, prompt),
	}

	if handler != nil {
		result, err := r.streamLLM(ctx, content, handler)
		if err == nil {
			return result, nil
		}
	}
	return r.simpleLLM(ctx, content)
}

func (r *SimpleResearcherAgent) streamLLM(
	ctx context.Context,
	content []llms.MessageContent,
	handler StreamHandler,
) (string, error) {
	var response strings.Builder
	chunkCount := 0

	_, err := r.llm.GenerateContent(ctx, content,
		llms.WithTemperature(0.5),
		llms.WithMaxTokens(2000),
		llms.WithStreamingFunc(func(ctx context.Context, chunk []byte) error {
			str := string(chunk)
			response.WriteString(str)
			handler(str)
			chunkCount++
			return nil
		}),
	)

	if err != nil {
		return "", err
	}

	log.Printf("[LLM] Total chunks: %d, Total bytes: %d", chunkCount, response.Len())
	return response.String(), nil
}

func (r *SimpleResearcherAgent) simpleLLM(
	ctx context.Context,
	content []llms.MessageContent,
) (string, error) {
	resp, err := r.llm.GenerateContent(ctx, content,
		llms.WithTemperature(0.5),
		llms.WithMaxTokens(2000),
	)
	if err != nil {
		return "", err
	}
	if len(resp.Choices) == 0 {
		return "", fmt.Errorf("no response")
	}
	return resp.Choices[0].Content, nil
}

Writer and Editor follow the same pattern, they just have different prompts and temperatures. Here is the complete Writer implementation (Editor is identical except for the prompt):

type WriterAgent struct {
	llm llms.Model
}

func NewWriter(llm llms.Model) *WriterAgent {
	return &WriterAgent{llm: llm}
}

func (w *WriterAgent) ExecuteWithStream(
	ctx context.Context,
	research string,
	handler StreamHandler,
) (string, error) {
	prompt := fmt.Sprintf(`Transform these research notes into an engaging article:

%s

Write with compelling introduction, clear section headings (Markdown ##),
factual accuracy, strong conclusion, and professional tone.`, research)

	return w.callLLM(ctx, prompt, "You are a professional writer.", 0.7, handler)
}

func (w *WriterAgent) callLLM(
	ctx context.Context,
	prompt, system string,
	temp float64,
	handler StreamHandler,
) (string, error) {
	content := []llms.MessageContent{
		llms.TextParts(llms.ChatMessageTypeSystem, system),
		llms.TextParts(llms.ChatMessageTypeHuman, prompt),
	}

	if handler != nil {
		result, err := w.streamLLM(ctx, content, temp, handler)
		if err == nil {
			return result, nil
		}
	}
	return w.simpleLLM(ctx, content, temp)
}

func (w *WriterAgent) streamLLM(
	ctx context.Context,
	content []llms.MessageContent,
	temp float64,
	handler StreamHandler,
) (string, error) {
	var response strings.Builder
	chunkCount := 0

	_, err := w.llm.GenerateContent(ctx, content,
		llms.WithTemperature(temp),
		llms.WithMaxTokens(2000),
		llms.WithStreamingFunc(func(ctx context.Context, chunk []byte) error {
			str := string(chunk)
			response.WriteString(str)
			handler(str)
			chunkCount++
			return nil
		}),
	)

	if err != nil {
		return "", err
	}

	log.Printf("[Writer] Chunks: %d, Bytes: %d", chunkCount, response.Len())
	return response.String(), nil
}

func (w *WriterAgent) simpleLLM(
	ctx context.Context,
	content []llms.MessageContent,
	temp float64,
) (string, error) {
	resp, err := w.llm.GenerateContent(ctx, content,
		llms.WithTemperature(temp),
		llms.WithMaxTokens(2000),
	)
	if err != nil {
		return "", err
	}
	if len(resp.Choices) == 0 {
		return "", fmt.Errorf("no response")
	}
	return resp.Choices[0].Content, nil
}

Editor is the same structure with a different prompt and temperature (0.3 instead of 0.7).

The pipeline wires everything together with streaming:

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"time"

	"github.com/k1ng440/go-llm-demo/2-multi-agent-pipeline/internal/agents"
	"github.com/k1ng440/go-llm-demo/2-multi-agent-pipeline/internal/tools"
	"github.com/tmc/langchaingo/llms"
	"github.com/tmc/langchaingo/llms/ollama"
	langchaintools "github.com/tmc/langchaingo/tools"
)

type Pipeline struct {
	researcher *agents.SimpleResearcherAgent
	writer     *agents.WriterAgent
	editor     *agents.EditorAgent
}

func NewPipeline() (*Pipeline, error) {
	model := "minimax-m2.7:cloud" // Ollama Cloud; or "qwen3.5:9b" / "llama3.1:8b" for local

	llm, err := ollama.New(
		ollama.WithModel(model),
		ollama.WithPredictMirostat(0),
	)
	if err != nil {
		return nil, err
	}

	testCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	defer cancel()
	_, err = llm.Call(testCtx, "Hi", llms.WithTemperature(0.1))
	if err != nil {
		return nil, fmt.Errorf("model %s not available: %w", model, err)
	}

	searchTools := []langchaintools.Tool{tools.NewSearch()}

	return &Pipeline{
		researcher: agents.NewSimpleResearcher(llm, searchTools),
		writer:     agents.NewWriter(llm),
		editor:     agents.NewEditor(llm),
	}, nil
}

func (p *Pipeline) Run(ctx context.Context, topic string) (string, error) {
	log.Println("[1/3] Research...")
	log.Println("[Streaming] Starting...")
	research, err := p.researcher.ExecuteWithStream(ctx, topic, func(chunk string) {
		fmt.Print(chunk)
		os.Stdout.Sync()
	})
	fmt.Println()
	if err != nil {
		return "", fmt.Errorf("research: %w", err)
	}
	log.Printf("      -> %d chars", len(research))

	log.Println("")
	log.Println("[2/3] Writing...")
	log.Println("[Streaming] Starting...")
	draft, err := p.writer.ExecuteWithStream(ctx, research, func(chunk string) {
		fmt.Print(chunk)
		os.Stdout.Sync()
	})
	fmt.Println()
	if err != nil {
		return "", fmt.Errorf("writing: %w", err)
	}
	log.Printf("      -> %d chars", len(draft))

	log.Println("")
	log.Println("[3/3] Editing...")
	log.Println("[Streaming] Starting...")
	final, err := p.editor.ExecuteWithStream(ctx, draft, func(chunk string) {
		fmt.Print(chunk)
		os.Stdout.Sync()
	})
	fmt.Println()
	if err != nil {
		return "", fmt.Errorf("editing: %w", err)
	}
	log.Printf("      -> %d chars", len(final))

	return final, nil
}

func main() {
	log.Println("Multi-Agent Pipeline Demo")
	log.Println("=========================")

	pipeline, err := NewPipeline()
	if err != nil {
		log.Fatal(err)
	}

	topic := "The impact of AI on software engineering workflows"

	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
	defer cancel()

	result, err := pipeline.Run(ctx, topic)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println("")
	fmt.Println("=========================")
	fmt.Println("FINAL OUTPUT")
	fmt.Println("=========================")
	fmt.Println(result)
}

To run this:

# Get a Brave Search API key from https://brave.com/search/api/
export BRAVE_API_KEY="your-api-key-here"

# Pull a local model (skip if using Ollama Cloud)
ollama pull qwen3.5:9b

# Run the pipeline
go run main.go

For Ollama Cloud models like minimax-m2.7:cloud, you need a Pro or Max subscription. Local models like qwen3.5:9b or llama3.1:8b run on your own hardware.

Without BRAVE_API_KEY, the search tool returns mock data.

Handling Failures

The naive pipeline fails on any error. Production needs fallback logic.

Retry the research step a few times before giving up:

func (p *Pipeline) RunWithFallback(ctx context.Context, topic string) (string, error) {
	var research string
	var err error

	for i := 0; i < 3; i++ {
		research, err = p.researcher.ExecuteWithStream(ctx, topic, nil)
		if err == nil {
			break
		}
		log.Printf("[Pipeline] Research attempt %d failed: %v", i+1, err)
		time.Sleep(time.Second * 2)
	}

	if err != nil {
		log.Println("[Pipeline] Using fallback research")
		research = fmt.Sprintf("Basic information about: %s", topic)
	}

	// Continue with writer and editor...
}

Sometimes you want intermediate results even if later stages fail:

type PipelineResult struct {
	Research  string
	Draft     string
	Final     string
	Errors    []error
	Completed bool
}

func (p *Pipeline) RunPartial(ctx context.Context, topic string) PipelineResult {
	result := PipelineResult{}

	research, err := p.researcher.ExecuteWithStream(ctx, topic, nil)
	if err != nil {
		result.Errors = append(result.Errors, fmt.Errorf("research: %w", err))
		return result
	}
	result.Research = research

	draft, err := p.writer.ExecuteWithStream(ctx, research, nil)
	if err != nil {
		result.Errors = append(result.Errors, fmt.Errorf("writer: %w", err))
		result.Draft = "[Failed to generate draft]"
		result.Final = research // Return research as fallback
		return result
	}
	result.Draft = draft

	final, err := p.editor.ExecuteWithStream(ctx, draft, nil)
	if err != nil {
		result.Errors = append(result.Errors, fmt.Errorf("editor: %w", err))
		result.Final = draft // Return draft if editor fails
		return result
	}
	result.Final = final
	result.Completed = true

	return result
}

Parallel Execution

When agents can work independently, use goroutines. You will need to import sync:

import "sync"

func (p *Pipeline) ResearchParallel(ctx context.Context, topics []string) ([]string, error) {
	results := make([]string, len(topics))
	errList := make([]error, len(topics))

	var wg sync.WaitGroup
	for i, topic := range topics {
		wg.Add(1)
		go func(idx int, t string) {
			defer wg.Done()
			results[idx], errList[idx] = p.researcher.ExecuteWithStream(ctx, t, nil)
		}(i, topic)
	}
	wg.Wait()

	var combined []string
	for i, err := range errList {
		if err != nil {
			log.Printf("[Parallel] Topic %d failed: %v", i, err)
			continue
		}
		combined = append(combined, results[i])
	}

	if len(combined) == 0 {
		return nil, fmt.Errorf("all parallel research tasks failed")
	}

	return combined, nil
}

Don't Let Context Bleed

The biggest trap in multi-agent systems is earlier agents' reasoning confusing later ones. When agents share memory, you get contamination.

Use explicit handoffs with structured data:

type ResearchOutput struct {
	Topic       string            `json:"topic"`
	KeyFacts    []string          `json:"key_facts"`
	Sources     []string          `json:"sources"`
	Statistics  map[string]string `json:"statistics"`
	RawNotes    string            `json:"raw_notes"`
}

func ParseResearchOutput(raw, topic string) *ResearchOutput {
	output := &ResearchOutput{
		Topic:    topic,
		RawNotes: raw,
	}
	// Attempt JSON parsing here...
	return output
}

func (w *WriterAgent) ExecuteFromResearch(
	ctx context.Context,
	research *ResearchOutput,
	handler StreamHandler,
) (string, error) {
	prompt := fmt.Sprintf(`
Write an article about "%s" using these research findings:

Key facts: %v
Sources: %v
Statistics: %v
Notes: %s`,
		research.Topic,
		research.KeyFacts,
		research.Sources,
		research.Statistics,
		research.RawNotes,
	)

	return w.callLLM(ctx, prompt, "You are a professional writer.", 0.7, handler)
}

Debugging

When things break, you need to know which agent failed and why:

package observability

import (
	"context"
	"log"
	"time"
)

type AgentTracer struct {
	AgentName string
}

func (t *AgentTracer) Trace(
	ctx context.Context,
	input string,
	fn func() (string, error),
) (string, error) {
	start := time.Now()
	log.Printf("[Trace] %s started | input: %d chars", t.AgentName, len(input))

	output, err := fn()

	duration := time.Since(start)
	if err != nil {
		log.Printf("[Trace] %s FAILED after %v | error: %v", t.AgentName, duration, err)
	} else {
		log.Printf("[Trace] %s completed in %v | output: %d chars", t.AgentName, duration, len(output))
	}

	return output, err
}

Picking Models (And Knowing When to Stop)

Not every agent needs the same model. Different models excel at different tasks.

Research is mostly factual lookup. Fast models (Minimax, Qwen3.5, Llama) do this well. Writing needs creativity and flow. Premium models (Claude Opus 4, GPT-4o) shine here. Editing is about following rules. Mid-tier models (Claude Sonnet, GPT-4o-mini) are sufficient.

The quality gap is task-dependent. A fast model extracting facts from Brave Search performs nearly as well as Opus 4. But ask it to write engaging prose and the gap is obvious.

Start with fast models everywhere. Upgrade individual agents to premium only when you see specific quality gaps. Usually this means keeping research on fast models, upgrading writing to premium if the output needs to be engaging, and using mid-tier for editing, or skipping the editor entirely if the writer is good enough.

More agents is not always better. Agent overhead is real: each adds latency and debugging complexity grows quadratically with handoffs. I follow one rule: one agent per distinct kind of work. Research, writing, and editing are different. "Research part A" and "Research part B" are the same task. Use parallel execution, not separate agents.

Things That Will Bite You

The working code is on GitHub.

Before you productionize, watch out for these:

Context Window Limits Each agent passes its full output to the next. A 4K research output becomes a 6K draft, then an editor prompt that includes both. Cumulative growth adds up fast. Most local models have 32K-128K context windows, so you will not hit the hard limit immediately, but you pay for every token in latency and cost. Track your per-stage token counts or your pipeline will balloon unnecessarily.

Go Concurrency Traps The parallel research example above shares the same llms.Model across goroutines. Most LLM providers rate-limit by API key, not by connection. If you spawn 10 parallel researchers against Ollama Cloud, you will hit quota errors. Add a semaphore or pool your model instances.

Streaming and Timeouts The ExecuteWithStream pattern looks clean but complicates error handling. If the LLM connection drops mid-stream, you get a partial response that looks like success. Check your response length against expected ranges, or validate the output has a proper ending marker before declaring victory.

Start with a 2-agent pipeline. Add the editor only when you see specific failure modes it would fix. Measure latency at each step. Fix the slowest agent first.

Asaduzzaman Pavel

About the Author

Asaduzzaman Pavel is a Software Engineer who actually enjoys the friction of a well-architected system. He has over 15 years of experience building high-performance backends and infrastructure that can actually handle the real-world chaos of scale.

Currently looking for new opportunities to build something amazing.