svnscha - Profile Picture

Welcome back to the AI Agents in .NET series. In the introduction, I built a basic conversational agent. In Part 1, I added database persistence, embeddings, and semantic search capabilities.

Today I'm breaking free from the built-in DevUI. Time to escape the developer dungeon.

Don't get me wrong - DevUI is fantastic for development and debugging.

Enter LibreChat - and my slightly chaotic journey to expose agents as OpenAI-compatible endpoints.

Spoiler alert: This post includes a fun debugging adventure where I thought I was done, deployed to LibreChat, patted myself on the back... and then watched everything break spectacularly. Tool-calling agents and downstream clients? Yeah, they don't play nice. The fix? A reusable middleware pattern that took me time to figure out. Learn from my pain - stay tuned for the ToolCallFilterAgent.

The Problem with DevUI

DevUI served me well. It let me test agents, inspect tool calls, and debug issues. But it has limitations:

  • Single-purpose: It's a development tool, not a production interface
  • No multi-user support: One conversation at a time
  • Limited customization: You get what you get

For real-world use, I need something more flexible. And rather than building my own chat UI (been there, done that in early 2025, don't want to repeat), let's leverage existing open-source solutions.

Why LibreChat?

I mentioned LibreChat in my DGX Spark post, and I really enjoy it. Like, really like it.

It's:

  • OpenAI API compatible: Speaks the same language as ChatGPT
  • Self-hosted: Your data stays yours (take that, cloud overlords)
  • Feature-rich: File uploads, conversations, presets, the whole shebang
  • Beautiful: Actually looks like something you'd want to use - not like a developer accidentally vomited JSON onto a webpage

The key insight? LibreChat doesn't care what's behind the API. It just needs something that speaks OpenAI's protocol. And guess what the Microsoft Agent Framework already supports?

The Plan

Here's what I'm building:

%%{init: {"theme": "dark"}}%%
flowchart LR
    LC[LibreChat] -->|OpenAI API| K[Knowledge API]
    K -->|Agent Framework| A[Agents]
    A --> KA[Knowledge]
    A --> KSA[KnowledgeSearch]
    A --> KTA[KnowledgeTitle]
    KSA -->|Vector Search| PG[(PostgreSQL)]

LibreChat talks to our Knowledge API using the standard OpenAI chat completions format. Our API routes requests to our agents based on the model parameter. From LibreChat's perspective, it's just another OpenAI-compatible endpoint with multiple models to choose from.

Removing DevUI, Embracing Swagger

First things first - I'm removing the DevUI dependency and redirecting to Swagger for API exploration. DevUI is great for debugging, but for an API-first approach, Swagger makes more sense. Plus, Swagger has that nice "I'm a real API" energy.

The home page now redirects to Swagger:

app.MapGet("/", () => Results.Redirect("/swagger"));

Clean and simple. The API is now focused on being a backend service. Goodbye, training wheels.

Exposing OpenAI-Compatible Endpoints

The Agent Framework provides a beautiful MapOpenAIChatCompletions method that exposes agents as OpenAI-compatible endpoints. Each agent gets its own endpoint path. It's almost too easy (foreshadowing...).

I collect agent builders during registration and map them after building the app:

// Collect agent builders for endpoint mapping
var agentBuilders = new List<IHostedAgentBuilder>();

builder.ConfigureKnowledgeDefaults((settings, logger) =>
{
    // ... OpenAI client setup ...

    // Register agents and collect builders for endpoint mapping
    builder.Services.AddSingleton<KnowledgeSearchAgent>();
    agentBuilders.Add(builder.AddAIAgent("Knowledge", (services, key) => 
        AgentFactory.CreateKnowledgeAgent(chatClient, services, key)));
    agentBuilders.Add(builder.AddAIAgent("KnowledgeSearch", (services, key) => 
        AgentFactory.CreateKnowledgeSearchAgent(chatClient, services, key)));
});

var app = builder.Build();

// ... middleware setup ...

// Map OpenAI chat completions endpoint for each registered agent
foreach (var agentBuilder in agentBuilders)
{
    app.MapOpenAIChatCompletions(agentBuilder);
}

Now each agent has its own endpoint:

  • /Knowledge/v1/chat/completions
  • /KnowledgeSearch/v1/chat/completions

Pretty slick, right?

Setting Up LibreChat

Before I go further, let me get LibreChat running so I can test this integration. If you don't have LibreChat running yet, here's the quick setup. (If you do, feel free to skip ahead and judge my configuration choices.)

Docker Compose

mkdir -p ~/librechat
cd ~/librechat
git clone https://github.com/danny-avila/LibreChat.git .
cp .env.example .env

The LibreChat Configuration

Create librechat.yaml - this is where we point LibreChat at our API:

version: 1.2.8
cache: true

endpoints:
  custom:
    - name: "Agent Framework"
      apiKey: "not-used-but-required"
      baseURL: "http://host.docker.internal:5000/v1"
      models:
        default: ["Knowledge", "KnowledgeSearch"]
        fetch: false

      titleConvo: false
      summarize: false
      forcePrompt: false
      modelDisplayLabel: "Agent Framework"
      iconURL: "https://svnscha.de/svnscha.webp"

Docker Compose Override

Create docker-compose.override.yml to mount the config:

services:
  api:
    volumes:
      - type: bind
        source: ./librechat.yaml
        target: /app/librechat.yaml
    extra_hosts:
      - "host.docker.internal:host-gateway"

The extra_hosts line is crucial - it lets containers reach the host machine where the Knowledge API runs.

Fire It Up

docker compose up -d

Navigate to http://localhost:3080, create an account (it's local, don't worry), and you should see "Agent Framework" in the endpoint list.

But wait - if you select "Knowledge" and send a message, LibreChat hits /v1/chat/completions with "model": "Knowledge". My per-agent endpoints live at /Knowledge/v1/chat/completions instead. Houston, we have a routing problem.

The Model Routing Middleware

LibreChat (and most OpenAI clients) expect a single endpoint at /v1/chat/completions where the model parameter determines the underlying model but in this scenario it should determines routing:

{
  "model": "Knowledge",
  "messages": [{"role": "user", "content": "What's on your mind?"}]
}

I need middleware that reads the model from the request body and rewrites the path to the agent-specific endpoint. Nothing fancy, just some good old path mangling:

// Rewrite /v1/chat/completions to /{model}/v1/chat/completions based on request body
app.Use(async (context, next) =>
{
    var path = context.Request.Path.Value;
    if (path?.Equals("/v1/chat/completions", StringComparison.OrdinalIgnoreCase) == true)
    {
        context.Request.EnableBuffering();

        using var reader = new StreamReader(context.Request.Body, leaveOpen: true);
        var body = await reader.ReadToEndAsync();
        context.Request.Body.Position = 0;

        string? model = null;
        if (!string.IsNullOrEmpty(body))
        {
            try
            {
                var json = System.Text.Json.JsonDocument.Parse(body);
                if (json.RootElement.TryGetProperty("model", out var modelElement))
                {
                    model = modelElement.GetString();
                }
            }
            catch (System.Text.Json.JsonException)
            {
            }
        }

        if (string.IsNullOrWhiteSpace(model))
        {
            var logger = context.RequestServices.GetRequiredService<ILogger<Program>>();
            logger.LogWarning("Chat completions request missing required 'model' field");
            context.Response.StatusCode = StatusCodes.Status400BadRequest;
            context.Response.ContentType = "application/json";
            await context.Response.WriteAsJsonAsync(new { 
                error = new { 
                    message = "The 'model' field is required", 
                    type = "invalid_request_error", 
                    param = "model", 
                    code = "missing_required_parameter" 
                } 
            });
            return;
        }

        context.Request.Path = $"/{model}/v1/chat/completions";
    }
    await next();
});

The key points:

  • Enable buffering: I need to read the body to extract model, then reset it for the actual handler
  • Path rewriting: Transform /v1/chat/completions → /{model}/v1/chat/completions
  • Error handling: Return a proper OpenAI-style error if model is missing

Now when LibreChat sends a request, it gets routed to the right agent automatically. Back to LibreChat - select "Knowledge" from the dropdown, send a message, and... it works! Streaming responses, proper formatting, everything.

At this point, I thought I was done. Celebrated with a coffee. Time to test KnowledgeSearch...

The Tool Call Disaster (And How I Fixed It)

I switched to the KnowledgeSearch model in LibreChat, asked it to search for something, and... nothing. The request just hung. Then cancelled. No error message. No response. Just the cold, judgmental silence of broken software.

I dug through LibreChat's source code, traced the request flow, added logging everywhere. Console.WriteLine debugging like it's 2005.

Note: We should get to Observability very soon...

The Problem

My KnowledgeSearch agent uses tools - specifically the SearchConversationHistory function. When the agent executes a tool, the response stream includes FunctionCallContent and FunctionResultContent alongside the text. This is how the Agent Framework communicates "I'm calling a tool" and "here's what the tool returned."

But here's the thing - LibreChat receives this stream. It sees a function call. It thinks: "Oh, the model wants me to execute a function!" So it tries to find and execute that function. It can't (because the function lives in my backend, not LibreChat). So it gives up and cancels the request. Rude.

[Our Agent] → "I'll search for that" → FunctionCallContent{SearchConversationHistory}
[Our Agent] → [executes tool internally]
[Our Agent] → FunctionResultContent{results...}
[Our Agent] → "Based on the search, here's what I found..."
          ↓
[LibreChat] → Sees FunctionCallContent → "I need to call this function!"
[LibreChat] → Can't find function → Cancels request
          ↓
[User] → "Why isn't anything happening?" 😤

The frustrating part? You can't disable this behavior in LibreChat. It's doing the right thing for its use case - if it sends a function call to an upstream model, it expects to handle the response. But I'm the upstream model, and I've already handled my own tool calls. We're both right, and that's the most annoying kind of bug.

The Solution

Don't send tool call content downstream. Filter it out before it leaves the API. If LibreChat never sees the tool calls, it can't get confused by them. taps forehead

Meet ToolCallFilterAgent - a delegating agent that wraps any agent and strips FunctionCallContent and FunctionResultContent from responses. It's like a bouncer for your API responses:

/// <summary>
/// A delegating agent that filters out tool call content from responses.
/// This prevents downstream consumers from seeing FunctionCallContent and FunctionResultContent
/// that they cannot execute.
/// </summary>
public sealed class ToolCallFilterAgent : DelegatingAIAgent
{
    public ToolCallFilterAgent(AIAgent innerAgent) : base(innerAgent) { }

    public override async Task<AgentRunResponse> RunAsync(
        IEnumerable<ChatMessage> messages,
        AgentThread? thread,
        AgentRunOptions? options,
        CancellationToken cancellationToken)
    {
        var response = await InnerAgent.RunAsync(messages, thread, options, cancellationToken);
        response.Messages = FilterToolCalls(response.Messages);
        return response;
    }

    public override async IAsyncEnumerable<AgentRunResponseUpdate> RunStreamingAsync(
        IEnumerable<ChatMessage> messages,
        AgentThread? thread,
        AgentRunOptions? options,
        [EnumeratorCancellation] CancellationToken cancellationToken)
    {
        await foreach (var update in InnerAgent.RunStreamingAsync(messages, thread, options, cancellationToken))
        {
            yield return FilterToolCalls(update);
        }
    }

    private static IList<ChatMessage> FilterToolCalls(IEnumerable<ChatMessage> messages) =>
        messages.Select(m => new ChatMessage(m.Role,
            m.Contents.Where(c => c is not FunctionCallContent && c is not FunctionResultContent).ToList()
        )).ToList();

    private static AgentRunResponseUpdate FilterToolCalls(AgentRunResponseUpdate update) =>
        new(update.Role, update.Contents
            .Where(c => c is not FunctionCallContent && c is not FunctionResultContent).ToList());
}

Now in the AgentFactory, I wrap KnowledgeSearchAgent:

public static AIAgent CreateKnowledgeSearchAgent(IChatClient chatClient, IServiceProvider services, string key)
{
    var searchAgent = services.GetRequiredService<KnowledgeSearchAgent>();

    var agent = chatClient.CreateAIAgent(new ChatClientAgentOptions
    {
        Id = key,
        Name = key,
        ChatOptions = new ChatOptions
        {
            ConversationId = "global",
            Instructions = KnowledgeSearchSystemPrompt,
            Tools = [AIFunctionFactory.Create(searchAgent.SearchConversationHistoryAsync, "SearchConversationHistory")],
            ToolMode = ChatToolMode.Auto
        }
    });

    // Wrap with filter to prevent downstream consumers from seeing tool calls they can't execute
    return new ToolCallFilterAgent(agent);
}

The agent still uses tools internally, but clients only see the final text response. Clean and transparent. What happens in the backend stays in the backend.

Back to LibreChat - KnowledgeSearch now works! The agent searches, finds results, and responds - all without LibreChat ever knowing tools were involved. It's like magic, except it's just careful filtering.

Title Generation with KnowledgeTitleAgent

LibreChat has a feature called titleConvo - it automatically generates titles for conversations using the AI. But my main agents have tools and complex system prompts that are overkill for simple title generation. It's like using a flamethrower to light a candle.

The solution: a dedicated title agent. It's intentionally simple - no tools, no embeddings, just a focused system prompt:

/// <summary>
/// Simple agent for generating conversation titles.
/// No tools, no embeddings - just a basic helpful assistant.
/// Designed for use with LibreChat's title generation feature.
/// </summary>
public static class KnowledgeTitleAgent
{
    private const string TitleSystemPrompt = @"You are a helpful assistant that generates concise, descriptive titles for conversations.
When given conversation content, create a brief title (3-7 words) that captures the main topic or purpose.
Be specific and informative. Avoid generic titles like 'Chat' or 'Conversation'.";

    public static ChatClientAgent Create(IChatClient chatClient, IServiceProvider services, string key)
    {
        return chatClient.CreateAIAgent(new ChatClientAgentOptions
        {
            Id = key,
            Name = key,
            ChatOptions = new ChatOptions
            {
                ConversationId = "global",
                Instructions = TitleSystemPrompt,
            }
        });
    }
}

Register it alongside the other agents:

agentBuilders.Add(builder.AddAIAgent("KnowledgeTitle", (services, key) => 
    AgentFactory.CreateKnowledgeTitleAgent(chatClient, services, key)));

Now update librechat.yaml to use the title agent:

version: 1.2.8
cache: true

endpoints:
  custom:
    - name: "Agent Framework"
      apiKey: "not-used-but-required"
      baseURL: "http://host.docker.internal:5000/v1"
      models:
        default: ["Knowledge", "KnowledgeSearch", "KnowledgeTitle"]
        fetch: false

      titleConvo: true
      titleModel: "KnowledgeTitle"
      summarize: false
      forcePrompt: false
      modelDisplayLabel: "Agent Framework"
      iconURL: "https://svnscha.de/svnscha.webp"

Now LibreChat uses KnowledgeTitle specifically for generating conversation titles, while Knowledge and KnowledgeSearch handle the actual conversations.

The Complete Program.cs

Here's what Program.cs looks like after all these changes:

using System.ClientModel;
using Knowledge.Services;
using Knowledge.Shared.Agents;
using Knowledge.Shared.Extensions;
using Microsoft.Agents.AI.Hosting;
using Microsoft.Extensions.AI;
using OpenAI;

var builder = WebApplication.CreateBuilder(args);

// Collect agent builders for endpoint mapping
var agentBuilders = new List<IHostedAgentBuilder>();

builder.ConfigureKnowledgeDefaults((settings, logger) =>
{
    if (string.IsNullOrWhiteSpace(settings.ApiKey))
    {
        logger.LogWarning("No API key configured. Set Knowledge:ApiKey in user secrets or environment.");
    }

    var options = new OpenAIClientOptions();
    if (!string.IsNullOrEmpty(settings.ApiEndpoint))
    {
        options.Endpoint = new Uri(settings.ApiEndpoint);
    }

    var client = new OpenAIClient(new ApiKeyCredential(settings.ApiKey), options);
    var chatClient = client.GetChatClient("gpt-4.1").AsIChatClient();

    // Configure embedding service
    var embeddingClient = client.GetEmbeddingClient("text-embedding-3-small").AsIEmbeddingGenerator();
    builder.Services.AddEmbeddingService(embeddingClient);

    // Register agents and collect builders for endpoint mapping
    builder.Services.AddSingleton<KnowledgeSearchAgent>();
    agentBuilders.Add(builder.AddAIAgent("Knowledge", (services, key) => 
        AgentFactory.CreateKnowledgeAgent(chatClient, services, key)));
    agentBuilders.Add(builder.AddAIAgent("KnowledgeSearch", (services, key) => 
        AgentFactory.CreateKnowledgeSearchAgent(chatClient, services, key)));
    agentBuilders.Add(builder.AddAIAgent("KnowledgeTitle", (services, key) => 
        AgentFactory.CreateKnowledgeTitleAgent(chatClient, services, key)));
});

// Register background service for embedding processing
builder.Services.AddHostedService<EmbeddingBackgroundService>();

builder.Services.AddOpenAIResponses();
builder.Services.AddOpenAIConversations();

var app = builder.Build();

// Rewrite /v1/chat/completions to /{model}/v1/chat/completions based on request body
app.Use(async (context, next) =>
{
    var path = context.Request.Path.Value;
    if (path?.Equals("/v1/chat/completions", StringComparison.OrdinalIgnoreCase) == true)
    {
        context.Request.EnableBuffering();

        using var reader = new StreamReader(context.Request.Body, leaveOpen: true);
        var body = await reader.ReadToEndAsync();
        context.Request.Body.Position = 0;

        string? model = null;
        if (!string.IsNullOrEmpty(body))
        {
            try
            {
                var json = System.Text.Json.JsonDocument.Parse(body);
                if (json.RootElement.TryGetProperty("model", out var modelElement))
                {
                    model = modelElement.GetString();
                }
            }
            catch (System.Text.Json.JsonException) { }
        }

        if (string.IsNullOrWhiteSpace(model))
        {
            var logger = context.RequestServices.GetRequiredService<ILogger<Program>>();
            logger.LogWarning("Chat completions request missing required 'model' field");
            context.Response.StatusCode = StatusCodes.Status400BadRequest;
            context.Response.ContentType = "application/json";
            await context.Response.WriteAsJsonAsync(new { 
                error = new { 
                    message = "The 'model' field is required", 
                    type = "invalid_request_error", 
                    param = "model", 
                    code = "missing_required_parameter" 
                } 
            });
            return;
        }

        context.Request.Path = $"/{model}/v1/chat/completions";
    }
    await next();
});

app.UseRouting();

app.ConfigureKnowledgePipeline();

// Map OpenAI chat completions endpoint for each registered agent
foreach (var agentBuilder in agentBuilders)
{
    app.MapOpenAIChatCompletions(agentBuilder);
}

app.MapGet("/", () => Results.Redirect("/swagger"));

app.LogStartupComplete();

app.Run();

That's it. No CORS configuration needed (LibreChat makes server-side requests). No custom request/response models. No manual streaming code. The framework does the heavy lifting - I just wire it up. Sometimes the best code is the code you don't have to write.

The key insights:

  • Each agent gets its own endpoint via MapOpenAIChatCompletions
  • Inline middleware handles routing from /v1/chat/completions based on model
  • ToolCallFilterAgent hides internal tool execution from downstream clients
  • KnowledgeTitleAgent handles title generation without tool complexity

LibreChat Test

Open LibreChat, select "Agent Framework" as the endpoint, choose a model from the dropdown (Knowledge, KnowledgeSearch, or KnowledgeTitle), and start chatting. You should see:

  • Streaming responses (that satisfying typing effect)
  • Tool calls working transparently (KnowledgeSearch uses tools, but you only see the results)
  • Automatic title generation (thanks to KnowledgeTitle)

Image

Architecture Overview

Let me step back and look at what I've built (and maybe pat myself on the back a little):

%%{init: {"theme": "dark"}}%%
flowchart TB
    subgraph Clients
        LC[LibreChat]
        API[Direct API Calls]
    end

    subgraph "Knowledge API"
        MR[Model Router Middleware]
        E1["/Knowledge/v1/chat/completions"]
        E2["/KnowledgeSearch/v1/chat/completions"]
        E3["/KnowledgeTitle/v1/chat/completions"]
    end

    subgraph Agents
        KA[Knowledge Agent]
        KSA[KnowledgeSearch Agent]
        KTA[KnowledgeTitle Agent]
        TCF[ToolCallFilterAgent]
    end

    subgraph Storage
        PG[(PostgreSQL)]
    end

    LC -->|"/v1/chat/completions"| MR
    API --> MR
    MR -->|"model: Knowledge"| E1
    MR -->|"model: KnowledgeSearch"| E2
    MR -->|"model: KnowledgeTitle"| E3
    E1 --> KA
    E2 --> TCF
    TCF --> KSA
    E3 --> KTA
    KSA -->|Vector Search| PG

The flow is clean:

  1. Request comes in to /v1/chat/completions with "model": "KnowledgeSearch"
  2. Middleware rewrites path to /KnowledgeSearch/v1/chat/completions
  3. Framework routes to the agent, which is wrapped in ToolCallFilterAgent
  4. Agent executes tools internally, filter strips tool content from response
  5. Client sees clean text output

Multiple clients, multiple agents, one simple routing pattern. It's almost... elegant?

Swagger is still available at /swagger for API exploration and testing.

Wrapping Up

I've taken the agents from a development-only DevUI to something that can serve real users through a proper chat interface. And I did it with surprisingly little code - plus one hard-won debugging lesson that I'm still a bit salty about.

The key insights from this journey:

  1. MapOpenAIChatCompletions does the heavy lifting - one line per agent, full OpenAI compatibility
  2. Each agent gets its own endpoint - clean separation, easy to test directly
  3. Simple inline middleware handles routing - LibreChat sends to /v1/chat/completions, I rewrite to /{model}/v1/chat/completions
  4. Tool-calling agents need filtering - the ToolCallFilterAgent pattern is essential when exposing agents to downstream clients that don't understand your internal tool calls.
  5. Dedicated agents for specific tasks - KnowledgeTitleAgent for titles, KnowledgeSearch for search, Knowledge for general chat

What's Next?

I've got a solid foundation now - persistence, embeddings, semantic search, a proper UI, and automatic title generation. But there's more to explore:

  • Observability: OpenTelemetry, Jaeger, understanding what's happening at scale
  • Agent Loop: Exploring the magic behind automated AI Agents
  • Authentication: Proper API key validation for production deployments (because "meh, whatever" isn't a security strategy)

But first, I'm letting this settle. Play with LibreChat, see how your agents perform with real conversations, and notice what's missing. The best features come from actual use - not from staring at code and imagining what users might want.

See you in the next post. 🚀


The companion repository has been updated with the part/02-connect-librechat branch containing all the code from this post. Check it out on GitHub.