Calling Azure OpenAI from .NET: Setup, Authentication, and Deployments

Azure OpenAI and the standard OpenAI API share the same underlying models, but they're not the same service. The endpoints are different, the authentication works differently, and — the one that trips up most developers coming from the OpenAI SDK — you don't refer to models by name. You refer to them as deployments that you configure yourself in the Azure Portal.

In this article, we'll install the right NuGet packages, authenticate with an API key, create a client targeting a specific deployment, and make both standard and streaming completions.

Let's get started.

Installing the Azure OpenAI .NET SDK

We start with a new ASP.NET Core Web API project and clean up the default boilerplate until we have a minimal Program.cs:

Program.cs

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenApi();

var app = builder.Build();

if (app.Environment.IsDevelopment())
{
	app.MapOpenApi();
}

app.UseHttpsRedirection();

app.Run();

Then we install the Azure OpenAI package:

dotnet add package Azure.AI.OpenAI

Azure.AI.OpenAI is Microsoft's official SDK for Azure OpenAI. It provides AzureOpenAIClient, which is our entry point for everything we'll do. The underlying OpenAI package gets installed transitively — its types, like ChatClient and ChatMessage, are what we work with day-to-day.

Deployments vs. Model Names

Before we write any code, it's worth understanding the most common source of confusion when moving from the standard OpenAI API to Azure.

With the standard OpenAI SDK, we specify a model name directly:

Program.cs

var client = new OpenAIClient("some-api-key");
var chatClient = client.GetChatClient("gpt-5.4-mini");

Azure doesn't work this way. We can't request gpt-5.4-mini by name. Instead, we go to the Azure AI Foundry portal, create a deployment, and give it a name — say, general-purpose-gpt-5.4-deployment. That deployment name is what we use in our code:

Program.cs

var azureClient = new AzureOpenAIClient(
	new Uri("deployment-base-address"),
	new ApiKeyCredential("some-api-key"));

var chatClient = azureClient.GetChatClient("general-purpose-gpt-5.4-deployment"); // deployment name, not model name

The deployment name can be anything we choose — after the model, the feature it serves, or the environment. The important thing is that our code matches what we configured in Azure.

Setting Up the AzureOpenAIClient

Let's build a small service that generates descriptions for cats. We'll start with the interface:

ICatDescriptionService.cs

public interface ICatDescriptionService
{
	Task<string> DescribeCatAsync(
		string name,
		string breed,
		CancellationToken cancellationToken = default);
}

Before we implement it, we need a settings class and a corresponding section in appsettings.json following the options pattern:

AzureOpenAiSettings.cs

public sealed class AzureOpenAiSettings
{
	[Required]
	public required string BaseAddress { get; init; }

	[Required]
	public required string Deployment { get; init; }
}

appsettings.json

{
  "AzureOpenAi": {
    "BaseAddress": "https://my-resource.openai.azure.com/",
    "Deployment": "general-purpose-gpt-5.4-deployment"
  }
}

Now let's implement the service:

CatDescriptionService.cs

public sealed class CatDescriptionService : ICatDescriptionService
{
	private readonly ChatClient _client;
	private readonly ILogger<CatDescriptionService> _logger;

	public CatDescriptionService(
		ILogger<CatDescriptionService> logger,
		IOptions<AzureOpenAiSettings> options)
	{
		_logger = logger;

		var apiKey = Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")
			?? throw new InvalidOperationException("AZURE_OPENAI_API_KEY is not set.");

		var azureClient = new AzureOpenAIClient(
			new Uri(options.Value.BaseAddress),
			new ApiKeyCredential(apiKey));

		_client = azureClient.GetChatClient(options.Value.Deployment);
	}
}

We retrieve the API key from an environment variable and throw if it's missing. We then create an AzureOpenAIClient using the configured endpoint and credential, and from that we get a ChatClient tied to our specific deployment.

Notice that BaseAddress and Deployment live in configuration while the API key comes from an environment variable. This is intentional — configuration is fine in appsettings.json, but secrets shouldn't be committed to source control, not even in development.

Finally, let's register everything in Program.cs:

Program.cs

builder.Services.AddSingleton<ICatDescriptionService, CatDescriptionService>();

builder.Services.AddOptions<AzureOpenAiSettings>()
	.BindConfiguration(nameof(AzureOpenAiSettings))
	.ValidateDataAnnotations()
	.ValidateOnStart();

We register CatDescriptionService as a singleton — ChatClient is thread-safe and designed to be reused, so there's no need to create a new instance per request. We also bind and validate our settings at startup so any misconfiguration is caught immediately rather than at runtime.

Making a Chat Completion Request

Let's now implement the DescribeCatAsync() method:

CatDescriptionService.cs

public async Task<string> DescribeCatAsync(
	string name,
	string breed,
	CancellationToken cancellationToken = default)
{
	ChatMessage[] messages =
	[
		new SystemChatMessage(
			"You are a helpful assistant " +
			"that writes charming descriptions" +
			" of cats for a pet adoption website."),

		new UserChatMessage(
			$"Write a short, warm description " +
			$"for a cat named {name} who is a {breed}."),
	];

	try
	{
		var completion = await _client.CompleteChatAsync(
			messages,
			new ChatCompletionOptions(),
			cancellationToken);

		var result = completion.Value;

		return result.Content[0].Text;
	}
	catch (Exception ex)
	{
		_logger.LogError(
			ex,
			"Azure OpenAI completion failed for {Name}.",
			name);

		throw;
	}
}

We build an array of ChatMessage objects. A SystemChatMessage sets the behavior and tone of the model for this conversation. A UserChatMessage is the actual request. We pass both to CompleteChatAsync() and read back the first content item's text.

Let's wire this up to an endpoint and test it:

Program.cs

app.MapGet("/cat-descriptions", async (
	[FromQuery] string name,
	[FromQuery] string breed,
	[FromServices] ICatDescriptionService service,
	CancellationToken cancellationToken) =>
{
	var description = await service
		.DescribeCatAsync(name, breed, cancellationToken);

	return TypedResults.Ok(
		new
		{
			Name = name,
			Breed = breed,
			Description = description
		});
});

Let's send a GET request to https://localhost:7195/cat-descriptions?name=Cody&breed=American%20Shorthair and inspect the result:

{
    "name": "Cody",
    "breed": "American Shorthair",
    "description": "Cody is a sweet American Shorthair with a gentle spirit 
		and an easygoing charm. He’s the kind of cat who can brighten
		a room with his calm presence and make every day feel a little cozier.
		Cody is ready to find a loving home where he can share
		his quiet affection and companionship."
}

If you are interested in having a deep dive into Minimal APIs, you can check my Minimal APIs in ASP.NET Core course here.

Streaming Chat Completions in ASP.NET Core

CompleteChatAsync() waits for the entire response before returning. For longer responses that can feel slow. Streaming sends back content as it's generated so we can start displaying it immediately.

Let's add another method to our interface:

ICatDescriptionService.cs

public interface ICatDescriptionService
{
	Task<string> DescribeCatAsync(
		string name,
		string breed,
		CancellationToken cancellationToken = default);

	IAsyncEnumerable<string> StreamCatDescriptionAsync(
		string name,
		string breed,
		CancellationToken cancellationToken = default);
}

We add the StreamCatDescriptionAsync() method. It takes the same arguments, but return an IAsyncEnumerable<string>.

Let's move on to it's implementation:

CatDescriptionService.cs

public async IAsyncEnumerable<string> StreamCatDescriptionAsync(
	string name, 
	string breed,
	[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
	ChatMessage[] messages =
	[
		new SystemChatMessage(
			"You are a helpful assistant " +
			"that writes charming descriptions" +
			" of cats for a pet adoption website."),

		new UserChatMessage(
			$"Write a short, warm description " +
			$"for a cat named {name} who is a {breed}."),
	];

	AsyncCollectionResult<StreamingChatCompletionUpdate> stream;

	try
	{
		stream = _client.CompleteChatStreamingAsync(
			messages,
			new ChatCompletionOptions(),
			cancellationToken);
	}
	catch (Exception ex)
	{
		_logger.LogError(
			ex, 
			"Failed to initiate streaming for {Name}.",
			name);

		throw;
	}

	await foreach (var update in stream.WithCancellation(cancellationToken))
	{
		foreach (var part in update.ContentUpdate)
		{
			if (!string.IsNullOrEmpty(part.Text))
			{
				yield return part.Text;
			}
		}
	}
}

We start by using the [EnumeratorCancellation] attribute on the CancellationToken parameter. It's easy to overlook but important. Callers can pass a cancellation token either as a direct argument or via WithCancellation() on the returned enumerable. Without the attribute, the second approach silently does nothing — the iterator never sees that token and runs to completion regardless. With it, the compiler merges both tokens automatically.

Next, we build the message array the same way we did in the DescribeCatAsync() method.

The most important structural decision is the split between initiation and enumeration. We do it because of a hard C# language restriction. The compiler doesn't allow us to yield return inside a try block that has a catch clause. So instead of one big try/catch wrapping everything, we call the CompleteChatStreamingAsync() inside a try/catch, assign the result to a variable declared outside it, and then exit the try/catch entirely before the first yield. Errors during stream setup — authentication failures, invalid deployment names, network issues — are caught and logged there. Once we have a live stream object in hand, we yield chunks freely with no try/catch in the way.

Each streaming update contains a ContentUpdate collection rather than a single text value, because the model can produce multiple fragments in one event. We filter out empty parts because the model frequently sends whitespace-only updates as heartbeat events. This way users receive only actual content, and they receive it immediately as it's generated rather than waiting for the full description to complete.

Now, let's add our streaming endpoint:

Program.cs

app.MapGet("/stream-cat-descriptions", async (
	[FromQuery] string name,
	[FromQuery] string breed,
	[FromServices] ICatDescriptionService service,
	HttpResponse response,
	CancellationToken cancellationToken) =>
{
	response.Headers.ContentType = "text/event-stream";
	response.Headers.CacheControl = "no-cache";
	response.Headers.Append("X-Accel-Buffering", "no");

	await foreach(var chunk in service.StreamCatDescriptionAsync(name, breed, cancellationToken))
	{
		await response.WriteAsync($"data: {chunk}\n\n", cancellationToken);
		await response.Body.FlushAsync(cancellationToken);
	}
});

Unlike the completion endpoint, we can't return a value and let the framework serialize it — we need to write chunks directly to the response as they arrive. We inject HttpResponse directly to get that control.

We set Content-Type to text/event-stream, which is the Server-Sent Events content type. It tells the client this is a long-lived stream of discrete events rather than a single response body. Cache-Control: no-cache prevents any proxy from buffering the stream between our server and the client. X-Accel-Buffering: no handles the same problem specifically for Nginx, which buffers upstream responses by default — without this header, a Nginx reverse proxy would hold all chunks and flush them at once, making streaming invisible to the client.

Each chunk is written in SSE format: data: followed by the content, followed by two newlines. The double newline is what signals the end of a single event to the client — without it the browser's EventSource never fires. Finally, FlushAsync() after every write ensures ASP.NET Core doesn't buffer the output on its side. Without it, chunks pile up in the response buffer and the client receives nothing until the stream completes.

Let's send a GET request to https://localhost:7195/stream-cat-descriptions?name=Cody&breed=American%20Shorthair in Postman:

Postman keeps the connection open and displays events as they arrive in the response pane. We'll see a live elapsed-time counter in the bottom right while the stream is active, and each data: event appears as a new row:

Azure OpenAI streaming chat completions response in Postman

Conclusion

Calling Azure OpenAI from .NET comes down to three things: installing Azure.AI.OpenAI, creating an AzureOpenAIClient pointed at our resource endpoint, and getting a ChatClient for a specific deployment name — not a model name. Once that client is in place, completions and streaming follow naturally.

Frequently Asked Questions

What is the difference between Azure OpenAI and OpenAI?

Azure OpenAI and the standard OpenAI API expose the same underlying models, but they run on different infrastructure. Azure OpenAI is hosted on Microsoft Azure, which means it inherits Azure's compliance certifications, regional data residency options, and private networking support. Authentication also works differently — Azure supports both API keys and Microsoft Entra ID managed identities, while the standard API uses only API keys. The other key difference is that Azure requires you to create a named deployment for each model you want to use, rather than referencing models directly by name in your code.

What is a deployment in Azure OpenAI?

A deployment is a named configuration you create in the Azure AI Foundry portal that maps a specific model — such as GPT-4o — to an endpoint in your Azure OpenAI resource. When you call the API from .NET code, you pass the deployment name to GetChatClient(), not the model name. This gives you control over which model version is active, lets you set quota limits per deployment, and means you can swap the underlying model without changing your application code.

How do I stream Azure OpenAI responses in ASP.NET Core?

Use CompleteChatStreamingAsync() instead of CompleteChatAsync(). It returns an AsyncCollectionResult<StreamingChatCompletionUpdate> that you iterate with await foreach. In your ASP.NET Core endpoint, inject HttpResponse directly, set Content-Type to text/event-stream, and write each chunk using response.WriteAsync() followed by response.Body.FlushAsync() to push content to the client as it arrives. Add Cache-Control: no-cache and X-Accel-Buffering: no to prevent proxies — including Nginx — from buffering the stream.

Installing the Azure OpenAI .NET SDK

Deployments vs. Model Names

Setting Up the AzureOpenAIClient

Making a Chat Completion Request

If you are interested in having a deep dive into Minimal APIs, you can check my Minimal APIs in ASP.NET Core course here.

Streaming Chat Completions in ASP.NET Core

Conclusion

Frequently Asked Questions