AI WisdomArchitecture & guides β†—
HT
How Things Work

Application Insights & Distributed Tracing

How the App Insights SDK instruments your app β€” sampling, the telemetry pipeline, and reading a distributed trace.

How It Works

Application Insights is Azure Monitor's APM solution. The SDK auto-instruments your app at startup β€” intercepting HTTP requests, SQL queries, and outgoing calls β€” and correlates them into distributed traces using W3C TraceContext headers. All telemetry flows through an in-process pipeline of initializers and processors before being batched and sent to a Log Analytics Workspace.

1
SDK instruments your app at startup

AddApplicationInsightsTelemetry() injects middleware that intercepts every HTTP request, outgoing HttpClient call, SQL command, and dependency. It reads the W3C traceparent header to continue a distributed trace started upstream.

2
Telemetry flows through the processing pipeline

Before data leaves the process, it passes through TelemetryInitializers (enrichers) and TelemetryProcessors (filters/samplers). This is where you add custom dimensions, filter health-check noise, or implement fixed-rate sampling.

3
Adaptive sampling reduces volume automatically

By default, App Insights uses adaptive sampling β€” it targets ~5 events/second and drops telemetry to stay under the limit. Critically, it samples entire operations (all spans for one trace) together to preserve trace completeness.

4
W3C TraceContext propagates operation IDs across services

The SDK automatically injects traceparent: 00-{traceId}-{spanId}-01 into outgoing HTTP requests. Downstream services that also use App Insights (or OpenTelemetry) pick this up, creating the distributed trace that links all spans under one operation_Id.

5
Data lands in Log Analytics Workspace

All telemetry is stored in a Log Analytics Workspace as typed tables: requests, dependencies, exceptions, traces, customEvents, customMetrics. You query with KQL. Cross-workspace queries let you join telemetry from multiple services.

Key Concepts

πŸ“‘TelemetryClient

The main SDK class. Inject as singleton. Tracks requests, exceptions, events, metrics, dependencies. Use FlushAsync() in Azure Functions before exit.

πŸ”ŒConnection String

Replaces the deprecated instrumentation key. Contains endpoint URL allowing telemetry to be sent to sovereign clouds or custom endpoints. Format: InstrumentationKey=...;IngestionEndpoint=...

🎯Adaptive Sampling

Default sampling strategy that adjusts rate dynamically to target ~5 req/sec. Samples entire operations together (preserves trace completeness). Can be disabled or replaced with fixed-rate.

πŸ”—Dependency Tracking

Auto-instruments outgoing HTTP, SQL, Azure SDK calls, and Redis. Each call becomes a 'dependency' span with duration, result code, and target. Links back to the parent request via operation_Id.

🌐W3C TraceContext

Standard distributed tracing header (traceparent). App Insights injects it into outgoing calls and reads it from incoming requests to stitch spans across service boundaries into one trace.

⚑Live Metrics Stream

Sub-second telemetry preview β€” requests/sec, failure rate, server metrics β€” without waiting for ingestion. Useful for deployments and incident response. Does NOT sample.

πŸ”Availability Tests

Synthetic monitoring: URL ping, multi-step, or custom TrackAvailability() calls from Azure regions worldwide. Alerts when availability drops below threshold. Appears in Application Map.

.NET 8 β€” App Insights with distributed tracing, custom events, and Function flush
tsx
1// .NET 8 β€” App Insights with distributed tracing + custom telemetry
2// Program.cs
3builder.Services.AddApplicationInsightsTelemetry(options =>
4{
5 // Use connection string, NOT instrumentation key (deprecated)
6 options.ConnectionString = builder.Configuration["ApplicationInsights:ConnectionString"];
7});
8
9// Custom middleware β€” enriches every request with user context
10builder.Services.AddSingleton<ITelemetryInitializer, UserTelemetryInitializer>();
11
12public class UserTelemetryInitializer : ITelemetryInitializer
13{
14 public void Initialize(ITelemetry telemetry)
15 {
16 // Adds custom dimension to EVERY piece of telemetry
17 if (telemetry is ISupportProperties item)
18 item.Properties["UserId"] = httpContextAccessor.HttpContext?.User.GetUserId();
19 }
20}
21
22// Service β€” track custom event + dependency manually
23public class OrderService
24{
25 private readonly TelemetryClient _telemetry;
26
27 public async Task PlaceOrder(Order order)
28 {
29 var sw = Stopwatch.StartNew();
30 try
31 {
32 await _db.SaveAsync(order);
33 sw.Stop();
34
35 // Custom event with properties (avoid high cardinality keys!)
36 _telemetry.TrackEvent("OrderPlaced", new Dictionary<string, string>
37 {
38 ["ProductCategory"] = order.Category, // LOW cardinality β€” good
39 // ["OrderId"] = order.Id, // HIGH cardinality β€” BAD, hits data cap fast
40 }, new Dictionary<string, double>
41 {
42 ["OrderValueGBP"] = order.TotalGbp,
43 });
44 }
45 catch (Exception ex)
46 {
47 _telemetry.TrackException(ex, new Dictionary<string, string>
48 {
49 ["OrderId"] = order.Id, // OK in exceptions β€” low volume
50 });
51 throw;
52 }
53 }
54}
55
56// Azure Functions β€” MUST flush before function exits!
57public class ProcessOrderFunction
58{
59 private readonly TelemetryClient _telemetry;
60
61 [FunctionName("ProcessOrder")]
62 public async Task Run([ServiceBusTrigger("orders")] string message)
63 {
64 try { /* process */ }
65 finally
66 {
67 // Without this, telemetry is lost when function host recycles
68 await _telemetry.FlushAsync(CancellationToken.None);
69 }
70 }
71}
πŸ’‘
Why This Matters

Without distributed tracing, a slow API response could be caused by a SQL query, a downstream HTTP call, a message queue delay, or the app's own logic β€” and you'd have no way to know which without instrumenting every layer manually. App Insights gives you a unified view across all services in one trace, with zero code changes for standard operations.

Common Pitfalls

⚠Adaptive sampling does not guarantee any specific request is retained. Rare failures (0.1% error rate) may be consistently dropped. Add a custom TelemetryProcessor that forces keep=true for failed operations.
⚠Azure Functions, console apps, and short-lived processes must call await telemetryClient.FlushAsync() before exit. The SDK batches in memory β€” process exit drops the buffer silently.
⚠High-cardinality customDimensions (OrderId, UserId, GUIDs) on high-volume events rapidly exhaust the daily data cap. Reserve unique IDs for exceptions and low-volume events only.
⚠Instrumentation key is deprecated β€” use connection string. The key alone can't target sovereign clouds and will stop working in future SDK versions.
Real-World Use Cases

1Sampling Drops the Exact Failing Request During Incident

Scenario

A payments API had a 0.3% error rate on a specific payment provider. During a P1 incident, engineers searched App Insights for the failing requests but found no traces β€” only aggregate error counts in metrics.

Problem

Adaptive sampling was running at 8% (high traffic volume). The failing requests β€” already rare at 0.3% β€” were statistically being dropped by the sampler. The operation that failed was not preserved.

Solution

Added a TelemetryProcessor that forces sampling=true for any operation where ResultCode >= 400 or where a specific CustomDimension is present. Also added a KQL alert on exceptions table (which is sampled differently) rather than requests table.

πŸ’‘

Takeaway: Adaptive sampling does not guarantee that any specific request is kept. For error investigation, use fixed-rate sampling or a custom processor that always keeps failed operations. Never rely on sampling to preserve your debugging evidence.

2Azure Function Telemetry Lost on Cold Start Recycle

Scenario

A processing function appeared healthy in App Insights β€” low exception rates, normal request counts. But downstream database records showed gaps: some messages were being processed but telemetry for ~5% of invocations was missing.

Problem

The Azure Functions consumption plan recycles idle instances. Without await _telemetry.FlushAsync(), in-process telemetry buffers are dropped when the host shuts down. The SDK batches telemetry for efficiency, so a quick function that exits before the flush interval loses its data.

Solution

Added FlushAsync(CancellationToken.None) in a finally block in every function. Also configured the App Insights host option flushOnDispose: true and reduced MaxTelemetryBufferCapacity to trigger more frequent flushes.

πŸ’‘

Takeaway: App Insights SDK buffers telemetry in memory and flushes on a timer. In short-lived processes (Azure Functions, console apps, Lambda), you must call FlushAsync() explicitly or telemetry is silently dropped on exit.

3High-Cardinality Custom Dimensions Exhausting Daily Data Cap

Scenario

A team added TrackEvent('ApiCall') with a custom dimension OrderId on every API call. Within a week, their App Insights resource hit the 100GB/day data cap and began dropping telemetry across all services sharing the workspace.

Problem

Adding a unique value (OrderId, UserId, CorrelationId) as a customDimensions key creates millions of unique dimension values. These inflate metric cardinality and storage volume dramatically. 500k orders/day * 10 telemetry items each = 5M rows with a unique dimension that can't be aggregated usefully.

Solution

Moved high-cardinality values (OrderId) to customProperties on exceptions only (low volume). Replaced on events with low-cardinality dimensions (ProductCategory, Region, PaymentMethod). Set up sampling rules and data cap alerts at 70% threshold.

πŸ’‘

Takeaway: customDimensions keys on high-volume events should have low cardinality (< 1000 unique values). High-cardinality dimensions (IDs, timestamps, free text) should only appear on low-volume telemetry like exceptions or custom events that fire rarely.