
How to Set Up the LGTM Stack with Prometheus and OpenTelemetry
Introduction
Modern applications generate vast amounts of logs, metrics, and traces. To monitor and troubleshoot effectively, we need a powerful observability stack. The LGTM stack (Loki, Grafana, Tempo, and Mimir) combined with Prometheus and OpenTelemetry provides a comprehensive solution for logs, metrics, and tracing.
In this guide, we will explore each component of the stack, its use cases, and how they work together.
Understanding the Components
1. Loki (Log Aggregation)
Loki is a log aggregation system designed for efficiency. Unlike traditional log systems like ELK, Loki indexes metadata instead of log content, making it lightweight and cost-effective.
-
Use Cases:
- Centralized log management
- Debugging application failures
- Searching logs efficiently with LogQL
-
Example LogQL Queries:
{job="nginx"} |= "error" {app="myapp"} | json | request_time > 500
2. Prometheus (Metrics Collection)
Prometheus is a time-series database that collects and stores system and application metrics.
-
Use Cases:
- Monitor system performance (CPU, memory, disk, network)
- Track API latency and error rates
- Alert on abnormal conditions
-
Example PromQL Queries:
# CPU usage node_cpu_seconds_total{mode="user"} / sum(node_cpu_seconds_total) # Memory usage (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes # HTTP request latency rate(http_request_duration_seconds_bucket[5m])
3. Tempo (Distributed Tracing)
Tempo helps track request flows across microservices, enabling efficient debugging.
-
Use Cases:
- Find slow API requests
- Debug request failures across services
- Monitor service dependencies
-
Example in Grafana:
- Trace ID lookup for debugging slow requests
- Service dependency visualization
4. Grafana (Visualization & Alerting)
Grafana is a visualization tool that connects to Loki, Prometheus, and Tempo.
- Use Cases:
- Create interactive dashboards
- Set up alerts based on logs and metrics
- Correlate logs, metrics, and traces
5. OpenTelemetry (Instrumentation & Telemetry)
OpenTelemetry provides a unified approach to collecting logs, metrics, and traces.
-
Use Cases:
- Instrument applications without modifying code
- Process and export data to Loki, Prometheus, and Tempo
- Reduce data storage via sampling
-
Example: Sending Traces with Python
from opentelemetry import trace from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor trace.set_tracer_provider(TracerProvider()) tracer = trace.get_tracer(__name__) exporter = OTLPSpanExporter(endpoint="http://otel-collector:4317") span_processor = BatchSpanProcessor(exporter) trace.get_tracer_provider().add_span_processor(span_processor) with tracer.start_as_current_span("test-span"): print("Hello from OpenTelemetry!")
How LGTM + Prometheus + OpenTelemetry Work Together
Service | Type | Purpose | Use Case |
---|---|---|---|
Loki | Logs | Aggregation & Search | Centralized logging, debugging |
Prometheus | Metrics | Time-series database | Monitor system health, alerting |
Tempo | Tracing | Distributed tracing | Debugging service failures |
Grafana | Visualization | Dashboards & Alerts | Correlation of data |
OpenTelemetry | Instrumentation | Telemetry pipeline | Unifying logs, metrics, traces |
Real-World Use Case Example
Scenario: Monitoring a Microservices-Based E-commerce Platform
-
API Debugging:
- Loki: Search for errors in checkout logs.
- Prometheus: Analyze API response times.
- Tempo: Trace slow transactions.
-
Server Health Monitoring:
- Prometheus: Check CPU/memory usage.
- Grafana: Create alert rules for high usage.
- Loki: Investigate logs for crashes.
-
User Transactions Tracking:
- OpenTelemetry: Capture request flow.
- Tempo: Identify bottlenecks.
- Loki: Retrieve error logs.
Deployment & Next Steps
You can deploy LGTM, Prometheus, and OpenTelemetry using Docker, Kubernetes, or NixOS. Here are some next steps:
✅ Deploy with Kubernetes using Helm. ✅ Enable Alerting in Prometheus/Grafana. ✅ Use Fluent Bit for log collection.
Would you like a step-by-step setup guide? Let me know in the comments! 🚀