Real-Time ALB Log Analysis for Proactive Integration Recovery via Datadog Monitors, Workflows and AWS Lambda
Real-time ALB Log Analysis for Proactive Integration Recovery with Datadog, Workflows and AWS Lambda In a world of constant change and unexpected events, the ability to adapt to new...

Real-time ALB Log Analysis for Proactive Integration Recovery with Datadog, Workflows and AWS Lambda
In a world of constant change and unexpected events, the ability to adapt to new conditions is crucial. In Brazil, there’s a term “gambiarra” that signifies the art of improvisation and creative problem-solving. This approach proved useful in our work with AWS Application Load Balancer (ALB) and Elastic Load Balancer (ELB) logs. Let’s examine how we were able to improve log monitoring and analysis, and how this contributed to proactive integration recovery.
The Problem: Lack of Visibility
One of the primary challenges we faced was poor visibility into Google Calendar events. Users encountered issues, and we struggled to pinpoint who was experiencing difficulties. The main errors revolved around cha
els "ignoring" requests, or events not reaching the API. The worst part was the lack of access logs on our ELB/ALB, leaving us “in the dark.”
The Solution: Adding Access Logs
As a solution, we decided to add access logs for ALB/ELB. This idea came to us as we realized we were already using Datadog for monitoring. Thus, our project was born, developed using Pulumi, which easily manages infrastructure as code (IaC). If you haven’t worked with Pulumi before, we highly recommend checking out their documentation.
What are Access Logs?
Access logs are akin to a security guard’s logbook at a crossroads, recording information about every request passing through the Load Balancer. Each log entry contains crucial data: when the request was made, who initiated it, and so on. An example entry looks like this:
127.0.0.1 - - [12/Dec/2023:11:16:03 +0000] "GET /api/event HTTP/1.1" 200 1024
Here, we see the user’s IP address, request time, and HTTP method along with the response code.
Approach to Adding Access Logs
Step 1: Configuring Pulumi
The first step in our project was adding access logs in Pulumi. Below is an example code snippet in TypeScript that performs the configuration.
import * as aws from "@pulumi/aws";
const loadBalancer = new aws.lb.LoadBalancer("example-lb", {
internal: false,
loadBalancerType: "application",
securityGroups: [mySecurityGroup.id],
subnets: mySubnetIds,
enableDeletionProtection: false,
});
// Add access logs to ALB
const accessLogs = new aws.lb.LogConfiguration("example-logs", {
bucket: myBucket.id,
prefix: "access-logs/",
loadBalancerArn: loadBalancer.arn,
});
Step 2: Integrating with Datadog
After configuring the access logs, we integrated them with Datadog for effective monitoring. Datadog allows you to configure alerts and visualizations based on these logs.
Example of Configuring a Datadog Monitor:
const monitor = new datadog.Monitor("example-monitor", {
type: "log alert",
query: "service:my-service status:404",
name: "HTTP 404 Errors",
message: "High number of 404 errors detected!",
options: {
thresholds: {
critical: 100,
},
},
});
Step 3: Automation with AWS Lambda
To make the process even more automated, we integrated AWS Lambda functions to process the logs in real time. When new logs arrive, Lambda automatically handles them and sends notifications or performs other actions.
Example implementation of a Lambda function:
exports.handler = async (event) => {
// Process the received logs
const logs = event.Records[0].Sns.Message;
// Processing logic
console.log(logs);
// Send notifications or perform actions
if (logs.contains("error")) {
await sendAlert(logs); // Function to send an alert
}
};
Practical Tips
- Plan Ahead: Before implementing logging, clarify which metrics and data are truly important to you.
- Automation: Use AWS Lambda to process logs in real time. This will help reduce time spent on manual operations and avoid errors.
- Monitoring and Analytics: Utilize tools like Datadog to visualize logs and identify anomalies. Configure alerts to be notified of issues.
- Testing and Debugging: Don't forget to test your solutions for efficiency. Constant monitoring of "graphs" and logs will provide a more complete picture of what’s happening.
Conclusion
Adding ALB/ELB access logs to Datadog was an important step towards improving integration monitoring and analysis. Using tools like Pulumi and AWS Lambda streamlines the team’s work and achieves greater transparency in systems. The interaction between various services enabled proactive problem solving, which is so important in today’s environment.