Budi Arsana Blog

Data Science Tips

2025-05-01T00:00:00+00:00

log of list data science tips I learned when learning data science.

Panda category

To reduce memory usage, for data that is repeatable or categorical, use dtype category. For example table below

name | status
andrew | married
james | single
barbara | married

the status can be converted to category, so the memory usage will be reduced.

import pandas as pd

df = pd.DataFrame([
    {'name': 'andrew', 'status': 'married'},
    {'name': 'james', 'status': 'single'},
    {'name': 'barbara', 'status': 'married'}
], dtype={'status': 'category'})

# or can be also
df['status'] = df['status'].astype('category')

# check the byte size of the column
print(df['status'].nbytes)

Grafana Loki for Debug Local Log

2025-02-18T00:00:00+00:00

My experience setup and using Grafana Loki for debugging local log.

The issue

For years, I’ve been using AWS CloudWatch & Log Insight to debug my application on AWS infrastructure, it’s works great. But when I’m developing locally, I’m still using Jetbrains IDE to read the log manually and inspect it using editor.

This is not an ideal situation because after several minutes of the development, the log file is getting bigger and bigger, and it’s hard to find the log that I need. Especially when the editor did syntax processing on the lo file, the process will be much slower. This made me need to regularly delete the log file to make it faster.

What I want to achieve

I want to have a tool that can inspect log like what CloudWatch do but locally. I could send the log from local file to AWS CloudWatch using external provider method (like in my older post), but it doesn’t feel efficient.

Thus begin the research to find the suitable tools. Ideally I want the tools to be easy to set up, replicate and remove. With this specification, I already narrowed my options that it must support or use Docker (even if it didn’t have Docker image, I can create one myself).

It should support reading log from multiple applications. I have multiple apps in development, I didn’t want each app will have their own tools to inspect log, as it will make the app development heavier and too much to manage.

Grafana Loki

After searching for a while, I found Grafana Loki intriguing as it’s open source and support multiple log sources. It took me a while to understand which services to use since Grafana has a lot of services and need to understand their terminology. Luckily, Grafana already provide the docker image. So what I need to do is just to configure the docker-compose file and run it.

Below is my docker-compose setup for Grafana Loki while I’m explaining what some line does and describe in simpler terminology for each Grafana service used:

# docker-compose.yml
networks:
  loki:

services:
  # UI to view log
  grafana:
    # This image is just a web interface to used all the grafana service, but the service itself is not here. 
    # We need to use other image to run backend of Grafana service. e.g: Loki
    image: grafana/grafana:11.5.1 
    ports:
      # the port where I will access the Grafana UI
      - "3000:3000"
    networks:
      - loki
    environment:
      - GF_PATHS_PROVISIONING=/etc/grafana/provisioning
      - GF_AUTH_ANONYMOUS_ENABLED=true # since this is a local usage, I don't need authentication, so I can quickly open Grafana.
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
      - GF_FEATURE_TOGGLES_ENABLE=alertingSimplifiedRouting,alertingQueryAndExpressionsStepMode
    entrypoint:
      - sh
      - -euc
      # config the Grafana data source for Loki
      # notice, it's using http://loki:3100, this is because the service name is loki, and it will be resolved to the IP address of the loki service.
      - |
        mkdir -p /etc/grafana/provisioning/datasources
        cat < /etc/grafana/provisioning/datasources/ds.yaml
        apiVersion: 1
        datasources:
        - name: Loki
          type: loki
          access: proxy 
          orgId: 1
          url: http://loki:3100
          basicAuth: false
          isDefault: true
          version: 1
          editable: false
        EOF
        /run.sh

  # the agent to collect the logs
  promtail:
    # Promtail is a service agent that will read the raw log file and then sent it Loki.
    # Note: Promtail is deprecated, but if it works for me, that's fine.
    # good practice to also define the exact image version, in case the newer version has breaking changes. Avoid surprise!
    image: grafana/promtail:3.4 
    volumes:
      # the config file for promtail, I will provide below
      - ./promtail-config.yaml:/etc/promtail/config.yml 
        
      # I have multiple apps which is equal to multiple logs files but I only 1 instance of log inspector to manage all.
      # This is the trick to make it works, I mount the log file from multiple apps to the same promtail container.
      - ~/app-1-laravel/storage/logs:/var/log/app-1-laravel
      - ~/app-2-symfony/var/log:/var/log/app-2-symfony
    command: -config.file=/etc/promtail/config.yml
    networks:
      - loki
  
  # the agent that query the log
  loki:
    # Loki is a service that will store the log and provide the ability to query the log (searching, filtering, etc).
    image: grafana/loki:3.4
    ports:
      # the port I open, identical with grafana service config above.
      - "3100:3100"
    # the config file for Loki, I will provide below
    command: -config.file=/etc/loki/local-config.yaml
    networks:
      - loki

Promtail config

Below is the promtail-config.yaml I mentioned above that I use to configure the promtail agent.

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push # note this is the Loki service URL, it will be resolved to the IP address of the loki service.

scrape_configs:
  - job_name: app-1-laravel
    static_configs:
      - targets:
          - localhost
        labels:
          job: app-1-laravel # label to use to filter later in UI
          __path__: /var/log/app-1-laravel/*log # get all the file with extension log in this directory
    # by default, Loki will try to determine the log fields but from my experience it's only able to determine the datetime & log level automatically.
    # so I need to define the log fields manually. To do that, I will need to use pipeline_stages below. I will give example of the log context and explain the regex.
    pipeline_stages:
      - regex:
          # regex expression to parse the log line into a captured named group.
          expression: '\[(?P[^]]+)\] (?P[^:]+): (?P[^ ]+) "(?P[^"]*)" "(?P[^"]*)" "(?P[^"]*)" (?P{.*})'
      - labels:
          log_date: log_date
          log_app: log_app
          log_ip: log_ip
          log_group: log_group
          log_supplier: log_supplier
          log_message: log_message
          log_context: log_context
  - job_name: app-2-symfony
    static_configs:
      - targets:
          - localhost
        labels:
          job: app-2-symfony
          __path__: /var/log/app-2-symfony/*log

Example of my log single line for app-1-laravel is like below.

[2025-02-18 02:24:14] local.INFO: 192.168.65.1 “StripePaymentIntentCreateHttpRequest” “EC” “http request to create payment intent” {“sessionId”:”f72f438b-494d-4a0d-9d74-f93cec2e492c”,”payload”:{“public_key”:”pk_test_redacted”,”params”:{“amount”:1000,”currency”:”aud”}}} []

We have info of

datetime when the log is created
local.INFO is the app and log level
192.168.65.1 is the user IP address. Obviously, this is the local IP address, but in production, it will be the user IP address.
“StripePaymentIntentCreateHttpRequest” is the group of the log or what it’s doing.
“EC” is the supplier ID. Sometimes it can be empty, since not all process have info of the supplier
“http request to create payment intent” is the descriptive message of the log
{“sessionId”:”f72f438b-494d-4a0d-9d74-f93cec2e492c”, …..} values are the log context. There is sessionId in there is a trick I used to monitor what is happening in a single HTTP request, as in Production with millions of HTTP requests, that will be hard to do.

For sessionId, if you interested in how I built it. Let me know. I will create another post to explain how I built it.

Loki config

Below is the local-config.yaml I mentioned above that I use to configure the Loki service.

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

There is nothing special in the config, I just use the default config that Grafana provided. (I forgot from where I copied it, but it’s from the official documentation, I will update the post if find it).

Conclusion

Once this done, I just need to run docker-compose up -d and I can access the Grafana UI at http://localhost:3000 and start inspecting the log. This process can also replicate to my team member without any hassle as they just need to copy the repo and start the docker compose.

If you found my blog post insightful and valuable, you can support my work with a voluntary contribution. Your support helps sustain independent writing, research, and the continued sharing of high-quality content.

Why Donate?

Encourages the creation of more in-depth, well-researched content.
Helps cover costs like hosting, tools, and time spent on writing.
Supports independent writing without paywalls or intrusive ads.

How It Works:

This is a voluntary contribution with a minimum of $3—you can choose any amount.
100% of your support goes toward improving and expanding my content.
Your contribution is greatly appreciated.

Have a Topic in Mind?
If there’s a specific topic you’d like me to cover, feel free to reach out! You can email me at budi.arsana@bungamata.com, and I’ll consider it for future content.

Support me at https://budiarsana.gumroad.com/coffee

I Developed FeedbackApp.id (BETA), Here is What, Why

2024-09-05T00:00:00+00:00

TLDR

I developed FeedbackApp https://feedbackapp.id. It currently is beta.
Cost-efficient to be accessible for individual and small business.

What is FeedbackApp.id?

FeedbackApp.id is a super simple and straight forward app to build the feedback form and ask feedback to your respondents.

It is designed to be so simple, so you can design your form in less than 10 minutes and start collecting feedback from your respondents.

Focus on collecting the feedback, not designing the form.

Why do I develop FeedbackApp.id?

Personally, I do it to challenge myself to build a product that I own myself.

I have been working as a software engineer since 2010, in that span of years, I feel my soft skill side has improved a lot and I did it by collecting feedback from my peers, my boss, and my team. “What I can do better?” is the question that I always ask.

After collecting the feedback, I compiled it to find out what mostly mentioned and made a plan & goal to improve myself. I feel that this is a very cumbersome process. Why not create a software to automate it? I’m a software engineer after all.

Why “re-invent the wheel”?

There is already open source software to build a feedback form. There are already ready-to-use feedback form services. There are already business map places services that provide reviews.

Why reinvent the wheel?

This is the question I asked myself before I decided to build FeedbackApp.id.

History

Around 2014, one of the clients asked me to build a feedback form for their business processes. I solved it by using existing Open Source software. It’s working even though the process to set up the form is a long process, and I personally feel the client asked too many questions to the respondents which made them reluctant to fill the form.

Around 2023, I was eating in the restaurant and I saw they asking for feedback via scanning QR code. I thought this is a good idea and great that always looking a way to improve their service via feedback. I took the photo and planned to give feedback at back home.

ALAS! how surprised I was when I found out the first question asked is “In which restaurant you eat?”

And there are like 20 options in there. I closed the form. Sorry.

At that moment, I convinced myself that I want to build FeedbackApp.id with these principles:

1. Simplicity

The app must be as simple as possible both for the form designer and the respondent.

If it is not simple for the form designer, they will not use it. If it is not simple for the respondent, they will not fill it.

2. Anonymity

The best feedback is the honest feedback. The respondent might feel uneasy to give feedback if we’re asking who they are. Thus, why I designed it to be anonymous by default, if the respondent wants to give their credentials, they can always do it via the form message, but it’s not mandatory.

3. Cost-efficient

I want to make it accessible for individual and small business. Thus, I designed it to be cost-efficient. As long as it covers the server operational cost and helps you improve yourselves or your business, I’m happy.

4. Continuity

Self-improvement is a continuous process. When we asked for feedback, compiled it, and made a plan to improve. Then on next iteration we asked again for feedback, see if it works, and repeat the process.

What’s next?

I plan to write on how I architecture and develop FeedbackApp.id. Stay tuned!

If you have any feedback for me on these writing or is there any topic that you would like me to write, please help me by giving your feedback here: https://feedbackapp.id/s/11

My Experience Created AWS SAM to build Pipeline to Deploy AppRunner

2024-09-01T00:00:00+00:00

Date: 1 September 2024..

Objective

I have multiple AppRunner in AWS. Currently, I deploy it manually. I plan to build AWS CodePipeline to make it done automatically.

Of course, since I have multiple apps stacks, I prefer not to manage each pipeline manually, because of so, AWS SAM is the best solution.

My experience conclusion so far with AWS

Tried using AWS nested stacks

The reason I’m trying using nested stacks is that SAM pipeline alone config is so long that even took 54 lines

So I’m thinking, if I splice this into multiple nested stacks, it will be more manageable.

Finding out nested stacks makes it harder because of two reasons:

When error happened, it’s not clear what is the cause of the errors because it happens on child stack.
Slower to update stacks, as it even triggers update child stack that is not even updated.

Conclusion:
I back to use SAM single stack. Fast and efficient.
I’m willing to trade long lines for speed and simplicity.

I Learnt how to CodeDeploy AppRunner

CodeDeploy by default did not support to deploy to AppRunner. To do this, I end up using Invoke Lambda.

It involves in the build process it will push new Docker Image tag to AWS ECR and save the image name as artifact.

CodeDeploy Lambda then read the artifact to know the image name, inside Lambda, I use AWS SDK AppRunner to update the config and use this image tag.

Deployment runs quite fast for me, in total less than 5 minutes.

AWS CodeBuild, which builds the Docker Image, took two minutes
AWS CodeDeploy, which invoking Lambda took two minutes.

Watch out, Lambda must report to CodePipeline

I’m using CodeDeploy v2 which we pay based on how long the pipeline ran.

If I use Lambda, I found the Lambda MUST report back to CodePipeline if the job success or failed from context. Just throwing exception won’t do. If I miss this, I could make me over-billed.

Setup AWS ECR image rule

Since I build Docker image for every PR merged to the main branch. I will have a lot of Docker images in ECR and could cause over-bill.

Because of so, I set up a rule to just save the latest 10 Docker Images for deployment, which should be more than enough in case I need to roll back the deployment.

Next step

I have a DB migration script that needs to run after AppRunner updated, need to a way to trigger this.
I remember AppRunner maybe dispatching an event in EventBridge.

Closing

If you are interested to know how, I configure it with source code and explanation. Please let me know in comment, I will find time to share it and make a video for it.

Setup AWS CloudWatch Agent On-Premise Server — Part 2 [END]

2022-06-25T00:00:00+00:00

Tutorial how to add AWS CloudWatch agent in Kubernetes server.

Previous post at PART 1

On part 1 we have successfully tested AWS CloudWatch agent (I will short it as CWA in next mentioned) as a container. Now in this post I will share how I successfully installed in my K8s server.

I will share you my full K8s declarative config first, then I will explain each line that related and important. In this example I using my Symfony application as the app that will write log, you can change it into any of your app, as long as your app write a log file, we can use it and make CWA send log event to AWS.

My K8s full config

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-prod
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api-prod
  template:
    metadata:
      labels:
        app: api-prod
    spec:
      containers:
        - image: your-application-image
          name: web
          envFrom:
            - configMapRef:
                name: api-prod-env
          volumeMounts:
            - mountPath: /app/var/log
              name: app-log
        - image: amazon/cloudwatch-agent:1.247350.0b251814
          name: agent
          volumeMounts:
            - mountPath: /etc/cwagentconfig
              name: agent-config
              readOnly: true
            - mountPath: /log
              readOnly: true
              name: app-log
            - mountPath: /root/.aws
              name: aws-cred
              readOnly: true
      volumes:
        - name: app-log
          emptyDir: { }
        - name: agent-config
          configMap:
            name: api-prod-cwagent
            items:
              - key: cwagentconfig
                path: cwagentconfig
        - name: aws-cred
          configMap:
            name: aws-cred
      terminationGracePeriodSeconds: 60
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: api-prod-cwagent
data:
  cwagentconfig: |
    {
      "agent": {
        "metrics_collection_interval": 60,
        "run_as_user": "root"
      },
      "logs": {
        "logs_collected": {
          "files": {
            "collect_list": [
              {
                "file_path": "/log/app.log",
                "log_group_name": "api-prod",
                "log_stream_name": "api-prod-{hostname}",
                "timestamp_format" :"[%Y-%m-%dT%H:%M:%S.%f%z]",
                "retention_in_days": 365
              }
            ]
          }
        }
      }
    }
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-cred
data:
  config: |
    [profile AmazonCloudWatchAgent]
    output = text
    region = ap-southeast-1
  credentials: |
    [AmazonCloudWatchAgent]
    aws_access_key_id = XXX-XXX
    aws_secret_access_key = XXX-XXX

Deployment config explanation

Container config

If you follow my previous post, I decided to config CWA installed on each pod. This may be not optimal as we could end up having too many CWA agent, but this should work for now and I prefer to keep it simple for starter and proof of concept.

I will explain each relevant config by it excerpt.

        - image: your-application-image
          name: web
          envFrom:
            - configMapRef:
                name: api-prod-env
          volumeMounts:
            - mountPath: /app/var/log
              name: app-log```

This if my core container app, this image contain my full application. It read envFrom K8s configMapRef, I will not explain in detail for each as each app is different and not relevant with CWA installation.

The interesting part is “volumeMounts”, my core container app write log to file “/app/var/log/app.log” which this file is then synced with other container in CWA, so CWA have capabilities to read the log file from different container (decoupled concept).

        - image: amazon/cloudwatch-agent:1.247350.0b251814
          name: agent
          volumeMounts:
            - mountPath: /etc/cwagentconfig
              name: agent-config
              readOnly: true
            - mountPath: /log
              readOnly: true
              name: app-log
            - mountPath: /root/.aws
              name: aws-cred
              readOnly: true

Above is the CWA container, there are only “volumeMounts” which act ac the container config. I will explain each in detail below.

            - mountPath: /etc/cwagentconfig
              name: agent-config
              readOnly: true

This is the CWA core config. It configure how the CWA should act like which file to read. If you read my previous, this is the “cwagentconfig.cnf” file.

            - mountPath: /log
              readOnly: true
              name: app-log

Above is the volume of the log file, which it synced the core app container. Thing to note here, if you config the CWA to delete the log lines once it send to AWS CloudWatch, you may want to set “readOnly: false”.

            - mountPath: /root/.aws
              name: aws-cred
              readOnly: true

Above is your AWS CWA agent credentials where the container will read it from file. It contains the “aws_access_key_id” & “aws_secret_access_key”.

Deployment volume config

        volumes:
        - name: app-log
          emptyDir: { }
        - name: agent-config
          configMap:
            name: api-prod-cwagent
            items:
              - key: cwagentconfig
                path: cwagentconfig
        - name: aws-cred
          configMap:
            name: aws-cred

Above is the deployment volume config. I will explain each below

- name: app-log
  emptyDir: { }

Above the is “app-log”, i decided to go with “emptyDir” volume because the content of this volume is not important to keep once the log sent to AWS CloudWatch.

        - name: agent-config
          configMap:
            name: api-prod-cwagent
            items:
              - key: cwagentconfig
                path: cwagentconfig

Above is CWA config, since K8s support using “configMap” as “file”. I decided to use configMap to keep all configs in a single file rather than reference it to external file. Just to keep thing simple and have everything in a single deployment file.

        - name: aws-cred
          configMap:
            name: aws-cred

Above is the AWS credentials file. Similar like “agent-config”, this is the “configMap” as “file”

Pod termination timing

terminationGracePeriodSeconds: 60

I set the “terminationGracePeriodSeconds: 60” to follow the CWA “metrics_collection_interval: 60” interval, so when the pod replaced with new deployment, it will have grace period for 60 seconds to allow CWA send the log the AWS Cloudwatch, you may want to increase this value a little bit, e.g: by 10 seconds to be safe.

ConfigMap “api-prod-cwagent”

apiVersion: v1
kind: ConfigMap
metadata:
  name: api-prod-cwagent
data:
  cwagentconfig: |
    {
      "agent": {
        "metrics_collection_interval": 60,
        "run_as_user": "root"
      },
      "logs": {
        "logs_collected": {
          "files": {
            "collect_list": [
              {
                "file_path": "/log/app.log",
                "log_group_name": "api-prod",
                "log_stream_name": "api-prod-{hostname}",
                "timestamp_format" :"[%Y-%m-%dT%H:%M:%S.%f%z]",
                "retention_in_days": 365
              }
            ]
          }
        }
      }
    }

Above is the ConfigMap of the CWA. It’s basically using K8s capability to create a file from a config map. The config for CWA has been explained in post 1, I will not explain again here.

ConfigMap “aws-cred”

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-cred
data:
  config: |
    [profile AmazonCloudWatchAgent]
    output = text
    region = ap-southeast-1
  credentials: |
    [AmazonCloudWatchAgent]
    aws_access_key_id = XXX-XXX
    aws_secret_access_key = XXX-XXX

Above are the AWS credentials, you need to replace the value of access key id and secret access key with yours. Then K8s will create a file from this config map.

That’s pretty much what I did to successfully setup CWA in K8s. If you have any questions or other strategies how to set in K8s, please let me know in comment.

And last, I apologize for the delay writing this post final part. I actually already resolved it a week after the first post, but only recently got the time to write this post.

Thank you.

Setup AWS CloudWatch Agent On-Premise Server — Part 1

2022-05-12T00:00:00+00:00

Tutorial how to set up AWS CloudWatch agent with on-premise server so we can send logs from server outside AWS.

The second and final part is available here PART 2

Background story why i need it

I have Kubernetes (K8s) server in Digital Ocean, currently i have difficulty on how i can persist from k8s pods as we know container storage is ephemeral and how i will examine the logs as it will require another tool to read it easily.

For temporary solution, i have micro web service as the central for log storage called “Tracing” and my applications will send the logs to Tracing via HTTP request instantly for each log. Then soon i realized this method is too expensive and slow as each HTTP call cost time.

Thus why i need something faster yet minimum maintenance. Since i got quite familiar AWS CloudWatch, i would like to use it as my central log, but since my K8s server is outside of AWS, i need to find out how to do it. Luckly AWS have prepared AWS CloudWatch agent which it will act as an agent (hence its name) to send the logs to AWS.

Options to install CloudWatch Agent

There are 2 options i knew how to install CloudWatch agent with my familiarity with Docker & K8s.

Install it as binary/service manually in my app operating system in each container. I tried this then found how complicated it’s and no luck to make it works. Then i realized if i do it this way, i will need to manage the CloudWatch agent for each of my containers which will cost me another overhead. I drop this idea. Install it as container using AWS provided Docker image as in this https://hub.docker.com/r/amazon/cloudwatch-agent.

The problem with this image is it minimum documentation focused on using this Docker Image 😢 So on this part 1, i want to describe how i successfully do it with container. Since i successfully did it by run trial and error reading each container error message and search on Google to make sense of it. I hope this post will help you and save your time.

Prerequisites

Before you can start following this tutorial, below is some prerequisites that you must have:

AWS account (obviously)
AWS IAM account with access to create log group and stream. You will need to have it access key and secret. Let me know if you need my help to describe IAM access specs.
Knowledge how to use Docker.

Docker Compose

In this tutorial, i will use Docker Compose instead of pure Docker CLI. Since i feel it’s easier for me to record the change i made and git commit for history.

This is the config of my docker-composer.yml

version: "3.8"
services:
  agent:
    image: amazon/cloudwatch-agent
    volumes:
      - ./config/log-collect.json:/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json
      - ./aws:/root/.aws
      - ./log:/log
      - ./etc:/opt/aws/amazon-cloudwatch-agent/etc

Do not run docker-compose up yet as still didn’t have the sync volume files which it the important file that i will explain below.

CloudWatch agent config

CloudWatch agent config is the configuration that describe how the agent will collect the log from the container and send it to AWS. In my context it represented in as file ./config/log-collect.json then synced to the container.

Below is the content of that file

{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "root"
  },
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [
          {
            "file_path": "/log/app.log",
            "log_group_name": "container-aws-logs-test",
            "log_stream_name": "{hostname}",
            "timestamp_format" :"[%Y-%m-%dT%H:%M:%S.%f%z]",
            "retention_in_days": 30
          }
        ]
      }
    }
  }
}

You can read the config structure docs at https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html

I will explain in detail the keypoints of my config

“metrics_collection_interval” describe in what every seconds the agent will check the log file and send to AWS.
60 seconds is the default. Do not set it too soon if not needed as it will increase the charge of AWS server. “run_as_user” i set it as root. Feel free to set it as any user other than root if you want to have it extra secure.
I did not collecting “log_stream_name” as it run in stand alone container, so it didn’t make sense to capture the metric of AWS CloudWatch agent container itself. And since I’m using K8s auto scaling, this is not an important issue for me to monitor for now.
“collect_list” describe which files the agent will watch and send to AWS. The interesting part in there, this is an array of object. Which mean we possibly can describe multiple files to watch then only need to have 1 agent or 1 container to watch all my pods for saving computing resource purpose.
“file_path” this describe which file the agent will read.
“log_group_name” & “log_stream_name” describe to which group and stream this log will belong in AWS CloudWatch.
“timestamp_format” describe your logs timestamp. If it match, CloudWatch will use your log timestamp, and if not, it will use current timestamp. I found difficulty when configuring and testing which will i describe how i solve it below.
“retention_in_days” describe how log the log will persist. 30 days is enough for testing.

AWS Credentials

AWS credentials will be used to authenticate to be able to send log to your aws account. It represented as folder ./aws, inside that folder i have 2 AWS standard credentials files like below:

./aws/config

[profile AmazonCloudWatchAgent]
output = text
region = ap-southeast-1

it’s important to name it as “AmazonCloudWatchAgent” as the container designed to use this profile. Change the region into your AWS region or any aws region you want to record the log.

./aws/credentials

[AmazonCloudWatchAgent]
aws_access_key_id = IAM_ID
aws_secret_access_key = IAM_KEY

Again, it’s important to name the profile AmazonCloudWatchAgent.

And that’s it for aws credentials.

Log file

This is not required in production but good to do to test if the container run in development before we go to production and help you understand the concept of how the agent monitor the file.

./log/app.log it contains logs file from Symfony application with a modified timestamp (as i mentioned above, I got difficulty with timestamp). Note the filename is “app.log” which match with the JSON config from file ./config/log-collect.json that i configured above.

My content of log directory is a below

[2022-05-14T17:32:19.826204+0000] app.INFO: budi test {"budi":"foo"}

The content is pretty simple.

Time to test

With all that configured, now you’re ready to start the docker with command “docker compose up”. After it run, examine the output in the terminal, then check in your AWS account CloudWatch group “container-aws-logs-test” to see if it successfully delivered.

If you’re facing any error, it most likely due to IAM account used didn’t have access to create log.

Now try to add any new line in file `./log/app.log”, wait 60 seconds and see it it delivered.

Now you understand the concept how the agent works. CONGRATULATION!

Bonus: Debugging

As i mentioned earlier, i have difficulty on how to setup the timestamp and there is no way for me to SSH into the container as it didn’t have “bash” nor “sh” so i unable to “docker compose exec sh” into container. And there is now way for me to install the “bash” as the container not even have “apt-get” command, i guess it’s because the image is created using container_linux:go. I don’t know how AWS did it or build the image, it’s something that i need to learn in the future.

After Google-fu for a moment and examining the terminal output, i learnt that the Agent config is located at “/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml” inside the container. And since i’m unable to SSH into the container to view the file, i got idea to sync that file into my local computer. Hence you see the config sync volume of

volumes:
- ....
- ./etc:/opt/aws/amazon-cloudwatch-agent/etc

now if you read the file “./etc/amazon-cloudwatch-agent.toml” in your local file, i found this interesting lines

timestamp_layout = "[2006-01-02T15:04:05..000-0700]"
timestamp_regex = "(\\[\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.(\\d{1,9})[\\+-]\\d{4}\\])"

Aha! this is the timestamp config used by the Agent. But my problem not end there as then i found out the “timestamp_layout” example as not accurate 😢. Notice the timestamp between second and microseconds/milliseconds “05..000” there are 2 dots (..). This is inaccurate as if looks from the regex it only check for 1 dot.

So my suggest is to stick with test it with regex, you can test your log file input with regex validation tool at https://regex101.com/.

Next part

As this proof of concept success. My next step will be how to install this into K8s. Since this run as container, I have two options to do it.

Have CloudWatch Agent as single independent pod then all my apps will have shared storage to this Agent pod. So the Agent pod will read all the storage files provided by all app pods. The pros: saving computing resources. The cons: I will need to update the config of this Agent pod whenever i add new app and this potentially will have lot of configuration.
Have CloudWatch Agent installed as container in the same pod with the app. So each app have it own Agent. The pros: the agent have small config and the config will also simpler. The cons: every app will have it own agent and when horizontal scaling happened, the agent will also replicated, my concern is with computing resource. But i guess it will quite small for starter.
At the moment i lean to option 2. Still not sure, but i will try that first and will let you know how it goes in next post. Once i did it, i will update this post to link to the next post.

Stay tune…

Thank you for reading ✌️