<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://gusdecool.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://gusdecool.github.io/" rel="alternate" type="text/html" /><updated>2025-05-01T08:42:57+00:00</updated><id>https://gusdecool.github.io/feed.xml</id><title type="html">Budi Arsana Blog</title><subtitle>Software engineer specialized in Backend API. e.g: REST API.</subtitle><author><name>Budi Arsana</name></author><entry><title type="html">Data Science Tips</title><link href="https://gusdecool.github.io/2025/05/01/data-science-tips.html" rel="alternate" type="text/html" title="Data Science Tips" /><published>2025-05-01T00:00:00+00:00</published><updated>2025-05-01T00:00:00+00:00</updated><id>https://gusdecool.github.io/2025/05/01/data-science-tips</id><content type="html" xml:base="https://gusdecool.github.io/2025/05/01/data-science-tips.html"><![CDATA[<blockquote>
  <p>log of list data science tips I learned when learning data science.</p>
</blockquote>

<h2 id="panda-category">Panda category</h2>
<p>To reduce memory usage, for data that is repeatable or categorical, use dtype category.
For example table below</p>

<pre><code class="language-table">name | status
andrew | married
james | single
barbara | married
</code></pre>

<p>the status can be converted to category, so the memory usage will be reduced.</p>
<pre><code class="language-python">import pandas as pd

df = pd.DataFrame([
    {'name': 'andrew', 'status': 'married'},
    {'name': 'james', 'status': 'single'},
    {'name': 'barbara', 'status': 'married'}
], dtype={'status': 'category'})

# or can be also
df['status'] = df['status'].astype('category')

# check the byte size of the column
print(df['status'].nbytes)
</code></pre>]]></content><author><name>Budi Arsana</name></author><category term="Other" /><summary type="html"><![CDATA[log of list data science tips I learned when learning data science.]]></summary></entry><entry><title type="html">Grafana Loki for Debug Local Log</title><link href="https://gusdecool.github.io/2025/02/18/grafana-loki-for-debug-local-log.html" rel="alternate" type="text/html" title="Grafana Loki for Debug Local Log" /><published>2025-02-18T00:00:00+00:00</published><updated>2025-02-18T00:00:00+00:00</updated><id>https://gusdecool.github.io/2025/02/18/grafana-loki-for-debug-local-log</id><content type="html" xml:base="https://gusdecool.github.io/2025/02/18/grafana-loki-for-debug-local-log.html"><![CDATA[<blockquote>
  <p>My experience setup and using Grafana Loki for debugging local log.</p>
</blockquote>

<h2 id="the-issue">The issue</h2>
<p>For years, I’ve been using AWS CloudWatch &amp; Log Insight to debug my application on AWS infrastructure, it’s works great. 
But when I’m developing locally, I’m still using Jetbrains IDE to read the log manually and inspect it using editor.</p>

<p>This is not an ideal situation because after several minutes of the development, the log file is getting bigger and bigger, and it’s hard to find the log that I need.
Especially when the editor did syntax processing on the lo file, the process will be much slower. This made me need to regularly delete the log file to make it faster.</p>

<h2 id="what-i-want-to-achieve">What I want to achieve</h2>
<p>I want to have a tool that can inspect log like what CloudWatch do but locally. 
I could send the log from local file to AWS CloudWatch using external provider method (like in my older post), but it doesn’t feel efficient.</p>

<p>Thus begin the research to find the suitable tools. 
Ideally I want the tools to be easy to set up, replicate and remove. 
With this specification, I already narrowed my options that it must support or use Docker (even if it didn’t have Docker image, I can create one myself).</p>

<p>It should support reading log from multiple applications. 
I have multiple apps in development, I didn’t want each app will have their own tools to inspect log, as it will make the app development heavier and too much to manage.</p>

<h2 id="grafana-loki">Grafana Loki</h2>
<p>After searching for a while, I found Grafana Loki intriguing as it’s open source and support multiple log sources.
It took me a while to understand which services to use since Grafana has a lot of services and need to understand their terminology.
Luckily, Grafana already provide the docker image. So what I need to do is just to configure the docker-compose file and run it.</p>

<p>Below is my docker-compose setup for Grafana Loki while I’m explaining what some line does and describe in simpler terminology for each Grafana service used:</p>

<pre><code class="language-yaml"># docker-compose.yml
networks:
  loki:

services:
  # UI to view log
  grafana:
    # This image is just a web interface to used all the grafana service, but the service itself is not here. 
    # We need to use other image to run backend of Grafana service. e.g: Loki
    image: grafana/grafana:11.5.1 
    ports:
      # the port where I will access the Grafana UI
      - "3000:3000"
    networks:
      - loki
    environment:
      - GF_PATHS_PROVISIONING=/etc/grafana/provisioning
      - GF_AUTH_ANONYMOUS_ENABLED=true # since this is a local usage, I don't need authentication, so I can quickly open Grafana.
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
      - GF_FEATURE_TOGGLES_ENABLE=alertingSimplifiedRouting,alertingQueryAndExpressionsStepMode
    entrypoint:
      - sh
      - -euc
      # config the Grafana data source for Loki
      # notice, it's using http://loki:3100, this is because the service name is loki, and it will be resolved to the IP address of the loki service.
      - |
        mkdir -p /etc/grafana/provisioning/datasources
        cat &lt;&lt;EOF &gt; /etc/grafana/provisioning/datasources/ds.yaml
        apiVersion: 1
        datasources:
        - name: Loki
          type: loki
          access: proxy 
          orgId: 1
          url: http://loki:3100
          basicAuth: false
          isDefault: true
          version: 1
          editable: false
        EOF
        /run.sh

  # the agent to collect the logs
  promtail:
    # Promtail is a service agent that will read the raw log file and then sent it Loki.
    # Note: Promtail is deprecated, but if it works for me, that's fine.
    # good practice to also define the exact image version, in case the newer version has breaking changes. Avoid surprise!
    image: grafana/promtail:3.4 
    volumes:
      # the config file for promtail, I will provide below
      - ./promtail-config.yaml:/etc/promtail/config.yml 
        
      # I have multiple apps which is equal to multiple logs files but I only 1 instance of log inspector to manage all.
      # This is the trick to make it works, I mount the log file from multiple apps to the same promtail container.
      - ~/app-1-laravel/storage/logs:/var/log/app-1-laravel
      - ~/app-2-symfony/var/log:/var/log/app-2-symfony
    command: -config.file=/etc/promtail/config.yml
    networks:
      - loki
  
  # the agent that query the log
  loki:
    # Loki is a service that will store the log and provide the ability to query the log (searching, filtering, etc).
    image: grafana/loki:3.4
    ports:
      # the port I open, identical with grafana service config above.
      - "3100:3100"
    # the config file for Loki, I will provide below
    command: -config.file=/etc/loki/local-config.yaml
    networks:
      - loki
</code></pre>

<h2 id="promtail-config">Promtail config</h2>
<p>Below is the <code>promtail-config.yaml</code> I mentioned above that I use to configure the promtail agent.</p>

<pre><code class="language-yaml">server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push # note this is the Loki service URL, it will be resolved to the IP address of the loki service.

scrape_configs:
  - job_name: app-1-laravel
    static_configs:
      - targets:
          - localhost
        labels:
          job: app-1-laravel # label to use to filter later in UI
          __path__: /var/log/app-1-laravel/*log # get all the file with extension log in this directory
    # by default, Loki will try to determine the log fields but from my experience it's only able to determine the datetime &amp; log level automatically.
    # so I need to define the log fields manually. To do that, I will need to use pipeline_stages below. I will give example of the log context and explain the regex.
    pipeline_stages:
      - regex:
          # regex expression to parse the log line into a captured named group.
          expression: '\[(?P&lt;log_date&gt;[^]]+)\] (?P&lt;log_app&gt;[^:]+): (?P&lt;log_ip&gt;[^ ]+) "(?P&lt;log_group&gt;[^"]*)" "(?P&lt;log_supplier&gt;[^"]*)" "(?P&lt;log_message&gt;[^"]*)" (?P&lt;log_context&gt;{.*})'
      - labels:
          log_date: log_date
          log_app: log_app
          log_ip: log_ip
          log_group: log_group
          log_supplier: log_supplier
          log_message: log_message
          log_context: log_context
  - job_name: app-2-symfony
    static_configs:
      - targets:
          - localhost
        labels:
          job: app-2-symfony
          __path__: /var/log/app-2-symfony/*log
</code></pre>

<p>Example of my log single line for <code>app-1-laravel</code> is like below.</p>

<blockquote>
  <p>[2025-02-18 02:24:14] local.INFO: 192.168.65.1 “StripePaymentIntentCreateHttpRequest” “EC” “http request to create payment intent” 
{“sessionId”:”f72f438b-494d-4a0d-9d74-f93cec2e492c”,”payload”:{“public_key”:”pk_test_redacted”,”params”:{“amount”:1000,”currency”:”aud”}}} []</p>
</blockquote>

<p>We have info of</p>
<ol>
  <li>datetime when the log is created</li>
  <li>local.INFO is the app and log level</li>
  <li>192.168.65.1 is the user IP address. Obviously, this is the local IP address, but in production, it will be the user IP address.</li>
  <li>“StripePaymentIntentCreateHttpRequest” is the group of the log or what it’s doing.</li>
  <li>“EC” is the supplier ID. Sometimes it can be empty, since not all process have info of the supplier</li>
  <li>“http request to create payment intent” is the descriptive message of the log</li>
  <li>{“sessionId”:”f72f438b-494d-4a0d-9d74-f93cec2e492c”, …..} values are the log context. There is sessionId in there is a trick I used to monitor what is happening in a single HTTP request,
as in Production with millions of HTTP requests, that will be hard to do.</li>
</ol>

<p>For sessionId, if you interested in how I built it. Let me know. I will create another post to explain how I built it.</p>

<h2 id="loki-config">Loki config</h2>
<p>Below is the <code>local-config.yaml</code> I mentioned above that I use to configure the Loki service.</p>

<pre><code class="language-yaml">server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093
</code></pre>

<p>There is nothing special in the config, I just use the default config that Grafana provided. 
(I forgot from where I copied it, but it’s from the official documentation, I will update the post if find it).</p>

<h2 id="conclusion">Conclusion</h2>
<p>Once this done, I just need to run <code>docker-compose up -d</code> and I can access the Grafana UI at <code>http://localhost:3000</code> and start inspecting the log.
This process can also replicate to my team member without any hassle as they just need to copy the repo and start the docker compose.</p>

<hr />
<p>If you found my blog post insightful and valuable, you can support my work with a voluntary contribution. 
Your support helps sustain independent writing, research, and the continued sharing of high-quality content.</p>

<p><strong>Why Donate?</strong></p>
<ul>
  <li>Encourages the creation of more in-depth, well-researched content.</li>
  <li>Helps cover costs like hosting, tools, and time spent on writing.</li>
  <li>Supports independent writing without paywalls or intrusive ads.</li>
</ul>

<p><strong>How It Works:</strong></p>
<ul>
  <li>This is a voluntary contribution with a minimum of $3—you can choose any amount.</li>
  <li>100% of your support goes toward improving and expanding my content.</li>
  <li>Your contribution is greatly appreciated.</li>
</ul>

<p><strong>Have a Topic in Mind?</strong><br />
If there’s a specific topic you’d like me to cover, feel free to reach out! You can email me at <a href="mailto:budi.arsana@bungamata.com">budi.arsana@bungamata.com</a>, and I’ll consider it for future content.</p>

<p>Support me at <a href="https://budiarsana.gumroad.com/coffee">https://budiarsana.gumroad.com/coffee</a></p>]]></content><author><name>Budi Arsana</name></author><category term="Other" /><summary type="html"><![CDATA[My experience setup and using Grafana Loki for debugging local log.]]></summary></entry><entry><title type="html">I Developed FeedbackApp.id (BETA), Here is What, Why</title><link href="https://gusdecool.github.io/2024/09/05/I-developed-feedback-app-what-why.html" rel="alternate" type="text/html" title="I Developed FeedbackApp.id (BETA), Here is What, Why" /><published>2024-09-05T00:00:00+00:00</published><updated>2024-09-05T00:00:00+00:00</updated><id>https://gusdecool.github.io/2024/09/05/I-developed-feedback-app-what-why</id><content type="html" xml:base="https://gusdecool.github.io/2024/09/05/I-developed-feedback-app-what-why.html"><![CDATA[<h2 id="tldr">TLDR</h2>
<ol>
  <li>I developed FeedbackApp <a href="https://feedbackapp.id">https://feedbackapp.id</a>. It currently is beta.</li>
  <li>Cost-efficient to be accessible for individual and small business.</li>
</ol>

<h2 id="what-is-feedbackappid">What is FeedbackApp.id?</h2>
<p>FeedbackApp.id is a super simple and straight forward app to build the feedback form and ask feedback to 
your respondents.</p>

<p>It is designed to be so simple, so you can design your form in less than <strong>10 minutes</strong> and start collecting
feedback from your respondents.</p>

<p><strong>Focus on collecting the feedback, not designing the form.</strong></p>

<h2 id="why-do-i-develop-feedbackappid">Why do I develop FeedbackApp.id?</h2>
<p>Personally, I do it to challenge myself to build a product that I own myself.</p>

<p>I have been working as a software engineer since 2010, in that span of years, I feel my soft skill side has
improved a lot and I did it by 
collecting feedback from my peers, my boss, and my team. <strong>“What I can do better?”</strong> is the question that I always ask.</p>

<p>After collecting the feedback, I compiled it to find out what mostly mentioned
and made a plan &amp; goal to improve myself. 
I feel that this is a very cumbersome process. <strong>Why not create a software to automate it?</strong> I’m a software engineer 
after all.</p>

<h2 id="why-re-invent-the-wheel">Why “re-invent the wheel”?</h2>
<p>There is already open source software to build a feedback form.
There are already ready-to-use feedback form services.
There are already business map places services that provide reviews.</p>

<p>Why reinvent the wheel?</p>

<p>This is the question I asked myself before I decided to build FeedbackApp.id.</p>

<h3 id="history">History</h3>
<p>Around 2014, one of the clients asked me to build a feedback form for their business processes. I solved it
by using existing Open Source software. It’s working even though the process to set up the form is a long process, and
I personally feel the client asked too many questions to the respondents which made them reluctant to fill the form.</p>

<p>Around 2023, I was eating in the restaurant and I saw they asking for feedback via scanning QR code.
I thought this is a good idea and great that always looking a way to improve their service via feedback. 
I took the photo and planned to give feedback at back home.</p>

<p>ALAS! how surprised I was when I found out the first question asked is “In which restaurant you eat?”</p>

<p>And there are like 20 options in there. I closed the form. Sorry.</p>

<p>At that moment, I convinced myself that I want to build FeedbackApp.id with these principles:</p>

<h3 id="1-simplicity">1. Simplicity</h3>
<p>The app must be as simple as possible both for the form designer and the respondent.</p>

<p>If it is not simple for the form designer, they will not use it.
If it is not simple for the respondent, they will not fill it.</p>

<h3 id="2-anonymity">2. Anonymity</h3>
<p><strong>The best feedback is the honest feedback</strong>. The respondent might feel uneasy to give feedback if 
we’re asking who they are. Thus, why I designed it to be anonymous by default, if the respondent wants to give their
credentials, they can always do it via the form message, but it’s not mandatory.</p>

<h3 id="3-cost-efficient">3. Cost-efficient</h3>
<p>I want to make it accessible for individual and small business. Thus, I designed it to be cost-efficient. As long
as it covers the server operational cost and helps you improve yourselves or your business, I’m happy.</p>

<h3 id="4-continuity">4. Continuity</h3>
<p>Self-improvement is a continuous process. When we asked for feedback, compiled it, and made a plan to improve. Then on
next iteration we asked again for feedback, see if it works, and repeat the process.</p>

<h2 id="whats-next">What’s next?</h2>
<p>I plan to write on how I architecture and develop FeedbackApp.id. Stay tuned!</p>

<hr />
<p>If you have any feedback for me on these writing or is there any topic 
that you would like me to write, 
please help me by giving your feedback here: <a href="https://feedbackapp.id/s/11">https://feedbackapp.id/s/11</a></p>]]></content><author><name>Budi Arsana</name></author><category term="Other" /><summary type="html"><![CDATA[TLDR I developed FeedbackApp https://feedbackapp.id. It currently is beta. Cost-efficient to be accessible for individual and small business.]]></summary></entry><entry><title type="html">My Experience Created AWS SAM to build Pipeline to Deploy AppRunner</title><link href="https://gusdecool.github.io/2024/09/01/my-experience-aws-sam-pipeline-nested-stacks.html" rel="alternate" type="text/html" title="My Experience Created AWS SAM to build Pipeline to Deploy AppRunner" /><published>2024-09-01T00:00:00+00:00</published><updated>2024-09-01T00:00:00+00:00</updated><id>https://gusdecool.github.io/2024/09/01/my-experience-aws-sam-pipeline-nested-stacks</id><content type="html" xml:base="https://gusdecool.github.io/2024/09/01/my-experience-aws-sam-pipeline-nested-stacks.html"><![CDATA[<p><strong>Date: 1 September 2024.</strong>. 
<img src="../image/aws-sam-nested-stack-change-my-mind.jpg" alt="Change My Mind" /></p>

<h2 id="objective">Objective</h2>
<p>I have multiple AppRunner in AWS. Currently, I deploy it manually. I plan to build AWS CodePipeline to make it done
automatically.</p>

<p>Of course, since I have multiple apps stacks, I prefer not to manage each pipeline manually, because of so, AWS SAM is
the best solution.</p>

<p>My experience conclusion so far with AWS</p>

<h3 id="tried-using-aws-nested-stacks">Tried using AWS nested stacks</h3>
<p>The reason I’m trying using nested stacks is that SAM pipeline alone config is so long that even took <strong>54 lines</strong></p>

<p>So I’m thinking, if I splice this into multiple nested stacks, it will be more manageable.</p>

<p>Finding out nested stacks makes it harder because of two reasons:</p>
<ol>
  <li>When error happened, it’s not clear what is the cause of the errors because it happens on child stack.</li>
  <li>Slower to update stacks, as it even triggers update child stack that is not even updated.</li>
</ol>

<p><strong>Conclusion:</strong> <br />
I back to use SAM single stack. Fast and efficient. <br />
I’m willing to trade long lines for speed and simplicity.</p>

<h3 id="i-learnt-how-to-codedeploy-apprunner">I Learnt how to CodeDeploy AppRunner</h3>
<p>CodeDeploy by default did not support to deploy to AppRunner. To do this, I end up using Invoke Lambda.</p>

<p>It involves in the build process it will push new Docker Image tag to AWS ECR and save the image name as artifact.</p>

<p>CodeDeploy Lambda then read the artifact to know the image name, inside Lambda, I use AWS SDK AppRunner to update the 
config and use this image tag.</p>

<p>Deployment runs quite fast for me, in total less than 5 minutes.</p>

<ol>
  <li>AWS CodeBuild, which builds the Docker Image, took two minutes</li>
  <li>AWS CodeDeploy, which invoking Lambda took two minutes.</li>
</ol>

<h3 id="watch-out-lambda-must-report-to-codepipeline">Watch out, Lambda must report to CodePipeline</h3>
<p>I’m using CodeDeploy v2 which we pay based on how long the pipeline ran.</p>

<p>If I use Lambda, I found the Lambda <strong>MUST report back</strong> to CodePipeline if the job success or failed from context. Just
throwing exception won’t do. If I miss this, I could make me over-billed.</p>

<h3 id="setup-aws-ecr-image-rule">Setup AWS ECR image rule</h3>
<p>Since I build Docker image for every PR merged to the main branch. I will have a lot of Docker images in ECR and could
cause over-bill.</p>

<p>Because of so, I set up a rule to just save the latest 10 Docker Images for deployment, which should be more than enough
in case I need to roll back the deployment.</p>

<h3 id="next-step">Next step</h3>
<ol>
  <li>I have a DB migration script that needs to run after AppRunner updated, need to a way to trigger this.</li>
  <li>I remember AppRunner maybe dispatching an event in EventBridge.</li>
</ol>

<h3 id="closing">Closing</h3>
<p>If you are interested to know how, I configure it with source code and explanation. Please let me know in comment, I will
find time to share it and make a video for it.</p>]]></content><author><name>Budi Arsana</name></author><category term="Other" /><summary type="html"><![CDATA[Date: 1 September 2024..]]></summary></entry><entry><title type="html">Setup AWS CloudWatch Agent On-Premise Server — Part 2 [END]</title><link href="https://gusdecool.github.io/2022/06/25/Setup-AWS-CloudWatch-Agent-On-Premise-Server-Part-2-end.html" rel="alternate" type="text/html" title="Setup AWS CloudWatch Agent On-Premise Server — Part 2 [END]" /><published>2022-06-25T00:00:00+00:00</published><updated>2022-06-25T00:00:00+00:00</updated><id>https://gusdecool.github.io/2022/06/25/Setup-AWS-CloudWatch-Agent-On-Premise-Server-Part-2-end</id><content type="html" xml:base="https://gusdecool.github.io/2022/06/25/Setup-AWS-CloudWatch-Agent-On-Premise-Server-Part-2-end.html"><![CDATA[<p>Tutorial how to add AWS CloudWatch agent in Kubernetes server.</p>

<p>Previous post at <a href="https://gusdecool.github.io/2022/05/12/Setup-AWS-CloudWatch-Agent-On-Premise-Server-Part-1.html">PART 1</a></p>

<p>On part 1 we have successfully tested AWS CloudWatch agent (I will short it as CWA in next mentioned) as a container. 
Now in this post I will share how I successfully installed in my K8s server.</p>

<p>I will share you my full K8s declarative config first, then I will explain each line that related and important. 
In this example I using my Symfony application as the app that will write log, you can change it into any of your app,
as long as your app write a log file, we can use it and make CWA send log event to AWS.</p>

<h2 id="my-k8s-full-config">My K8s full config</h2>

<pre><code class="language-yaml">apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-prod
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api-prod
  template:
    metadata:
      labels:
        app: api-prod
    spec:
      containers:
        - image: your-application-image
          name: web
          envFrom:
            - configMapRef:
                name: api-prod-env
          volumeMounts:
            - mountPath: /app/var/log
              name: app-log
        - image: amazon/cloudwatch-agent:1.247350.0b251814
          name: agent
          volumeMounts:
            - mountPath: /etc/cwagentconfig
              name: agent-config
              readOnly: true
            - mountPath: /log
              readOnly: true
              name: app-log
            - mountPath: /root/.aws
              name: aws-cred
              readOnly: true
      volumes:
        - name: app-log
          emptyDir: { }
        - name: agent-config
          configMap:
            name: api-prod-cwagent
            items:
              - key: cwagentconfig
                path: cwagentconfig
        - name: aws-cred
          configMap:
            name: aws-cred
      terminationGracePeriodSeconds: 60
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: api-prod-cwagent
data:
  cwagentconfig: |
    {
      "agent": {
        "metrics_collection_interval": 60,
        "run_as_user": "root"
      },
      "logs": {
        "logs_collected": {
          "files": {
            "collect_list": [
              {
                "file_path": "/log/app.log",
                "log_group_name": "api-prod",
                "log_stream_name": "api-prod-{hostname}",
                "timestamp_format" :"[%Y-%m-%dT%H:%M:%S.%f%z]",
                "retention_in_days": 365
              }
            ]
          }
        }
      }
    }
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-cred
data:
  config: |
    [profile AmazonCloudWatchAgent]
    output = text
    region = ap-southeast-1
  credentials: |
    [AmazonCloudWatchAgent]
    aws_access_key_id = XXX-XXX
    aws_secret_access_key = XXX-XXX
</code></pre>

<h2 id="deployment-config-explanation">Deployment config explanation</h2>
<h3 id="container-config">Container config</h3>
<p>If you follow my previous post, I decided to config CWA installed on each pod. This may be not optimal as we could end up 
having too many CWA agent, but this should work for now and I prefer to keep it simple for starter and proof of concept.</p>

<p>I will explain each relevant config by it excerpt.</p>

<pre><code class="language-yaml">        - image: your-application-image
          name: web
          envFrom:
            - configMapRef:
                name: api-prod-env
          volumeMounts:
            - mountPath: /app/var/log
              name: app-log```
</code></pre>

<p>This if my core container app, this image contain my full application. It read envFrom K8s configMapRef, I will not 
explain in detail for each as each app is different and not relevant with CWA installation.</p>

<p>The interesting part is “volumeMounts”, my core container app write log to file “/app/var/log/app.log” which this 
file is then synced with other container in CWA, so CWA have capabilities to read the log file from different 
container (decoupled concept).</p>

<pre><code class="language-yaml">        - image: amazon/cloudwatch-agent:1.247350.0b251814
          name: agent
          volumeMounts:
            - mountPath: /etc/cwagentconfig
              name: agent-config
              readOnly: true
            - mountPath: /log
              readOnly: true
              name: app-log
            - mountPath: /root/.aws
              name: aws-cred
              readOnly: true
</code></pre>

<p>Above is the CWA container, there are only “volumeMounts” which act ac the container config. I will explain each in detail below.</p>

<pre><code class="language-yaml">            - mountPath: /etc/cwagentconfig
              name: agent-config
              readOnly: true
</code></pre>

<p>This is the CWA core config. It configure how the CWA should act like which file to read. If you read my previous, 
this is the “cwagentconfig.cnf” file.</p>

<pre><code class="language-yaml">            - mountPath: /log
              readOnly: true
              name: app-log
</code></pre>

<p>Above is the volume of the log file, which it synced the core app container. Thing to note here, if you config the CWA 
to delete the log lines once it send to AWS CloudWatch, you may want to set “readOnly: false”.</p>

<pre><code class="language-yaml">            - mountPath: /root/.aws
              name: aws-cred
              readOnly: true
</code></pre>

<p>Above is your AWS CWA agent credentials where the container will read it from file. It contains the “aws_access_key_id”
&amp; “aws_secret_access_key”.</p>

<h3 id="deployment-volume-config">Deployment volume config</h3>
<pre><code class="language-yaml">        volumes:
        - name: app-log
          emptyDir: { }
        - name: agent-config
          configMap:
            name: api-prod-cwagent
            items:
              - key: cwagentconfig
                path: cwagentconfig
        - name: aws-cred
          configMap:
            name: aws-cred
</code></pre>

<p>Above is the deployment volume config. I will explain each below</p>

<pre><code class="language-yaml">- name: app-log
  emptyDir: { }
</code></pre>

<p>Above the is “app-log”, i decided to go with “emptyDir” volume because the content of this volume is not important to 
keep once the log sent to AWS CloudWatch.</p>

<pre><code class="language-yaml">        - name: agent-config
          configMap:
            name: api-prod-cwagent
            items:
              - key: cwagentconfig
                path: cwagentconfig
</code></pre>

<p>Above is CWA config, since K8s support using “configMap” as “file”. I decided to use configMap to keep all configs in 
a single file rather than reference it to external file. Just to keep thing simple and have everything in a single 
deployment file.</p>

<pre><code class="language-yaml">        - name: aws-cred
          configMap:
            name: aws-cred
</code></pre>

<p>Above is the AWS credentials file. Similar like “agent-config”, this is the “configMap” as “file”</p>

<h3 id="pod-termination-timing">Pod termination timing</h3>
<pre><code class="language-yaml">terminationGracePeriodSeconds: 60
</code></pre>

<p>I set the “terminationGracePeriodSeconds: 60” to follow the CWA “metrics_collection_interval: 60” interval, so when the
pod replaced with new deployment, it will have grace period for 60 seconds to allow CWA send the log the AWS Cloudwatch,
you may want to increase this value a little bit, e.g: by 10 seconds to be safe.</p>

<h3 id="configmap-api-prod-cwagent">ConfigMap “api-prod-cwagent”</h3>
<pre><code class="language-yaml">apiVersion: v1
kind: ConfigMap
metadata:
  name: api-prod-cwagent
data:
  cwagentconfig: |
    {
      "agent": {
        "metrics_collection_interval": 60,
        "run_as_user": "root"
      },
      "logs": {
        "logs_collected": {
          "files": {
            "collect_list": [
              {
                "file_path": "/log/app.log",
                "log_group_name": "api-prod",
                "log_stream_name": "api-prod-{hostname}",
                "timestamp_format" :"[%Y-%m-%dT%H:%M:%S.%f%z]",
                "retention_in_days": 365
              }
            ]
          }
        }
      }
    }
</code></pre>

<p>Above is the ConfigMap of the CWA. It’s basically using K8s capability to create a file from a config map. The config for CWA
has been explained in post 1, I will not explain again here.</p>

<h3 id="configmap-aws-cred">ConfigMap “aws-cred”</h3>
<pre><code class="language-yaml">apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-cred
data:
  config: |
    [profile AmazonCloudWatchAgent]
    output = text
    region = ap-southeast-1
  credentials: |
    [AmazonCloudWatchAgent]
    aws_access_key_id = XXX-XXX
    aws_secret_access_key = XXX-XXX
</code></pre>

<p>Above are the AWS credentials, you need to replace the value of access key id and secret access key with yours. 
Then K8s will create a file from this config map.</p>

<hr />

<p>That’s pretty much what I did to successfully setup CWA in K8s. If you have any questions or other strategies how to 
set in K8s, please let me know in comment.</p>

<p>And last, I apologize for the delay writing this post final part. I actually already resolved it a week after the first 
post, but only recently got the time to write this post.</p>

<p>Thank you.</p>]]></content><author><name>Budi Arsana</name></author><category term="Other" /><summary type="html"><![CDATA[Tutorial how to add AWS CloudWatch agent in Kubernetes server.]]></summary></entry><entry><title type="html">Setup AWS CloudWatch Agent On-Premise Server — Part 1</title><link href="https://gusdecool.github.io/2022/05/12/Setup-AWS-CloudWatch-Agent-On-Premise-Server-Part-1.html" rel="alternate" type="text/html" title="Setup AWS CloudWatch Agent On-Premise Server — Part 1" /><published>2022-05-12T00:00:00+00:00</published><updated>2022-05-12T00:00:00+00:00</updated><id>https://gusdecool.github.io/2022/05/12/Setup-AWS-CloudWatch-Agent-On-Premise-Server-Part-1</id><content type="html" xml:base="https://gusdecool.github.io/2022/05/12/Setup-AWS-CloudWatch-Agent-On-Premise-Server-Part-1.html"><![CDATA[<p>Tutorial how to set up AWS CloudWatch agent with on-premise server so we can send logs from server outside AWS.</p>

<p>The second and final part is available here <a href="https://gusdecool.github.io/2022/06/25/Setup-AWS-CloudWatch-Agent-On-Premise-Server-Part-2-end.html">PART 2</a></p>

<hr />

<h2 id="background-story-why-i-need-it">Background story why i need it</h2>
<p>I have Kubernetes (K8s) server in Digital Ocean, currently i have difficulty on how i can persist from k8s pods as 
we know container storage is ephemeral and how i will examine the logs as it will require another tool to read it easily.</p>

<p>For temporary solution, i have micro web service as the central for log storage called “Tracing” and my applications 
will send the logs to Tracing via HTTP request instantly for each log. Then soon i realized this method is too expensive
and slow as each HTTP call cost time.</p>

<p>Thus why i need something faster yet minimum maintenance. Since i got quite familiar AWS CloudWatch, i would like to 
use it as my central log, but since my K8s server is outside of AWS, i need to find out how to do it. Luckly AWS have
prepared AWS CloudWatch agent which it will act as an agent (hence its name) to send the logs to AWS.</p>

<hr />

<h2 id="options-to-install-cloudwatch-agent">Options to install CloudWatch Agent</h2>
<p>There are 2 options i knew how to install CloudWatch agent with my familiarity with Docker &amp; K8s.</p>

<p>Install it as binary/service manually in my app operating system in each container. I tried this then found how 
complicated it’s and no luck to make it works. Then i realized if i do it this way, i will need to manage the 
CloudWatch agent for each of my containers which will cost me another overhead. I drop this idea.
Install it as container using AWS provided Docker image as in this https://hub.docker.com/r/amazon/cloudwatch-agent.</p>

<p>The problem with this image is it minimum documentation focused on using this Docker Image 😢
So on this part 1, i want to describe how i successfully do it with container. Since i successfully did it by run 
trial and error reading each container error message and search on Google to make sense of it. I hope this post will 
help you and save your time.</p>

<hr />

<h2 id="prerequisites">Prerequisites</h2>
<p>Before you can start following this tutorial, below is some prerequisites that you must have:</p>

<ol>
  <li>AWS account (obviously)</li>
  <li>AWS IAM account with access to create log group and stream. You will need to have it access key and secret. Let me 
know if you need my help to describe IAM access specs.</li>
  <li>Knowledge how to use Docker.</li>
</ol>

<hr />

<h2 id="docker-compose">Docker Compose</h2>
<p>In this tutorial, i will use Docker Compose instead of pure Docker CLI. Since i feel it’s easier for me to record the change i made and git commit for history.</p>

<p>This is the config of my <code>docker-composer.yml</code></p>

<pre><code class="language-yml">version: "3.8"
services:
  agent:
    image: amazon/cloudwatch-agent
    volumes:
      - ./config/log-collect.json:/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json
      - ./aws:/root/.aws
      - ./log:/log
      - ./etc:/opt/aws/amazon-cloudwatch-agent/etc
</code></pre>

<p>Do not run <code>docker-compose up</code> yet as still didn’t have the sync volume files which it the important file that i will explain below.</p>

<hr />

<h2 id="cloudwatch-agent-config">CloudWatch agent config</h2>
<p>CloudWatch agent config is the configuration that describe how the agent will collect the log from the container and 
send it to AWS. In my context it represented in as file <code>./config/log-collect.json</code> then synced to the container.</p>

<p>Below is the content of that file</p>

<pre><code class="language-json">{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "root"
  },
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [
          {
            "file_path": "/log/app.log",
            "log_group_name": "container-aws-logs-test",
            "log_stream_name": "{hostname}",
            "timestamp_format" :"[%Y-%m-%dT%H:%M:%S.%f%z]",
            "retention_in_days": 30
          }
        ]
      }
    }
  }
}
</code></pre>

<p>You can read the config structure docs at https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html</p>

<p>I will explain in detail the keypoints of my config</p>

<ol>
  <li>“<code>metrics_collection_interval</code>” describe in what every seconds the agent will check the log file and send to AWS.</li>
  <li>60 seconds is the default. Do not set it too soon if not needed as it will increase the charge of AWS server.
“<code>run_as_user</code>” i set it as root. Feel free to set it as any user other than root if you want to have it extra secure.</li>
  <li>I did not collecting “<code>log_stream_name</code>” as it run in stand alone container, so it didn’t make sense to capture 
the metric of AWS CloudWatch agent container itself. And since I’m using K8s auto scaling, this is not an important issue
for me to monitor for now.</li>
  <li>“<code>collect_list</code>” describe which files the agent will watch and send to AWS. The interesting part in there, this is an 
array of object. Which mean we possibly can describe multiple files to watch then only need to have 1 agent or 1 
container to watch all my pods for saving computing resource purpose.</li>
  <li>“<code>file_path</code>” this describe which file the agent will read.</li>
  <li>“<code>log_group_name</code>” &amp; “<code>log_stream_name</code>” describe to which group and stream this log will belong in AWS CloudWatch.</li>
  <li>“<code>timestamp_format</code>” describe your logs timestamp. If it match, CloudWatch will use your log timestamp, and if not,
it will use current timestamp. I found difficulty when configuring and testing which will i describe how i solve it below.</li>
  <li>“<code>retention_in_days</code>” describe how log the log will persist. 30 days is enough for testing.</li>
</ol>

<hr />

<h2 id="aws-credentials">AWS Credentials</h2>
<p>AWS credentials will be used to authenticate to be able to send log to your aws account. It represented as folder 
<code>./aws</code>, inside that folder i have 2 AWS standard credentials files like below:</p>

<p><code>./aws/config</code></p>

<pre><code class="language-ini">[profile AmazonCloudWatchAgent]
output = text
region = ap-southeast-1
</code></pre>

<p>it’s important to name it as “AmazonCloudWatchAgent” as the container designed to use this profile. Change the region 
into your AWS region or any aws region you want to record the log.</p>

<p><code>./aws/credentials</code></p>

<pre><code class="language-ini">[AmazonCloudWatchAgent]
aws_access_key_id = IAM_ID
aws_secret_access_key = IAM_KEY
</code></pre>

<p>Again, it’s important to name the profile <code>AmazonCloudWatchAgent</code>.</p>

<p>And that’s it for aws credentials.</p>

<hr />

<h2 id="log-file">Log file</h2>
<p>This is not required in production but good to do to test if the container run in development before we go to production
and help you understand the concept of how the agent monitor the file.</p>

<p><code>./log/app.log</code> it contains logs file from Symfony application with a modified timestamp (as i mentioned above, 
I got difficulty with timestamp). Note the filename is “app.log” which match with the JSON config from file 
<code>./config/log-collect.json</code> that i configured above.</p>

<p>My content of log directory is a below</p>

<pre><code class="language-txt">[2022-05-14T17:32:19.826204+0000] app.INFO: budi test {"budi":"foo"}
</code></pre>

<p>The content is pretty simple.</p>

<hr />

<h2 id="time-to-test">Time to test</h2>
<p>With all that configured, now you’re ready to start the docker with command “docker compose up”. After it run, examine 
the output in the terminal, then check in your AWS account CloudWatch group “container-aws-logs-test” to see if it
successfully delivered.</p>

<p>If you’re facing any error, it most likely due to IAM account used didn’t have access to create log.</p>

<p>Now try to add any new line in file `./log/app.log”, wait 60 seconds and see it it delivered.</p>

<p>Now you understand the concept how the agent works. <strong>CONGRATULATION!</strong></p>

<hr />

<h2 id="bonus-debugging">Bonus: Debugging</h2>
<p>As i mentioned earlier, i have difficulty on how to setup the timestamp and there is no way for me to SSH into the
container as it didn’t have “bash” nor “sh” so i unable to “docker compose exec sh” into container. And there is now 
way for me to install the “bash” as the container not even have “apt-get” command, i guess it’s because the image is 
created using container_linux:go. I don’t know how AWS did it or build the image, it’s something that i need to learn
in the future.</p>

<p>After Google-fu for a moment and examining the terminal output, i learnt that the Agent config is located at 
“/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml” inside the container. And since i’m unable to SSH
into the container to view the file, i got idea to sync that file into my local computer. Hence you see the config sync
volume of</p>

<pre><code class="language-yml">volumes:
- ....
- ./etc:/opt/aws/amazon-cloudwatch-agent/etc
</code></pre>

<p>now if you read the file “./etc/amazon-cloudwatch-agent.toml” in your local file, i found this interesting lines</p>

<pre><code class="language-toml">timestamp_layout = "[2006-01-02T15:04:05..000-0700]"
timestamp_regex = "(\\[\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.(\\d{1,9})[\\+-]\\d{4}\\])"
</code></pre>

<p>Aha! this is the timestamp config used by the Agent. But my problem not end there as then i found out the 
“timestamp_layout” example as not accurate 😢. Notice the timestamp between second and microseconds/milliseconds 
“05..000” there are 2 dots (..). This is inaccurate as if looks from the regex it only check for 1 dot.</p>

<p>So my suggest is to stick with test it with regex, you can test your log file input with regex validation tool 
at https://regex101.com/.</p>

<hr />

<h2 id="next-part">Next part</h2>
<p>As this proof of concept success. My next step will be how to install this into K8s. Since this run as container, I 
have two options to do it.</p>

<ol>
  <li>Have CloudWatch Agent as single independent pod then all my apps will have shared storage to this Agent pod. So the 
Agent pod will read all the storage files provided by all app pods. The pros: saving computing resources. The cons: I
will need to update the config of this Agent pod whenever i add new app and this potentially will have lot of configuration.</li>
  <li>Have CloudWatch Agent installed as container in the same pod with the app. So each app have it own Agent. The pros:
the agent have small config and the config will also simpler. The cons: every app will have it own agent and when 
horizontal scaling happened, the agent will also replicated, my concern is with computing resource. But i guess it will
quite small for starter.</li>
  <li>At the moment i lean to option 2. Still not sure, but i will try that first and will let you know how it goes in 
next post. Once i did it, i will update this post to link to the next post.</li>
</ol>

<p>Stay tune…</p>

<p>Thank you for reading ✌️</p>]]></content><author><name>Budi Arsana</name></author><category term="Other" /><summary type="html"><![CDATA[Tutorial how to set up AWS CloudWatch agent with on-premise server so we can send logs from server outside AWS.]]></summary></entry></feed>