NashTech Blog

Networking: BGP – The Invisible Protocol That Decides Where Your Traffic Goes

Table of Contents

When engineers talk about the Internet, we often imagine something simple:

  • A request leaves our server
  • It travels across the network
  • It reaches the destination
  • A response comes back

But in reality, your packet might travel across dozens of independent networks owned by different companies, countries, and cloud providers.

And there is one protocol quietly deciding that journey.

That protocol is BGP.


A small incident that explains everything

Imagine this scenario.

You deploy a new version of your service to production.
Everything looks fine:

  • Health checks are green
  • CPU and memory are stable
  • No errors in logs

But suddenly:

  • Users in Europe cannot access your API
  • Latency from Singapore jumps from 40 ms to 300 ms
  • Some requests start timing out

You check:

  • Application logs → nothing unusual
  • Database → healthy
  • Load balancer → normal
  • DNS → correct

So what’s wrong?

After some digging, you discover:

Traffic to your cloud region is being routed through a completely different path across the Internet.

No code change caused it.
No infrastructure change caused it.

It was a BGP routing change somewhere on the Internet.


So what exactly is BGP?

BGP (Border Gateway Protocol) is the protocol that decides:

“Which path should data take across the Internet?”

The Internet is not one big network.
It is made of thousands of independent networks.

Each of these is called an Autonomous System (AS).


The Internet as a group of networks

            +----------------+
            |   Cloudflare   |
            |     AS13335    |
            +--------+-------+
                     |
                     |
+-----------+   +----+----+   +-----------+
|  ISP A    +---+  BGP    +---+  ISP B    |
|  AS64501  |   | Peering |   |  AS64502  |
+-----+-----+   +----+----+   +-----+-----+
      |               |              |
      |               |              |
+-----+-----+   +----+----+   +-----+-----+
|  Company  |   |  Google  |   |   AWS     |
|  AS65010  |   | AS15169  |   | AS16509   |
+-----------+   +----------+   +-----------+



Each box is:

  • An independent network
  • With its own policies
  • Its own infrastructure
  • Its own business decisions

BGP is what connects them all.


How traffic moves across multiple networks

When a user opens your website, the packet may travel like this:

User
|
| 1. Request
v
+---------+
| Home ISP| AS64512
+----+----+
|
| BGP path decision
v
+----+----+
| Transit | AS3356
| Provider|
+----+----+
|
v
+----+----+
| AWS | AS16509
| Region |
+----+----+
|
v
Your Server

Every hop between these networks is decided by BGP.


How networks announce routes

Each Autonomous System tells others what it can reach

        AS100
   (Cloud Provider)
     10.0.0.0/8
         |
         | "I can reach 10.0.0.0/8"
         v
        AS200
       (Transit)
         |
         | "To reach 10.0.0.0/8,
         |  go through AS100"
         v
        AS300
        (ISP)



So when a user in AS300 needs to reach 10.1.2.3:

  • It sends traffic to AS200
  • AS200 sends it to AS100
  • AS100 delivers it

When the path is not what you expect

You may expect traffic to follow the shortest geographic path:

Vietnam Singapore AWS Singapore
Vietnam Hong Kong Japan Singapore AWS
Expected path:
[User VN] ---> [Singapore ISP] ---> [AWS Singapore]
Actual BGP path:
[User VN]
|
v
[ISP VN]
|
v
[Transit HK]
|
v
[Transit JP]
|
v
[AWS Singapore]

Why?

Because:

  • Transit HK is cheaper
  • Or there is a policy preference
  • Or a route change happened

BGP cares about policy, not geography.


BGP in a hybrid cloud setup

This is a common DevOps scenario.

     On-prem Data Center
           AS65010
        +-------------+
        | Core Router |
        +------+------+
               |
               | BGP session
               |
        +------+------+
        | Cloud Edge  |
        |   AS16509   |
        +------+------+
               |
               v
        +-------------+
        |  VPC / VNet |
        | Application |
        +-------------+



When the BGP session is up:

  • Routes are exchanged
  • On-prem can reach cloud subnets
  • Cloud can reach on-prem networks

If BGP goes down:

  • Routes disappear
  • Connectivity breaks
  • Your app may suddenly lose database access

What a BGP problem looks like

Normal routing

User
|
v
ISP A
|
v
Transit
|
v
Your Cloud

After a route leak or misconfiguration

User
|
v
ISP A
|
v
Wrong Network
|
v
Black hole (traffic dropped)

From your monitoring, this appears as:

  • Sudden regional outage
  • Increased latency
  • Timeouts from specific countries

Even though:

  • Your servers are healthy
  • Your deployments are fine

Key takeaway diagram

This is the simplest mental model of BGP:

Your App (Cloud)
|
v
Cloud Network (AS1)
|
v
Transit Network (AS2)
|
v
ISP (AS3)
|
v
User

BGP decides:

  • Which AS to use
  • In which order
  • Based on policies and agreements

Practical takeaway for DevOps engineers

You don’t need to configure BGP daily.

But you should understand this chain:

User ISP Transit Cloud Your App

Because problems can happen at any link in that chain.

And often:

If the app is healthy but unreachable, BGP is part of the story.

Picture of Duong Dao Viet

Duong Dao Viet

Leave a Comment

Suggested Article

Discover more from NashTech Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading