CASESTUDY

On-Prem to AWS Cloud Connectivity Auto-Failover

Whenever there is an outage with Direct Connect location, traffic failover to other direct connect location is not happening automatically which causing major outage.

About the Company

The Company is an American multinational financial services company, headquartered in Denver, Colorado. It Serving millions of retail and digital customers every day with franchise in over 200 countries and territories.

Problem Statement

Whenever there is an outage with Direct Connect location, traffic failover to other direct connect location is not happening automatically which causing major outage.

The Solution

Based on the multiple threshold we can implement script to automatically bring down/up impacted BGP peers (direct connection location) from AWS cloud side by using Lambda.

Summary

A gateway protocol known as Border Gateway Protocol (BGP) makes it possible for the internet to transmit routing data between autonomous systems (AS). Networks require a method of communication as they interact with one another. Peering allows for the accomplishment of this. Peering is made feasible through BGP. Without it, networks could not communicate with one another to send and receive data.

With a SLA of up to 99.99%, several Direct Connect links at different locations offer the necessary reliability for on-premise connectivity. Border Gateway Protocol (BGP), which is used to exchange routes between your network and AWS, is simple. Routing decisions frequently ignore transient changes in end-to-end network conditions that could impact application performance.

The solution uses an Amazon EventBridge rule to match the desired alarms and invokes an AWS Lambda function that executes the response. Sample Lambda function use the AWS Direct Connect Resiliency Toolkit API to put the impeded Virtual Interface in “Failover Test” mode for up to 10 minutes.

The Benefits

  1. Lambda executes the code on infrastructure that is highly available, fault-tolerant, and distributed across several Availability Zones (AZs) in a single Region. It also handles all infrastructure administration, maintenance, and patching.
  2. AWS Lambda automatically scales your application by running code in response to each trigger. The code runs in parallel and processes each trigger individually, scaling precisely with the size of the workload.
  3. AWS Lambda has built-in availability and fault tolerance.