Saturday, September 19, 2020

AWS Disaster Management

 RPO: Recovery Point Objective.


RTP: Recovery Time Objective.


Disaster Recovery Strategies:

  1. Backup & Restore.

  2. Pilot Light

  3. Warm Standby

  4. Hot Site/Multi Site approach

  1. Backup & Restore:

    1. Very easy 

    2. Only cost of storing the data

    3. Can take long to Restore.

    4. High RPO & High RTO.

  2. Pilot Light

    1. A small version of the app is always running in the cloud

    2. Useful for the critical core

    3. Very similar to backup & restore

    4. Faster than backup & Restore as critical systems are already up.

  1. Warm Standby:

    1. Full System is up but at minimum size

    2. Upon disaster we can scale up to prod load


  1. Hot Site / Multi Site approach

    1. Very low RTO (minutes or seconds) → very expensive

    2. Full production scale is running on cloud or on prem

    3. Active Active Status


All AWS Multi Region:

Disaster Recovery Tips:

Backups:


Saturday, September 12, 2020

Amazon VPC & Networking [Part 1]

 Amazon VPC & Networking:

  • CIDR (Classless Inter Domain Routing) IPv4:

    • CIDR is used for AWS networking.

    • Security group rules (allow 0.0.0.0/0).

    • CIDR Example: 0.0.0.0/0 & 192.0.10.0/8.

    • They help defining the IP address Range:

      • 192.0.0.1/32 → This means only one IP. 

      • 0.0.0.0/0 → This means all the IP’s.

      • 192.0.0.0/26 →192.0.0.0 to 192.0.0.63 (total 64 Ip’s). 

    • How CIDR is calculated:

    • There are two components in CIDR Base IP & Subnet Mask

      • 0.0.0.0/0 ⇒ 0.0.0.0 is base ip & /0 is subnet mask.

      • The base ip represents an ip contained in the range

      • The subnet mask represents how many bits can change in IP

    • The subnet mask allows parts of the underlying IP to get additional next values from the base ip.

      • 0.0.0.0/32 = 2^0 = 1

      • 0.0.0.0/31 = 2^1 = 2

      • 0.0.0.0/28 = 2^(32-28) = 2^4 = 16 IP’s

      • 0.0.0.0/24 = 2^(32-24) = 2^8 = 256 IP’s: Range [0.0.0.0 to 0.0.0.255] last class of IP 

      • 0.0.0.0/16 = 2^(32-16) = 2^16 = 65536 IP’s Range [0.0.0.0 to 0.0.255.255] last two classes of IP

      • 0.0.0.0/8 = 2^(32-8) = 2^24 = 16777216 IPs Range [0.0.0.0 to 0.255.255.255] last three classes of IP

      • 0.0.0.0./0 = 2^(32-0) = 2^32 = 4294967296 IPs Range [0.0.0.0 to 255.255.255.255] all four classes of IP’s

    • Private IP’s vs Public IP’s (IPv4)

      • The IANA (Internet Assigned Numbers Authority) established certain blocks of IPv4 addresses for the use of private and public addresses.

      • Meaning certain private IP’s are reserved and rest are made public 

      • Allowed private IP ranges:

        • 10.0.0.0 - 10.255.255.255 (10.0.0.0/8) 

        • 172.16.0.0 - 172.31.255.255 (172.16.0.0/12) => default AWS VPC

        • 192.168.0.0 - 192.168.255.255 (192.168.0.0/16) => home network

    • All the rest IP’s are public


Default VPC Virtual Private Cloud:

  • All New accounts have default VPCs

  • The new instances are created in default VPC if no subnet is provided.

  • The default VPC has internet connectivity and all instances have public IP.

  • We also get a public & a private DNS.

  • The default CIDR of VPC is 172.16.0.0/16

  • The VPC is associated with the default subnets.

    • Eg: If Region A has 3 AZ then 3 Subnets in each AZ. 

    • Each subnet has non overlapping CIDRs

  • Default VPC comes with following four set of services:

    • Subnet

    • Route Table

    • Internet Gateway

    • Network ACL

  • You can have multiple VPC in a region (Max 5 VPC → Soft limit).

  • Raise a support ticket to increase the limit

  • Max CIDR per VPC = 5 

    • Min size /28 = 16 IP’s 

    • Max Size /16 = 65536 IP’s

  • Since VPC is private only private IP range is allowed (172.16.0.0, 192.168.0.0, 10.0.0.0).

  • VPC CIDR should not overlap with your other network

    • Ex: corporate network is in 10.0.0.0/8 than VPC CIDR should not be in that range.

  • IMPORTANT: The only range we can choose in VPC is between /16 to /28


Subnet

  • AWS reserves 5 IP address every time you create a subnet

  • First 4 & last 1 are reserved every time.

  • Ex: 10.0.0.0/24 

    • 10.0.0.0 is reserved for Network Address

    • 10.0.0.1 is reserved by AWS for VPC router

    • 10.0.0.2 is reserved by AWS for mapping to Amazon-provided DNS

    • 10.0.0.3 is reserved by AWS for future use.

    • 10.0.0.255 Network broadcast address. AWS doesn’t support broadcast in VPC, therefore it's reserved.

  • Exam Tip: 

    • If you need 29 IPs address for your EC2 instances, you can’t create a subnet of size /27 (32 IP’s) [32-5=27 < 29]

    • You will need to choose /26 = 64 IP’s [64-5=59 >29 ]


Internet Gateway:

  • Internet Gateway helps our VPC instances to get connected to the internet

  • It scales horizontally and is HA and redundant 

  • Must be created separately from VPC 

  • Internet Gateway is also NAT for the instances that have public IP4V

  • Internet Gateway does not allow their own internet, we will also need a route table.

  • One Internet Gateway per VPC 


Route Table:

  • The route table controls the ingress traffic. 

  • 0.0.0.0/0  with an internet gateway helps to connect to internet.


NAT Instance (Network Address Translation):

(Outdated but in exam topic):

  • Allow instances in private subnet to connect to the internet

  • The NAT Instance must be launched in a public subnet and must.

  • Must disable the EC2 Flag: Source/Destination check.

  • Must have Elastic IP attached to it.

  • Route table must be configured to route traffic from private to NAT instance.

Note: Search for NAT OS while selecting OS.



NAT Gateway:

  • AWS Managed NAT, higher bandwidth, better quality and no admin.

  • Pay by the hour for usage & bandwidth.

  • NAT is created in a specific AZ, uses an EIP.

  • Cannot be used by the instance in the same subnet (Only from other subnets).

  • Requires an Internet Gateway (Private Subnet ⇒ NAT ⇒IGW).

  • 5GBPS of bandwidth which will automatically scale up to 45GBPS.

  • No Security Groups needs to be managed/required.


NAT Gateway HA:


Attribute 

NAT gateway

NAT instance

Availability

Highly available. NAT gateways in each Availability Zone are implemented with redundancy. Create a NAT gateway in each Availability Zone to ensure zone independent architecture

Use a script to manage failover between instances

Bandwidth

Scale up to 45MBPS

Depends on instance bandwidth

Performance

Software is optimized to handle the NAT traffic

A generic Amazon Linux AMI that's configured to perform NAT.

Cost

Charged depending on the number of NAT gateways you use, duration of usage, and amount of data that you send through the NAT gateways.

Charged depending on the number of NAT instances that you use, duration of usage, and instance type and size.

//TODO 
















DNS Resolution in VPC:

  • enableDnsSupport (=DNS Resolution Setting)

  • Default is true 

  • Helps decide if DNS resolution is supported for the VPC.

  • If enableDnsSupport=true than queries to AWS DNS server at 169.254.169.253 IP.

  • enableDnsHostname (=DNS Hostname Setting).

  • By default it's false when you create a new VPC (for default VPC it is set to True).

  • Won’t do anything if enableDnsSupport=false, it requires the enableDnsSupport=true.

  • If it is set to true then assign a public hostname to EC2 Instance if it is public.

  • If you use custom DNS Hostnames in a private zone in Route53 then both this flag needs to be set to true.


Network ACL & Security Group:

  • The Network ACL is at subnet level, so the allowing/denying is done at subnet level.

  • Unlike the Security group it has deny rule as well.

  • Security Group is Stateful & NACL is Stateless.

  • If inbound rule is defined in Security Group then that rule is automatically added in Outbound rule

  • NACL are like firewalls which control the traffic from & to subnet.

  • The Default NACL allows everything outbound and everything inbound. (Does not restrict anything).

  • One NACL per subnet and new subnets are assigned to default NACL by default.


Defining NACL:

  • Rules have a number from 1-32766 and higher precedence with lower numbers.

  • Eg: #100 allow <IP> & #200 deny <IP>, the IP will be allowed since precedence is given to a lower number.

  • Last rule is an asterisk (*) and denies the request if no match is found.

  • AWS recommends adding rules by increment of 100.

  • Newly created NACL will deny everything.

  • NACL is a great way of blocking any specific IP at the subnet level.

  • https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Security.html



Friday, September 11, 2020

Key Services Focus for AWS Solution Architect Exam (Associate level)

AWS Solution Architect - Associate : 

IAM ------------------------------------------------------------> 4 to 7 questions
S3    ------------------------------------------------------------> 4 to 6 questions
Non S3 (CloudFront, Snowball, Storage Gateway) -----> 3 to 4 questions
Route 53  ------------------------------------------------------> 2 to 3 questions
DB Service  ---------------------------------------------------> 3 to 4 questions
VPC ------------------------------------------------------------> 8 to 10 questions
Managed Services (SQS, SWF, SNS) ---------------------> 4 to 7 questions
Directory Services and Federation -------------------------> 2 to 3 questions
All other Services --------------------------------------------> 2 to 3 questions
White Paper ---------------------------------------------------> 6 to 8 questions

Exam Preparation: (3-6 weeks):

1. First phase: [30 hours]
Focus on 40% Videos & 60% hands on

2. Second Phase: [20 hours ]
Focus on 30% Videos, 30% hands on & 40% going through FAQ's 

3. Third Phase: [20 hours]
Focus on 25% Videos, 25% hands on & 50% going through FAQ's & whitepapers

4. Forth Phase: [4-5 hours]
Mock Exams 

5. Fifth Phase: [130 mins]
Final Exam.

Tuesday, September 1, 2020

AWS - Different Messaging Services [SQS/SNS/Kinesis]

 Messaging Service:

  • Direct communication (Synchronous)

  • Communication through queue in between service (Asynchronous)



  • Synchronous can be problematic in case of sudden spike in traffic.

  • Ex: a video uploading service which accepts 10 videos at one time & all of sudden 1000 videos are uploaded which will lead to service crash.

  • So in this case decoupling cas solve this problem

    • Using SQS → queue model

    • Using SNS → pub/sub model

    • Kinesis → real time streaming model

  • These services can scale independently from our application.


SQS: Simple Queuing Service


  • Here producer will send/push the messages to SQS Queue and Consumer will poll the messages from the SQS Queue.

SQS Standard Queue:

  • One of the oldest Services in AWS (10 years old).

  • Fully managed service, to decouple the application.

  • Can have duplicate messages (at least once delivery, occasionally).

  • Can have out of order messages (best effort ordering).

Attributes:

  • Unlimited throughput, unlimited number of messages in queue.

  • Default retention of messages 4 days and upto 14days.

  • Low latency (<10ms on publish and receive).

  • Limitation of 256Kb per message sent.


SQS Producer Messages:

  • Producer to SQS using SDK (sendMessage API).

  • The message is persisted in the queue until the consumer reads it & deletes it.

  • Default retention of messages 4 days and upto 14days.

  • Example: e-commerce website (send an order to be processed)


SQS Consumers Messages:

  • Consumer (running ec2 instance or lambda function or even on prem instance).

  • Consumer polls for messages to SQS Queue.

  • Consumers can receive up to 10 messages at a time.

  • Process the message (ex: insert the message into RDS).

  • Delete the messages from the queue using DeleteMessage API


SQS Message Visibility Timeout:

  • When one consumer consumes this message it will be unavailable for another consumer till message visibility timeout. 

  • The first consumer can also extend the visibility timeout by calling ChangeMessageVisibility API.

  • If the visibility timeout is high(hour) and if a consumer crashes it can reprocess the message.

  • If the visibility timeout is low(seconds) then the chances of duplicate message are high


SQS Dead letter Queue:

  • If the message fails to process within message visibility timeout then the message is sent back to the queue. 

  • We can set the threshold on how many times this message can be sent back to the queue.

  • After the MaximumReceives threshold is exceeded the message goes into the dead letter queue.

  • The Dead letter queue is useful for debugging purpose

  • Make sure to process the dead letter queue before it expires.

  • We can set the retention period to this dead letter queue.

SQS Delay Queue:

  • Delay a message up to 15mins (Consumers don't see it immediately).

  • Default is set to 0 (Delivery Delay)

  • Can set default to queue level

  • Can override the default value on send using DelaySeconds parameter.


SQS FIFO 

  • First In First Out (Ordering of message in a queue).

  • Ordering guarantee. 

  • Exactly once send capacity (by removing duplicates).

  • Messages are processed in order by a consumer.

  • Limited throughput: 300 msg/s & without batching 3000 msg/s


SQS with Auto Scaling Group:


SNS (Simple Notification Service):

  • Pub/Sub Service: meaning publish and subscriber service.

  • The event producer only sends message to one SNS topic.

  • As many event receivers(subscribers) we want to listen to SNS topic notifications.

  • Each subscriber to the topic will receive all the messages. (new feature to filter out the messages but by default every one will receive it)

  • Upto 10million subscribers per topic (high scale)

  • 1lac topic limit.

  • Subscribers can be (Protocol):

    • SQS

    • HTTP/HTTPS

    • Lambda functions

    • Email notifications

    • SMS messages

    • Mobile notifications

  • SNS Integrates with lot of AWS services:

    • SQS

    • Cloud Watch for alarms

    • Amazon s3 (on bucket events)

    • Auto Scaling group event.

    • CloudFormation (upon stack changes) etc…

  • How to publish SNS message:

    • Topic Publish:(using SDK).

      • Topic Creation.

      • Create a Subscription 

      • Publish to the topic

    • Direct Publish: (For mobile apps SDK)

      • Create a platform application

      • Create a platform endpoint

      • Publish to the endpoint

      • Works with Google GSN, Apple APNS, Amazon ADM etc.

  • SNS Security:

    • Inflight Encryption

    • At rest encryption using KMS

    • Client side encryption

    • Access control using IAM policies to regulate access to the SNS API.

    • You can define SNS policies similar to S3 bucket policies

      • Useful for cross account access to SNS topic

      • Useful for other services to write to SNS topics (S3).


SQS + SNS Fan Out:

  • Push once into SNS and receive all in SQS queues that are SNS subscribers 

  • Fully decoupled no data loss

  • SQS Allows for: data persistence, delayed processing and retries of work.

  • To make this thing work the SQS needs to have allow policy to give access to SNS to write 

* Important Note: SNS cannot send messages to SQS FIFO Queues (AWS Limitation)

If asked in exam about this simply rule out since it’s not possible as of today:


AWS Kinesis:

  • Kinesis is managed alternative to Apache Kafka

  • Real time Big data collection tool.

  • Great for streaming processing frameworks like Spark, NiFi etc.

  • Data is automatically replicated to 3 AZ

  • Three sub products:

    • Kinesis Streams: low latency streaming ingest at scale.

    • Kinesis Analytics: This is to perform real time analytics on streams using SQL.

    • Kinesis Firehose: Load streams into s3, RedShift, ElasticSearch etc.


  • On a high level kinesis is in the middle to stream and then to perform analytics over it and then to store somewhere for long time

  • #TODO need to add remaining features of Kinesis


.



Terraform Cheat Sheet [WIP]

Installing Terraform