Techno Share: September 2020

Saturday, September 19, 2020

AWS Disaster Management

RPO: Recovery Point Objective.

RTP: Recovery Time Objective.

Disaster Recovery Strategies:

Backup & Restore.
Pilot Light
Warm Standby
Hot Site/Multi Site approach

Backup & Restore:

Very easy
Only cost of storing the data
Can take long to Restore.
High RPO & High RTO.

Pilot Light

A small version of the app is always running in the cloud
Useful for the critical core
Very similar to backup & restore
Faster than backup & Restore as critical systems are already up.

Warm Standby:

Full System is up but at minimum size
Upon disaster we can scale up to prod load

Hot Site / Multi Site approach

Very low RTO (minutes or seconds) → very expensive
Full production scale is running on cloud or on prem
Active Active Status

All AWS Multi Region:

Disaster Recovery Tips:

Backups:

Saturday, September 12, 2020

Amazon VPC & Networking [Part 1]

Amazon VPC & Networking:

CIDR (Classless Inter Domain Routing) IPv4:

CIDR is used for AWS networking.
Security group rules (allow 0.0.0.0/0).
CIDR Example: 0.0.0.0/0 & 192.0.10.0/8.
They help defining the IP address Range:

192.0.0.1/32 → This means only one IP.
0.0.0.0/0 → This means all the IP’s.
192.0.0.0/26 →192.0.0.0 to 192.0.0.63 (total 64 Ip’s).

How CIDR is calculated:
There are two components in CIDR Base IP & Subnet Mask

0.0.0.0/0 ⇒ 0.0.0.0 is base ip & /0 is subnet mask.
The base ip represents an ip contained in the range
The subnet mask represents how many bits can change in IP

The subnet mask allows parts of the underlying IP to get additional next values from the base ip.

0.0.0.0/32 = 2^0 = 1
0.0.0.0/31 = 2^1 = 2
0.0.0.0/28 = 2^(32-28) = 2^4 = 16 IP’s
0.0.0.0/24 = 2^(32-24) = 2^8 = 256 IP’s: Range [0.0.0.0 to 0.0.0.255] last class of IP
0.0.0.0/16 = 2^(32-16) = 2^16 = 65536 IP’s Range [0.0.0.0 to 0.0.255.255] last two classes of IP
0.0.0.0/8 = 2^(32-8) = 2^24 = 16777216 IPs Range [0.0.0.0 to 0.255.255.255] last three classes of IP
0.0.0.0./0 = 2^(32-0) = 2^32 = 4294967296 IPs Range [0.0.0.0 to 255.255.255.255] all four classes of IP’s

Private IP’s vs Public IP’s (IPv4)

The IANA (Internet Assigned Numbers Authority) established certain blocks of IPv4 addresses for the use of private and public addresses.
Meaning certain private IP’s are reserved and rest are made public
Allowed private IP ranges:

10.0.0.0 - 10.255.255.255 (10.0.0.0/8)
172.16.0.0 - 172.31.255.255 (172.16.0.0/12) => default AWS VPC
192.168.0.0 - 192.168.255.255 (192.168.0.0/16) => home network

All the rest IP’s are public

Default VPC Virtual Private Cloud:

All New accounts have default VPCs
The new instances are created in default VPC if no subnet is provided.
The default VPC has internet connectivity and all instances have public IP.
We also get a public & a private DNS.
The default CIDR of VPC is 172.16.0.0/16
The VPC is associated with the default subnets.

Eg: If Region A has 3 AZ then 3 Subnets in each AZ.
Each subnet has non overlapping CIDRs

Default VPC comes with following four set of services:

Subnet
Route Table
Internet Gateway
Network ACL

You can have multiple VPC in a region (Max 5 VPC → Soft limit).
Raise a support ticket to increase the limit
Max CIDR per VPC = 5

Min size /28 = 16 IP’s
Max Size /16 = 65536 IP’s

Since VPC is private only private IP range is allowed (172.16.0.0, 192.168.0.0, 10.0.0.0).
VPC CIDR should not overlap with your other network

Ex: corporate network is in 10.0.0.0/8 than VPC CIDR should not be in that range.

IMPORTANT: The only range we can choose in VPC is between /16 to /28

Subnet

AWS reserves 5 IP address every time you create a subnet
First 4 & last 1 are reserved every time.
Ex: 10.0.0.0/24

10.0.0.0 is reserved for Network Address
10.0.0.1 is reserved by AWS for VPC router
10.0.0.2 is reserved by AWS for mapping to Amazon-provided DNS
10.0.0.3 is reserved by AWS for future use.
10.0.0.255 Network broadcast address. AWS doesn’t support broadcast in VPC, therefore it's reserved.

Exam Tip:

If you need 29 IPs address for your EC2 instances, you can’t create a subnet of size /27 (32 IP’s) [32-5=27 < 29]
You will need to choose /26 = 64 IP’s [64-5=59 >29 ]

Internet Gateway:

Internet Gateway helps our VPC instances to get connected to the internet
It scales horizontally and is HA and redundant
Must be created separately from VPC
Internet Gateway is also NAT for the instances that have public IP4V
Internet Gateway does not allow their own internet, we will also need a route table.
One Internet Gateway per VPC

Route Table:

The route table controls the ingress traffic.
0.0.0.0/0 with an internet gateway helps to connect to internet.

NAT Instance (Network Address Translation):

(Outdated but in exam topic):

Allow instances in private subnet to connect to the internet
The NAT Instance must be launched in a public subnet and must.
Must disable the EC2 Flag: Source/Destination check.
Must have Elastic IP attached to it.
Route table must be configured to route traffic from private to NAT instance.

Note: Search for NAT OS while selecting OS.

NAT Gateway:

AWS Managed NAT, higher bandwidth, better quality and no admin.
Pay by the hour for usage & bandwidth.
NAT is created in a specific AZ, uses an EIP.
Cannot be used by the instance in the same subnet (Only from other subnets).
Requires an Internet Gateway (Private Subnet ⇒ NAT ⇒IGW).
5GBPS of bandwidth which will automatically scale up to 45GBPS.
No Security Groups needs to be managed/required.

NAT Gateway HA:

NAT Gateway is resilient. But only within a single AZ
Must create multiple NAT Gateway in multiple AZ’s for fault tolerance.
There is no cross AZ failure needed because if AZ goes down it doesn’t need NAT.
https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-comparison.html

Attribute	NAT gateway	NAT instance
Availability	Highly available. NAT gateways in each Availability Zone are implemented with redundancy. Create a NAT gateway in each Availability Zone to ensure zone independent architecture	Use a script to manage failover between instances
Bandwidth	Scale up to 45MBPS	Depends on instance bandwidth
Performance	Software is optimized to handle the NAT traffic	A generic Amazon Linux AMI that's configured to perform NAT.
Cost	Charged depending on the number of NAT gateways you use, duration of usage, and amount of data that you send through the NAT gateways.	Charged depending on the number of NAT instances that you use, duration of usage, and instance type and size.
//TODO

DNS Resolution in VPC:

enableDnsSupport (=DNS Resolution Setting)
Default is true
Helps decide if DNS resolution is supported for the VPC.
If enableDnsSupport=true than queries to AWS DNS server at 169.254.169.253 IP.
enableDnsHostname (=DNS Hostname Setting).
By default it's false when you create a new VPC (for default VPC it is set to True).
Won’t do anything if enableDnsSupport=false, it requires the enableDnsSupport=true.
If it is set to true then assign a public hostname to EC2 Instance if it is public.
If you use custom DNS Hostnames in a private zone in Route53 then both this flag needs to be set to true.

Network ACL & Security Group:

The Network ACL is at subnet level, so the allowing/denying is done at subnet level.
Unlike the Security group it has deny rule as well.
Security Group is Stateful & NACL is Stateless.
If inbound rule is defined in Security Group then that rule is automatically added in Outbound rule
NACL are like firewalls which control the traffic from & to subnet.
The Default NACL allows everything outbound and everything inbound. (Does not restrict anything).
One NACL per subnet and new subnets are assigned to default NACL by default.

Defining NACL:

Rules have a number from 1-32766 and higher precedence with lower numbers.
Eg: #100 allow <IP> & #200 deny <IP>, the IP will be allowed since precedence is given to a lower number.
Last rule is an asterisk (*) and denies the request if no match is found.
AWS recommends adding rules by increment of 100.
Newly created NACL will deny everything.
NACL is a great way of blocking any specific IP at the subnet level.
https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Security.html

Friday, September 11, 2020

Key Services Focus for AWS Solution Architect Exam (Associate level)

AWS Solution Architect - Associate :

IAM ------------------------------------------------------------> 4 to 7 questions

S3 ------------------------------------------------------------> 4 to 6 questions

Non S3 (CloudFront, Snowball, Storage Gateway) -----> 3 to 4 questions

Route 53 ------------------------------------------------------> 2 to 3 questions

DB Service ---------------------------------------------------> 3 to 4 questions

VPC ------------------------------------------------------------> 8 to 10 questions

Managed Services (SQS, SWF, SNS) ---------------------> 4 to 7 questions

Directory Services and Federation -------------------------> 2 to 3 questions

All other Services --------------------------------------------> 2 to 3 questions

White Paper ---------------------------------------------------> 6 to 8 questions

Exam Preparation: (3-6 weeks):

1. First phase: [30 hours]

Focus on 40% Videos & 60% hands on

2. Second Phase: [20 hours ]

Focus on 30% Videos, 30% hands on & 40% going through FAQ's

3. Third Phase: [20 hours]

Focus on 25% Videos, 25% hands on & 50% going through FAQ's & whitepapers

4. Forth Phase: [4-5 hours]

Mock Exams

5. Fifth Phase: [130 mins]

Final Exam.

Tuesday, September 1, 2020

AWS - Different Messaging Services [SQS/SNS/Kinesis]

Messaging Service:

Direct communication (Synchronous)
Communication through queue in between service (Asynchronous)

Synchronous can be problematic in case of sudden spike in traffic.
Ex: a video uploading service which accepts 10 videos at one time & all of sudden 1000 videos are uploaded which will lead to service crash.
So in this case decoupling cas solve this problem

Using SQS → queue model
Using SNS → pub/sub model
Kinesis → real time streaming model

These services can scale independently from our application.

SQS: Simple Queuing Service

Here producer will send/push the messages to SQS Queue and Consumer will poll the messages from the SQS Queue.

SQS Standard Queue:

One of the oldest Services in AWS (10 years old).
Fully managed service, to decouple the application.
Can have duplicate messages (at least once delivery, occasionally).
Can have out of order messages (best effort ordering).

Attributes:

Unlimited throughput, unlimited number of messages in queue.
Default retention of messages 4 days and upto 14days.
Low latency (<10ms on publish and receive).
Limitation of 256Kb per message sent.

SQS Producer Messages:

Producer to SQS using SDK (sendMessage API).
The message is persisted in the queue until the consumer reads it & deletes it.
Default retention of messages 4 days and upto 14days.
Example: e-commerce website (send an order to be processed)

SQS Consumers Messages:

Consumer (running ec2 instance or lambda function or even on prem instance).
Consumer polls for messages to SQS Queue.
Consumers can receive up to 10 messages at a time.
Process the message (ex: insert the message into RDS).
Delete the messages from the queue using DeleteMessage API

SQS Message Visibility Timeout:

When one consumer consumes this message it will be unavailable for another consumer till message visibility timeout.
The first consumer can also extend the visibility timeout by calling ChangeMessageVisibility API.
If the visibility timeout is high(hour) and if a consumer crashes it can reprocess the message.
If the visibility timeout is low(seconds) then the chances of duplicate message are high

SQS Dead letter Queue:

If the message fails to process within message visibility timeout then the message is sent back to the queue.
We can set the threshold on how many times this message can be sent back to the queue.
After the MaximumReceives threshold is exceeded the message goes into the dead letter queue.
The Dead letter queue is useful for debugging purpose
Make sure to process the dead letter queue before it expires.
We can set the retention period to this dead letter queue.

SQS Delay Queue:

Delay a message up to 15mins (Consumers don't see it immediately).
Default is set to 0 (Delivery Delay)
Can set default to queue level
Can override the default value on send using DelaySeconds parameter.

SQS FIFO

First In First Out (Ordering of message in a queue).
Ordering guarantee.
Exactly once send capacity (by removing duplicates).
Messages are processed in order by a consumer.
Limited throughput: 300 msg/s & without batching 3000 msg/s

SQS with Auto Scaling Group:

SNS (Simple Notification Service):

Pub/Sub Service: meaning publish and subscriber service.
The event producer only sends message to one SNS topic.
As many event receivers(subscribers) we want to listen to SNS topic notifications.
Each subscriber to the topic will receive all the messages. (new feature to filter out the messages but by default every one will receive it)
Upto 10million subscribers per topic (high scale)
1lac topic limit.
Subscribers can be (Protocol):

SQS
HTTP/HTTPS
Lambda functions
Email notifications
SMS messages
Mobile notifications

SNS Integrates with lot of AWS services:

SQS
Cloud Watch for alarms
Amazon s3 (on bucket events)
Auto Scaling group event.
CloudFormation (upon stack changes) etc…

How to publish SNS message:

Topic Publish:(using SDK).

Topic Creation.
Create a Subscription
Publish to the topic

Direct Publish: (For mobile apps SDK)

Create a platform application
Create a platform endpoint
Publish to the endpoint
Works with Google GSN, Apple APNS, Amazon ADM etc.

SNS Security:

Inflight Encryption
At rest encryption using KMS
Client side encryption
Access control using IAM policies to regulate access to the SNS API.
You can define SNS policies similar to S3 bucket policies

Useful for cross account access to SNS topic
Useful for other services to write to SNS topics (S3).

SQS + SNS Fan Out:

Push once into SNS and receive all in SQS queues that are SNS subscribers
Fully decoupled no data loss
SQS Allows for: data persistence, delayed processing and retries of work.
To make this thing work the SQS needs to have allow policy to give access to SNS to write

* Important Note: SNS cannot send messages to SQS FIFO Queues (AWS Limitation)

If asked in exam about this simply rule out since it’s not possible as of today:

AWS Kinesis:

Kinesis is managed alternative to Apache Kafka
Real time Big data collection tool.
Great for streaming processing frameworks like Spark, NiFi etc.
Data is automatically replicated to 3 AZ
Three sub products:

Kinesis Streams: low latency streaming ingest at scale.
Kinesis Analytics: This is to perform real time analytics on streams using SQL.
Kinesis Firehose: Load streams into s3, RedShift, ElasticSearch etc.

On a high level kinesis is in the middle to stream and then to perform analytics over it and then to store somewhere for long time
#TODO need to add remaining features of Kinesis

Techno Share