Techno Share: August 2020

Monday, August 31, 2020

Storage Comparison

Saturday, August 29, 2020

AWS By default available quota

Friday, August 28, 2020

Route 53

Route 53:

It is managed DNS (Domain Name Service)
DNS is a collection of rules & records which helps client to reach a server through domain name.
In AWS the most common records are:

A: hostname mapped to ipv4.
AAAA: hostname mapped to ipv6
CNAME: hostname mapped to another hostname.
Alias: hostname mapped to AWS resource.

AWS ATHENA (For serverless analytics)

AWS ATHENA:

It is a serverless service to perform analytics against Amazon S3.
Uses SQL query language.
Has a JDBC & ODBC driver.
Charged per query & amount of data scanned.
Supports CSV, JSON, PARQUET, AVRO, ORC

USE CASES:

Business Intelligence
Analytics
Reporting
Analyze & Query VPC Flow Logs, ELB Logs, CloudTrail etc.

EXAM TIPS:

Anything related to analyze think of Athena

AWS Elastic Compute (EC2)

Instance Launch Types:

On Demand Instance Type:

Short Workload predictable pricing

Reserved Instance Type:

Known amount of type (at least say 1 year we will keep this instance)
Minimum for 1 year
Types:

Convertible Reserved Instance (long workload and flexible instance).
Scheduled Reserved Instance: Example I need an instance for every Thursday for a time period 3PM to 6PM and at least for a year.

Spot Instance Type:

Short Workloads for cheap and can lose instance (less reliable).

Dedicated Instance Type:

No other customer can share the underlying hardware instance.

Dedicated Hosts:

Book an entire physical server, control the instance placements.

Instance Types:

R -> RAM
C-> Compute
I -> I/O
M -> Medium
G -> GPU
T2/T3 -> Burstable (up to capacity)
T2/T3 -> Unlimited Burst

T2/T3→ Once it reaches the threshold it uses CPU credits (bursting) and once credit is over you lose performance

T2/T3 Unlimited Burst → you pay extra post CPU credits but you don’t lose the performance.

AWS RDS (Relational Database Service)

AWS RDS [Relational Database Service]:

It is a managed DB service for DB that uses SQL as a query language.
It allows you to create DB in cloud that are managed by AWS

Postgres
MySql
MariaDB
Oracle
Microsoft SQL Server
Aurora (AWS proprietary Service)

Advantages of using RDS over deploying DB on EC2.

Automated provisioning & OS patching
Monitoring dashboards
Continuous backups & restore to specific timestamp (Point in time restore)
Read replicas for improved read performance
Multi AZ setup for Disaster recovery
Maintenance window for upgrades
Scaling capability (vertical as well as horizontal)
Storage backed by EBS (GP2 or IO1)

Note: But you can’t ssh to the instance.

RDS Backups:

Backups are automatically enabled in RDS.

Automated backups:

Daily full backup of database (during maintenance window).
Transaction logs are backed-up by RDS every 5 mins.
=> Above two points gives you the ability to restore at any point in time [from the oldest backup to the last 5 minutes].
7 days retention period (can be increased up to 35 days).

DB Snapshots:

Manually triggered by the user.
Retentions of backups for as long as you want.

Read Replicas for read Scalability:

Read replica creates multiple db instances with read access and those async with the master DB instance.
Eventual read consistency (meaning you won’t get latest output immediately)
Use case:

You have a production database that takes usual normal load
And now you want to run a reporting application to run some analytics.
You create a read replica to run the new workload there
Doing this the production application is unaffected.
Read replicas are used for SELECT (=>READ) only kind of statements. (Not for INSERT, DELETE, UPDATE)

RDS Read Replicas Network costs:

In AWS there is a network cost when the data goes from one AZ to another AZ
To reduce the cost, you can have your Read replica in same AZ.

RDS Multi AZ [Disaster Recovery]:

SYNC REPLICATION
ONE DNS NAME [automatic app failover to standby].
Increase availability.
Failover in case loss of AZ, loss of network, instance or storage failure.
No manual intervention in app.
Stand by becomes master in case of failover.

Note: Not to be used for scaing.

The Read Replicas can be setup as MultiAZ for disaster recovery.

RDS Security & Encryption:

At Rest encryption.
Can encrypt master as well as read replicas using AWS KMS.
Encryption has to be defined during launch time.
If the master is not encrypted the read replica cannot be encrypted.
Transparent data encryption(TDE) which is available for Oracle & SQL Server.
If flight encryption :

SSL certificate to encrypt data in RDS while in flight.
Provide SSL options with a trust certificate when connecting to the database.
To enforce SSL to users there is explicit parameter for postgres & mysql

Amazon Aurora:

AWS proprietary technology(not open source but)
Postgres & MySql both are supported as Aurora DB that means drivers will works as if Aurora is Postgres or MySql DB.
Aurora is “AWS Cloud Optimized” and claims 5x performance improvement than MYSQL on RDS, over 3X the performance of postgres on RDS.
Aurora storage automatically grows in increments of 10GB, upto 64TB.
Aurora can have 15 replicas while MySql has 5, and the replication process is faster (sub 10ms replica lag).
Failover in Aurora is instantaneous. Its HA by default.
Aurora costs more than 20% of RDS, but it is more efficient.

Aurora HA & Scaling:

AWS Simple Storage Service (S3)

S3 FEATURES:

Tiered Storage available.
Lifecycle Management.
Versioning.
Encryption.
MFA For Deletion
Securing data using Access Control Lists & Bucket Policies.

S3 STORAGE CLASSES:

S3 Standard:
S3 IA (Infrequently Accessed):
S3 One Zone IA: (RRS old service)
S3 Intelligent Tiering: (ML on how frequently accessed and moves from S3 Standard to IA).
S3 Glacier: For data archiving. Mins to hours for retrieving.
S3 Deep Glacier Archive: Lowest storage class, retrieval is 12 hours & 48hrs for bulk

S3 COMPARISON:

S3 CHARGING:

For Storing.
On Requests.
Storage Management pricing (Type of class for storing).
Data Transfer Pricing. (Cross Region Replication)
Transfer Acceleration. (Edge Location)

Exam Tips:

S3 is Object Based (i.e allows you to upload files).
Files can be from 0KB to 5TB of size.
There is unlimited storage.
Files are stored in a bucket.
S3 is a universal namespace i.e the name should be unique across the globe.

Since it is Object Based it is not suitable to install OS or database on S3.
Only to store files.
On Successful Upload you will get HTTP 200OK
You can turn on MFA for deletion. So that the data won’t get deleted accidentally.
Control access to bucket using bucket ACL or bucket policies

KEY FUNDAMENTALS OF S3:

Key: Filename (Simply name of the file).
Value: Simply a data and is made up of a sequence of bytes.
Version ID: Important in terms of versioning.
Metadata: Information about the data you are storing (data about the data).
Sub Resources:

Access Control Lists. (permission)
Torrents.

Consistency:

Read after Write consistency for PUTS of new object.
Eventual Consistency for Overwrite of PUTS and DELETES (can take some time to propagate the changes.)

IMPORTANT NOTE:

Read the FAQs.
Read the whitepapers.

S3 Pricing Tier:

What drives the price:

Storage.
Requests & Data Retrieval.
Data Transfer.
Management & Replication.

What are the different Tiers:

S3 Standard.
S3 IA
S3 One Zone IA
S3 Intelligent Tiering
S3 Glacier
S3 Deep Glacier Archive.

Understanding how to get best out of S3.

S3 Standard -> most expensive.
S3 Intelligent Tiering.
S3 IA
S3 One Zone IA
S3 Glacier.
S3 Deep Glacier Archive.

Tip: Avoid S3 Standard as much as possible.
Scenario based questions.

S3 Security & Encryption:

By Default all the buckets are private
You can setup access control to your bucket using:

Access Control Lists → Object Level
Bucket Policies → Bucket Level

S3 buckets can be configured to create access logs which logs all requests made to the s3 bucket. This can be sent to another bucket or in another bucket from another account.
Note: S3-> access bucket monitoring. Logging service enable (new bucket for logs)
Encryption: (https) SSL/TLS
Encryption at rest (Server Side)

S3 Managed Keys-- SSE S3 (Server side encryption s3)
AWS Key Management Service Managed keys (SSE-KMS)
Server Side Encryption with customer provided keys- SSE-C.

Client Side Encryption (upload encrypted file).
VPC endpoint for private access to s3
MFA on delete
Pre-signed URLs (temporary urls created for specific time period)

S3 Versioning

Stores all versions of all objects including all writes and even if you delete an object.
Great backup tool (mysql backup versioning)
Once enabled versioning it cannot be disabled. Only be suspended. You will need to delete the bucket and recreate it to completely disable it.
Integrates with lifecycle rules.
Versionings MFA for delete (security)

Tips:

Stores all the versions of object (including write and delete )
Backup tool
Once enabled cannot be disabled.
Integrates with Lifecycle rule
Versioning MFA capability for delete which uses multi-factor authentication for delete.

S3 Lifecycle Management:

Automates moving objects between the different storage tiers.
Conjunction with versioning
Can be applied to the current version as well as the previous version.

Useful Links:

https://d1.awsstatic.com/whitepapers/Security/AWS_Security_Best_Practices.pdf

S3 Object Lock:

To secure your S3 object from read/write or delete.
It’s a write once and read many models (WORM).

S3 Object Lock Governance mode:

User can’t overwrite or delete an object or alter an object unless it has special permission to do so
In Governance mode you protect your objects against being deleted by most of the users, but you can still grant some permission to alter the retention settings or delete the object if necessary.

S3 Object Lock Compliance mode:

In this mode the object can’t be deleted or can’t be altered by any user including the root user.
Can’t change the retention mode and its retention period can’t be shortened.
Compliance mode ensures that the version of the object can’t be overwritten or deleted for the duration of retention period.

Retention Period:

Legal Hold:

Glacier Vault:

CLI -> /folder1/sub1/1

S3 performance:

What is S3 Prefixes?:

https://<bucket_name>/folder1/subfolder1/test.jpeg

Prefix is folder1/subfolder1

https://<bucket_name>/folder2/subfolder1/test2.jpeg

Prefix is folder2/subfolder1

https://<bucket_name>/folder3/abhay.jpeg

Prefix is folder3

S3 Performance:

S3 Has extremely low latency
You can get first byte out of S3 in 100-200 milliseconds
You can also achieve a high no of requests:

3000 PUT/COPY/DELETE/POST req per second per prefix.
5500 GET/HEAD req per second per prefix.

You can get better performance by spreading your reads across different prefixes.

Eg: If you are using 2 prefixes you can achieve 11000 reqs per second
Eg: If you are using 4 prefixes you can achieve 22000 reqs per second

Note: more no of prefix you get better performance

Multipart uploads:

Parallel upload

S3 Byte range fetch (download)

Speedup download
You can design how much % of data you want to download (partial amount of file download)

S3 Limitation while using KMS

Quota issue [file-> upload → encrypts & download→ decrypt]
Limit

S3 Select:

Allowing to use a sql select query.

Eg: csv file zipped in bucket. The select will grep the data directly.

Get data in rows & columns
Retrieve only a subset of data.
Simple Sql expressions
Save money on data transfer and increase speed.

Glacier Select:

Same like S3 Select but can be used on Glacier

Techno Share