All About Amazon Web Services(AWS)

AWS is made up of Regions and Availability Zones.

AWS Essentials

Storage Basics On AWS

There are different types of storage available in Amazon Web Services.
here i will give core understanding of storage devices when we are thinking to architecting our applications in AWS.

1.S3(Simple Storage Service-S3) : It's a scalable object & File storage.You can store objects & Files such as pdfs,jpgs, anything that a regular operating system files.

You can store unlimited data as files in objects(objects are from 1 byte to 1 TeraByte)
you can upload your data form your off-site to your AWS S3 account.
you can do  amazon server side encryption for your data.
you can host static files and websites as well.

The data in S3 automatically syncs to different availability zones in a region. So that it can be useful for the high available purpose as well.Hence S3 can be used for an high available environment.

You can host private content using signed URLS/CDN.
S3 comes with LifeCycle Policies. That is you can assign a policy to a bucket : Bucket is in were we can store our objects : Policy is like for example delete all objects in a particular bucket after 10 days.

We can Use Amazon Glacier with S3 to delete backups after some amount of time.
S3  also used as Versioning.

AWS S3 is a very much Region Specific not an availability zone specific.

2.Amazon Glacier: 

It's an Archival Storage type.
Used for the infrequently accessed data.
Integrates with the S3 life-cycle policies.
Amazon Glacier costs 1Cent for 1GB

3.EBS(Elastic Block Storage)

EBS is a block storage device.You can write File System on top of EBS.when it comes to Linux you can create ext5 type File systems & other type of File Systems just we usually do when we attaching a regular HardDrive to a physical machine.
EBS is Redundant but it applicable to only your availability zone.It's only redundant to your specific availability zone.
So EBS automatically replicates data only to your specific availability zone. So when it comes to high availability we have to customize there to work across different availability zones.
But in S3 the data will be replicates to all availability zones in a region.

 4.Ephemeral/Instance Store:

Ephemeral Storage type. We can see this storage type only when we can use Instance Store EC2 Instance.
This is temporary storage. And then what happens when we use a temporary storage inside an EC2 instance is , when we stop that particular EC2 instance the temporary storage is erased & the data can be lost.

5.Database Storage:

Relational DataBases(RDS) & NoSQL Databases(Dynamo DB) are also considered as a Storage in AWS
Dynamo DB(NoSQL DB) is an alternative to the mongoDB.
Relational Databases..MySQL,PostgreSQL,Oracle DataBase(BYOL),SQLServer(BYOL)
           BYOL: Bring Your Own License
If you already purchased a License of Oracle & Microsoft SQL server you can bring it AWS as a Relational DataBase.                    
Security On AWS

AWS Follows shared security model . That is User is responsible for certain security aspects and AWS is responsible for certain security aspects.

Identity Access Management: User Management & Access , Any users that access your account .You will be able to create groups , users .
You can use this to integrate Single-Sign-on using Federated Users, You can Create temporary User access ; Since you don't have to create New AWS Identity Access management whenever a contractor or an employee needs access, you can actually use Single-sign-on credentials to create temporary access.
Password/Key Rotation: For example if developers using passwords,access keys & secret keys to develop the code. You are responsible for rotating the passwords, access keys & secret keys based on the certain period of time.The rotation policy is based on the more individual/company policy.

Multi-Factor-Authentication needs to implemented since we are accessing console through the web
AWS Trusted Advisor help in potential ways to improve the security in the whole AWS infrastructure level.

Security groups -- instance level Firewall , helpful in EC2 & Relational Database services.
Access Control Lists -- resource level permissions/Controlling network traffic : This allows us to do is who can have access to ......for example specific or single EC2 instance, same-thing for an AWS bucket.

Virtual Private Cloud(VPC): Allows us to build or own private cloud.

AWS is responsible for Physical Host level protection.Physical environment security.
AWS is responsible for decommissioning old storage devices.

Port scanning using linux tool such as nmap etc.

Understanding AWS Global Infrastructure

Whenever we login to AWS console our default region would be US-East(North Virginia)

Regions: Regions are the physical locations. A grouping of AWS Data Centers ( Availability zones)
the available regions are as below

Availability Zones: It is an individual data-center with in an Region. A region is made up of multiple availability zones. For example US-EAST Region North Virginia has below availability zones.

Edge Locations:
Edge locations are built to deliver the object content. These Edge locations main use is to deliver cached data across the world.Mainly CloudFront CDN uses this service to deliver content.

There are Global Edge locations & Regional Edge locations. When ever you are building an application , your data will be cached in edge locations in which you mentioned.

For example an user coming from india while that user automatically(if you have configured Cloud Front globally) served from the india edge location. that will effectively reduces the latency.

Regions & Availability Zones
Edge Locations:

Overview of amazon web services -1

Scalability : 
Scalability is the fundamental property of the cloud. Because when we design or architect our infrastructure we wan a make sure it is scalable.

What it means is ability for the system to expand according to the workload demands.
Example: As our workload hits on an application ,  as visitors to a website increases our infrastructure should automatically scale to handle that load.

Scalable systems are the
1. Reselient
2.Operationally efficient. It will scale-up to handle the load/It will scale down when no more load is required.
3.Cost-effective as the service grows.

Fault Tolerant:
It is the ability for your system to operate without interruption in the event of service failures.What it means is....
For example: If an AWS datacenter goes down or somebody deletes your VM inside your AWS infrastructure & that should not affect your infrastructure's uptime or ability to operate. Infrastructure should respond to that and quickly auto heal itself to that problem without noticing to the end customers any type of interruption.
Fault-Tolerant can be achievable using below
1.Availability Zones
2.Auto Scaling

Elasticity also the fundamental property of the cloud.
Elasticity is an ability for infrastructure to scale up & down automatically to adopt to the workload.

Auto Scaling.

1.Proactive Cycle scaling:
For example let's say everyday from 11AM to 11PM only there is 90% of the load/traffic to your application/website happens. You knew everyday heavy traffic will occur , In this time period we can setup an auto-scaling policy to automatically spin-up additional instances to the Load-Balancers to automatically handle the workload/Traffic. After 11PM the instances automatically spin-down and detached from the LoadBalancer as well.

2.Proactive event-based scaling:
Proactive event based scaling is knowing there is large event coming up.So what is an large event it may be a presedential election, E-commerce site big sale or  a new year or launching an application,book,video,mp3 etc... you are expecting huge load but only at this launch event. We can scale the instances based on the event.

3.Auto-scaling based on demand:
Auto scaling on demand will look at your CPU utilization or what ever measurement you decide it can be a CPU,it can be a memory & it can be pro-actively launch additional instances based of the your current infrastructure load. For example : If you have a load balancer with two instances and you have setted the CPU threshold as 50% . Auto Scaling policy automatically spin-up another instance based of the threshold you mentioned.And this instance gonna scale down when CPU load decreases.

                Some of the Service available in AWS and its brief overview

1.Amazon S3:
S3 is one of the first service launched from from AWS. It is an storage service.
Designed for 11 9's durability that is 99.99999999999%(11 9's) and 2 9's availability i.e. 99.99%...
Durability means safe+secure+redundant.
It is automatically highly available. Cause it automatically syncs asynchronously all of your data to all availabilityzones  in a region. It is a file storage service.
Costs --> per Gig usage on monthly base.Cost will change based on the Data Transfer In and Data Transfer Out.
RRS--> Reduced Redundancy Storage , It is a best option for if you don't want your data would be won't be durable i.e. not be 99.999999999(11 9's) ... you can save costs effectively if you use this option.
Life Cycle policies and object versioning --> In life cycle policy you can set up as for example like if you want to send the one week backedup data to glacier and then you wanna delete that data ... you can achieve this using the life cycle policy.
Verioning --> unlimited versions...for example: If you have versioning enable on your S3 bucket.. if you delete a file no worries sit back and relax, cause all of the previous versions of file are available for you.

2.Amazon Glacier:
It is useful for archiving data. If you want to access the data in glacier you have to submit job for this , it will take 5 to 6 hours to process the job and to give the results.
So it is only useful for data that not frequently accessed.
It costs very very less..i.e. $0.01 per Gig per month.
It integrates with amazon s3 life cycle policies for archiving.

                       Instances and its types

There are different available instance types. 
On-demand Instances
Spot Instances
Reserved Instances.

On-demand Instances: 
We can just firedup an instance when ever we want. We have to pay the cost of the instance upon our usage.Let's say for example : we need to fireup a 4 instances to process the application for only one hour and we are only need to pay for one hour. We just firedup it is an on-demand instance for this scenario.

Spot Instances:
Spot instances allows us to bid or name our own price for unused EC2 capacity.
The difference from ondemand instance to spot instance is that we are not guaranty to be able to use the spot lets say we are bidding on a price , we can use it for as long as the capacity is not needed, When the capacity is needed amazon will automatically take your spot instances from you and your application stop working.
These instances useful for jobs that can be interrupted and resumed at a later point of time when we can get more spot capacity . This is actually great for if we have jobs in the backend that are not time consuming , that are do not need to be running consistently  and we wanted to save the money during the process impair of that, this is were we might use spot instances.

Reserved Instances:
Reserved instances can guaranty you with specific amount of computing power in an availability zone.There is different reasons we may wan a do this.
For example our architecture has one instance which is running on 24/7 and 365 days a year. That is we are going to have a heavy utilization since we knew it in advance we can buy this one instance in advance from amazon for one year which savea costs 40% from on-demand instances.
We are going to pay in-advance what we gonna use.
Three types of Reserved instances.
    (i). Heavy utilization instances
    (ii). Medium utilization instances
    (iii).Low utilization instances.

Amazon EC2

It is virtual Instance Computing platform.
Allows us to RUN ans manage our own instances using OS as windows,Linux etc whatever
Can choose to host DB servers instead of using DB services.

Not highly available out of the box, It is the admin responsibility to design highly available infrastructure using available services in EC2
   (ii).Elastic Load Balancer

EC2 uses two storage types.

1.Instance store instances(ephemeral storage):
Temporary storage ; which means the storage only lives in a life of an instance.When you terminate your instance the data is GONE.

2.EBS Backed instances(Elastic Block Store) 
If you terminate your instance , your ROOT information can actually live on.There is an additional volume stores that are network attached storage and we actually use those to backup our instance data, An EBS storage type is actually backed by amazon s3 so that consistent store.

Amazon EC2 security:

It is actually a shared security model.

    AWS manages Hypervisor & physical layer security.
                       DDOS protection
                       Port scanning protection(even in our own environment)
                       Ingress/Egress network filtering
  CUSTOMER should manage software level security for the instances.
                       Security Groups
                       IP Tables/Fire Walls
                       By default security groups are created 100% lock down.
                       Use encrypted file systems to store encrypted data at rest
                       Apply SSL cert to the ELB.

Amazon Virtual Private Cloud(VPC)
                          Ability to provision your own private cloud on AWS
                          Private/public subnets
                          NAT Instances
                          Network Layer Security with ACL's (Access Control Lists)
                          Elastic Network Interface cards
                          ELB can service to instances not connected to an Internet Gateway
                          VPN Connections

Amazon Relational Database Service(RDS):
RDS is fully managed DB service. What it means is you can host mysql servers,oracle servers,Microsoft SQL servers and even postGre SQL servers.

Inside of relational Database servers
     you don't have to manage the underlying OS for example Linux etc
     you don't have to manage the backups for that for the mysql dumps.
     you don't have to manage the updates for the mysql sql service.
     you don't have to manage the updates and security updates for your instance
All above amazon does it for us.That is RDS service will manage your backups,It will apply your updates,It will even apply major version updates such as mysql5.1 to mysql5.5.

You pay per Hour,Per Giga Byte Storage, Per Hour/Read/Write from & into the Database
By design itself RDS in MultipleAZ enabled feature available .
i.e. RDS automatically replicates data into a backup instance in a different availability zone. In the event of availability zone failure for your primary relational Database instance : It will automatically change the IP address for the CNAME(You will receive a CNAME host) to the Backup instance.And it is automatic fail-over for you.
RDS is highly available by design.
RDS has automatic backups as long as we uses InnoDB storage engine.(Now it is only available for mySQL database)
RDS performs automatic version updates as well.

Amazon CloudWatch

Amazon Cloud watch is used to manage & monitor all AWS services.
It is not only monitoring tool. it plays vital role when it comes to Auto-Scaling.
The cost of cloudwatch is based on how many metrics you used and in what intervals you using. the cost is mainly based on if you getting metrics every 60 seconds rather than default 5 minutes.
You can set up a monitoring for to manage SQS  i.e. amazon simple queue service.
You can setup cloud watch monitoring alert for CPU utilization so that your auto-scaling script will know that and spin-up the instances when-ever needed.
You can setup monitor in cloudwatch for your AWS billing so that you can not worry about huge bills and you can do lot of cost-saving.
You can also integrates cloud with to amazon SNS(Simple Notification Service) to get text messages and E-mails to you.
The real cool thing is everything above all will be also done using an API.

Cloud Formation

Cloud Formation allows us to build our entire infrastructure as a template.That is infrastructure as code or code as infrastructure.We are going write very very simple JSON code that simply says fire-up 3 instances in an VPC with the specific security group rules. Add Auto-scaling to it, Add automatically s3 bucket and add our domain to Route53, Create a Relational Database Service.
The real cool thing is you can version this and keep this template of our infrastructure inside of code.

Amazon Elastic BeanStalk

Amazon Elastic Beanstalk very useful for developers. Basically we are able to provision Devlopment/QA/Staging/Production environments by integrating with a Amazon Elastic Beanstalk toll that in turn integrates with the GIT.
And just by pushing code to the production Repo for example will automatically fire-up your infrastructure which will take your PHP/Python/Java/.NET and deploy that on the instances and automatically have it in load balanced and auto-scaling & Developing  and creating your Relational Database Service.It deploys the infrastructure based on the code.
It is great tool for the developers and it is automatic.

Amazon Route53

Amazon Route53 is a DNS service in the cloud.
For example lets say we need to host a DNS for we can do this using the Route53.It's Highly available by design. It means redundancy built into it , it is fault-tolerant , you pay 50cents per domain/per month.
Route53 in integrates with the EC2 and S3 actually.
Route53 has different routing policies
(i).weighted routing policy: that is may be 50% of the requests to that specific DNS entry to go to your AWS infrastructure and other 50% will go to your corporate network.may be this can help to test the load.let's say you are rolling-out to a new application , you only want to 10% of your customers will go to as you role it out because you don't want issues. you can setup to say 10% requests should go to and 90% requests will go to 
(ii)Latency Routing Policy:
we can have latency test which sends to specific location which ever has least amount of latency . So we can ensure that were ever the users coming from they are getting the closest i.e. they are going to routed to the closest data-center & closest region to them .
(iii)Failover policy:
It is a great tool. It will say for example if an AWS service is unavailable then failover to another AWS service.

Amazon Storage Gateway

Amazon Storage gateway connects local datacenter s/w appliances to the cloud based storage such as amazon S3.
The data can be stored in the S3 cache that is frequently is called Gateway-cached volumes.

All the data can be stored locally as well. This data backedup asynchronously in snapshots to S3.
It is called Gateway-Stored Volumes.

Amazon Import/Export
For example if you have 200 TraBytes of data and you don't have sufficient band width to send this ti cloud. Then Amazon Import/Export is an apt option for you.You can say were you want to store this large data like in amazon's data warehouse i.e. in amazon Redshift or amazon S3 etc
You can also use amazon import/export as the off-site backup for the disaster recovery.

Amazon Dynamo DB

amazon Dynamo DB is really a cool tool that developed by AWS . It is a Fully managed NoSQL service.
two concepts here 1. Fully Managed 2.No SQL
Fully Managed:
Fully managed means you no need to manage the anything whatsoever even more fully manageable than the relational database service(RDS).
there is an interface to create & view tables.there is also an off-line client which allows us to i.e. developers to develops apps using client in offline i.e. in their own laptop  but you can integrates with an API.
Dynamo DB Asynchronously backedup to the all the available datacenters in a region.It automatically based on the demand.
By design DynamoDB is Fault-tolerant,distribute,highly available.It is 
auto-scaled based of the storage needs.
Even uses the DynamoDB as well. It can be easily integrates with the Elastic MapReduce(Hadoop). If you want to do high computing calculations. you have amazon dynamo DB data that can be pushed to the Elastic MapReduce to perform the high computing calculations.

How to access the AWS Services

There are three ways to access the all AWS services.
(i).Amazon Control Panel : Using AWS console to fire up instances(Visual Representation)
(ii).Amazon API : This allows us to script our infrastructure, Infrastructure as code.
(iii).Amazon Command Line Tools : This allows us to script our infrastructure,Infrastructure as code.

Amazon Simple Work Flow

It allows us to track the workflow executions.
For example take a process. Every single process has a different steps it needs to go threw  that's what SWF can do for us. We can see were the process is in Work Flow service ... i.e. the process is executing which step and which step it has completed successfully, all this we can see simple work flow service.

Amazon SQS(Simple Queue Service)

It is a great way to Decouple(separate) your infrastructure and build scalable applications.

Example: when we might wan a use this. We are using for the decouple our infrastructure/systems we can auto scale based on queue size . Queuing guaranties to deliver at least one message , you can't potentially get duplicates.So what's backup and what this Simple Queue Service does is,

In vision a text message or a file , a text file that contains a JSON data and what happens is you own a image manipulation company or application and the user uploads an image and what's gonna happen is you gonna apply a filter that your service gonna apply a grey filters some sort of filters to that image. Now what's happens if you all of a sudden spike in load. From regular 100 users per minute to 100 thousand users per minute. Your application is running on AWS should be designed in order to compensate that usage increase, So what is the best way of doing this.

If your images been uploaded to an EC2 instance , You can't really scale that quickly. What's gonna happen is your customers gonna have a really gonna big backlog .
What we  would wan a do is to allow the user to upload the image in S3 , What happens when they upload image from amazon s3  your app will then create a little message that has JSON information which would be the UserId,UserName associate with that file and that file name and the location inside of S3.
And this information can be send to a queue i.e. amazon simple queue service. and this is the distributed service by amazon , we don't have to manage the any underlying software/hardware for it, but it's gonna be send into that queue.
And We gonna have the worker instances the other side of the queue that continually pole the queue which basically asks Hey do you have any work for me?
Each message contains work information , so it will grab that message and run on an ec2 instance were will connect to amazon s3 grab that users image and apply the filter and then update the information for users it's completed.
So what gonna happen is let's say we only have a 100 messages inside of a queue 100 user requests , you have 3 or 4 EC2 instances that continuously poling that queue and processing the information.

Let's think what happens that the queue goes upto 100 thousand messages , then use parallel processing because our infrastructure is decoupled , We are allowing users to upload that information they don't have to wait. They will be notified when it is done. They can leave the page and come back.
So all of the information sit in the queue, So we are not relying upon our worker EC2 instances . It's Okay if something fails because our information is in the queue , what happens is our infrastructure comes back (UP )or when we fire-up new instances to help with the load.
The instances gonna asks the queue what i am supposed to do , what is my role in this infrastructure , what am i supposed to do ? It's gonna pole the queue , It's gonna grab the message , it's gonna process the message , so if we have a 100 thousand messages,  we gonna use cloud watch to set up the metric which says as soon as we have these many messages start firing up the additional instances , so we can fire up the 10 to 30 instances automatically to start managing the queue.

Eventhough we have a big load increase your infrastructure in decoupled. So you can fire-up autoscale instances which then ask what am i supposed to do? thats is EC2 instances are asking what is my job is here? If All of your EC2 worker instances gonna die for someother reason due to availability zone failure. Firing up new worker instances , they would automatically pickup were they left of? So you have Fault-Tolerant application were there is no downtime or unavailability for your users.
You can even use SQS to process data information.You can auto-scale based on queue size. 

Understanding AWS Instance Types And Utilization  

There are differnce types instances available in AWS . Some instance types are CPU  intesnsive. Some instance types are memory intensive they specially designed for high memory loads.

Below are the core instance types available in AWS

1). T2 type instances are the relatively new launched on 2014 , an instance type it is backed by HVM virtualization type , So it requires HVM virtualization type AMI'S . And it is intended for consistent CPU workloads that do not use the full CPU often or consistently , so the t2 type instances incorporates this what's called burstable  CPU performance , So you have a base line performance. let's say you have a t2.micro , that t2.micro has a base line performance of CPU utilization at 10%. For the amount of time you are not using the baseline performance you are cruising called CPU credits. 
The CPU credits, per one credit-CPU credits that means your CPU can run one minute at 100% . So what happens when you run out of CPU credits, well You can only use 10% of your CPU which is the baseline performance. So clearly if we have high intensity CPU workloads , if we are using lot of CPU processors speed then T2 types of instnaces is not for us.
T2 instances are not intended for high volume database. high volume web servers,

2)C4 type instances are ideal for compute-bound applications that benefit from high performance processors.If you have an application that is utilising lot of CPU and requires consistence performance of high CPU usage C4 type instances are for you.For example Mongo DB server,high load database server,Applications that utilizes a lot of CPU utilization
3)R3 type instances:  High memory performance and also sustainable for the high bandwidth . So if you are using a memory caching engine or performing in-memory analytics for big data , you gonna wan a use the R3 instance type.
4)GPU:  Generally used for the parallel processing , generally for scientific engineering and other rendering type applications as a higher increased of GPU processor running for the Graphical Processing Unit. GPU's usually used for the these types of processing for example bitcoin  mining(digital asset and a payment system is peer-to-peer; users can transact directly without an intermediary)
5)T1 type instances: These instances provide small amount of consistent CPU resources and allow CPU increase in short bursts. T1 has an ability use lot more CPU than the T2 , But T2 instances are burstable consistent if you have a available CPU credits.

6)EBS-optimized: These type of instances  gives more priority to EBS-traffic in the sense EBS are network attached storage right , that network traffic will get priority over other traffic coming form your instance. So that ensures your EBS volumes you have faster input/output write-in , i.e. faster writing to the device .
7).M3 type instances:  It's a generic balanced of compute,memory & network resources , So if you need to have a significant amount of network as well as memory as well as CPU then these M3 type of instances are very useful.These types of instances has a pretty good average of all three(network,cpu,memory)

Understanding the T2 type instances:

T2 instance types(which requires HVM virtualization) provides a baseline performance for our CPU. These should only be used if we do not have a consistent need for high CPU computing environments. For example like Development environments.
One of the main advantages of T2 are you are paying very low for these instances but you are also acknowledging that , you gonna use less resources for these instances So development environments really gonna fit that need of t2 instances.

For example : A t2.micro have an Initial CPU credit of 30. For 1 CPU credit that is 1 minute that our CPU can run 100% or 2 minutes that  our CPU can run 50%.  CPU-Credits can earned per hour basis.
For t2.micro if you are not utilising ,
i.e. If you are using CPU under the line of your base-line performance you can earn the 6 CPU-credits per Hour.
If you are using over the base-line performance i.e. in t2.micro instance  case , then you are actually using the some of your CPU-credits.that means your CPU-credits goes down if your using more than 10% of CPU which is the base-line CPU performance in the t2.micro instance case.
So these calculations become tricky if you are using t2.micro in production environment.

If you want to setup a autoscaling for the t2 instances this could  be a little-bit of problem because you can't necessarily have autoscaling with metric set to  for example 40% CPU utilisation if your base-line CPU performance is only 10%(for t2.micro instances) and your base-line performance is 10% and you have used-up all your CPU-credits then your CPU processor will never get 40% for your autoscaling to fire-up new instances.
So there are lot of things we need to consider when we use t2 instance types.
And you can accumulate maximum CPU-credits per instance type.
For example t2.micro's maximum CPU credit balance is 144 has 10% base line performnce
                     t2.small's maximum CPU credit balance is 288 has 20% base line performance
                     t2.medium's maximum CPU credit balance is 576 has 40% baseline performance

Practical Approch to AWS Services

Creating a ROLE in IAM:

Creating a role in AWS is very important so that while launching an AWS you can assign that particular role instance so that you can access the all the AWS services from that instnces usinf AWS CLI. You should assign role to an instance while launching only , you can not able to assign a role to instance after launching that. Goto to AWS console select IAM(Identity and Access Management) There Click on Roles

Attach policy to that role in which i have assigned s3 full-access policy and the Administrator full access(This itself inclusive of se and all services)

After this i have just created an user with the policies and generated the accesskey/secretkey using AWS console. In which these accesskey/secretkey are usefull while installing the AWS CLI.
As below.


yum install -y wget telnet openssl


rpm -ivh epel-release-6-8.noarch.rpm

yum install -y python-pip

pip install awscli

[root@bharath bharath]# aws configure


AWS Secret Access Key [None]: KLh8TMO1JDAeg5tYSzaPsslfdjHJHR6tvig4zKm4bqg2
Default region name [None]: us-west-1
Default output format [None]: json
[root@bharath bharath]# 

yum install -y jq

[root@bharath bharath]# aws ec2 describe-instances | jq .Reservations[].Instances[].InstanceId
[root@bharath bharath]# 

Create S3 bucket using aws cli as below
[root@bharath bharath]# aws s3 mb s3://jan10
make_bucket: s3://jan10/

[root@bharath bharath]#

We can create S3 buckets and objects using the s3cmd tool as well.

AMAZON S3(Simple Storage Service)

S3 Essentials:

It's an object storage, When ever you upload an object to a bucket. That object will be visible to entire Region that means that object has been synced to the all availability zones in that region.
11 9's durability(99.9999999999%) and 2 9's availability(99.99%).
Buckets must be unique across AWS S3. Objects are not unique in your individual account. Buckets are unique across globe AWS S3. For example if you create a bucket name as test that wouldn't actually workout cause somebody else in S3 have that bucket name already.
Objects contains meta-data,Objects are static files.Objects are from 1byte to 5TB.

S3-storage types:

Standard S3 storage has 99.99999999999% durability and 99.99% availability
RRS(Reduced Redundancy Storage) has 99.99% durability & 99.99% availability.
Amazon Glacier used to store in-frequently accessed data.

The main two core features of Bucket ate Versioning and Life-cycle policies
Versioning is kind of backup of all your objects and buckets. If you delete a object inside a bucket unknowingly no issues you would be able to go to the previous version of that object if and only if you have enabled a versioning in AWS S3.
And you can not have both Versioning & Life-Cycle policies enabled on Same Bucket.
You can not directly disable the versioning in the bucket , in order to do that you must have to delete all the objects inside it and delete the bucket and have to create the bucket without versioning.

Life-Cycle policy allows us to define a how long an object can live inside of a bucket.
What we can do with the life-cycle policy is that whenever we sending an object to a bucket in that time we can create a life-cycle policy saying that after a 3 months of time delete this object or send this object to Glacier storage.

Amazon S3 Security:

Whenever we create a bucket/object is private by default. So if you wan a make it public to the internet or to receive a downloadable URL then we need to specify a particular role to our object or the bucket itself.
We can able to grant access based on the specific privileges using our Identity Access Management Console if we could have created a Groups and Users inside of our AWS account.
We can also use Access Control Lists to have fine-grained permissions on the resource level.Like A user can have access to only one bucket and he has only read access to that bucket.

Let's say there is a scenario there you want to distribute a video files to employees throughout your company. These employees might work in different remote locations , So we need to deliver over the open-internet , What we can do is using Developer tools we create a Signed URL'S. This Signed certificate verifies that and says you have permission to create an event to object and also you can say like how long the link exist.

ALL API end-points are SSL Terminated Endpoints.

You can host individual static websites in S3 , We can integrate with Route53 to actually point to your to index.html  file inside your S3 bucket. This allows us to have highly available static website.
Let's say you have a document that has a blog post that you expect to  get a hits aof million and millions of traffic hypothetical situation.What you actually do is create a static page of that blog post and for the domain to your bucket using Route53. Since amazon S3 is highly available and highly scalable you no need to worry about hits . AWS automatically takes care of it.

Another Beautiful feature of amazon s3 is like it can be used as an CDN(Content Delivery Network) . S3 can be also for delivering content over cloudFront CDN.

For example if you have a wordpress application , delivering static files i.e. pdf files, or images or videos over that wordpress application . Well we can actually store that information on S3 and integrates with the cloudFront CDN were the CloudFront can grab the S3 data caches along with its global edge locations . So you can use S3's high availability & Cloud Front's Scalability to deliver content anywere in the world with low latency.

Amazon S3 has something called Multipart upload.
Multipart upload allows us to upload parts of files concurrently.It essentially breaksup a prats of a files into a individual pieces , If we write the proper scripts using multithreaded applications may be such as java or python , we can break those parts-up using multipart upload and send them at the same time.
Using multipart upload we are actually able to resume the stopped file uploads.It's best practise to use multipart upload if you have an object that has 100mb in size or larger, However it's requirement of file size 5Gb and larger.

When we are architecting our application it is very important how the data can be added and retrieved with in an amazon region.
Eventual Consistency : what it means is , whenever you add an object in amazon s3 that object gonna sync to across all the availabilityzones in the region.
If we are in US-Standard region or us-east-1 region all PUT(uploading options) or WRITE into the requests , anything we do auto change inside of an s3 object is called eventual consistency.
Which means you can make that change and then immediately make an API request to view that file , Well since s3 is designed as highly available , you might be hitting an availabiltyzone wich doesn't have newest changes of that file , All regions have eventual consistency when it comes to updating i.e.Writing an object and deleting an object not adding or uploading an object 

Read after Write for new objects:
The only region that does not have read after write for new objects is the US-Standard region(us-east-1).
Read after write means immediately after you upload an object , that object is available to be able to read by an application

Below are the scenarios were we can use S3 effectively.

Getting started with S3 and RRS Storage Class:

Create an S3 Bucket using AWS Console:

Bucket names are unique in the entire AWS S3 eco-system.You can not have same bucket name as it means. You have to think while selecting the region cause of latency issues.
You have to select the region in which your application mainly communicates. For example you have an EC2 instance in Oregon region where the applications residing then you have to select the same oregon region for your s3 as well in order to achieve the low latency.

We jast created a Bucket called test-jan11, but we didn't specified a storage class. Storage class is not a bucket specific but it is a object specific.
Let's upload an object in the bucket test-jan11 using AWS console.After Chosen a file select Set-Details tab to get the what type of storage class, what type of encryption we want to use our object.
And then select set-permissions tab to grant access who want to access this object.
And also select set-metadata tab to set the types of information associate with the abject. After these three settings only select-upload option for that object.



  1. Yes, it is helpful... thanks for sharing the valuable information on AWS. AWS EC2 automated backup help to take backup automatically, given snapshot are very helpful. thank you for sharing.


Post a Comment

Popular posts from this blog

How to check the hardware information in Linux Systems?

Ansible for Devops