In this hands-on session, you will enable a working web application, served from within your Amazon Virtual Private Cloud (VPC) complete with a logging back end provided by the Amazon Elasticsearch Service and with real-time monitoring using Kibana. The application provides a movie search experience across 5,000 movies, powered by Amazon ES and served with a React framework utilizing Amazon API Gateway and AWS Lambda. The logging infrastructure, comprised of Fluentd and Metricbeat, sends metrics and application logs to Amazon ES via Amazon Kinesis Data Streams. Kinesis Data Streams buffers the log lines for architectural separation of delivery components and a highly available buffer to mitigate “load storms”. Finally, Logstash transforms and delivers records to Amazon ES.
All components of the solution reside in a VPC. In this lab, we explore how to use Amazon ES in a VPC for scalable log handing as well as for full text search. In addition to the application and logging infrastructure, an Internet Gateway allows customers to view your website via an Application Load Balancer in a DMZ. For interaction with Amazon EC2 instances deployed in the solution, you will use AWS Systems Manager to configure the instances to enable logging on the various Amazon EC2 instances. Finally, you will interact with Kibana via NGINX deployed on a proxy server.
For the logging infrastructure, we use Metricbeat, Fluentd and Logstash on Amazon EC2, Amazon Kinesis Data Streams and of course the Amazon Elasticsearch Service. Metricbeat is a host-based metric generator that creates log files that are the inputs for Fluentd. Fluentd uses plugins to pull the data from the file system and instance metadata and it will then write to Amazon Kinesis Data Streams. Logstash collects, transforms, and writes your data to an Amazon Elasticsearch Service Domain. The combination of these components gives you a flexible, configurable, and private networked solution within VPC that will allow you to scale as your volume increases.
Deploy a secure end-to-end solution within VPC Private Networking and private endpoints for services that support it.
Host multiple indexes (movies, proxy, metrics and application logs) with which the solution interacts
Leverage managed services from AWS and popular tools from the Elasticsearch ecosystem
Configure the solution to create data used to power the application
Configure the solution to deliver log data to Amazon ES
Visualize the log interactions with Kibana and create real time dashboards to view customer and component activities
The majority of this lab uses nested CloudFormation templates. The templates will enable you to create the necessary resources needed to achieve the goals of the lab without worrying about the details of getting the components set up to create the solution.
The organization of the JSON templates (use this website to convert to YAML if you like) are as follows:
bootcamp-aes-kickoff - (already deployed for you) coordinates the creation of foundational elements that are used in the labs. These are:
bootcamp-aes-network – all the necessary network components such as the VPC, subnets, routes and baseline security elements
bootcamp-aes-cognito – creates the Amazon Cognito user pool and identity pool so that Kibana can be accessed securely with a login
bootcamp-aes-kinesis – creates the stream used for buffering data to your solution
bootcamp-aes-domain – creates the Amazon Elasticsearch Service cluster with Kibana and Elasticsearch endpoints
bootcamp-aes-kibana-proxy – creates the NGINX proxy server to broker public internet calls via Kibana to the Elasticsearch domain
bootcamp-aes-lab1 – deploys the application and API layers
bootcamp-aes-lab2 – deploys the Logstash layer used to complete the log delivery pipeline of Fluentd -> Amazon Kinesis Data Streams -> Logstash -> Elasticsearch
Some components do not show up in the diagrams, as the intent of this walkthrough is to clarify items discussed in the kickoff presentation. The components are the following:
Amazon DynamoDB – this service is creates a table used by the Kinesis Consumer Library – a library used by the Amazon Kinesis plugin for Logstash so you can make sure multiple consumers do not read the same data on the Amazon Kinesis Data Stream.
Amazon API Gateway – this service provides a façade between the React web service and the Amazon Elasticsearch Service. It is always a best practice in architecture to allow ample separation of components that affect a solution due to versioning and replacement of certain layers in a solution.
AWS Lambda – this service does the actual work of sending the requests to Amazon ES and parsing of the response for the web application.
The following diagram depicts the foundational elements of the VPC. It does not contain verbose detail.
The network stack establishes the backbone for the solution. Without networking, you cannot communicate with instances that need to connect with Amazon ES. For this workshop, the network stack deploys subnets across three availability zones. The stack creates both “public” and “private” subnets (ranges of IP addresses used for communication in a VPC). When someone states the term “public subnet”, it means the ability of the subnet to be reachable from an Internet Gateway.
This solution strives to keep the majority of the internet traffic private and public access to a minimum. The aggregate solution provides a secure foundation as an example of how you create a secure environment. Access to the internet is limited from instances in the VPC by using a NAT Gateway. The NAT Gateway provides a public IP address that does not allow inbound traffic.
The right hand side of the image shows the term “Service VPCs”. The majority of the AWS services deploy in a “service VPC” – one managed by the service team – and provide additional layers of isolation for security and service stability.
This diagram shows where the authentication layer deploys.
The authentication layer is Amazon Cognito. The layer creates a user pool (user names and passwords) and an identity pool (trusted sources that allow access via IAM). It also creates a default user and password. You can access these attributes in the output section of the AWS CloudFormation stack used to create this foundational element. In lab 1, you will pair this service with the Amazon Elasticsearch Service. This gives you control of who accesses your Amazon Elasticsearch Service domain.
This layer provisions the Elasticsearch cluster via the Amazon Elasticsearch Service.
When you select the VPC option for your domain, the service presents ENIs (elastic network interfaces) in your VPC. VPC Private Endpoints provide the ENIs to your subnets. Each data node in your Elasticsearch domain gets an ENI for communication in the network. Private endpoints give you the ability to control the scope of traffic. It stays private. This gives you a secure foundation for your solution.
The proxy layer creates an NGINX server in a public subnet.
You use the proxy server to access Kibana (with Amazon Cognito authentication). The Kibana URL given to you when you deploy your domain in VPC is always private. You need a means by which you can access Kibana, with authentication, in a public environment for this lab. Your lab captures logs from NGINX and pushes them to the buffer layer. In lab 3, you monitor requests from this layer.
When you operate at scale, you need to plan for individual component failures. Decoupling producers from consumers gives you freedom to swap components, take maintenance outages at the producer and consumer layers, and survive “data storms” (high volumes of direct traffic that can destroy operational performance of your domain).
Buffers must be persistent and highly available. Amazon Kinesis Data Streams deploy over a minimum of three availability zones. The data is persisted so that it cannot be lost unless it surpasses a retention window managed by you. The buffer properties needed in a solution like this are:
Persistence – data cannot be lost
Availability – survives zonal failures and still works
Multiple producers – anyone can write at any time and at any load based on configuration
Multiple consumers can act on the same data for different goals – streams give you concurrent processing on the same data. Consumers read a point in the stream based on their processing power and business needs.
This layer presents a front end to your customers. It is constructed with the following components previously discussed:
Amazon API Gateway
Amazon Application Load Balancing
For architectural simplicity, I show just an EC2 fleet behind a load balancer and auto-scaling group so do not clutter the high-level architecture.
Requests funnel through an Application Load Balancer that has a publically accessible IP address. The EC2 instances run an Apache server
The final part of this solution leverages the open source product Logstash. The Logstash instances use the Kinesis input plugin for AWS to read from the Amazon Kinesis Data Stream and the Logstash output plugin to write to the Amazon Elasticsearch Service endpoint.
Placing an Amazon ES domain within a VPC enables secure communication between Amazon ES and other services without the need for an Internet gateway, NAT device, or VPN connection. All traffic remains securely within the AWS Cloud. Domains that reside within a VPC have an extra layer of security when compared to domains that use public endpoints: you can use security groups as well as IAM policies to control access to the domain.
To support VPCs, Amazon ES places an endpoint into one, two or three subnets of your VPC. A subnet is a range of IP addresses in your VPC
The following illustration shows the VPC architecture if zone awareness is not enabled.
The following illustration shows the VPC architecture if zone awareness is enabled.
Amazon ES also places elastic network interfaces (ENIs) in the VPC for each of your data nodes. Amazon ES assigns each ENI a private IP address from the IPv4 address range of your subnet and also assigns a public DNS hostname (which is the domain endpoint) for the IP addresses. You must use a public DNS service to resolve the endpoint (which is a DNS hostname) to the appropriate IP addresses for the data nodes:
If your VPC uses the Amazon-provided DNS server by setting the enableDnsSupport option to true (the default value), resolution for the Amazon ES endpoint will succeed.
If your VPC uses a private DNS server and the server can reach the pubic authoritative DNS servers to resolve DNS hostnames, resolution for the Amazon ES endpoint will also succeed.
This solution will leverage three availability zones, a feature recently introduced to the service for the highest availability as it distributes your data nodes across three AZ’s which could limit the scope of an AZ failure to 1⁄3^rd^ your domain. Multiple AZs allow you to take advantage of our service SLA so that your business can deploy solutions using the Amazon Elasticsearch Service with confidence.
For this lab, you do not need SSH keys or any external tools to get access to your instances deployed with this solution. You will use the AWS Systems Manager console and web VDI (virtual desktop interface) provided in the Managed Instances tab. For more details on the Managed Instances and the session integration, please review this web page.
Session Manager launches in your web browser making it easy to access instances without SSH:
Make sure you have your favorite file editor / notepad of choice as this lab involves configuring and running scripts. This bootcamp makes heavy use of vi or installed components like vim. If you have a preference of editors, then you will need to install those on each instance.
Get started with Lab 1