Cloud FinOps

mins read

Different ways of enabling memory metrics on AWS EC2 and their respective costs

Memory Metrics on AWS EC2: Methods and Costs Explained

Pratik Kulkarni

Published on

September 2, 2024

What & Why of Memory Metrics

Memory metrics are crucial for monitoring and optimizing the performance of AWS EC2 instances. Accurate memory metrics facilitate rightsizing by helping to understand memory consumption patterns, identify performance bottlenecks, and optimize instance configurations. These metrics become particularly critical during application launches, significant traffic spikes, or when introducing new features that may impact memory consumption patterns.

By default, AWS does not enable users to measure memory metrics through CloudWatch.

This article will help you understand the ways to measure memory metrics, and the cost implications of measuring memory!

There are three main categories of tools available for capturing and analyzing these memory metrics:

AWS Native Solutions ‍
Paid Observability Tools such as New Relic, DataDog etc ‍
Open Source Tools such as Prometheus, Zabbix etc.

Each of these provide a range of features with varying integration capabilities, cost structures, and implementation complexities, which we'll explore in detail in the following sections.

‍

‍

Ways to Capture Memory Metrics

1) CloudWatch Agent (AWS Native Solution)

AWS EC2 instances come with basic CloudWatch metrics such as CPU utilization, disk I/O, and network traffic, but they do not provide memory-related metrics out of the box. To monitor memory, you need to set up the CloudWatch (CW) Agent. It offers two tiers of data collection with different frequencies and granularity. Standard Resolution that typically collects data at intervals of 1 minute and High Resolution at one second (with added cost).

Installing CW agent is straightforward, it comes as a package which can be downloaded and configured using the command line. For the fleet of EC2 instances, use AWS Systems Manager to install CW agent. Once installed, it collects data from the host's memory management subsystem and sends it to CloudWatch, enabling detailed memory metrics monitoring. The agent can be configured to monitor custom memory metrics specific to application needs.

Metrics Tracked: The CW Agent collects various memory metrics, such as memory utilization, available memory, and swap utilization, to provide insights into memory performance and usage. These metrics include details like total memory, free memory, active memory, and buffer/cache memory. This link provides a detailed list and the definitions of memory metrics collected by the CloudWatch agent. ‍
Use Cases: CloudWatch, as an AWS native service, is ideal for basic memory monitoring needs, cost-conscious organizations, and AWS-centric workloads. It provides essential memory metrics with varying granularity, making it suitable for applications that do not require advanced analytics. Perfect for quick deployment and leveraging existing AWS infrastructure, CloudWatch offers a cost-effective and straightforward solution for EC2 memory monitoring. ‍
Implementation Consideration: The CloudWatch Agent offers seamless integration with AWS ecosystem, providing a native solution for monitoring EC2 memory metrics. While effective for AWS-focused environments, it may have limitations in cross-platform monitoring capabilities and advanced visualizations compared to specialized observability tools.

2) Paid Observability Tools

Third-party tools like Datadog, New Relic, Dynatrace, and AppDynamics offer advanced capabilities for monitoring memory metrics on AWS EC2 instances. These solutions typically feature sophisticated analytics and visualization tools, AI-driven anomaly detection, and broad coverage of both infrastructure and application performance along with cross-platform integration capabilities.

Implementing these tools generally involves deploying an agent on each EC2 instance, which collects and sends memory usage data to the monitoring platform. SSM can be used to install agents on EC2, manage them, and automate monitoring. For example datadog agents can be installed and automated using SSM.

Metrics Tracked: Comprehensive EC2 memory metrics, including detailed utilization breakdowns, available memory, cached memory, swap usage, and custom memory-related metrics. These tools often offer real-time monitoring, historical analysis, and predictive insights specific to EC2 instance memory performance. For example, they can predict memory leaks before they cause issues, allowing for proactive management. ‍
Use Cases: Ideal for organizations running critical workloads on EC2 instances that require in-depth memory analysis. Suitable for complex EC2 deployments where native AWS tools may not provide sufficient granularity or analysis capabilities. Particularly useful for businesses needing to correlate EC2 memory metrics with broader application performance data. ‍
Implementation Consideration: While these tools provide extensive EC2 memory monitoring capabilities, they come at an additional cost on top of AWS charges. Pricing is typically based on a per-host or per-resource model, which can become significant for large-scale deployments. However, the investment can be justified by enhanced performance monitoring, reduced downtime, and optimized resource usage leading to potential overall cost savings.

3) Open Source Tools

Open source monitoring tools like Prometheus, Grafana, and Zabbix offer powerful alternatives for capturing memory metrics on AWS EC2 instances. These tools provide flexibility, customization options, and cost-effectiveness, and they require more setup and maintenance compared to managed/paid solutions. Open source tools usually work using scraping mechanisms with data being collected by periodic HTTP requests.

For example, Prometheus provides following way to collect metrics from EC2 -

Node Exporter: The Prometheus Node Exporter can be installed on EC2 instances to expose hardware and OS metrics, including memory data. ‍
CloudWatch Integration: Use the CloudWatch Agent to collect metrics from EC2 instances, which Prometheus can access via the exporter. This AWS guide describes it with an exporter example. ‍
AWS Managed Service - AWS provides AWS Managed Service for Prometheus (AMP) which offers native integration between Prometheus and CloudWatch.

Each method offers different levels of customization and integration with existing AWS services, allowing users to choose the most suitable approach for their specific monitoring needs.

These open source tools offer several advantages and considerations:

Metric Coverage: Standard EC2 memory metrics including total memory, used memory, free memory, cached memory, and swap usage. Many open source tools allow for custom metric collection, enabling detailed monitoring of specific memory-related parameters on EC2 instances. ‍
Use Cases: Best for organizations with the technical expertise to deploy and manage monitoring solutions on EC2 instances. Helps to minimize additional monitoring costs while maintaining detailed oversight of EC2 memory usage. Ideal for scenarios requiring customized memory metrics or when compliance needs mandate self-hosted monitoring solutions. ‍
Implementation Considerations: While these tools are free to use, they require time and expertise to set up and maintain on EC2 instances. There may be additional EC2 costs for running the monitoring software. Support typically relies on community forums rather than dedicated customer service. However, they offer high flexibility and can be an economical choice for organizations willing to invest time in configuration and management.

Cost Analysis

Scenario: You're running a medium-sized e-commerce website with around 15k transactions daily. You want to track 15 metrics (10 standard + 5 custom). Your infrastructure consists of 10 EC2 instances:

6 x t3.medium (2 vCPU, 4 GB RAM) for the web servers
3 x m5.large (2 vCPU, 8 GB RAM) for the application servers
1 x r5.large (2 vCPU, 16 GB RAM) for the database

You want to monitor memory metrics for all instances 24/7, with a 14-day retention period for logs and metrics.

Note - Following cost calculation is just for monitoring solutions. It does not consider the underlying EC2 and other services cost for the given application.

Ways to capture memory metrics	Total Cost per Month	Approx Cost Per EC2 Instance	Comments
CloudWatch Agent With 1 min polling	$49.32	$5	Data collection with 1-minute frequency
CloudWatch Agent With 1 second polling	$304.80	$30	Data collection with 1-second frequency
Paid Observability Tools (e.g., Datadog)	$400	$40	Includes advanced analytics and custom metrics
Open Source Tools	$50-150	$5-$15	Cost includes infrastructure, storage, and data transfer, and excludes development and maintenance

Note - Refer appendix for detailed cost calculations

This table provides an outlook of the cost implications of capturing memory metrics for an ecommerce application running on 10 instances with 15 metrics (10 standard + 5 custom).

However, these costs can quickly become prohibitive as scale grows. For a fleet of 1000 instances, cost can reach $50k-$150k. Thus, metrics need to be tracked mindfully.

Following strategies suggest some ways to monitor memory selectively considering the requirements of the application.

‍

Strategies for Memory Monitoring at Scale

As your EC2 fleet grows, implementing efficient memory monitoring becomes crucial for maintaining performance while managing costs. The following strategies offer practical approaches to scale your monitoring efforts effectively.

1) Exclude Stable Workloads

Stable environments with low variability, such as archived or backup instances, batch processing jobs, development and test environments, static content servers, or low-traffic applications, may not benefit from continuous memory metrics data. Configure monitoring tools to exclude these instances from regular memory metric collection by using tags like memorymetrics:false. This strategy becomes crucial with ‘per instance billing’ solutions as it limits the number of instances to be monitored as opposed to other strategies that optimize on frequency of capturing metrics.

‍

2)Tagging and Tiered Monitoring

Implement a comprehensive tagging strategy to categorize EC2 instances based on their memory criticality and monitoring needs. This approach can significantly reduce monitoring costs in large-scale deployments by decreasing the frequency of metric collection for less critical instances.

Define tags like "memory-critical:high", "memory-monitor:standard", or "memory-monitor:low" and configure monitoring tool to adjust monitoring frequency based on tags, for example, monitor production instances every minute, staging every 5 minutes, and development hourly
Use AWS Resource Groups to manage and organize instances based on these memory monitoring tags. Configure monitoring dashboards to display memory metrics grouped by these tags for easy visualization.
In mature environments, implement automated tag assignment based on instance roles or application types to ensure consistent monitoring as your infrastructure scales

This tiered approach ensures that you allocate more resources to monitoring memory-critical instances while maintaining a basic level of oversight on less critical systems. It allows for a more nuanced and cost-effective memory monitoring strategy across your entire EC2 fleet.

‍

3) Trigger-Based Monitoring

Implement a dynamic memory monitoring system that adjusts based on specific conditions or events. This approach ensures you capture crucial memory data during critical periods without the constant cost of full-time, high-frequency monitoring. Use EventBridge to orchestrate the trigger-based monitoring system.

Create CloudWatch alarms for indicators that often precede memory issues, such as high CPU utilization, increased swap usage, or application-specific metrics. Configure these alarms to trigger when thresholds are exceeded for a specified duration.
Set up AWS Lambda functions to enable detailed memory metrics collection when these alarms trigger. These functions can increase the monitoring frequency, enable additional memory-related metrics, or start more granular logging processes.

This trigger-based strategy allows you to maintain a baseline level of memory monitoring and automatically scale up monitoring efforts during critical events. It provides a balance between comprehensive memory insights and cost-efficiency which is particularly beneficial for large-scale deployments with varying workload patterns.

By implementing these strategies, you can create an efficient and economical monitoring system for your EC2 fleet's memory usage, regardless of its scale. Regularly review and adjust your approach as your applications' memory requirements evolve to maintain optimal performance insights while keeping costs under control.

‍

Conclusion

Effective memory monitoring on AWS EC2 is crucial for optimizing application performance. The ideal approach depends on your specific needs and scale. For smaller AWS-focused deployments, the CloudWatch agent offers a cost-effective solution. It integrates well with the AWS ecosystem and can be relatively straightforward to set up on EC2.

Larger-scale multi-platform applications are when specialized observability tools like Datadog or Prometheus are required. They can act as a central point to receive the logs from across the cross-platform application, servers, kubernetes clusters, containerized workloads, etc to build a comprehensive analytics and correlate it with, for example, memory metrics.

Open source alternatives, as the name suggests, require high implementation and maintenance efforts. They can be a great choice for teams with technical bandwidth who want to avoid locking in with commercial vendors. While for teams requiring sophisticated UI, visualization capabilities, AI based anomaly detection etc, paid tools like Datadog can work best.

It is important to keep an eye on costs these tools might incur. They employ varied and often complex pricing models based on factors such as metric count, collection frequency, data volume, and number of instances. As the EC2 fleet grows, implementing advanced strategies for efficiently monitoring the right resources becomes essential. It can simply start with excluding stable and predictable workloads followed by more nuanced strategies like tiered monitoring, and trigger-based monitoring systems etc.

By regularly reassessing strategies and tailoring it to the evolving needs, you can achieve a stable, cost-effective cloud environment that maximizes resource utilization and application performance.

‍

Appendix I - Cost Calculation Details

1) CloudWatch Agent

CloudWatch Agent With Standard Resolution: $49.32

Frequency: per 1 minute.

Ingestions per day: 24 hours * 60 (1-minute intervals) = 1,440 ingestions/day/instance

Cost Calculation:

Custom metrics: 10 instances x 15 metrics (10 standard + 5 custom) x $0.30 = $45 ‍
Data ingestion (for custom metrics only): 10 instances x 1,440 ingestions/day x 30 days x $0.01/1,000 = $4.32 ‍
Total Cost per Month: $45 + $4.32 = $49.32

CloudWatch Agent With High Resolution: $304.80

Frequency: per 1 second.

Ingestions per day: 24 hours * 60 minutes * 60 seconds (1-second intervals) = 86,400 ingestions/day/instance

Cost Calculation:

Custom metrics: 10 instances x 15 metrics x $0.30 = $45 ‍
Enhanced Monitoring log ingestion: 10 instances x 86,400 ingestions/day x 30 days x $0.01/1,000 = $259.20 ‍
Enhanced Monitoring log storage: 10 instances x 0.6 MB/hour x 24 hours x 14 days x $0.03/GB = $0.60 ‍
Total Cost per Month: $45 + $259.20 + $0.60 = $304.80

2) Paid Observability Tools (e.g., Datadog): $400

Infrastructure Monitoring: $15 per host per month x 10 instances = $150 ‍
Custom Metrics: $5 per custom metric per host per month x 5 metrics x 10 instances = $250 ‍
Total Cost per Month: $150 + $250 = $400

3) Open Source Observability Tools (e.g., Prometheus with node_exporter):

Cost Calculation

A small EC2 instance (t3.small or t3.micro) for running the monitoring stack: $15 - $30/month
EBS storage for metrics (assuming 1-5 GB per day): $5 - $50/month
Data transfer costs: $10 - $30/month
Miscellaneous (e.g., small S3 bucket for backups): $5 - $10/month ‍
Total Cost per Month: $50-150

Note - Open source monitoring tools have no license fees, but costs are highly variable. Expenses include infrastructure, development, and maintenance, which depend heavily on each team's context and existing resources. The decision to use open source tools should consider factors beyond cost, such as in-house expertise, customization requirements, and long-term scalability. For the given scenario, estimated monthly costs could range from $50 to $150, primarily covering a small EC2 instance for the monitoring stack and associated storage and data transfer costs. However, this estimate excludes the potentially significant time investment for setup and ongoing management.

This provides the understanding of the costs for storage monitoring for each of the given options.

‍

References

1. Amazon CloudWatch Pricing – Amazon Web Services (AWS)

2. https://www.datadoghq.com/pricing/list/

FAQs

No items found.

Table of content

Example H2