Amazon Bedrock is a fully managed machine learning service that helps developers easily build and run machine learning applications. It provides access to foundation models, which are large language models trained on huge datasets, for tasks like natural language processing and content creation.
Understanding Bedrock's pricing and optimization is important to manage costs, avoid overspending, and get the most out of its features for building affordable and scalable machine learning applications. This guide will help you understand Bedrock's pricing structure and share tips to optimize usage, ensuring you can control costs while maximizing the value of its powerful features.
AWS Bedrock offers several pricing models to suit different needs. Whether you need flexible, on-demand access or consistent throughput for large-scale jobs, AWS provides a variety of options.
The On-Demand pricing model in AWS Bedrock is flexible, charging users based on actual token usage during API calls. It's ideal for use cases with unpredictable demand, offering pay-as-you-go access without long-term commitments.
Users are charged for every token input to and output from the model. Pricing is typically calculated in 1,000-token units, and different models have different per-token pricing.
On-Demand Example
For Claude by Anthropic, the price is $0.01 per 1,000 tokens. If you process 10,000 tokens (5,000 input and 5,000 output), your cost would be: Cost: 10,000 / 1,000 × 0.01 = 0.10 USD
Batch pricing, part of the on-demand model, is optimized for large-scale data processing. It's ideal for bulk or periodic inference jobs, allowing users to batch data together and reduce per-request costs.
Batch Example
If you batch 1,000 API calls, each generating 500 tokens input + 500 tokens output, the cost for each call would be:
Cost per call: 1,000 / 1,000 × 0.01 = 0.01 USD per call
Total cost for 1,000 calls: $0.005 per call, so for 1,000 calls, the total would be $5.00.
The Latency Optimized pricing model is designed for use cases where low-latency inference is crucial, such as interactive applications or real-time decision-making systems. In this model, AWS optimizes the underlying infrastructure to reduce response time for each request.
As of now, this model is in public preview, meaning it may be available at a discounted rate or subject to changes in pricing.
Because this service is optimized for low-latency, it generally incurs a higher cost compared to the standard on-demand model due to the infrastructure optimization that reduces the time taken to process a request.
Example Pricing: Price: $0.02 per 1,000 tokens
If you process 50,000 tokens in a low-latency use case, the cost would be: Cost: 50,000 / 1,000 × 0.02 = 1.00 USD
The Provisioned Throughput model is suited for businesses that need consistent, high-volume, and high-throughput access to foundational models. This model provides customers with guaranteed throughput, ensuring the requested volume of processing will be met without delays.
Customers provision a specific amount of throughput, ensuring that a predetermined number of tokens can be processed per second or minute.
This model is ideal for customers with predictable workloads and can help avoid the variability associated with on-demand pricing.
Example Pricing: If you require 1 million tokens per day processed with guaranteed throughput, the cost may be:Flat rate: $10 per hour for a guaranteed throughput of 1 million tokens/day. Cost for 24 hours of usage: $10 × 24 = $240 per day.
This model allows users to bring in their models and run them in AWS Bedrock's scalable infrastructure. Custom models typically incur both storage costs (for storing the model in Amazon S3 or other storage options) and per-token usage costs (for processing requests).
Example Pricing:
Total Cost: $0.046 (storage) + $5.00 (token usage) = $5.046
AWS also provides the option to purchase models through the AWS Marketplace, where third-party vendors offer additional machine learning models beyond what is available in the foundational model set. These models are typically optimized for specific use cases and can be integrated into AWS Bedrock.
Example Pricing:
Vendor charges: $0.05 per 1,000 tokens.
If you process 100,000 tokens, the cost would be: 100,000 / 1,000 × 0.05 = 5.00 USD
In addition, there may be a flat subscription fee for accessing the model, say $50 per month.
Total Cost: $5.00 (usage) + $50 (subscription) = $55.00 per month.
Amazon Bedrock offers features for customizing foundation models (FMs), optimizing model usage, and reducing costs. This table summarizes key features, charges, and benefits.
The following table summarizes the pricing for AI models offered through AWS Bedrock, which enables access to a variety of pre-trained models for tasks like natural language processing, text generation, and more. Prices are based on the input and output tokens, with different rates for batch processing as applicable.
This pricing table helps users determine the most cost-effective model based on their specific needs for AI-based text generation, image generation, embeddings, or other use cases.
Amazon Bedrock provides tools to help you build, manage, and improve your AI workflows. Each tool is designed to handle specific tasks, like processing data, generating accurate responses, or ensuring safe content. Understanding the pricing for these tools is important so you can plan costs and only pay for what you use.
Here's a simple breakdown of the tools, what they do, and their pricing.
Selecting the right model is crucial for cost optimization in Amazon Bedrock. Running large models for tasks that smaller models can handle wastes resources and money. Amazon Bedrock offers different models from providers like AI21 Labs, Anthropic, and Stability AI, each with varying pricing. Choose the smallest model that meets your needs. You can compare models using the AWS Console, CLI, or SDK to evaluate performance and cost trade-offs.
Example: If you use a large language model for summaries, consider switching to smaller models like BART or GPT-Neo to reduce resource consumption and costs. Check model parameters in the AWS Console to assess cost differences.
aws bedrock describe-models --region us-west-2
This command helps you view available models in your region. You can then compare them based on their size and cost.
Efficiently processing large volumes of data can reduce costs by batching requests and using asynchronous inference. Instead of sending individual requests for each data point, batch multiple requests together to optimize throughput. For workloads that don’t require immediate results, use asynchronous processing to save on real-time processing costs, allowing for more cost-effective handling of large datasets.
Example: Batching requests in Amazon Bedrock can be done through the API by grouping multiple queries into a single call.
import boto3
client = boto3.client('bedrock', region_name='us-west-2')
# Sample input for batch processing
batch_input = [
{"input": "Text 1"},
{"input": "Text 2"},
{"input": "Text 3"}
]
# Send a batch request
response = client.invoke_model(
ModelId="model-id",
ContentType='application/json',
Accept='application/json',
Body=json.dumps(batch_input)
)
print(response)
This method sends multiple data points in one request, reducing overhead.
To ensure you're not overusing the service, you should continuously monitor and adjust your workloads.Set up monitoring for usage and costs using CloudWatch. It allows you to track the number of tokens processed and any cost anomalies.Review data regularly to ensure you're using resources optimally, focusing on reducing unnecessary requests.
Example: To set up basic monitoring for AWS services, including Bedrock usage:
aws cloudwatch put-metric-alarm --alarm-name "High-Token-Usage" \
--metric-name "TokensProcessed" --namespace "AWS/Bedrock" \
--statistic "Sum" --period 86400 --threshold 1000000 \
--comparison-operator "GreaterThanThreshold" --evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-west-2:123456789012:MyTopic
This sets up an alarm to notify you if token usage exceeds a specified threshold.
Optimizing the data you're sending to Bedrock can drastically reduce costs, particularly when you need to minimize token usage.Process and clean your data to minimize token count, which is typically the main cost driver in Bedrock.In natural language processing (NLP) tasks, trim unnecessary words, use abbreviations, and avoid overly verbose inputs.
Example: Use Python’s nltk library to preprocess text and remove unnecessary words.
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
text = "This is an example sentence that we want to optimize."
stop_words = set(stopwords.words('english'))
optimized_text = ' '.join([word for word in text.split() if word.lower() not in stop_words])
print(optimized_text)
This reduces the number of tokens that need to be processed, lowering costs.
By tagging Amazon Bedrock resources (like models, jobs, and agents) with AWS cost allocation tags (e.g., cost centers, departments, business units), organizations can track spending more accurately. Tags help ensure that costs are allocated to the right business units or projects, allowing for better visibility and informed decision-making. Using cost allocation tags can help streamline cost management and reduce overspending by aligning usage with business priorities.
Minimizing data transfer costs is key when dealing with large amounts of input and output data in Amazon Bedrock. Store input and output data in S3 buckets that are in the same region as Bedrock. This eliminates inter-region transfer costs. Compress your data before sending it to reduce network transfer size.
Example: To compress a file using gzip in Python before sending it:
import gzip
import shutil
with open('input.txt', 'rb') as f_in:
with gzip.open('input.txt.gz', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
This reduces the data sent, cutting down on both storage and transfer costs.
To optimize costs effectively, combine Spot Instances for flexible, non-critical tasks like model training with Reserved Capacity for predictable, long-term workloads. Spot Instances offer substantial savings by allowing you to run EC2 instances at a fraction of the cost of On-Demand instances, making them ideal for tasks that can tolerate interruptions. Meanwhile, Reserved Instances or savings plans provide cost reductions for consistent compute usage over a fixed term, offering substantial discounts for workloads with predictable demands.
By utilizing both, you can maximize savings across different workload types, ensuring cost-efficiency while maintaining the necessary compute resources.
Many companies, including ElysianNxt, Amazon Finance Automation, Namirial, Showpad, Infor, KONE, and others, leverage Amazon Bedrock for building advanced generative AI solutions.[5] These organizations use Bedrock to streamline processes, enhance productivity, improve response times, and drive innovation, all while reducing operational costs and adapting quickly to evolving customer needs.[6]
To optimize Amazon Bedrock pricing, choose the right pricing model (On-Demand, Batch, or Provisioned Throughput) based on your workload and cost needs. On-Demand charges based on usage, Batch is suited for larger, less time-sensitive tasks with a lower cost, and Provisioned Throughput offers a fixed cost for consistent, high-throughput needs. Use smaller models, optimize throughput, and leverage cost-effective tools for tasks like automation and evaluation. Regularly monitor usage to ensure efficient and cost-effective AI-driven applications.
1. Amazon Bedrock - Market Share, Competitor Insights in Data Science And Machine Learning
2. AWS Amazon Bedrock Documentation
3. Build Generative AI Applications with Foundation Models – Amazon Bedrock Pricing
5. Build Generative AI Applications with Foundation Models - Amazon Bedrock Customer Testimonials
Strategical use of SCPs saves more cloud cost than one can imagine. Astuto does that for you!