AWS Cost Allocation

How to Optimize Amazon Bedrock Pricing and Reduce Costs

Ultimate Guide to Amazon Bedrock Pricing and Cost Optimization Strategies
Document

Did you know?

Amazon Bedrock lets you add generative AI to your apps with minimal code! Its plug-and-play approach allows quick and easy integration without complex setup.

Amazon Bedrock is a fully managed machine learning service that helps developers easily build and run machine learning applications. It provides access to foundation models, which are large language models trained on huge datasets, for tasks like natural language processing and content creation.

Understanding Bedrock's pricing and optimization is important to manage costs, avoid overspending, and get the most out of its features for building affordable and scalable machine learning applications. This guide will help you understand Bedrock's pricing structure and share tips to optimize usage, ensuring you can control costs while maximizing the value of its powerful features.

Amazon Bedrock Pricing Models

AWS Bedrock offers several pricing models to suit different needs. Whether you need flexible, on-demand access or consistent throughput for large-scale jobs, AWS provides a variety of options.

1. On-Demand and Batch Pricing

The On-Demand pricing model in AWS Bedrock is flexible, charging users based on actual token usage during API calls. It's ideal for use cases with unpredictable demand, offering pay-as-you-go access without long-term commitments.

  • On-Demand Pricing

Users are charged for every token input to and output from the model. Pricing is typically calculated in 1,000-token units, and different models have different per-token pricing.                                                                            

On-Demand Example

For Claude by Anthropic, the price is $0.01 per 1,000 tokens. If you process 10,000 tokens (5,000 input and 5,000 output), your cost would be: Cost: 10,000 / 1,000 × 0.01 = 0.10 USD

  • Batch Pricing

Batch pricing, part of the on-demand model, is optimized for large-scale data processing. It's ideal for bulk or periodic inference jobs, allowing users to batch data together and reduce per-request costs.

Batch Example

If you batch 1,000 API calls, each generating 500 tokens input + 500 tokens output, the cost for each call would be:

Cost per call: 1,000 / 1,000 × 0.01 = 0.01 USD per call

Total cost for 1,000 calls: $0.005 per call, so for 1,000 calls, the total would be $5.00.

2. Latency Optimized (Public Preview)

The Latency Optimized pricing model is designed for use cases where low-latency inference is crucial, such as interactive applications or real-time decision-making systems. In this model, AWS optimizes the underlying infrastructure to reduce response time for each request.

  • Public Preview

As of now, this model is in public preview, meaning it may be available at a discounted rate or subject to changes in pricing.

  • Higher Cost

Because this service is optimized for low-latency, it generally incurs a higher cost compared to the standard on-demand model due to the infrastructure optimization that reduces the time taken to process a request.

Example Pricing: Price: $0.02 per 1,000 tokens

If you process 50,000 tokens in a low-latency use case, the cost would be: Cost: 50,000 / 1,000 × 0.02 = 1.00 USD

3. Provisioned Throughput

The Provisioned Throughput model is suited for businesses that need consistent, high-volume, and high-throughput access to foundational models. This model provides customers with guaranteed throughput, ensuring the requested volume of processing will be met without delays.

  • Guaranteed Throughput

Customers provision a specific amount of throughput, ensuring that a predetermined number of tokens can be processed per second or minute.

  • Predictable Costs

This model is ideal for customers with predictable workloads and can help avoid the variability associated with on-demand pricing.

Example Pricing: If you require 1 million tokens per day processed with guaranteed throughput, the cost may be:Flat rate: $10 per hour for a guaranteed throughput of 1 million tokens/day. Cost for 24 hours of usage: $10 × 24 = $240 per day.

4. Custom Model Import

This model allows users to bring in their models and run them in AWS Bedrock's scalable infrastructure. Custom models typically incur both storage costs (for storing the model in Amazon S3 or other storage options) and per-token usage costs (for processing requests).

Example Pricing:

  • Storage cost for the custom model: $0.023 per GB per month (for S3 storage).
  • Processing tokens: $0.01 per 1,000 tokens.
  • Storage: Assume the model is 2 GB, so monthly storage cost is: 2 GB × $0.023 = $0.046 per month
  • Token usage: Processing 500,000 tokens in one month: 500,000 / 1,000 × 0.01 = 5.00 USD

Total Cost: $0.046 (storage) + $5.00 (token usage) = $5.046

5. Marketplace Models

AWS also provides the option to purchase models through the AWS Marketplace, where third-party vendors offer additional machine learning models beyond what is available in the foundational model set. These models are typically optimized for specific use cases and can be integrated into AWS Bedrock.

  • Model Marketplace - Customers can explore a variety of third-party models available in the AWS Marketplace.
  • Pricing Variability - Pricing for marketplace models is set by the third-party vendors. These models may offer specialized features not available in AWS’s core set of foundational models.

Example Pricing:

Vendor charges: $0.05 per 1,000 tokens.                                

If you process 100,000 tokens, the cost would be: 100,000 / 1,000 × 0.05 = 5.00 USD

In addition, there may be a flat subscription fee for accessing the model, say $50 per month.

Total Cost: $5.00 (usage) + $50 (subscription) = $55.00 per month.

Pricing Comparison

Model Cost Structure Best For
On-Demand & Batch Per-token charge, bulk discounts Flexible, unpredictable workloads
Latency Optimized Higher per-token cost, faster response Real-time applications
Provisioned Throughput Flat fee for guaranteed throughput High-volume, predictable workloads
Custom Model Import Storage + per-token charge Custom models, specific needs
Marketplace Models Vendor-set pricing Specified , third party models

Amazon Bedrock Customization and optimization

Amazon Bedrock offers features for customizing foundation models (FMs), optimizing model usage, and reducing costs. This table summarizes key features, charges, and benefits.

Feature Description Charges Notes
Model Customization Fine-tune models with labeled/unlabeled data for tailored responses. Training: Tokens processed (tokens × epochs).
Storage: Monthly per model.
Inference: Requires Provisioned Throughput plan.
One model unit included; extra throughput needs a 1-month or 6-month commitment.
Model Distillation Create smaller, optimized models via synthetic data and fine-tuning. Synthetic Data: On-demand pricing.
Fine-tuning: Customization rates.
Inference: Requires Provisioned Throughput plan.
Distilled models are treated as customized, and inference is charged accordingly.
Prompt Caching Cache repeated prompts for up to 5 minutes to reduce costs and latency. Cached Tokens: Up to 90% discount.
Performance: Up to 85% latency improvement.
Pricing and performance vary by model and prompt length. Cache is specific to your AWS account.

Amazon Bedrock Pricing

The following table summarizes the pricing for AI models offered through AWS Bedrock, which enables access to a variety of pre-trained models for tasks like natural language processing, text generation, and more. Prices are based on the input and output tokens, with different rates for batch processing as applicable.

Provider Model Price per 1,000 Input Tokens Price per 1,000 Output Tokens
AI21 Labs Jamba 1.5 Large
Jamba 1.5 Mini
Jurassic-2 Mid
Jurassic-2 Ultra
Jamba-Instruct
$0.002 - $0.008
$0.0002 - $0.0004
$0.0125
$0.0188
$0.0005
$0.008
$0.0004
$0.0125
$0.0188
$0.0007
Amazon
NovaMicro, Lite, Pro
NovaImage Generator
NovaVideo Generator
TitanText Embeddings V2
$35μ - $800μ
$0.04 - $0.08
$0.08 per second
$0.00002
$0.00014 - $0.0032
N/A
N/A
N/A
Anthropic Claude 3.5 Sonnet/Haiku
Claude 3 Opus
$0.00025 - $0.003
$0.0075
$0.00125 - $0.015
$0.0375
Cohere Command
Embed - English/Multilingual
$0.0015 - $0.0030
$0.0001
$0.0020 - $0.0150
N/A
Meta (Llama) 3.2/3.3 Instruct (1B - 3B)
3.2 Instruct (8B - 11B)
3.2/3.3 Instruct (70B - 90B)
3.1 Instruct (70B - 405B)
3.1 Instruct (Latency Optimized)
$0.0001 - $0.00016
$0.00016 - $0.00022
$0.00072
$0.00072 - $0.0024
$0.0009
$0.0001 - $0.00016
$0.00016 - $0.00022
$0.00072
$0.00072 - $0.0024
$0.0009
Mistral AI Mistral 7B
Mixtral 8*7B
Mistral Small (24.02)
Mistral Large (24.02)
$0.00015
$0.00045
$0.001
$0.004
$0.0002
$0.0007
$0.003
$0.012
Stability AI Stable Diffusion 3.5 Large
Stable Image Core
Stable Diffusion 3 Large
Stable Image Ultra
SDXL 1.0
$0.08 per image
$0.04 per image
$0.08 per image
$0.14 per image
$0.04 per image (<=50 steps)
N/A
N/A
N/A
N/A
$0.08 per image (>50 steps)
Custom Models Llama, Mistral, Mixtral, Flan
Custom Model Storage
$0.0785 per minute
$1.95 per month
N/A
N/A

This pricing table helps users determine the most cost-effective model based on their specific needs for AI-based text generation, image generation, embeddings, or other use cases.

Amazon Bedrock Pricing Tools

Amazon Bedrock provides tools to help you build, manage, and improve your AI workflows. Each tool is designed to handle specific tasks, like processing data, generating accurate responses, or ensuring safe content. Understanding the pricing for these tools is important so you can plan costs and only pay for what you use. 

Here's a simple breakdown of the tools, what they do, and their pricing.

Pricing Tool Details Price
Flows
Charged based on node transitions. Each time a node in your workflow is executed, a node transition is counted. $0.035 per 1,000 node transitions.
Knowledge Bases
-Structured Data Retrieval (SQL Generation)
-Rerank Models
Charged for each request to generate a SQL query to retrieve data from structured data stores.

Charged per query to improve response relevance in Retrieval Augmented Generation (RAG) applications. Each query can contain up to 100 document chunks; queries with more are treated as multiple.
$2.00 per 1,000 queries




$1.00 per 1,000 queries for Amazon-rerank-v1.0.

Pricing for Cohere rerank models varies (see provider).
Guardrails Charges for various guardrail policies (e.g., content filters, denied topics). Optional based on application requirements. $0.15 per 1,000 text units for content filters & denied topics, $0.10 for contextual grounding check
Model Evaluation Charged for inference from the selected model. Automatically-generated algorithmic scores are free, but human-based evaluations incur a charge. $0.21 per completed human task.
Data Automation Standard and custom output pricing for data automation tasks, such as processing audio, documents, images, and video. Standard Output: $0.006/minute for audio, $0.010/page for documents, $0.003/image for images, $0.050/minute for video.

Custom Output: $0.040/page for documents, $0.005/image for images.

Top 7 Strategies for Amazon Bedrock Cost Optimization  

1. Right-Size Model Usage

Selecting the right model is crucial for cost optimization in Amazon Bedrock. Running large models for tasks that smaller models can handle wastes resources and money. Amazon Bedrock offers different models from providers like AI21 Labs, Anthropic, and Stability AI, each with varying pricing. Choose the smallest model that meets your needs. You can compare models using the AWS Console, CLI, or SDK to evaluate performance and cost trade-offs.

Example: If you use a large language model for summaries, consider switching to smaller models like BART or GPT-Neo to reduce resource consumption and costs. Check model parameters in the AWS Console to assess cost differences.

aws bedrock describe-models --region us-west-2

This command helps you view available models in your region. You can then compare them based on their size and cost.

2. Optimize Throughput

Efficiently processing large volumes of data can reduce costs by batching requests and using asynchronous inference. Instead of sending individual requests for each data point, batch multiple requests together to optimize throughput. For workloads that don’t require immediate results, use asynchronous processing to save on real-time processing costs, allowing for more cost-effective handling of large datasets.

Example: Batching requests in Amazon Bedrock can be done through the API by grouping multiple queries into a single call.

import boto3

client = boto3.client('bedrock', region_name='us-west-2')

# Sample input for batch processing
batch_input = [
    {"input": "Text 1"},
    {"input": "Text 2"},
    {"input": "Text 3"}
]

# Send a batch request
response = client.invoke_model(
    ModelId="model-id", 
    ContentType='application/json', 
    Accept='application/json',
    Body=json.dumps(batch_input)
)

print(response)

This method sends multiple data points in one request, reducing overhead.

3. Monitor and Adjust Workloads

To ensure you're not overusing the service, you should continuously monitor and adjust your workloads.Set up monitoring for usage and costs using CloudWatch. It allows you to track the number of tokens processed and any cost anomalies.Review data regularly to ensure you're using resources optimally, focusing on reducing unnecessary requests.

Example: To set up basic monitoring for AWS services, including Bedrock usage:

aws cloudwatch put-metric-alarm --alarm-name "High-Token-Usage" \
  --metric-name "TokensProcessed" --namespace "AWS/Bedrock" \
  --statistic "Sum" --period 86400 --threshold 1000000 \
  --comparison-operator "GreaterThanThreshold" --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-west-2:123456789012:MyTopic

This sets up an alarm to notify you if token usage exceeds a specified threshold.

4. Optimize Data Processing

Optimizing the data you're sending to Bedrock can drastically reduce costs, particularly when you need to minimize token usage.Process and clean your data to minimize token count, which is typically the main cost driver in Bedrock.In natural language processing (NLP) tasks, trim unnecessary words, use abbreviations, and avoid overly verbose inputs.

Example: Use Python’s nltk library to preprocess text and remove unnecessary words.

import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')

text = "This is an example sentence that we want to optimize."
stop_words = set(stopwords.words('english'))

optimized_text = ' '.join([word for word in text.split() if word.lower() not in stop_words])

print(optimized_text)

This reduces the number of tokens that need to be processed, lowering costs.

5. Implement Tagging for Cost Allocation

By tagging Amazon Bedrock resources (like models, jobs, and agents) with AWS cost allocation tags (e.g., cost centers, departments, business units), organizations can track spending more accurately. Tags help ensure that costs are allocated to the right business units or projects, allowing for better visibility and informed decision-making. Using cost allocation tags can help streamline cost management and reduce overspending by aligning usage with business priorities.

6. Optimize Cross-Service Data Transfers

Minimizing data transfer costs is key when dealing with large amounts of input and output data in Amazon Bedrock. Store input and output data in S3 buckets that are in the same region as Bedrock. This eliminates inter-region transfer costs. Compress your data before sending it to reduce network transfer size.

Example: To compress a file using gzip in Python before sending it:

import gzip
import shutil

with open('input.txt', 'rb') as f_in:
    with gzip.open('input.txt.gz', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

This reduces the data sent, cutting down on both storage and transfer costs.

7. Combine Spot Instances and Reserved Capacity

To optimize costs effectively, combine Spot Instances for flexible, non-critical tasks like model training with Reserved Capacity for predictable, long-term workloads. Spot Instances offer substantial savings by allowing you to run EC2 instances at a fraction of the cost of On-Demand instances, making them ideal for tasks that can tolerate interruptions. Meanwhile, Reserved Instances or savings plans provide cost reductions for consistent compute usage over a fixed term, offering substantial discounts for workloads with predictable demands.

By utilizing both, you can maximize savings across different workload types, ensuring cost-efficiency while maintaining the necessary compute resources.

Many companies, including ElysianNxt, Amazon Finance Automation, Namirial, Showpad, Infor, KONE, and others, leverage Amazon Bedrock for building advanced generative AI solutions.[5] These organizations use Bedrock to streamline processes, enhance productivity, improve response times, and drive innovation, all while reducing operational costs and adapting quickly to evolving customer needs.[6]

Conclusion 

To optimize Amazon Bedrock pricing, choose the right pricing model (On-Demand, Batch, or Provisioned Throughput) based on your workload and cost needs. On-Demand charges based on usage, Batch is suited for larger, less time-sensitive tasks with a lower cost, and Provisioned Throughput offers a fixed cost for consistent, high-throughput needs. Use smaller models, optimize throughput, and leverage cost-effective tools for tasks like automation and evaluation. Regularly monitor usage to ensure efficient and cost-effective AI-driven applications.

References 

1. Amazon Bedrock - Market Share, Competitor Insights in Data Science And Machine Learning

2. AWS Amazon Bedrock Documentation

3. Build Generative AI Applications with Foundation Models – Amazon Bedrock Pricing

4. Track, allocate, and manage your generative AI cost and usage with Amazon Bedrock | AWS Machine Learning Blog

5. Build Generative AI Applications with Foundation Models - Amazon Bedrock Customer Testimonials

6. Amazon Bedrock Customer Success Stories - YouTube

Subscribed !
Your information has been submitted
Oops! Something went wrong while submitting the form.

Similar Blog Posts

Maintain Control and Curb Wasted Spend!

Strategical use of SCPs saves more cloud cost than one can imagine. Astuto does that for you!