Caching for Microservices

Saurav Kumar
11 min readNov 17, 2023


What is Caching?

Caching is a technique used in computer science and software development to temporarily store copies of frequently accessed or computationally expensive data in order to reduce the time or resources required to fetch the data again. The primary purpose of caching is to improve the performance and efficiency of a system by providing quicker access to data.

In a caching system, there is a cache — a high-speed storage layer — that sits between the data source (e.g., a database, a web service, or an API) and the application that needs the data. When the application requests data, the caching system first checks if the data is already in the cache. If the data is present, it can be retrieved more quickly than fetching it from the original source. If the data is not in the cache, the system fetches it from the source and stores a copy in the cache for future use.

Caching mechanisms commonly used in various architectures

1. Embedded Cache:

In an embedded cache, caching functionality is integrated directly within the application or service, rather than relying on an external caching layer or system. The cache is part of the application’s runtime environment, providing an in-memory storage space for frequently accessed data. This approach is often straightforward to implement and is suitable for scenarios where a lightweight, in-process cache is sufficient.


  1. Ehcache: Ehcache is a widely used open-source Java-based caching library. It allows developers to easily integrate caching into their Java applications. Ehcache provides features like in-memory caching, disk storage for overflow, and support for distributed caching.
  2. Caffeine: Caffeine is a high-performance, near-optimal caching library for Java 8 and above. It offers in-memory caching with features such as automatic removal of entries based on various policies, asynchronous loading, and support for maximum size and expiration.

Use Case:

Imagine a web application that displays a list of popular articles. The list is generated by querying a database, and the same set of articles is requested frequently. To optimize performance, the application can use an embedded cache. Here’s how it might work:

  1. Data Retrieval: When the application receives a request for the list of popular articles, it first checks the embedded cache.
  2. Cache Hit: If the list of articles is found in the cache (cache hit), the application retrieves the data from the cache, avoiding the need to query the database.
  3. Cache Miss: If the list of articles is not in the cache (cache miss), the application queries the database to fetch the data.
  4. Update Cache: After fetching the data from the database, the application updates the embedded cache with the newly retrieved list of popular articles.
  5. Subsequent Requests: For subsequent requests for the same data, the application can quickly retrieve it from the embedded cache, improving response times.


  • Embedded caching is well-suited for small to medium-sized applications or scenarios where simplicity and low latency are more critical than extensive scalability.

2. Client-Server Cache:

In a Client-Server Cache architecture, caching is performed either at the client-side, server-side, or both. This means that either the client, the server, or both entities in a communication exchange cache data to improve performance and reduce the need for repeated requests to the original data source.


Client-Side Caching:

  • Description: Web browsers often use client-side caching to store static assets locally, such as images, stylesheets, and scripts.
  • Use Case: When a user visits a website, the browser caches these static resources. If the user revisits the same website, the browser can retrieve these assets from its local cache rather than re-downloading them from the server.
  • Considerations: Cache-control headers (e.g., Cache-Control and Expires) are essential to manage how long resources are stored in the client's cache and ensure that updated resources are fetched when necessary.

Server-Side Caching:

  • Description: The server caches responses to specific requests, reducing the need to repeatedly generate the same response for identical requests.
  • Use Case: In a web server, if a resource-intensive query is made, the server can cache the result. Subsequent requests for the same data can be served directly from the cache, reducing the load on the backend system.
  • Considerations: Cache management policies, such as cache expiration times and cache invalidation strategies, are crucial to ensure that the server cache remains up-to-date.

3. Distributed Cache / Cloud Cache:

Distributed caching involves the use of a caching system that spans multiple nodes or servers, allowing for the storage and retrieval of data across a distributed environment. This type of caching is particularly useful in scenarios where a centralized cache on a single server is not sufficient, and the application needs to scale horizontally.


  1. Redis: Redis is an in-memory data structure store that can be used as a distributed cache. It supports various data structures, and its fast read and write operations make it suitable for caching frequently accessed data in a distributed system.
  2. Memcached: Memcached is another popular distributed caching system. It is a high-performance, distributed memory caching system that can store key-value pairs and is commonly used to accelerate dynamic web applications.

Use Case:

  • Consider a microservices architecture where multiple services need quick access to shared data. A distributed caching system can store this shared data across multiple nodes, reducing the need for each service to make frequent requests to the original data source, such as a database.

Example Scenario:

Consider an e-commerce platform where product information is frequently accessed by multiple services responsible for displaying product details, managing inventory, and processing orders. Using a distributed cache like Redis or Memcached allows for the quick retrieval of product information, reducing the load on the product database and improving overall system performance. Each microservice can access the distributed cache to obtain the latest product details without directly querying the database for every request.

4. Reverse-Proxy Cache:

A reverse proxy cache is a server that sits between client devices (such as web browsers) and web servers, acting as an intermediary for requests. It is called a “reverse” proxy because it handles requests on behalf of the server, as opposed to a traditional forward proxy that handles requests on behalf of the client. The primary function of a reverse proxy cache is to store and serve cached copies of responses from backend servers to improve performance and reduce the load on those servers.

Examples of Reverse-Proxy Cache:

  1. Nginx: Nginx is a popular web server and reverse proxy that can also function as a caching server. It can be configured to cache static content, such as images, stylesheets, and even dynamic content, reducing the load on backend servers.
  2. Varnish: Varnish is a powerful HTTP accelerator and reverse proxy cache. It is designed to cache entire web pages and accelerate content delivery. Varnish is often used in front of web servers like Apache or Nginx.

5. Side-Car Cache:

In a microservices architecture, a sidecar is a secondary container that runs alongside a main application container. A sidecar cache, in the context of caching mechanisms, involves placing a caching system in a sidecar container. This allows the main application to offload caching responsibilities to the sidecar, which manages the caching logic independently.


Let’s consider a scenario where multiple microservices in a Kubernetes cluster need to cache certain data to improve performance. Each microservice is accompanied by a sidecar container running a caching system like Redis or Memcached. The main microservice communicates with its respective sidecar to store and retrieve cached data.

6. Reverse-Proxy Side-Car Cache:

Combining both the reverse-proxy caching functionality and the side-car caching approach. This involves having a reverse proxy server (like Nginx or Varnish) handling caching at the network level and side-car containers handling caching at the application level.

Example Use Case:

  • Consider a microservices architecture with multiple services, each having its own dedicated caching needs.
  • Nginx or Varnish serves as a reverse proxy, caching common static content at the network level, and providing fast responses to clients.
  • Each microservice has its own side-car cache container (e.g., using Redis or Memcached) for caching dynamic or service-specific data.


  • Isolation of Concerns: The reverse proxy handles network-level caching, and side-car caches handle application-level caching, allowing for clear isolation of concerns.
  • Configuration Complexity: Managing configurations for both reverse proxy caching and side-car caching may introduce complexity, requiring careful planning.
  • Resource Usage: Running multiple caching components might increase resource consumption, so resource allocation should be optimized.

By combining a reverse proxy for network-level caching and side-car containers for application-level caching, the reverse proxy side-car cache architecture provides a flexible and scalable solution for managing caching in a microservices environment. Proper configuration, cache invalidation strategies, and monitoring are essential to ensure optimal performance and consistency.

What is Cache Eviction Policy?

A cache eviction policy defines the rules and criteria used to determine which items (entries or records) in a cache should be removed or “evicted” when the cache reaches its capacity limit. Caches have finite storage, and when new data needs to be stored but the cache is full, eviction policies help decide which existing items to remove to make room for the new ones.

There are several common cache eviction policies, each with its own characteristics and use cases. Here are some notable ones:

  1. Least Recently Used (LRU):
  • Description: Evicts the least recently accessed items first.
  • Logic: Items that haven’t been accessed for the longest time are considered less likely to be used soon.
  • Advantages: Simple and often effective for scenarios where recent access patterns are relevant.
  • Considerations: Requires tracking access times, which can introduce additional overhead.

2. Most Recently Used (MRU):

  • Description: Evicts the most recently accessed items first.
  • Logic: Assumes that recently accessed items are more likely to be accessed again soon.
  • Advantages: Simple, and can be effective for scenarios with a focus on recent access patterns.
  • Considerations: This may not perform well in situations where there is a mix of short-term and long-term reuse.

3. Least Frequently Used (LFU):

  • Description: Evicts the least frequently accessed items first.
  • Logic: Items with the lowest access frequency are considered less likely to be used soon.
  • Advantages: Effective in scenarios where access frequencies vary widely.
  • Considerations: Requires tracking access frequencies, which can add computational overhead.

4. Random Replacement (RR):

  • Description: Evicts a randomly selected item.
  • Logic: Simple and avoids the need for detailed tracking of access patterns.
  • Advantages: Simplicity and ease of implementation.
  • Considerations: May not be as effective as more sophisticated algorithms in certain scenarios.

5. First-In-First-Out (FIFO):

  • Description: Evicts the oldest items first based on their arrival time in the cache.
  • Logic: Items that have been in the cache the longest are evicted first.
  • Advantages: Simple and easy to implement.
  • Considerations: May not be optimal for scenarios where access patterns change over time.

6. Adaptive Replacement Cache (ARC):

  • Description: Dynamically adjusts between LRU and LFU based on recent access patterns.
  • Logic: Tries to combine the advantages of LRU and LFU by dynamically adapting to workload changes.
  • Advantages: Adapts well to varying access patterns.
  • Considerations: More complex to implement compared to basic eviction policies.

7. Last In First Out (LIFO):

  • Description: LIFO (Last-In-First-Out) is a cache eviction policy where the most recently added item is the first to be removed when the cache reaches its capacity limit.
  • Logic: This policy assumes that the most recently added items are more likely to be accessed in the near future, making them more relevant.
  • Advantage: Simple and easy to implement, requiring minimal tracking of access times.
  • Considerations: This may not perform well in scenarios where access patterns do not align with the recency of data additions.

The choice of a cache eviction policy depends on the specific requirements and characteristics of the application and the nature of the data access patterns. Different policies may be more suitable for different scenarios, and some caching systems may allow for the configuration of custom eviction policies based on the application’s needs.

Why to use Cache?

  1. Faster Response Times: Caching allows frequently accessed data to be stored in a faster-access medium, such as memory so that subsequent requests for the same data can be served more quickly. This results in reduced response times and improved user experience.
  2. Reduced Latency: By storing copies of data closer to the point of access, caching reduces the need to fetch the data from the original source, such as a database or an external service. This helps in minimizing network latency and improves overall system responsiveness.
  3. Scalability:
    Caching helps distribute the load on backend services and databases by serving cached data, reducing the overall demand for resources. This is particularly important in microservices architectures and distributed systems.
  4. Bandwidth Conservation: Caching reduces the amount of data that needs to be transmitted over the network, conserving bandwidth. This is particularly beneficial in scenarios where network resources are limited or expensive.
  5. Resource Optimization: Retrieving data from a cache is often less resource-intensive than fetching it from the original source. Caching helps optimize resource usage, as it involves fewer computational and I/O operations.
  6. Improved User Experience: Faster response times and reduced latency contribute to an improved user experience. Applications that load quickly and respond promptly to user interactions tend to be more user-friendly and engaging.
  7. High Availability: Caching can contribute to improved system availability by reducing the reliance on external dependencies. In scenarios where external services are slow or temporarily unavailable, cached data can still be served.
  8. Load Balancing: Caching helps distribute the load more evenly across different components of a system. By serving cached content, the demand on backend servers is reduced, contributing to better load balancing.
  9. Cost Reduction: Caching can lead to cost savings by minimizing the need for expensive computational resources or reducing the consumption of external services. It allows organizations to achieve better performance without a proportional increase in infrastructure costs.
  10. Offline Access: Cached data can be useful for providing functionality even when the application is offline or when there are connectivity issues. Users can still access cached content, improving the robustness of the application.

How to request Cache?

  1. Check the Cache: Before making a request to the original data source (e.g., a database or a service), the application checks the cache to see if the required data is already present.
  2. Generate a Cache Key: To uniquely identify the data in the cache, a cache key is generated based on the parameters of the request. The cache key should be unique for each set of parameters, ensuring that different requests are stored separately in the cache.
  3. Lookup in the Cache: The application performs a lookup in the cache using the cache key. If the data associated with the cache key is found, it is considered a cache hit, and the cached data can be returned without accessing the original data source.
  4. Handle Cache Hit: If the cache lookup is successful (cache hit), the application retrieves the cached data and uses it as needed. This process helps avoid the overhead of fetching the data from the original source.
  5. Handle Cache Miss: If the cache lookup is unsuccessful (cache miss), meaning the required data is not in the cache, the application proceeds to fetch the data from the original data source.
  6. Update the Cache: After fetching the data from the original source, the application updates the cache with the newly retrieved information. This helps improve performance for subsequent requests for the same data.



Saurav Kumar

Experienced Software Engineer adept in Java, Spring Boot, Microservices, Kafka & Azure.