My current client has a service which connects to an old IBM z/OS application (legacy system). The data centre charges for each message sent to this legacy system, rather than using a processor or hardware based pricing model. The output from this legacy system is always the same, since the calculations are idempotent. The application calculates prices for travelling along a given route of the train network. Prices are changed only twice a year through an administration tool. So in order to save money (a hundered thousand dollars a year), the service which connects to this legacy system had an in-memory least-recently-used (LRU) cache built into it, which removes the least recently used entries when it gets full in order to make space for new entries. The cache is quite large, thus avoiding making costly calls to the legacy system. To avoid losing the cache content upon server restarts, a background task was later built to periodically persist the latest data inside this cache. Upon starting a server, the cache content is re-read. Entries within the cache have a TTL (time to live) so that stale entries are discarded and re-fetched from the legacy system. This cache was great in the beginning because it saved the customer a lot of money, but in the mean time several similar caches have been added, as well as more general caches for avoiding repeated database reads, causing our nodes to need over 1.5 GB of RAM. Analysis has showed that the caches are…