On removal of memcache SG or DNS incomplete flush from old cache IPs, front end fails to load due to timeout errors

Description

This is in continuation from follow up from the first war game.
When newly created ehCache cluster without SG, pointing to it actually fails front end to load as DNS tries to talks to EC and keeps timing out (heaps of times in single request - 2.5s/request).

WARN ElasticCacheImpl:180 Error calling memcached operation 'get' with key entry with key 'dev::cloudIdTenantId::00742506-9288-4ed5-aa55-823e5b4a7f79': OperationTimeoutException: Timeout waiting for value: waited 2,500 ms. Node status: Connection Status { /172.31.30.98:11211 active: false, authed: true, last read: 12,516 ms ago }

Whereas we expect the request to move forward and hit the DB.

Even when we stop cache cluster, if DNS does not flushes out in time , then we still try to reconnect old cache ips falling into same reconnecting loop, where repeatedly retrying even on failure is messing the whole thing.

Environment

None

Assignee

Sam Harding

Reporter

Sherry Goyal

Labels

Links

None

Sprint

None

Fix versions

Priority

High
Configure