October 30, 2016

Mocking Endpoint Behaviours for Troubleshooting in ESB


This article explains some common timeout and latency scenarios in endpoints and information on troubleshooting these issues. Following three scenarios will be discussed.
  1. Backend responds as expected but response time is fairly low
  2. Backend timeout
  3. 101503 Error : Backend connection refused
  4. Unknown Host exception
In order to explain the different scenarios, the easiest approach is to create a mock service for troubleshooting. For this article I am creating a mock service with SOAPUI. You may use any other tool to do the same. Refer [1] on more details for creating mock services using SOAPUI.

Once we create a mock service it is accessible with http://localhost:8080/mockservice.

Case 1 : Endpoint Responds with Low Response Time

This case is fairly straightforward. First we need to create the mock service and point to the mock service url in the ESB endpoint definition. Usually the cases to verify are the behaviour at different responses. To simulate this we can create multiple responses and attach to the mock service. This allows us to validate the behaviour for different responses.


Case 2 : Endpoint Timeout

Refer below endpoint timeout duration in the endpoint definition. We have set it to be 3000ms.


<?xml version="1.0" encoding="UTF-8"?>
<endpoint name="MockServiceEndpoint" xmlns="http://ws.apache.org/ns/synapse">
    <http method="post" uri-template="http//localhost:8080/mockservice">
        <timeout>
            <duration>3000</duration>
            <responseAction>fault</responseAction>
        </timeout>
        <suspendOnFailure>
            <errorCodes>-1</errorCodes>
            <initialDuration>0</initialDuration>
            <progressionFactor>1.0</progressionFactor>
            <maximumDuration>0</maximumDuration>
        </suspendOnFailure>
        <markForSuspension>
            <errorCodes>-1</errorCodes>
        </markForSuspension>
    </http>
</endpoint>

In this case, since endpoint timeout duration is 3000ms, we can verify four cases where;

  1. No backend latency
  2. 2000 < timeout_duration <  3000 (latency close to 3000)
  3. timeout_duration  > 4000 (latency much higher than 3000)
  4. timeout_duration = 3000

From above four cases, 3 and 4 cases should cause endpoint to timeout. Other scenarios should respond and the rest of the mediation flow should work correct. In order to add the response latency to mock service, add a sleep command on the onRequest script of the mock service window as below.



Notice the delay in response time for the request that has gone past 3000ms in the below screenshot (left bottom)


By changing the sleep duration we can test out the behaviour of response time at the above four scenarios listed above.

Case 3 : 101503 - Endpoint Connection Refused

When faced with this error, a latency of ~10000 ms will be observed from the backend. Considering the fact that the endpoint timeout was 3000ms and it seemed highly improbable to exceed this number.

In order to mock this scenario, you need to host the mock service in a different machine in the same network and point to it as the endpoint (referred to as the mock server hereafter). While requests were sent out, the ‘mock server’ was disconnected from the network and ~10000ms of latency was observed. When ‘mock server’ is reconnected to the network normal behaviour is observed again.

To explain further the latency of ~10000ms is caused by connection refused error where a hostname is valid by DNS but is no longer available. This is why the ‘mock server’ had to be disconnected from the network to mock this behaviour. This is different from Case 4 described below.

Another way to mock this behaviour is to use an IP address which timeouts when pinged as the hostname in the endpoint definition.

Explanation on the Observation from WSO2 ESB perspective

In ESB, whenever a request is made a callback is registered for the request. This callback is responsible for sending back the response and doing the required processing. When backend does not response, these callbacks have to be cleared. Clearing is done by a task (TimeoutHandler) which runs every 15000 ms (by default).  This time duration can be controlled by overriding the default time duration with “synapse.timeout_handler_interval” property in ‘synapse.properties’ file.

To further elaborate, what happens here is that TimeoutHandler is executed every 15s and the callbacks get cleared within a 15s timeframe. Thus the response time will be somewhere below 15s, yet above 3s (to allow initial endpoint timeout duration). Attached graph of response time at default timeout for the mock API we created further explains this fact.


Response time will go to a lower range when we reduce the timeout_handler_interval. However, this means a higher frequency of timeout handler task being executed and will in turn impact performance.

To confirm the above description please refer the following observations when timeout_handler_interval is reduced.

Timeout_handler_interval = 5000ms

Timeout_handler_interval = 1000ms

Case 4 : Unknown Host Exception

Another scenario that was verified was giving an invalid host name, which resulted in an ‘Unknown Host Exception’, but this is handled within the 3000ms timespan.

Apart from these, there are a number of scenarios related to troubleshooting issues in endpoints. Refer for possible error scenarios in endpoints in [2]

References

[1] https://www.soapui.org/soap-mocking/getting-started.html
[2] https://docs.wso2.com/display/ESB480/Error+Handling#ErrorHandling-codes