This document describes domain failure retry processing in the Oracle WebLogic Server in Kubernetes environment.
The WebLogic Kubernetes Operator may encounter various failures during its processing of a Domain resource.
Failures are reported using Kubernetes events and conditions
status.conditions field in the Domain resource.
See Domain debugging.
Failures fall into different categories and are handled differently by the operator, where most failures lead to automatic retries.
Refer to Retry behavior on tuning failure retry limits and intervals.
Domain resource failures fall into three severity levels:
SEVEREerrors in the introspector log that do not contain the special marker string
Severelevel that have reached the expected maximum retry time.
For reasons for Domain failures, see Domain failure reasons.
Status: ... Conditions: Last Transition Time: 2022-10-10T23:48:09.157398Z Message: 10 replicas specified for cluster 'cluster-1' which has a maximum cluster size of 5 10 replicas specified for cluster 'cluster-2' which has a maximum cluster size of 2 Reason: ReplicasTooHigh Severity: Warning Status: True Type: Failed ...
Domains that have failures with a severity of
Warning will not be retried. The domain status should contain a message indicating what action is needed to fix the failure condition.
Domains failures with a severity of
Severe will be retried as follows:
lastFailureTimefield in the domain status.
failureRetryIntervalSecondsfield in the Domain spec. It has a default value of 120 seconds. A value of zero seconds means retry immediately after failure.
initialFailureTimefield the domain status.
failureRetryLimitMinutesfield in the Domain spec. It has a default value of 1440 minutes (24 hours). A value of zero minutes will disable retries, which can be useful for accessing log files for debugging purposes.
The following is an example of domain status showing a failure with pending retries. This Domain resource is configured to have a
failureRetryLimitMinutes of 10 minutes. Note that the next retry is 120 seconds after the
Last Failure Time,
and the retry until time is 10 minutes after the
Initial Failure Time.
Status: ... Initial Failure Time: 2022-10-11T23:16:21.851801Z Last Failure Time: 2022-10-11T23:21:53.109997Z Message: Failure on pod 'domain1-introspector-hlvwt' in namespace 'default': Back-off pulling image "oracle/weblogic:12214". Will retry next at 2022-10-11T23:23:53.109997240Z and approximately every 120 seconds afterward until 2022-10-11T23:26:21.851801Z if the failure is not resolved.
In this example, all retries failed to start the domain before the predefined retry time limit, and the domain status shows a
Fatal failure with
Status: Clusters: Conditions: Last Transition Time: 2022-10-11T23:26:34.107662Z Message: The operator failed after retrying for 10 minutes. This time limit may be specified in spec.failureRetryLimitMinutes. Please resolve the error and then update domain.spec.introspectVersion to force another retry. Reason: Aborted Severity: Fatal Status: True ...
To manually initiate an immediate retry, or to restart retries that have reached their
spec.failureRetryLimitMinutes, update a domain field that will cause immediate action by the operator.
For example, change
spec.restartVersion as appropriate.
See Startup and shutdown
and Initiating introspection
The following is a list of reasons for failures that may be encountered by the operator while processing a Domain resource.
|Domain Failure Reason||Description|
||One of more configuration validation errors in the Domain resource, such as the
||One or more
||Unrecoverable response code received from a Kubernetes API call.|
||One or more WebLogic Server pods failed or did not get into the ready state within a predefined maximum wait time as configured in
||The replicas field is set or changed to a value that exceeds the maximum number of servers in the WebLogic cluster configuration.|
||The operator encountered an internal exception while processing the Domain resource.|
||One or more servers or clusters configured in the domain resource do not exist in the WebLogic domain configuration, or the monitoring exporter port is specified and it conflicts with a server port.|
||The introspector encountered a fatal error or the operator has exceeded the maximum retry time.|