This document describes domain failure retry processing in the Oracle WebLogic Server in Kubernetes environment.
The WebLogic Kubernetes Operator may encounter various failures during its processing of a Domain resource.
Failures are reported using Kubernetes events and conditions
in the status.conditions field in the Domain resource.
See Domain debugging.
Failures fall into different categories and are handled differently by the operator, where most failures lead to automatic retries.
Refer to Retry behavior on tuning failure retry limits and intervals.
Domain resource failures fall into three severity levels:
DeadlineExceeded error).SEVERE errors in the introspector log that do not contain the special marker string FatalIntrospectorError.FatalIntrospectorError.Severe level that have reached the expected maximum retry time.For reasons for Domain failures, see Domain failure reasons.
Domains that have failures with a severity of Fatal or Warning will not be retried. The domain status should contain a message indicating what action is needed to fix the failure condition.
Domains failures with a severity of Severe will be retried as follows:
lastFailureTime field in the domain status.failureRetryIntervalSeconds field in the Domain spec. It has a default
value of 120 seconds. A value of zero seconds means retry immediately after failure.initialFailureTime field the domain status.failureRetryLimitMinutes field in the Domain spec. It has a default value of 1440 minutes (24 hours).
A value of zero minutes will disable retries, which can be useful for accessing log files for debugging purposes.The following is an example of domain status showing a failure with pending retries. This Domain resource is configured to have a
failureRetryLimitMinutes of 10 minutes. Note that the next retry is 120 seconds after the Last Failure Time,
and the retry until time is 10 minutes after the Initial Failure Time.
Status:
...
Initial Failure Time: 2022-10-11T23:16:21.851801Z
Last Failure Time: 2022-10-11T23:21:53.109997Z
Message: Failure on pod 'domain1-introspector-hlvwt' in namespace 'default': Back-off pulling image "oracle/weblogic:12214". Will retry next at 2022-10-11T23:23:53.109997240Z and approximately every 120 seconds afterward until 2022-10-11T23:26:21.851801Z if the failure is not resolved.
In this example, all retries failed to start the domain before the predefined retry time limit, and the domain status shows a Fatal failure with Aborted reason.
Status:
Clusters:
Conditions:
Last Transition Time: 2022-10-11T23:26:34.107662Z
Message: The operator failed after retrying for 10 minutes. This time limit may be specified in spec.failureRetryLimitMinutes. Please resolve the error and then update domain.spec.introspectVersion to force another retry.
Reason: Aborted
Severity: Fatal
Status: True
...
To manually initiate an immediate retry, or to restart retries that have reached their
spec.failureRetryLimitMinutes, update a domain field that will cause immediate action by the operator.
For example, change spec.introspectVersion or spec.restartVersion as appropriate.
See Startup and shutdown
and Initiating introspection
The following is a list of reasons for failures that may be encountered by the operator while processing a Domain resource.
| Domain Failure Reason | Description |
|---|---|
DomainInvalid |
One of more configuration validation errors in the Domain resource, such as the domainUID is too long, or configuration overrides are used in a Model In Image domain. |
Introspection |
One or more SEVERE log messages is found in the introspector’s log file. |
Kubernetes |
Unrecoverable response code received from a Kubernetes API call. |
ServerPod |
One or more WebLogic Server pods failed or did not get into the ready state within a predefined maximum wait time as configured in spec.serverPod.maxReadyWaitTimeSeconds in the Domain resource, or the introspector job pod did not complete. |
ReplicasTooHigh |
The replicas field is set or changed to a value that exceeds the maximum number of servers in the WebLogic cluster configuration. |
Internal |
The operator encountered an internal exception while processing the Domain resource. |
TopologyMismatch |
One or more servers or clusters configured in the domain resource do not exist in the WebLogic domain configuration, or the monitoring exporter port is specified and it conflicts with a server port. |
Aborted |
The introspector encountered a fatal error or the operator has exceeded the maximum retry time. |