This document describes domain failure retry processing in the Oracle WebLogic Server in Kubernetes environment.
The WebLogic Kubernetes Operator may encounter various failures during its processing of a Domain resource.
Failures are reported using Kubernetes events and conditions
in the status.conditions
field in the Domain resource.
See Domain debugging.
Failures fall into different categories and are handled differently by the operator, where most failures lead to automatic retries.
Refer to Retry behavior on tuning failure retry limits and intervals.
Domain resource failures fall into three severity levels:
DeadlineExceeded
error).SEVERE
errors in the introspector log that do not contain the special marker string FatalIntrospectorError
.FatalIntrospectorError
.Severe
level that have reached the expected maximum retry time.For reasons for Domain failures, see Domain failure reasons.
Status:
...
Conditions:
Last Transition Time: 2022-10-10T23:48:09.157398Z
Message: 10 replicas specified for cluster 'cluster-1' which has a maximum cluster size of 5
10 replicas specified for cluster 'cluster-2' which has a maximum cluster size of 2
Reason: ReplicasTooHigh
Severity: Warning
Status: True
Type: Failed
...
Domains that have failures with a severity of Fatal
or Warning
will not be retried. The domain status should contain a message indicating what action is needed to fix the failure condition.
Domains failures with a severity of Severe
will be retried as follows:
lastFailureTime
field in the domain status.failureRetryIntervalSeconds
field in the Domain spec. It has a default
value of 120 seconds. A value of zero seconds means retry immediately after failure.initialFailureTime
field the domain status.failureRetryLimitMinutes
field in the Domain spec. It has a default value of 1440 minutes (24 hours).
A value of zero minutes will disable retries, which can be useful for accessing log files for debugging purposes.The following is an example of domain status showing a failure with pending retries. This Domain resource is configured to have a
failureRetryLimitMinutes
of 10 minutes. Note that the next retry is 120 seconds after the Last Failure Time
,
and the retry until time is 10 minutes after the Initial Failure Time
.
Status:
...
Initial Failure Time: 2022-10-11T23:16:21.851801Z
Last Failure Time: 2022-10-11T23:21:53.109997Z
Message: Failure on pod 'domain1-introspector-hlvwt' in namespace 'default': Back-off pulling image "oracle/weblogic:12214". Will retry next at 2022-10-11T23:23:53.109997240Z and approximately every 120 seconds afterward until 2022-10-11T23:26:21.851801Z if the failure is not resolved.
In this example, all retries failed to start the domain before the predefined retry time limit, and the domain status shows a Fatal
failure with Aborted
reason.
Status:
Clusters:
Conditions:
Last Transition Time: 2022-10-11T23:26:34.107662Z
Message: The operator failed after retrying for 10 minutes. This time limit may be specified in spec.failureRetryLimitMinutes. Please resolve the error and then update domain.spec.introspectVersion to force another retry.
Reason: Aborted
Severity: Fatal
Status: True
...
To manually initiate an immediate retry, or to restart retries that have reached their
spec.failureRetryLimitMinutes
, update a domain field that will cause immediate action by the operator.
For example, change spec.introspectVersion
or spec.restartVersion
as appropriate.
See Startup and shutdown
and Initiating introspection
The following is a list of reasons for failures that may be encountered by the operator while processing a Domain resource.
Domain Failure Reason | Description |
---|---|
DomainInvalid |
One of more configuration validation errors in the Domain resource, such as the domainUID is too long, or configuration overrides are used in a Model In Image domain. |
Introspection |
One or more SEVERE log messages is found in the introspector’s log file. |
Kubernetes |
Unrecoverable response code received from a Kubernetes API call. |
ServerPod |
One or more WebLogic Server pods failed or did not get into the ready state within a predefined maximum wait time as configured in spec.serverPod.maxReadyWaitTimeSeconds in the Domain resource, or the introspector job pod did not complete. |
ReplicasTooHigh |
The replicas field is set or changed to a value that exceeds the maximum number of servers in the WebLogic cluster configuration. |
Internal |
The operator encountered an internal exception while processing the Domain resource. |
TopologyMismatch |
One or more servers or clusters configured in the domain resource do not exist in the WebLogic domain configuration, or the monitoring exporter port is specified and it conflicts with a server port. |
Aborted |
The introspector encountered a fatal error or the operator has exceeded the maximum retry time. |