Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns: introduce CRD DNS architecture and DNS status reporting #251

Merged
merged 7 commits into from
Jun 19, 2019

Conversation

ironcladlou
Copy link
Contributor

  • Introduce s dnsrecord.ingress.operator.openshift.io resource type to represent
    DNS records in a Kube-native API.

  • Refactor ingress controller to manage dnsrecords rather than inferring DNS
    record requirements indirectly through services.

  • Teach the ingresscontroller status computer about dnsrecords so DNS management
    status can surface through all operator CRDs in a way consistent with other
    resources (e.g. services).

  • Introduce a DNS controller which syncs dnsrecords with the cluster's
    configured DNS zones, reporting status on the dnsrecord.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 13, 2019
@ironcladlou
Copy link
Contributor Author

Here are some status examples:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec: {}
status:
  availableReplicas: 2
  conditions:
  - lastTransitionTime: "2019-06-13T15:26:46Z"
    status: "True"
    type: Available
  - lastTransitionTime: "2019-06-13T15:26:16Z"
    message: The endpoint publishing strategy supports a managed load balancer
    reason: WantedByEndpointPublishingStrategy
    status: "True"
    type: LoadBalancerManaged
  - lastTransitionTime: "2019-06-13T15:26:19Z"
    message: The LoadBalancer service is provisioned
    reason: LoadBalancerProvisioned
    status: "True"
    type: LoadBalancerReady
  - lastTransitionTime: "2019-06-13T15:26:16Z"
    message: DNS management is supported and zones are specified in the cluster DNS
      config.
    reason: Normal
    status: "True"
    type: DNSManaged
  - lastTransitionTime: "2019-06-13T17:29:32Z"
    message: The record is provisioned in all reported zones.
    reason: NoFailedZones
    status: "True"
    type: DNSReady
  domain: apps.dmace.devcluster.openshift.com
  endpointPublishingStrategy:
    type: LoadBalancerService
  selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default

apiVersion: ingress.operator.openshift.io/v1
kind: DNSRecord
metadata:
  creationTimestamp: "2019-06-13T16:00:59Z"
  finalizers:
  - operator.openshift.io/ingress
  generation: 1
  labels:
    ingresscontroller.operator.openshift.io/owning-ingresscontroller: default
  name: default-wildcard
  namespace: openshift-ingress-operator
  ownerReferences:
  - apiVersion: operator.openshift.io/v1
    controller: true
    kind: IngressController
    name: default
    uid: 953bfbd0-8def-11e9-b5d6-02915d60e384
  resourceVersion: "58271"
  selfLink: /apis/ingress.operator.openshift.io/v1/namespaces/openshift-ingress-operator/dnsrecords/default-wildcard
  uid: 6fdb10e0-8df4-11e9-b5d6-02915d60e384
spec:
  dnsName: '*.apps.dmace.devcluster.openshift.com'
  recordType: ALIAS
  targets:
  - a95dad0368def11e9b5d602915d60e38-93147362.us-east-2.elb.amazonaws.com
status:
  zones:
  - dnsZone:
      tags:
        Name: dmace-rl775-int
        kubernetes.io/cluster/dmace-rl775: owned
  - dnsZone:
      id: Z3URY6TWQ91KVV

@ironcladlou
Copy link
Contributor Author

Here's what things look like when dnsrecord status doesn't report any zones:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  creationTimestamp: "2019-06-13T15:26:15Z"
  finalizers:
  - ingresscontroller.operator.openshift.io/finalizer-ingresscontroller
  generation: 1
  name: default
  namespace: openshift-ingress-operator
  resourceVersion: "58272"
  selfLink: /apis/operator.openshift.io/v1/namespaces/openshift-ingress-operator/ingresscontrollers/default
  uid: 953bfbd0-8def-11e9-b5d6-02915d60e384
spec: {}
status:
  availableReplicas: 2
  conditions:
  - lastTransitionTime: "2019-06-13T15:26:46Z"
    status: "True"
    type: Available
  - lastTransitionTime: "2019-06-13T15:26:16Z"
    message: The endpoint publishing strategy supports a managed load balancer
    reason: WantedByEndpointPublishingStrategy
    status: "True"
    type: LoadBalancerManaged
  - lastTransitionTime: "2019-06-13T15:26:19Z"
    message: The LoadBalancer service is provisioned
    reason: LoadBalancerProvisioned
    status: "True"
    type: LoadBalancerReady
  - lastTransitionTime: "2019-06-13T15:26:16Z"
    message: DNS management is supported and zones are specified in the cluster DNS
      config.
    reason: Normal
    status: "True"
    type: DNSManaged
  - lastTransitionTime: "2019-06-13T15:26:19Z"
    message: The record isn't present in any zones.
    reason: NoZones
    status: "False"
    type: DNSReady
  domain: apps.dmace.devcluster.openshift.com
  endpointPublishingStrategy:
    type: LoadBalancerService
  selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default

apiVersion: ingress.operator.openshift.io/v1
kind: DNSRecord
metadata:
  creationTimestamp: "2019-06-13T16:00:59Z"
  finalizers:
  - operator.openshift.io/ingress
  generation: 1
  labels:
    ingresscontroller.operator.openshift.io/owning-ingresscontroller: default
  name: default-wildcard
  namespace: openshift-ingress-operator
  ownerReferences:
  - apiVersion: operator.openshift.io/v1
    controller: true
    kind: IngressController
    name: default
    uid: 953bfbd0-8def-11e9-b5d6-02915d60e384
  resourceVersion: "39161"
  selfLink: /apis/ingress.operator.openshift.io/v1/namespaces/openshift-ingress-operator/dnsrecords/default-wildcard
  uid: 6fdb10e0-8df4-11e9-b5d6-02915d60e384
spec:
  dnsName: '*.apps.dmace.devcluster.openshift.com'
  recordType: ALIAS
  targets:
  - a95dad0368def11e9b5d602915d60e38-93147362.us-east-2.elb.amazonaws.com
status: {}


// If the DNS record was deleted, clean up and return.
if record.DeletionTimestamp != nil {
for i := range zones {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we look at record.Status.Zones instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
if len(errs) == 0 {
updated := record.DeepCopy()
if slice.ContainsString(updated.Finalizers, "operator.openshift.io/ingress") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a constant for the annotation

})
}

// TODO: only update if status changed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement this and the AWS client caching can be deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved the TODO but don't intend to fix now because:

  1. It's a bit of work to get right
  2. It involves refactoring the DNS provider implementations
  3. We're already better off than before because we now only call ensure on creation (so once during startup, and once during ingresscontroller creation), so fixing the last problem here has low ROI for this particular PR (which is already huge)

if dnsConfig.Spec.PublicZone == nil && dnsConfig.Spec.PrivateZone == nil {
return []operatorv1.OperatorCondition{
operatorv1.OperatorCondition{
Type: operatorv1.DNSManagedIngressConditionType,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the e2e test, if this condition is true, wait for DNS on the ingress status to be ready

@knobunc
Copy link

knobunc commented Jun 14, 2019

The CI failure was caused by https://bugzilla.redhat.com/show_bug.cgi?id=1720678. It should be fixed shortly.

@ironcladlou ironcladlou changed the title WIP: dns: introduce CRD DNS architecture and DNS status reporting dns: introduce CRD DNS architecture and DNS status reporting Jun 14, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 14, 2019
@ironcladlou
Copy link
Contributor Author

I think this is actually ready for review

// The targets the DNS record points to
Targets []string `json:"targets,omitempty"`
// RecordType type of record, e.g. CNAME, A, SRV, TXT etc
RecordType string `json:"recordType,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a new string type and define named constants for the record types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and switched A to CNAME

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You added a type definition for DNSRecordType but continue to use string instead of using DNSRecordType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually intentional for now due to a random issue I was having with generators that will take some time to run down — I need to do a followup here for deepcopy scripts and cleaning up these types.

Also file under: should this be a "v1" API? It's not yet set in stone.

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status

// Record represents a DNS record.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RecordDNSRecord.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return fmt.Errorf("missing alias record")
}
domain, target := alias.Domain, alias.Target
domain, target := record.Spec.DNSName, record.Spec.Targets[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a "TODO" not to ignore all but the first target.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -76,7 +79,7 @@ func (m *manager) Ensure(record *dns.Record) error {
context.TODO(),
*targetZone,
client.ARecord{
Address: record.ARecord.Address,
Address: record.Spec.Targets[0],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a "TODO" not to ignore all but the first target.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -102,7 +105,7 @@ func (m *manager) Delete(record *dns.Record) error {
context.TODO(),
*targetZone,
client.ARecord{
Address: record.ARecord.Address,
Address: record.Spec.Targets[0],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this needs a "TODO".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return fmt.Errorf("failed to get current wildcard dnsrecord: %v", err)
} else {
if record != nil {
log.V(1).Info("waiting for wildcard dnsrecord to be deleted", "dnsrecord", record.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere we pass record in its entirety to log.Info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Domain: domain,
Target: target,
},
desired := desiredWildcardRecord(ic, service)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could service.Status.LoadBalancer.Ingress change from non-empty to empty, and should we handle that case? Probably that would only happen in case of an intermittent outage, in which case it leaving the old record might be preferable. In any case, I'd like us to make a deliberate choice, and document it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO with these considerations. Not going to be able to solve it here yet.

if err := r.client.Create(context.TODO(), desired); err != nil {
return nil, fmt.Errorf("failed to create dnsrecord %s/%s: %v", desired.Namespace, desired.Name, err)
}
log.Info("created dnsrecord", "namespace", desired.Namespace, "name", desired.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we log desired in its entirety?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

var recordType string

if len(ingress.Hostname) > 0 {
recordType = "ALIAS"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ALIAS" is AWS-specific. Let's document that and think about where the logic to choose between "ALIAS" or "CNAME" belongs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments

}
ingress := &operatorv1.IngressController{}
if err := r.cache.Get(context.TODO(), types.NamespacedName{Namespace: record.Namespace, Name: ingressName}, ingress); err != nil {
return reconcile.Result{}, fmt.Errorf("failed to get dnsrecord for record %s: %v", record.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a deliberate design choice not to handle orphaned DNSRecord resources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that this state implies the dnsrecord itself an orphan. It usually means the resource was deleted and due to timing we have a benign leftover entry to flush from the controller queue.

Also, how would the resource have been deleted from storage without having passed through finalization in delete()? So I'm also not sure there's any orphan in the cloud provider, either.

Help me understand the concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a possible state that we need to consider: a DNSRecord exists with no corresponding IngressController. If you want to say that we will do nothing to try to recover from this state because the only ways to get there involve, say, etcd corruption or user tinkering, then that is fine, but I want to make sure we don't simply ignore the case.

Copy link
Contributor Author

@ironcladlou ironcladlou Jun 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I tried to split the difference for now. I handle not found by logging a specific message, added a TODO, and made the retry far less aggressive in this case. For other errors, I retry somewhat more aggressively since those are likely transient.

@ironcladlou
Copy link
Contributor Author

@Miciah PTAL. I think all feedback is addressed

@danehans
Copy link
Contributor

@ironcladlou here is the status of the default wildcard DNSRecord when the record already exists:

    - conditions:
      - lastTransitionTime: 2019-06-18T16:10:28Z
        message: "The DNS provider failed to ensure the record: failed to update alias
          in zone Z3URY6TWQ91KVV: couldn't update DNS record in zone Z3URY6TWQ91KVV:
          InvalidChangeBatch: [RRSet of type A with DNS name \\052.apps.dhansen.devcluster.openshift.com.
          is not permitted because a conflicting RRSet of type  CNAME with the same
          DNS name already exists in zone devcluster.openshift.com.]\n\tstatus code:
          400, request id: 96a43bad-91e3-11e9-898b-19a2d7b35b17"
        reason: ProviderError
        status: "True"
        type: Failed

Do you feel this DNSRecord status condition is accurate or should this type of error indicate a False status?

@danehans
Copy link
Contributor

@ironcladlou I also see the operator continuing to reconcile the default IngressController, even after the resource has been successfully reconciled:

2019-06-18T09:18:06.963-0700	INFO	operator.controller	controller/controller.go:117	reconciling	{"request": "openshift-ingress-operator/default"}
2019-06-18T09:18:07.958-0700	DEBUG	operator.init.controller-runtime.controller	controller/controller.go:236	Successfully Reconciled	{"controller": "ingress_controller", "request": "openshift-ingress-operator/default"}
2019-06-18T09:18:07.958-0700	INFO	operator.controller	controller/controller.go:117	reconciling	{"request": "openshift-ingress-operator/default"}
2019-06-18T09:18:08.929-0700	DEBUG	operator.init.controller-runtime.controller	controller/controller.go:236	Successfully Reconciled	{"controller": "ingress_controller", "request": "openshift-ingress-operator/default"}
...

@danehans
Copy link
Contributor

It appears #251 (comment) and #251 (comment) are related. Reconciliation for the default IngressController stopped at Successfully Reconciled {"controller": "ingress_controller", "request": "openshift-ingress-operator/default"} after I deleted the default IngressController, manually removed the associated wildcard record from the public hosted zone and restarted the operator.

@danehans
Copy link
Contributor

@ironcladlou after deleting the default IngressController, the associated DNSRecord resource is deleted. However, the wildcard records still exist in the public and private hosted zones. Is this expected behavior?

if err != nil {
errs = append(errs, err)
} else {
log.Info("provider deleted record %s in zone %v", record.Name, zone)
Copy link
Contributor

@danehans danehans Jun 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe L154 should be something like:

log.Info("provider deleted record", "name", record.Name, "zone", zone)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@danehans
Copy link
Contributor

@ironcladlou Here is the default-wildcard DNSRecord:

spec:
  dnsName: '*.apps.dhansen.devcluster.openshift.com'
  recordType: CNAME
  targets:
  - aaf2e0f3691ea11e98be2068844bff81-651064175.us-west-2.elb.amazonaws.com
status:
  zones:
  - dnsZone:
      tags:
        Name: dhansen-z9lgn-int
        kubernetes.io/cluster/dhansen-z9lgn: owned
  - dnsZone:
      id: Z3URY6TWQ91KVV

A couple things come to mind. First, the spec indicates a CNAME record type but an Alias record is created since AWS is the configured provider. This may cause confusion. Secondly, the status provides zone details, but no provider information. IMO it's beneficial to indicate the provider associated with the zones to provide a complete picture. Thoughts?

//
// TODO: If .status.loadbalancer.ingress is processed once as non-empty and then
// later becomes empty, what should we do? Currently we'll treat it as an intent
// to not have a desired record.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

desiredWildcardRecord will return nil, but ensureWildcardDNSRecord will not delete the current record if desired is nil, which is what I was getting at when I suggested adding a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete is deliberately not handled during ensure in this case, so that part... I think mutations of .status.ingress ingress are a larger topic at this point. Unless we compute desired from a persisted "last observed" state I think there are probably more than one of these edge cases we don't coherently account for...

@danehans
Copy link
Contributor

@ironcladlou disregard #251 (comment). I no longer see the records in Route53 after deleting the default IngressController. The behavior in #251 (comment) may have been related to the other comments I provided.

@ironcladlou
Copy link
Contributor Author

A couple things come to mind. First, the spec indicates a CNAME record type but an Alias record is created since AWS is the configured provider. This may cause confusion.

Agree — not sure yet how to resolve it in the abstract. The fact we're using an A+Alias in Route53 is a provider detail, and one we aren't exposing any configuration around, so maybe it's a doc gap in the meantime? I wonder if this api should be in an alpha or beta group or something?

Secondly, the status provides zone details, but no provider information. IMO it's beneficial to indicate the provider associated with the zones to provide a complete picture. Thoughts?

Agree with this, but would like to address it in a followup if that's okay since we don't actually need it yet.

}
ingress := &operatorv1.IngressController{}
if err := r.cache.Get(context.TODO(), types.NamespacedName{Namespace: record.Namespace, Name: ingressName}, ingress); err != nil {
return reconcile.Result{}, fmt.Errorf("failed to get dnsrecord for record %s: %v", record.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate problem I failed to notice before: failed to get dnsrecord should be failed to get ingresscontroller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if len(ingress.Hostname) > 0 {
recordType = iov1.CNAMERecordType
target = ingress.Hostname
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the service have .Status.LoadBalancer.Ingress[0] with empty .Hostname and .IP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, there are a lot of unresolved Q's about that status field and our handling of them... but the environments we support are predictable enough to support our current assumptions, so I'm not really focused on solving them in this PR

}
}

func (r *reconciler) currentWilcardDNSRecord(ic *operatorv1.IngressController) (*iov1.DNSRecord, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currentWilcardDNSRecordcurrentWildcardDNSRecord.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@ironcladlou
Copy link
Contributor Author

Anything substantial left blocking merge?

@Miciah
Copy link
Contributor

Miciah commented Jun 18, 2019

Anything substantial left blocking merge?

Are we possibly going to want to extricate dns_controller from the operator, and its reconciliation logic from IngressController? If so, will need to have a plan for migrating DNSRecord, for example to add zones to its spec (which we should be able to achieve with a one-off migration on start-up).

@ironcladlou
Copy link
Contributor Author

Are we possibly going to want to extricate dns_controller from the operator, and its reconciliation logic from IngressController? If so, will need to have a plan for migrating DNSRecord, for example to add zones to its spec (which we should be able to achieve with a one-off migration on start-up).

Maybe? Pretty far out in any case

@ironcladlou
Copy link
Contributor Author

Going to retest to see if the scheduling issue is a flake — given the unexplained co-location I observed on my new cluster yesterday I'm getting nervous something is wrong there.

@ironcladlou
Copy link
Contributor Author

/retest

@Miciah
Copy link
Contributor

Miciah commented Jun 18, 2019

Looks like the ingresscontroller deployment's pods weren't being scheduled, check out the scheduler events complaining about affinity rules... very suspicious

Also, the operator's container is inexplicably killed at 2019-06-18T21:51:02Z.

@Miciah
Copy link
Contributor

Miciah commented Jun 18, 2019

Actually, it's the operand's container that is getting killed. I think I'm confused because Prow is doubling the test output and injecting misleading timestamps.

@ironcladlou
Copy link
Contributor Author

Okay, looks like a flake after all. Seems unlikely to be related to anything going on in this PR.

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status

// DNSRecord represents a DNS record.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a brief statement such as the following? "This resource kind is intended for internal use by the operator only."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to run this entire thing by API folks before code freeze. I'm not even sure if a v1 version is appropriate. We can tweak it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add something about how it is unstable too. The idea is that anyone looking at the resource using oc explain or OpenShift Console would at least have some idea of what the thing is and know not to mess with it.

failedZones := []configv1.DNSZone{}
for _, zone := range wildcardRecord.Status.Zones {
for _, cond := range zone.Conditions {
if cond.Type == iov1.DNSRecordFailedConditionType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need the following: && cond.Status == string(operatorv1.ConditionTrue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

conditions := []iov1.DNSZoneCondition{}
err := r.dnsProvider.Ensure(record, zone)
if err != nil {
conditions = append(conditions, iov1.DNSZoneCondition{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we decide to set Failed=False when err is nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we explicitly decided. If we publish True when there's a problem, does that imply we must publish False when there's not a problem? There's no internal reason we need the condition when things are normal, not sure what the conventions are here. We can change in a followup.

return nil, fmt.Errorf("failed to create dnsrecord %s/%s: %v", desired.Namespace, desired.Name, err)
}
log.Info("created dnsrecord", "dnsrecord", desired)
return desired, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we can do without update handling. I would like to test the following on AWS:

  1. Scale the default ingress controller to 1.
  2. Look up the AZ of the sole ingress controller pod.
  3. Kill ingress controller pods till one appears in a different AZ, which should cause the ELB's DNS name to change.
  4. Check whether the wildcard DNS name still resolves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I performed the test, and the ELB's DNS name does not change; only the address to which it resolves changes, and because we use an alias on AWS, the wildcard DNS record likewise changes over to the new address without requiring an update by the operator.

},
{
description: "private only A",
description: "no ingresses",
domain: "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case should have a non-empty value for domain because an empty value also causes a nil result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

},
{
description: "global ALIAS",
description: "hostname to ALIAS record",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALIASCNAME.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

recorder record.EventRecorder
operatorNamespace string
dnsProvider dns.Provider
dnsConfig *configv1.DNS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Half of these fields are not used:

  • recorder is not used (but maybe it should be).
  • operatorNamespace is not used and probably should be deleted.
  • dnsConfig is not used and should be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@ironcladlou
Copy link
Contributor Author

/retest

@Miciah
Copy link
Contributor

Miciah commented Jun 19, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 19, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ironcladlou, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit a140e2d into openshift:master Jun 19, 2019
@DanyC97
Copy link

DanyC97 commented Jun 22, 2019

@ironcladlou qq if you don't mind - is this going to work for UPI too on non-AWS env?

@ironcladlou
Copy link
Contributor Author

@DanyC97

qq if you don't mind - is this going to work for UPI too on non-AWS env?

The managed wildcard DNS record remains supported today only on AWS and Azure IPI.

This change is a functionally transparent internal refactor and doesn't imply any change in our DNS management capabilities as expressed by the public API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants