Skip to content

Conversation

@QxBytes
Copy link
Contributor

@QxBytes QxBytes commented Dec 26, 2025

Reason for Change:

There is a known issue in the pipeline for cniv1 during ip allocation. A symptom of this is "Initializing HTTP client with connection timeout" showing up in the cni logs. This PR adds a script to check the contents of the logs for these known phrases and marks the stage as succeeded with warnings if so. If the phrase is not found but there is an error, we fail out as normal.

Additionally adds tolerations to the privileged pods so that they always are scheduled, even if cilium or other components add taints to the nodes.

Additionally moves cni/cns log collection steps to windows or linux specific scripts. The goal is that anyone can set their kubectx to a cluster, run the collection scripts with appropriate parameters and the logs will be downloaded automatically, even outside of pipeline environments.

The log checking script in the future may also be used to detect other known issues in the pipeline.

Issue Fixed:

See above

Requirements:

Notes:
Green: https://msazure.visualstudio.com/One/_build/results?buildId=147727074&view=results
Detect: https://msazure.visualstudio.com/One/_build/results?buildId=147893558&view=results

@QxBytes QxBytes self-assigned this Dec 26, 2025
@QxBytes QxBytes added the cni Related to CNI. label Dec 26, 2025
@QxBytes QxBytes requested a review from a team as a code owner December 26, 2025 22:19
Copilot AI review requested due to automatic review settings December 26, 2025 22:19
@QxBytes QxBytes added the ci Infra or tooling. label Dec 26, 2025
@QxBytes
Copy link
Contributor Author

QxBytes commented Dec 26, 2025

/azp run Azure Container Networking PR

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a known CNI v1 pipeline issue during IP allocation and improves log collection infrastructure. The changes introduce automated detection of known issues, enhance pod scheduling reliability, and refactor log collection into reusable scripts.

Key changes:

  • Adds tolerations to privileged DaemonSets to ensure scheduling on all nodes regardless of taints
  • Creates standalone log collection scripts for Linux and Windows that can be run both in pipelines and locally
  • Implements a warning handler job that checks for known error patterns in logs and marks stages as succeeded with issues when detected

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
test/integration/manifests/load/privileged-daemonset.yaml Adds broad toleration to ensure privileged pods schedule on all nodes
test/integration/manifests/load/privileged-daemonset-windows.yaml Adds broad toleration to Windows privileged pods
hack/scripts/collect-windows-logs.sh New reusable script for collecting Windows CNI/CNS logs
hack/scripts/collect-linux-logs.sh New reusable script for collecting Linux CNI/CNS logs
hack/scripts/check-cni-log-contents.sh New script to search logs for known issue patterns
.pipelines/templates/warning-handler-job-template.yaml New template for handling warnings when known issues are detected
.pipelines/templates/log-template.yaml Refactored to use new log collection scripts and added NNC description
.pipelines/singletenancy/aks/e2e-job-template.yaml Integrates warning handler for CNI v1 Linux jobs
.pipelines/singletenancy/azure-cni-overlay-stateless/azure-cni-overlay-stateless-e2e-step-template.yaml Adds verbose flag to datapath test

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@vipul-21
Copy link
Contributor

Approved, discussed offline about the comments. The issue only occurred in pipeline so far so we will be skipping it as it has been discussed with @tamilmani1989 as per @QxBytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Infra or tooling. cni Related to CNI.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants