-
Notifications
You must be signed in to change notification settings - Fork 33
[CI/Infra] E2E tests on Argo Workflows #737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
64eb38f
test 1
okdas 9484791
test
okdas 86c92c8
try token
okdas bd02e3f
try to submit
okdas 8b8ac77
use GITHUB_WORKSPACE
okdas 593ee08
troubleshoot
okdas e5951ab
troubleshoot
okdas 192c154
test this
okdas bd8eeaf
try this
okdas cc9c0f0
added perms
okdas e4bdc03
Empty-Commit
okdas 6238b95
Empty-Commit
okdas 674055c
put it all together
okdas 271f321
try without artifacts
okdas acb9d29
safedump
okdas 9c489f2
add cluster manager sts pods kill
okdas 3dc987c
time limit wait-for-infra
okdas f2aae81
fix linting errors
okdas 0036d05
move into the separate CI
okdas 9ade326
fix spell mistake
okdas 79d691f
Merge branch 'main' into e2e-automation
okdas 5207af2
Update build/localnet/cluster-manager/sts_kill.go
okdas 3bef80c
Update build/localnet/cluster-manager/sts_kill.go
okdas a13b2b3
Update build/localnet/cluster-manager/sts_kill.go
okdas 39f07c2
Update build/localnet/cluster-manager/sts_kill.go
okdas 293f5d7
requested changes
okdas 385d7a3
Merge remote-tracking branch 'origin/main' into e2e-automation
okdas 30b7557
bump the date
okdas e1a66e9
remove unused sa
okdas 6de718c
update pointers
okdas 5b0da95
Update .github/workflows/e2e-test.yml
okdas 6bc8ee2
requested changes
okdas 5f8da74
[CI] Add inline error check linter (#770)
okdas 535246f
nump changelog
okdas 1e6aad9
Merge remote-tracking branch 'origin/main' into e2e-automation
okdas File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| name: E2E test on DevNet | ||
|
|
||
| # Only trigger, when the build workflow succeeded, and allow manual triggering. | ||
| on: | ||
| workflow_dispatch: | ||
| workflow_run: | ||
| workflows: ["Test, build and push artifacts"] | ||
| types: | ||
| - completed | ||
|
|
||
| jobs: | ||
| e2e-tests: | ||
| runs-on: ubuntu-latest | ||
| if: contains(github.event.pull_request.labels.*.name, 'e2e-devnet-test') | ||
| env: | ||
| ARGO_SERVER: "workflows.dev-us-east4-1.poktnodes.network:8443" | ||
| ARGO_HTTP1: true | ||
| ARGO_SECURE: true | ||
| permissions: | ||
| contents: "read" | ||
| id-token: "write" | ||
|
|
||
| steps: | ||
| - id: "auth" | ||
| uses: "google-github-actions/auth@v1" | ||
| with: | ||
| credentials_json: "${{ secrets.ARGO_WORKFLOW_EXTERNAL }}" | ||
|
|
||
| - id: "get-credentials" | ||
| uses: "google-github-actions/get-gke-credentials@v1" | ||
| with: | ||
| cluster_name: "nodes-gcp-dev-us-east4-1" | ||
| location: "us-east4" | ||
|
|
||
| - id: "install-argo" | ||
Olshansk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| run: | | ||
| curl -sLO https://github.com/argoproj/argo-workflows/releases/download/v3.4.7/argo-linux-amd64.gz | ||
| gunzip argo-linux-amd64.gz | ||
| chmod +x argo-linux-amd64 | ||
| mv ./argo-linux-amd64 /usr/local/bin/argo | ||
| argo version | ||
|
|
||
| - id: "wait-for-infra" | ||
| shell: bash | ||
| run: | | ||
| start_time=$(date +%s) # store current time | ||
| timeout=900 # 15 minute timeout in seconds | ||
|
|
||
| until argo template get dev-e2e-tests --namespace=devnet-issue-${{ github.event.pull_request.number }}; do | ||
| current_time=$(date +%s) | ||
| elapsed_time=$(( current_time - start_time )) | ||
| if (( elapsed_time > timeout )); then | ||
| echo "Timeout of $timeout seconds reached. Exiting..." | ||
| exit 1 | ||
| fi | ||
| echo "Waiting for devnet-issue-${{ github.event.pull_request.number }} to be provisioned..." | ||
| sleep 5 | ||
| done | ||
|
|
||
| - id: "run-e2e-tests" | ||
| run: | | ||
| argo submit --wait --log --namespace devnet-issue-${{ github.event.pull_request.number }} --from 'wftmpl/dev-e2e-tests' --parameter gitsha="${{ github.event.pull_request.head.sha }}" | ||
Olshansk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,137 @@ | ||
| package main | ||
|
|
||
| // Monitors Pods created by StatefulSets, and if the Pods are in a `CrashLoopBackOff` status, | ||
| // and they have a different image tag - kill them. StatefulSet would then recreate the Pod with a new image. | ||
|
|
||
| import ( | ||
| "context" | ||
| "errors" | ||
| "strings" | ||
|
|
||
| pocketk8s "github.com/pokt-network/pocket/shared/k8s" | ||
| corev1 "k8s.io/api/core/v1" | ||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| watch "k8s.io/apimachinery/pkg/watch" | ||
| "k8s.io/client-go/kubernetes" | ||
| appstypedv1 "k8s.io/client-go/kubernetes/typed/apps/v1" | ||
| coretypedv1 "k8s.io/client-go/kubernetes/typed/core/v1" | ||
| ) | ||
|
|
||
| // Loop through existing pods and set up a watch for new Pods so we don't hit Kubernetes API all the time | ||
| // This is a blocking function, intended for running in a goroutine | ||
| func initCrashedPodsDeleter(client *kubernetes.Clientset) { | ||
okdas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| stsClient := client.AppsV1().StatefulSets(pocketk8s.CurrentNamespace) | ||
| podClient := client.CoreV1().Pods(pocketk8s.CurrentNamespace) | ||
|
|
||
| // Loop through all existing Pods and delete the ones that are in CrashLoopBackOff status | ||
Olshansk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| podList, err := podClient.List(context.TODO(), metav1.ListOptions{}) | ||
| if err != nil { | ||
| logger.Error().Err(err).Msg("error listing pods on init") | ||
| } | ||
|
|
||
| for i := range podList.Items { | ||
| pod := &podList.Items[i] | ||
| if err := deleteCrashedPods(pod, stsClient, podClient); err != nil { | ||
| logger.Error().Err(err).Msg("error deleting crashed pod on init") | ||
| } | ||
| } | ||
|
|
||
| // Set up a watch for new Pods | ||
| w, err := podClient.Watch(context.TODO(), metav1.ListOptions{}) | ||
| if err != nil { | ||
| logger.Error().Err(err).Msg("error setting up watch for new pods") | ||
| } | ||
| for event := range w.ResultChan() { | ||
| switch event.Type { | ||
| case watch.Added, watch.Modified: | ||
| pod, ok := event.Object.(*corev1.Pod) | ||
| if !ok { | ||
| logger.Error().Msg("error casting pod on watch") | ||
| continue | ||
| } | ||
|
|
||
| if err := deleteCrashedPods(pod, stsClient, podClient); err != nil { | ||
| logger.Error().Err(err).Msg("error deleting crashed pod on watch") | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| func isContainerStatusErroneous(status *corev1.ContainerStatus) bool { | ||
| return status.State.Waiting != nil && | ||
| (strings.HasPrefix(status.State.Waiting.Reason, "Err") || | ||
| strings.HasSuffix(status.State.Waiting.Reason, "BackOff")) | ||
| } | ||
|
|
||
| func deleteCrashedPods( | ||
| pod *corev1.Pod, | ||
| stsClient appstypedv1.StatefulSetInterface, | ||
| podClient coretypedv1.PodInterface, | ||
| ) error { | ||
| // If annotation is present, we monitor the Pod | ||
| containerToMonitor, ok := pod.Annotations["cluster-manager-delete-on-crash-container"] | ||
| if !ok { | ||
| return nil | ||
| } | ||
|
|
||
| for ci := range pod.Spec.Containers { | ||
| podContainer := &pod.Spec.Containers[ci] | ||
|
|
||
| // Only proceed if container is the one we monitor | ||
| if podContainer.Name != containerToMonitor { | ||
| continue | ||
| } | ||
|
|
||
| for si := range pod.Status.ContainerStatuses { | ||
| containerStatus := &pod.Status.ContainerStatuses[si] | ||
|
|
||
| // Only proceed if container is in some sort of Err status | ||
| if !isContainerStatusErroneous(containerStatus) { | ||
| continue | ||
| } | ||
|
|
||
| // Get StatefulSet that created the Pod | ||
| var stsName string | ||
| for _, ownerRef := range pod.OwnerReferences { | ||
| if ownerRef.Kind == "StatefulSet" { | ||
| stsName = ownerRef.Name | ||
| break | ||
| } | ||
| } | ||
|
|
||
| if stsName == "" { | ||
| return errors.New("no StatefulSet found for this pod") | ||
| } | ||
|
|
||
| sts, err := stsClient.Get(context.TODO(), stsName, metav1.GetOptions{}) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| // Loop through all containers in the StatefulSet and find the one we monitor | ||
| for sci := range sts.Spec.Template.Spec.Containers { | ||
| stsContainer := &sts.Spec.Template.Spec.Containers[sci] | ||
| if stsContainer.Name != containerToMonitor { | ||
| continue | ||
| } | ||
|
|
||
| // If images are different, delete the Pod | ||
| if stsContainer.Image != podContainer.Image { | ||
| deletePolicy := metav1.DeletePropagationForeground | ||
|
|
||
| if err := podClient.Delete(context.TODO(), pod.Name, metav1.DeleteOptions{ | ||
| PropagationPolicy: &deletePolicy, | ||
| }); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| logger.Info().Str("pod", pod.Name).Msg("deleted crashed pod") | ||
| } else { | ||
| logger.Info().Str("pod", pod.Name).Msg("pod crashed, but image is the same, not deleting") | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return nil | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.