Managing cloud infrastructure efficiently has become a critical challenge for modern DevOps teams. While Kubernetes provides powerful orchestration capabilities, it often requires manual intervention for routine operational tasks like scaling resources based on schedules or cleaning up temporary resources. This is where Kubernetes operators shine — they extend Kubernetes’ native automation capabilities to handle complex, application-specific operations.

In this article, we’ll explore what Kubernetes operators are, why they’re essential for production environments, and walk through the development and practical implementation of the CronJob-Scale-Down-Operator — a real-world solution that addresses resource lifecycle management through automated scaling and cleanup.

The Operator Pattern

The operator pattern follows a simple but powerful concept:

  1. Observe: Watch the current state of resources in the cluster
  2. Analyze: Compare the current state with the desired state
  3. Act: Take corrective actions to align current state with desired state
  4. Repeat: Continuously monitor and adjust

This control loop is the foundation of Kubernetes’ declarative model, and operators extend this pattern to application-specific scenarios.

Why Build Custom Operators?

While Kubernetes provides basic resource management through Deployments, Services, and ConfigMaps, many operational tasks require domain-specific logic:

  • Time-based operations: Scaling resources based on schedules, time zones, or business hours
  • Resource lifecycle management: Automatically cleaning up temporary resources created by CI/CD pipelines
  • Complex upgrade procedures: Managing stateful applications with specific upgrade sequences
  • Cross-resource coordination: Orchestrating multiple Kubernetes resources as a single unit

Generic solutions often fall short because they can’t encode the nuanced operational knowledge that each application requires.

The CronJob-Scale-Down-Operator: Solving Real Problems

The Challenge

During my experience managing Kubernetes environments across multiple teams and time zones, I identified two recurring operational challenges:

  1. Resource waste in non-production environments: Development and staging environments often run 24/7, consuming unnecessary compute resources during off-hours when no one is actively developing or testing.

  2. Resource accumulation from CI/CD pipelines: Continuous integration processes create temporary resources (ConfigMaps, Secrets, test deployments) that accumulate over time, leading to namespace pollution and increased costs.

Existing solutions were either too generic (basic CronJobs that couldn’t handle complex scenarios) or too specific (custom scripts that were hard to maintain and didn’t integrate well with Kubernetes’ declarative model).

The Solution Architecture

The CronJob-Scale-Down-Operator addresses these challenges through two main capabilities:

Time-based Scaling: Automatically scale Deployments and StatefulSets based on cron schedules, supporting multiple time zones for global teams.

Annotation-driven Cleanup: Remove resources marked with cleanup annotations after specified time periods, enabling self-managing temporary resources.

📦 GitHub Repository: cronschedules/cronjob-scale-down-operator

Building with Kubebuilder

I chose Kubebuilder as the development framework because it provides:

  • Scaffolding: Automated generation of operator boilerplate code
  • API Definition: Easy creation of Custom Resource Definitions (CRDs)
  • Controller Logic: Framework for implementing the reconciliation loop
  • Testing Support: Built-in testing utilities and patterns
  • Production Features: RBAC, webhooks, and deployment configurations

The development process involved defining the API structure, implementing the controller logic, and extensive testing across different scenarios.

Practical Implementation Examples

Let’s explore how the operator works in practice across different use cases:

Development Environment Cost Optimization

Development environments typically need resources only during business hours. The operator enables automatic scaling based on team schedules:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
apiVersion: cronschedules.elbazi.co/v1
kind: CronJobScaleDown
metadata:
  name: dev-environment-scaler
  namespace: development
spec:
  targetRef:
    name: web-application
    namespace: development
    kind: Deployment
    apiVersion: apps/v1
  scaleDownSchedule: "0 0 22 * * 1-5"  # 10 PM weekdays
  scaleUpSchedule: "0 0 8 * * 1-5"     # 8 AM weekdays  
  timeZone: "America/New_York"

This configuration automatically scales down development workloads outside business hours while ensuring they’re ready when developers arrive.

Weekend Environment Shutdown

For maximum resource efficiency, completely shut down non-production environments during weekends:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
apiVersion: cronschedules.elbazi.co/v1
kind: CronJobScaleDown
metadata:
  name: weekend-shutdown
  namespace: staging
spec:
  targetRef:
    name: staging-api
    namespace: staging
    kind: StatefulSet
    apiVersion: apps/v1
  scaleDownSchedule: "0 0 18 * * 5"    # Friday 6 PM
  scaleUpSchedule: "0 0 8 * * 1"       # Monday 8 AM
  timeZone: "UTC"

Multi-Timezone Team Coordination

Global teams require different scaling schedules per region:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# US East Coast schedule
apiVersion: cronschedules.elbazi.co/v1
kind: CronJobScaleDown
metadata:
  name: us-east-schedule
  namespace: development
spec:
  targetRef:
    name: api-us-east
    namespace: development
    kind: Deployment
    apiVersion: apps/v1
  scaleDownSchedule: "0 0 19 * * 1-5"  # 7 PM EST
  scaleUpSchedule: "0 0 7 * * 1-5"     # 7 AM EST
  timeZone: "America/New_York"
---
# European schedule
apiVersion: cronschedules.elbazi.co/v1
kind: CronJobScaleDown
metadata:
  name: eu-schedule
  namespace: development
spec:
  targetRef:
    name: api-eu
    namespace: development
    kind: Deployment
    apiVersion: apps/v1
  scaleDownSchedule: "0 0 18 * * 1-5"  # 6 PM CET
  scaleUpSchedule: "0 0 8 * * 1-5"     # 8 AM CET
  timeZone: "Europe/Paris"

Advanced Use Case: Cleanup-Only Mode

Beyond scaling, the operator provides standalone resource cleanup capabilities — particularly valuable for CI/CD pipeline management and test environment hygiene.

CI/CD Pipeline Resource Management

Continuous integration pipelines often create temporary resources that accumulate over time. The cleanup-only mode addresses this without requiring scaling targets:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: cronschedules.elbazi.co/v1
kind: CronJobScaleDown
metadata:
  name: ci-cleanup-controller
  namespace: default
spec:
  # No targetRef - pure cleanup mode
  cleanupSchedule: "0 */6 * * * *"  # Every 6 hours
  cleanupConfig:
    annotationKey: "ci.cleanup/after"
    resourceTypes:
      - "ConfigMap"
      - "Secret"
      - "Service"
      - "Deployment"
      - "Job"
    namespaces:
      - "ci-test"
      - "integration-test"
    labelSelector:
      created-by: "ci-pipeline"
      environment: "test"
    dryRun: false
  timeZone: "UTC"

Resource Annotation Patterns

Resources are marked for cleanup using flexible annotation formats:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# CI-generated ConfigMap with 2-hour expiration
apiVersion: v1
kind: ConfigMap
metadata:
  name: test-config-pr-1234
  namespace: ci-test
  labels:
    created-by: "ci-pipeline"
    pr-number: "1234"
  annotations:
    ci.cleanup/after: "2h"    # Clean up 2 hours after creation
data:
  test-config.json: |
    {"environment": "test", "pr": 1234}
---
# Integration test deployment with absolute cleanup time
apiVersion: apps/v1
kind: Deployment
metadata:
  name: integration-test-app
  namespace: integration-test
  annotations:
    ci.cleanup/after: "2024-12-31T23:59:59Z"  # Absolute cleanup time
spec:
  replicas: 1
  selector:
    matchLabels:
      app: integration-test-app
  template:
    metadata:
      labels:
        app: integration-test-app
    spec:
      containers:
      - name: test-app
        image: nginx:latest

Cleanup Time Format Options

The operator supports multiple time specification formats:

1
2
3
4
5
6
7
annotations:
  cleanup-after: "24h"                      # Duration: 24 hours after creation
  cleanup-after: "7d"                       # Duration: 7 days after creation  
  cleanup-after: "30m"                      # Duration: 30 minutes after creation
  cleanup-after: "2024-12-31T23:59:59Z"     # Absolute: RFC3339 timestamp
  cleanup-after: "2024-12-31"               # Absolute: Date (midnight UTC)
  cleanup-after: ""                         # Immediate: Next cleanup cycle

Combined Scaling and Cleanup

Production scenarios often benefit from both scaling and cleanup in a single resource:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: cronschedules.elbazi.co/v1
kind: CronJobScaleDown
metadata:
  name: comprehensive-management
  namespace: staging
spec:
  # Scaling configuration
  targetRef:
    name: staging-app
    namespace: staging
    kind: Deployment
    apiVersion: apps/v1
  scaleDownSchedule: "0 0 22 * * *"    # Scale down at 10 PM
  scaleUpSchedule: "0 0 6 * * *"       # Scale up at 6 AM
  
  # Cleanup configuration
  cleanupSchedule: "0 0 2 * * *"       # Clean up at 2 AM
  cleanupConfig:
    annotationKey: "staging.cleanup/after"
    resourceTypes:
      - "ConfigMap"
      - "Secret"
    labelSelector:
      app: "staging-app"
      temporary: "true"
    dryRun: false
  timeZone: "UTC"

Production Safety and Testing

Dry-Run Mode for Safety

Always validate cleanup operations before production deployment:

1
2
cleanupConfig:
  dryRun: true  # Only log what would be deleted, don't actually delete

When dry-run mode is enabled, the operator logs all matching resources without performing deletions, allowing verification of cleanup logic.

Testing Scaling Schedules

Use frequent schedules for quick validation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
apiVersion: cronschedules.elbazi.co/v1
kind: CronJobScaleDown
metadata:
  name: quick-test
  namespace: test
spec:
  targetRef:
    name: nginx-test
    namespace: test
    kind: Deployment
    apiVersion: apps/v1
  scaleDownSchedule: "0 */2 * * * *"  # Every 2 minutes
  scaleUpSchedule: "30 */2 * * * *"   # 30 seconds later
  timeZone: "UTC"

Installation and Configuration

Helm Installation

1
2
3
4
5
6
# Add the charts repository
helm repo add cronschedules https://cronschedules.github.io/charts
helm repo update

# Install the operator
helm install cronjob-scale-down-operator cronschedules/cronjob-scale-down-operator

Container Image Deployment

1
docker pull ghcr.io/cronschedules/cronjob-scale-down-operator:0.3.0

Quick Validation

1
2
3
4
5
6
7
8
9
# Create test deployment
kubectl apply -f examples/test-deployment.yaml

# Apply scaling schedule
kubectl apply -f examples/quick-test.yaml

# Monitor operations
kubectl get cronjobscaledown -w
kubectl get deployment nginx-test -w

Web Dashboard Access

The operator includes a built-in web dashboard:

Web UI Dashboard

1
2
3
4
kubectl port-forward -n cronjob-scale-down-operator-system \
  deployment/cronjob-scale-down-operator-controller-manager 8082:8082

# Access dashboard at http://localhost:8082

Schedule Configuration Reference

The operator uses 6-field cron expressions with second precision:

1
2
3
4
5
6
7
8
┌─────────────second (0 - 59)
│ ┌───────────── minute (0 - 59)
│ │ ┌───────────── hour (0 - 23)
│ │ │ ┌───────────── day of month (1 - 31)
│ │ │ │ ┌───────────── month (1 - 12)
│ │ │ │ │ ┌───────────── day of week (0 - 6) (0 = Sunday)
│ │ │ │ │ │
* * * * * *

Common Schedule Patterns

PatternDescription
"0 0 22 * * *"Every day at 10:00 PM
"0 0 6 * * 1-5"Weekdays at 6:00 AM
"0 0 18 * * 5"Every Friday at 6:00 PM
"0 0 0 * * 0"Every Sunday at midnight
"0 */6 * * * *"Every 6 minutes
"*/30 * * * * *"Every 30 seconds (testing)

Supported Resource Types

  • Deployments: Standard application scaling
  • StatefulSets: Database and stateful application scaling
  • Cleanup targets: ConfigMaps, Secrets, Services, Jobs, CronJobs, Ingress, PersistentVolumeClaims

Conclusion

Kubernetes operators provide a robust framework for enhancing Kubernetes automation capabilities to meet specific operational requirements of applications. The CronJob-Scale-Down-Operator demonstrates how operators can address practical issues related to resource efficiency and lifecycle management.

Essential takeaways from this implementation:

  • Operators tackle specific issues: The most effective operators focus on clearly identified operational problems instead of attempting to be all-encompassing solutions.
  • Developer experience is important: Tools such as Kubebuilder greatly simplify operator development, enabling teams to concentrate on business logic instead of repetitive code.
  • Safety in production is critical: Attributes such as dry-run mode and thorough monitoring are vital for safe operations in production settings.

Whether you want to optimize your cloud costs, improve operational efficiency, or better understand Kubernetes operator development, the methods and techniques covered here provide a solid foundation for creating production-ready automation solutions.


Resources: