Editorial

January 20, 2025

From Cloud to Metal: €100k+ Cost Reduction with Cloud-Agnostic Kubernetes Migration

Gerrit Schumann

Modern businesses live and breathe the cloud. The public cloud promises scalability and ease, but often at a steep cost. One of our customers, a fast-growing company developing supply chain management solutions, was on that track. As bills began growing past €10,000 a month, they considered a shift – toward leaner, more efficient operations without sacrificing performance. Using the mogenius solution to streamline development workflows in Kubernetes on a public cloud, they turned to us with a new challenge: How can we migrate from hyperscaler to an agile cloud provider or bare metal setup, with substantial savings and operational efficiencies? And can we be fully cloud-agnostic?

‍

Given this challenge, there were several critical aspects to consider:

Preparation time should be minimal.
The maximum downtime during migration should be less than 1 hour.
The developer experience should remain the same, with a fully automated workflow.

‍

With tools like K3S, GitOps, Ansible and the mogenius operator, we set out to cut their costs drastically and gain operational advantages, such as streamlined developer experience, improved disaster recovery and more independence from vendor lock-in.

Let’s break it down step by step.

‍

The Journey: A Cloud-Agnostic Kubernetes Migration for Cost Reduction

Step 1: Setting the Scene - Cloud Architecture & Initial Kubernetes Setup

The architecture was a symphony of interconnected parts:

Microservice Architecture: At its core was the public cloud provider’s managed Kubernetes services, consisting of three clusters, orchestrating 150 pods through 12 controllers, encompassing stateful sets, deployments, and daemon sets. Each node was a critical element in the group of services, supporting authentication, caching, APIs, and real-time web-eventing through web sockets.
Standard Setup: They had followed the cloud provider’s recommended framework with three clusters, each containing three nodes – main and worker nodes, equipped with 16 cores and 64 GB RAM each. This was augmented by standard load balancers and container registries. Initially, our customer had enabled node auto-scaling, but the complications outweighed the benefits, leading them to deactivate it.

‍

Step 2: The Decision to Change - Moving to a Cloud-Agnostic Architecture for Kubernetes

Our exploration for a cost-effective alternative led us to a hosting provider, where running a similar architecture would cost a fraction but had no tooling for anstraction and automation. Could we get similar performance with reduced resources and budget?

‍

Step 3: The Toolkit - Preparing for Migration

Equipped with a robust toolkit including best open-source DevOps tools like Ansible and K3s Kubernetes, we laid the groundwork for the migration. The decision was to use K3s on bare metal, which meant provisioning servers, setting up networking, running K3s, installing the mogenius operator, and deploying the application services.

‍

Ansible:

The backbone of our migration, it managed infrastructure provisioning with Terraform and the deployment of the mogenius operator using Helm. Our Ansible scripts were housed in a repository, featuring distinct roles for each component. This allowed us to selectively deploy and manage tools across diverse K3s environments. Special cluster administration tools were installed solely on system nodes – Ansible handled the decision-making, optimizing node-specific installations.

‍

Modular Ansible Scripts:

The scripts were flexible, enabling us to execute only specific roles or playbooks as needed – ideal for tool upgrades without full infrastructure redeployment. Scaling easy with Ansible's capability to add new nodes on the fly by simply running a single command.

‍

Security and Accessibility:

Keys essential to the operations, hosted in the repository, were highly encrypted. Before deployment, decryption was necessary, and access was managed via One Password. This setup empowered any team member to deploy the complete system globally, even theoretically from a personal laptop, by using Cloudflare to adjust IP associations. The versatility extended to deployments across multiple infrastructure services.

‍

Step 4: Charting a New Course - Infrastructure Provisioning

Transitioning to bare metal involved calculated resource allocation:

Resource Efficiency: We initiated with three nodes per cluster, following Kubernetes best practices while right-sizing each node's resources to 8 cores and 32 GB RAM – a modest yet effective configuration.
Private Network Configuration: Ansible automated subnet creation within the hosting provider, establishing an internal communication network akin to Virtual Private Clouds (VPCs) used by major providers. This setup ensured nodes communicated securely, isolated from external internet threats, with each node possessing a public IP.

‍

Step 5: Stepping Out - Migrating Applications and Databases

Deploying applications successfully required continuity and strategy:

Application Deployment: With the mogenius operator, migrating the applications was the easiest part. The operator, installed on both the customer’s public cloud and the new target environment, automatically synced all deployed services from the public cloud to the new environment, ensuring structural integrity and service reliability were maintained. At the same time, the full developer experience remained the same, with full visibility for development teams. Their workflows and project workspaces did not change, and they had full access to monitoring and deployment functions from the start.
Database Strategy: In a strategic twist, the database remained on the cloud provider to leverage its stability and facilitate quick, cross-provider transitions if needed. This separation balanced performance and redundancy.

‍

Step 6: The DNS Setup - Managing Traffic

Navigating traffic was pivotal to a successful migration:

DNS Management with Cloudflare: Leveraging Cloudflare, our Ansible scripts automatically updated DNS configurations, redirecting traffic to our new servers without missing a beat. Multiple hostnames were seamlessly transitioned, ensuring uninterrupted service continuity.

‍

The Results

The whole project took a mere three weeks to prepare and complete. The actual migration process took only 20 minutes, well below the target. And with impressive results:

‍

1. Over 90% Cost Savings

Before: Operating a microservices-based architecture on the public cloud provider came at €10,000 monthly costs for infrastructure services. They provide excellent scalability, but it comes with a price for each managed service: CPU, memory, load balancing, and even the container registry.
After: With the new hosting partner, the cost dropped to less than €1,000 per month

This represents savings of 90%, proving that a self-managed hardware solution is often far more cost-effective for the right use case than fully managed cloud platforms.

‍

2. Improved Performance: 70ms Faster Response Times

Speed is king in the modern software landscape. The migration led to a consistent 70ms improvement in response times.

‍

Why it’s faster:

Reduced Latency in Internal Networks: With fewer subsystems specific to the cloud provider and routing overhead, internal service-to-service communication is faster.
Simpler Networking Topology: The move simplified traffic routing, reducing bottlenecks.
More Proximity in Backend Processes: By managing the nodes directly, we trimmed delays caused by overly complex cloud-native traffic management layers.

This faster response time translates to smoother app performance, better user experience, and improved trust in the platform’s responsiveness.

‍

3. Improved Control and Flexibility

With the hyperscaler, our customer relied heavily on the provider’s ecosystem, which handled many services behind the scenes – but at the cost of reduced control. Now they gained complete control over their infrastructure.

‍

Key upgrades include:

Self-managed Kubernetes with K3S + mogenius operator: Kubernetes is at the heart of mogenius’ microservices architecture. By switching to K3S + mogenius , they now control deployment pipelines, scaling policies, and workloads.
Infrastructure-as-Code: They adopted Ansible, Terraform, and Helm for infrastructure automation and deployment. These tools enable granular, reusable configuration management and scalable deployments.
Freedom from Proprietary Tools: Many services (e.g., Load Balancer, Container Registry) were proprietary. Now, open-source and standardized tools replaced these, ensuring portability and flexibility.

This transition makes our customer less dependent on any vendor’s ecosystem and more adaptable to business or technical changes.

‍

4. Disaster Recovery in Under 5 Minutes

One of the most impressive improvements was in disaster recovery (DR). Our customer can now restore its infrastructure in less than 5 minutes, regardless of location.

‍

How did we achieve this?

GitOps for Version-Controlled Deployments: Infrastructure configurations are stored in Git, enabling the entire system (from server provisioning to service deployments) to be rebuilt from scratch with precise consistency.
Highly Modular Setup: By designing its architecture modularly, we eliminated barriers that often slow down recovery efforts.
Cloudflare for High-Speed DNS Propagation: Cloudflare’s proxy DNS service does not hinder disaster recovery efforts by slow DNS record updates.
Encrypted Backups Across Locations: All database and infrastructure backups are stored securely in various physical and cloud locations, allowing rapid recovery without data loss.

‍

What Happens When Production Nodes Fail?

Dealing with the failure of production nodes is one of the most critical scenarios to prepare for when running a self-managed infrastructure. With the right disaster recovery and automation strategies, failures can be mitigated swiftly and without much downtime.

‍

Handling Node Failures

Automated Redeployment via GitOps: The infrastructure relies on a GitOps approach where all configurations (e.g., Kubernetes manifests, Helm charts) are stored in version control. If a node fails, GitOps ensures rapid redeployment of all services onto a healthy replacement node. This eliminates manual intervention and minimizes downtime.
Load Balancer Redistribution: With bare-metal servers, global and local load balancers are configured to automatically redirect traffic away from the failed node to the remaining healthy nodes. This ensures a seamless user experience during failover.
DNS Strategy via Cloudflare: We used Cloudflare’s DNS service, which propagates IP changes rapidly. DNS updates can point traffic toward infrastructure at an alternative location if an entire site or region becomes unavailable.
Node Monitoring and Alerts: Tools like Prometheus and Grafana continuously monitor node health and trigger alerts for anomalies. Automatic actions (e.g., spinning up a replacement node) can be executed within minutes.

‍

5. Independence and Portability: A Vendor-Agnostic Setup

With the migration, we achieved a vendor-independent, easily portable infrastructure.

Provider-specific systems like load balancers required configuration migration to other providers and proprietary tools for networking were deeply integrated into their backends. The Container Registry locked container storage into their specific ecosystem.

These dependencies made moving to another provider costly and time-consuming.

‍

What makes it more flexible now?

Platform-Agnostic Tools: We utilized universal technologies such as K3S, Terraform, and Helm, which work seamlessly across any provider.
No Vendor-Specific Services: Core services like DNS (via Cloudflare), load balancers, and backups are entirely independent of the hosting provider, making future migrations simple.
Ease of Reproducibility: With their entire system defined via code (Infrastructure-as-Code), mogenius can replicate their environment on any infrastructure, from AWS to Google Cloud, without vendor-imposed barriers.

This approach eliminates the dreaded vendor lock-in, making future transitions effortless.

‍

Overview and Conclusion

‍

Styled Table

Feature	Public cloud provider	Hosting Partner, Self-Managed
Cost	~ €10,000/month	~ €1,000/month
Compute Resources	Fully managed VMs, Managed Kubernetes Service	User-managed bare-metal servers
Auto-Scaling	Built-in (extra cost)	Custom configuration via Horizontal Pod Autoscaler
Monitoring	Provider Monitoring / Log Analytics (paid)	Open-source tools (e.g., Prometheus + Grafana)
Disaster Recovery	Additional cost for Backup/Site Recovery	Fully managed in-house with GitOps structure
Vendor-Lock-In Risk	High due to provider-specific services	Minimal

‍

We helped our customer achieve what many companies aim for: reduce their infrastructure spending by over 90%, boost performance, and gain full control over their infrastructure. By moving to bare-metal servers and embracing open-source technologies like K3S and GitOps, they now run a leaner, faster, and more independent operation.

‍

This success story exemplifies how companies can balance cost-efficiency, performance, and adaptability through thoughtful architecture design. It demonstrates that a self-managed approach, when executed effectively, can yield dramatic results, especially when compared to the limitations and costs of fully managed cloud solutions.

‍

For businesses rethinking cloud costs and operational structures, this migration presents a roadmap worth studying: Think lean, embrace open standards, and take back control of your infrastructure.

‍

3 Practical Tips for Planning a Migration

Successfully migrating from a fully managed cloud provider to a self-managed infrastructure requires careful planning and execution. Here are three essential tips to make the process smooth and efficient:

‍

1. Assess Your Application Architecture

‍

Not all workloads are equally suited for self-managed infrastructure. During your migration planning phase, evaluate which services can effectively run in a self-managed environment and which might be better retained as managed services in the cloud.

Key considerations:

Services suited for self-managed setups: Stateless microservices are a perfect fit because they can scale quickly using container orchestration tools like Kubernetes.
Caching services: Tools like Redis or Memcached can run efficiently on bare-metal servers, providing better control over performance tuning.
Web sockets and APIs: Services with predictable and consistent workloads are well-suited for self-managed environments.
Services better kept in the cloud: Databases: Self-managing databases like PostgreSQL or MySQL can add operational complexity (e.g., handling backups, replication, and scaling). Sticking to cloud-managed databases can reduce maintenance headaches - at least initially.

‍

Actionable Tip: Use a hybrid approach for non-critical components during testing (e.g., keep your database in the cloud while migrating microservices to minimize operational risks while achieving significant cost savings).

‍

2. Test in Parallel

To avoid downtime or service interruptions during migration, set up a parallel infrastructure in the target environment and conduct thorough testing before switching entirely.

‍

Steps to implement parallel testing:

Spin up nodes early: Use your infrastructure-as-code tools (e.g., Terraform, Ansible) to provision your target environment identical to your current setup.
Trigger service duplication: Clone your services and run them alongside the existing Azure instances. Use non-production traffic to test them and ensure they work as expected.
Configure shadow testing (if needed): Route a portion of live traffic to the parallel environment. Tools like Envoy or Traefik load balancers can help distribute traffic selectively to test the new infrastructure's performance.
Simulate failures: Use tools like Chaos Monkey to test how your new environment handles node or network failures.

Once confidence is established (e.g., full traffic testing, no performance or stability issues detected), you can easily flip the DNS switch to direct live traffic to the new infrastructure.

‍

3. Avoid Common Migration Pitfalls

Avoidable issues often trip up migrations. Here are the most critical pitfalls and how to avoid them:

Mismanaged Backups: Ensure all data backups are properly synced to the new infrastructure. Missing or inconsistent backups are one of the top causes of disruptions. Verify backup integrity by testing restores in the target environment before the final cutover.
Forgotten Access Permissions: Recreate RBAC (Role-Based Access Control) and IAM permissions in the new setup. Overlooking this can result in access issues for team members or even security breaches.
Overlooked Security Configurations: Misconfigured firewalls (e.g., exposing public IPs unintentionally) or not replicating SSL certificates to the new environment can create vulnerabilities. Use tools like HashiCorp Vault to migrate secrets and credentials securely.

‍

Early identification and mitigation of these pitfalls are critical for a smooth transition.

‍

FAQ

What makes K3s Kubernetes a good choice for migration efficiency?

K3s is a lightweight Kubernetes distribution designed for resource efficiency and quick deployment, making it ideal for migration to bare-metal servers. Its minimal resource requirements allow smaller nodes (e.g., 8 cores, 32GB RAM) to run Kubernetes clusters efficiently, reducing costs without sacrificing performance. It also simplifies cluster management with built-in automation for tools like Helm and Traefik.

How can GitOps streamline a cloud migration strategy?

GitOps enables version-controlled infrastructure and application deployments, ensuring consistency and repeatability during migration. With tools like ArgoCD or Flux, infrastructure as code (IaC) automates the setup of your new environment, while also allowing rapid recovery in case of failures. This approach reduces human error, minimizes downtime, and ensures every change is trackable via Git.

What are the best open-source DevOps tools for Kubernetes migration?

The top open-source DevOps tools for Kubernetes migration include:

‍

K3s: For lightweight Kubernetes clusters.
‍Ansible: To automate infrastructure provisioning and management.
Helm: For simplifying Kubernetes application deployments.
Prometheus & Grafana: For monitoring and performance optimization. These tools are versatile, cost-effective, and provide all the flexibility needed for a vendor-agnostic setup.

Interesting Reads

Best practices

Jan Lepsky

September 24, 2024

A guide to Kubernetes troubleshooting

Facing Kubernetes issues? This guide offers practical steps to fix k8s application failures, manage logs, and improve performance and security in your cluster.