How we rebuild a legacy monolithic platform at CERN and still sleep at night by Dmitry Kekelidze
(link)Summary
This talk describes how CERN is modernizing EDH, a 30-year-old monolithic system that runs critical administrative workflows such as HR, finance, approvals, and leave requests. Dmitry Kekelidze explains why a full rewrite was rejected, how the team moved from a tightly coupled monolith toward clearer domain ownership, and why they chose an incremental fork-and-migrate approach instead of a big-bang replacement. The presentation focuses on the operational and architectural pain points of the legacy platform: shared code paths, shared deployments, shared database access, and changes that can break unrelated domains. It then walks through the target direction: a forked codebase with a cleaner core exposed as versioned libraries, domain-owned back ends, a React front end, API boundaries, downstream testing, and governance around semantic versioning and coordinated migrations. The talk emphasizes trade-offs, team buy-in, and gradual migration as the practical path for a critical system that cannot afford downtime or large regressions.
Key Takeaways
- EDH is a mission-critical monolith at CERN that handles administrative workflows for HR, finance, operations, and approvals.
- A full rewrite was rejected because of integration risk, compliance concerns, and the cost of revalidating decades of behavior.
- The team’s target direction is clearer domain ownership, stable contracts, and reduced blast radius for changes.
- Instead of modularizing in place indefinitely, CERN moved toward forking the legacy system and migrating incrementally.
- The core is being extracted as versioned libraries rather than a standalone REST service, to minimize disruption and preserve existing contracts.
- Semantic versioning, coordinated major upgrades, and downstream compatibility tests are central to the governance model.
- Product teams are expected to take more responsibility for their own back ends, testing, and deployments.
- A React front end and API layers are being introduced while keeping the user experience unified where needed.
Sections
Legacy monolith and why it became fragile
EDH started as a small automation tool and grew over 30 years into the system behind most administrative processes at CERN. The application became a large monolith with tangled dependencies, shared core classes, and logic spread across core and domain code. Because many teams could modify shared code, changes in one area could affect unrelated workflows. The talk highlights a Spring 5 to Spring 6 upgrade that was expected to take two weeks but took more than two quarters because the team had to uncover and understand hidden dependencies first.
Operational coupling and blast radius
The system was not only coupled in code, but also in operations. All domains were deployed and rolled back together, so a bug in one document flow could take down everyone’s work. Because the application is stateful, rollbacks could invalidate sessions and interrupt ongoing user tasks. The shared database created another failure mode: a small mistake in one team’s access rule could block access to many unrelated records. The team needed a way to reduce the blast radius of changes without losing the reliability required for critical administrative workflows.
Why not rewrite from scratch
A clean rewrite was considered but rejected. EDH has decades of integrations with other CERN systems, and re-creating and revalidating all of them would be risky and expensive. A replacement system could easily introduce regressions in high-stakes workflows such as payments or administrative approvals. The team concluded that the safest path was to reuse the existing system’s behavior and modernize it gradually rather than restart from zero.
Chosen migration strategy: fork and evolve
The final approach is to fork the legacy codebase and clean up the fork incrementally while the old system remains in production. This lets the team define contracts, refactor the core, and port documents one by one without forcing an all-at-once migration. The fork can be validated using the existing documents and workflows as test cases. Once feature parity is reached for a given document or domain, that workflow can move to the new architecture and be decommissioned in the legacy system.
Target architecture and backend boundaries
The target model separates concerns more clearly. Product teams own domain-specific back ends, while the shared core is exposed through stable APIs and libraries. The workflow layer is separated from business logic, and domain schemas are owned by the corresponding back ends. The talk also notes a move from JSPs to a React front end. This lets CERN modernize the user interface while keeping the business logic concentrated in the back ends and reducing accidental coupling.
Why the core is being turned into a library
CERN considered exposing the core as a standalone REST service, but chose to extract it as a library first because that is the least disruptive path from the current monolith. As a library, the core keeps familiar Spring-based contracts and can be adopted without introducing a new service boundary immediately. The trade-off is that the library becomes a versioned framework, so teams must manage compatibility carefully.
Governance, testing, and versioning
To make the library model viable, the team introduced semantic versioning and a governance process for changes. Minor and patch releases remain backward compatible; major releases trigger a coordinated migration campaign with a defined support window for old versions. Testing is layered: core unit and component tests, plus downstream compatibility pipelines that run product-team tests against candidate core releases. QA is involved for major changes or when broader validation is needed.
Team buy-in and operating model changes
The architecture change also shifts accountability. In the monolith, the core team often handled incidents, upgrades, and firefighting for everyone. In the new model, product teams own their back ends and are responsible for their own upgrades and tests, while the core team provides support and helps with migrations. The talk stresses that this shift had to be negotiated with the teams; it could not simply be imposed. The overall goal is to scale the system and the organization without making the core team a permanent bottleneck.
Incremental rollout and lessons learned
The migration starts by freezing the stable legacy core, then cleaning up the fork and defining boundaries. CERN pilots one document flow end-to-end, validates it through testing and user feedback, and then moves additional documents one by one. The talk’s main lessons are that no architecture is permanent, that trial and error is necessary to find the right path, and that modernization of a long-lived platform depends as much on governance and team alignment as on code structure.
Keywords: cern edh modernization, legacy monolith refactoring, monolith to modular architecture, fork and migrate strategy, domain ownership, spring framework upgrade, spring 5 to spring 6 migration, library-based core extraction, semantic versioning for internal libraries, downstream compatibility testing, react frontend migration, jsp to react, shared database monolith, operational blast radius, workflow system architecture, backend boundaries, contract-first apis, incremental migration, technical governance, qa and test pipelines, enterprise content management system, stateful application rollback