The Clinical System Cutover Checklist (Built From What Goes Wrong at 2am)
Most cutover checklists miss the things that actually cause problems during live migration windows. This checklist is built from real go-live events — not from PRINCE2 templates or vendor documentation. The items here are the ones that surface at 2am, when the integration engine is running, the clinical team is waiting, and something is silently wrong.
Pre-Cutover: What Must Be True Before You Start
The mistakes that cause go-live failures are almost always made weeks before cutover night. By the time you are in the migration window, the decision space is narrow. These items must be resolved — not in progress, not tracked — before you declare readiness.
All interfaces have versioned specifications, not verbal agreements
Verbal agreements hold until the moment of pressure. During a live cutover, when an HL7 ADT message arrives with an unexpected segment or a field populated differently than expected, the integration developer needs a document they can point to — not a recollection of what was discussed in a meeting three months ago. Versioned interface specifications force precision. They also make accountability possible: if an endpoint delivers data in a format that contradicts the specification, that is a vendor defect, not an integration team problem. Without a document, it is always a conversation.
Every interface has a named technical owner and a named clinical owner
The classic failure mode on go-live night is an interface that is not working and nobody who owns it. The integration team says it is a clinical configuration issue. The clinical team says the data is not arriving. The vendor says their system is sending correctly. This loop runs for forty minutes until someone with authority makes a decision. Named ownership cuts this short. The technical owner understands the message flow. The clinical owner understands what the failure means for patient care. Both must be reachable during the cutover window.
Full cutover rehearsal completed at realistic data volume, not UAT data
UAT environments typically run with seeded test patients, low message volumes, and pre-warmed caches. Production environments run with years of patient history, concurrent message bursts during shift change, and database query patterns that have never been tested under load. A rehearsal that does not represent this tells you the choreography is roughly right but says nothing about whether the system will behave when the cardiology ward has fifty active patients generating order messages simultaneously. Volume differences matter for interface engines, database query performance, and third-party feed latency.
Rollback decision criteria documented and agreed — not just "if it goes wrong"
"If it goes wrong, we roll back" is not a plan. It is a statement of optimism that avoids the hard conversation. Rollback triggers must be specific: if lab result routing is not confirmed working within 45 minutes of go-live, rollback begins. If more than 15% of ADT messages fail validation in the first hour, rollback begins. If pharmacy integration cannot be confirmed within the cutover window, rollback begins. These numbers must be agreed before cutover week by the clinical lead, technical lead, and programme director. Under pressure at 3am, the trigger either exists in writing or it does not exist.
Go/no-go criteria signed off by clinical lead, technical lead, and programme director
Three signatures matter because they represent three different risk domains. The clinical lead is accountable for patient safety implications. The technical lead is accountable for system and integration readiness. The programme director is accountable for the delivery commitment. If any of the three cannot sign off, that is a genuine signal that readiness is not achieved. The most common failure here is the programme director signing off because the date is locked in, while the clinical lead's concerns are recorded as "noted" rather than resolved. Unresolved clinical concerns are not go/no-go criteria — they are deferred risk.
Consuming system readiness confirmed with written sign-off
"We're ready" said verbally in a readiness meeting is not confirmation. Verbal statements are made with incomplete information, under social pressure, and without individual accountability. Written sign-off requires the consuming system team to actually check: Is the connection string pointing at production? Is the certificate valid? Is the receiving service running? Are the correct staff monitoring the queue? These are questions that the act of writing the sign-off forces. Without it, you will discover on go-live night that the receiving system was still pointing at the test endpoint.
Downtime procedures tested and understood by clinical staff
Paper downtime procedures exist in every hospital. They are almost never tested in conditions that resemble real downtime. During a go-live window, the clinical system may be unavailable for hours. If the ward team has not actually printed, located, and used the paper forms in a simulated scenario, the gap between "procedure exists" and "procedure works" will show up when it matters most. A nurse who has never physically picked up a downtime form does not know where they are kept, how they are completed, or what happens to them after the system comes back. Test this, in person, with the actual staff who will be on shift during the cutover window.
The Cutover Window: Hour-by-Hour Control
The cutover window is where preparation meets reality. The structure of your command function during that window determines whether decisions get made cleanly or whether the team burns time on coordination overhead instead of problem-solving.
Command and communication structure
- Cutover coordinator with authority to make decisions — one person, not a committee. Every cutover that runs by committee consensus runs slowly. The coordinator must be empowered to make calls on deviations from plan, escalate issues, and — when the trigger is met — initiate rollback. This person does not need to be the most senior person in the room. They need to be trusted with the authority to act.
- Communication cadence: who gets updates, how often, through what channel. Define this before the window opens. Update cadence during a stable period (every 30 minutes to a key stakeholder group) is different from update cadence during an active incident (every 10 minutes, via a named channel). If the executive sponsor is receiving updates via a different channel than the technical team, information will diverge. One channel, one cadence, one owner.
- Decision log: every deviation from plan captured in real time. Decisions made under pressure at 2am are rarely remembered accurately in the morning. The decision log — even a shared document with timestamps — creates accountability and enables the post-go-live review to understand what actually happened, rather than what people think happened. Any time the team does something not in the runbook, it goes in the log.
- Escalation path: if X fails, who calls whom, in what order, with what authority. The escalation path must be written down and shared before the window opens. "We'll figure it out if something goes wrong" is not an escalation path. If the integration engine throws an unhandled error, who is the first call? If that person is unavailable, who is next? If pharmacy integration fails at 4am, who has the authority to authorise manual workaround procedures?
- "Canary" system checks: the first interfaces to test, chosen to give early signal of broader problems. Not all interfaces are equal as diagnostic signals. Choose your canary tests to be the ones that would surface systemic problems early — typically the interface with the highest message volume, the one most sensitive to routing configuration, or the one that touches the most downstream systems. If your canary tests pass, you have reasonable confidence that the configuration is broadly correct. If they fail, you have caught it before clinical activity begins.
- Rollback trigger: the specific time or condition that makes rollback non-negotiable. The rollback trigger is not the moment rollback begins — it is the moment the decision to roll back becomes non-negotiable. Once the trigger is met, the coordinator initiates rollback without further committee discussion. This is why the trigger must be agreed in advance. When it fires, the answer is already known.
Interface Readiness at Go-Live
Interface failures during go-live are not random. They cluster around the same categories in almost every programme. Check these in order — they are sequenced by clinical criticality, not alphabetically.
Lab result routing: first and most critical check after go-live
A lab result that routes to the wrong patient record, or that fails silently and never arrives, is a direct patient safety risk. Clinicians make treatment decisions based on lab results. If the ORU message routing is misconfigured, a critical potassium result can arrive in the wrong inbox or not arrive at all — and the system will show no error because the message was processed without rejection. The first check after go-live must be to verify that a known test result routes to the correct location in the new system. This is not a technical check. It requires a clinical staff member to confirm they can see the result in the right place.
ADT message flow: admit, discharge, transfer — verified before clinical activity starts
ADT messages are the patient movement backbone. If an A01 admit message does not reach the downstream systems correctly, those systems do not know the patient exists. Pharmacy will not find the patient for medication orders. The lab will not have the correct ward location for result delivery. Radiology will not have the correct encounter context. Verify ADT flow — not just that messages are being sent, but that downstream systems are correctly receiving and processing them — before any clinical activity begins on the new system.
Pharmacy integration: medication orders cannot be assumed to have arrived correctly
Pharmacy integration failures are insidious because the clinical workflow often continues — the prescriber enters the order, the system accepts it, and nothing visible fails. But if the ORM message to pharmacy has not arrived, or has arrived with incorrect encoding, the pharmacy team is working from incomplete or incorrect information. Verify medication order delivery end-to-end: from order entry in the new system, through the integration engine, to visible receipt in the pharmacy system. Check encoding of dose, route, frequency, and patient identifiers. Do not assume.
Radiology: report routing and order acknowledgement
Confirm that radiology orders from the new system are reaching the RIS correctly and that report results are routing back to the correct clinician inbox. Order acknowledgement (ORM/ORR flow) should be verified to confirm the PACS/RIS has accepted the order and the worklist is being populated correctly. Report routing failures in radiology can mean clinicians are checking paper or calling for results that should be arriving automatically.
Third-party vendor feeds: which external systems have confirmed they are receiving correctly
Third-party vendors — national services, private lab providers, external notification systems, regional data warehouses — must confirm receipt before you can mark the interface as live. "We sent the messages" is not the same as "the vendor confirmed they received and processed them correctly." Have a named contact at each external vendor who is available during the cutover window, and require written or logged confirmation before the interface is marked green.
Post-Go-Live: The First 72 Hours
The go-live window closes, the senior team decompresses, and the system is live. This is when the second category of failure happens — the problems that were present from the start but only visible once real clinical workflow runs at volume for a sustained period.
Hour 1–4: what clinical coordinators must check
The first four hours are the highest-risk period. Clinical staff are using the new system under real conditions for the first time. The hypercare team must be physically present or immediately available — not monitoring remotely from home. Clinical coordinators should be actively checking: Can they find patients? Are orders reaching downstream systems? Are results returning to the right location? Are there any clinical workflows that were not covered in training and are now causing confusion?
Do not wait for staff to raise issues. Proactively check the highest-volume, highest-risk workflows within the first hour. The problems that are going to surface will surface here.
Day 1–3: interface metrics that indicate stability vs hidden failure
Integration stability is not the absence of errors — it is the presence of expected message volumes with acceptable error rates. Define your baseline: how many ADT messages per hour should you expect? What is an acceptable ORU rejection rate? Any significant deviation from expected volume — in either direction — is a signal.
An interface processing fewer messages than expected may indicate a silent failure upstream. An interface processing more errors than expected is obvious. Monitor both. Check message queue depths at the integration engine level, not just at the receiving system. Queue depth accumulation means throughput is not keeping pace with incoming volume — a warning sign before the receiving system shows any distress.
Escalation for "silent" failures
Silent failures are the most dangerous category: messages being accepted by the receiving system but processed incorrectly, rather than rejected with an error. A message consumed incorrectly — for example, a lab result matched to a patient by MRN when the MRN encoding has changed and is now matching to the wrong record — will not generate an integration error. The integration engine reports success. The clinical system has data. The data is wrong. Detection requires clinical review, not technical monitoring. Build explicit clinical data quality checks into the day one and day three schedule, and assign named clinical staff to perform them.
Hypercare team: who stays, for how long, what their authority is
Define the hypercare team composition before go-live, not after. Who is on site (or on call) for the first 24 hours? Who transitions to remote support for days two and three? What is the escalation threshold that brings the senior technical team back on site? The hypercare team must have the authority to make changes to configuration without going through a full change request cycle — but with a decision log requirement for every change made. Hypercare is not business as usual with extra stress. It is a defined support model with defined authority.
The handover from cutover team to BAU: what must be documented before the senior team leaves
The senior delivery team will leave. When they do, the organisation must be able to operate the new system without them. The handover package is not the project documentation — it is the operational documentation. What are the known issues as of handover? What workarounds are in place? Which interfaces are on watch? Who is the first call for each known risk area? What monitoring is in place and who receives the alerts? If the BAU team cannot answer these questions from the handover documentation, the handover is not complete.
Frequently Asked Questions
Need a senior view on your cutover readiness?
Book a 20-minute fit check. We will work through your go/no-go criteria, interface readiness status, rollback plan, and the gaps that programmes most commonly leave open before go-live week.
Book a 20-minute fit check