Skip to content
Menu
menu
Illustration of a melting, black wi-fi symbol on a bright orange background. The skyline of Washington DC runs across the bottom and subtle smoke is simmering out of the city.

Illustration by Security Technology; iStock

A Heat Dome Hits Virginia: The Initial U.S. Government Report

01-btn-off.gif02-btn-on.gif03-btn-off.gif

Editor's Note: Below is part-two of a fictional account of a meteorological event, a “heat dome,” descending in the summer of 2025 on a part of the United States near the town of Ashburn, Virginia.

Ripple effects continue after more than 45 data centers in northern Virginia experienced partial or complete failure under the stress of last July’s heat dome, Calcifer. While intermittent impacts to power generation transmission and distribution caused a variety of local disruptions, the loss of more than four dozen data centers in Loudoun and Prince William counties had local, regional, national, and even global consequences.

What follows is the unclassified executive summary of a 300-page report compiled by the National Risk Management Center based on analyses performed by an intergovernmental Incident Response Team (IRT)/Incident Management Team (IMT) approximately four months after the heat dome lifted. Please note it recounts only a highly selective subset of impacts observed. A more comprehensive accounting is said to be included in the full report and its classified annex. Also note: The report and its executive summary are remarkable for the candor expressed in calling out potential oversight issues within the U.S. Department of Homeland Security (DHS) itself.

DHS Heat Dome Calcifer Incident Response Team/Incident Management Team

Public Release Date: 1 December 2025

Four months after Heat Dome Calcifer stalled over northern Virginia for nine days, causing temperatures to peak at 112 degrees Fahrenheit (44 degrees Celsius), reports continue to roll in via still-spotty communications channels and the job of calculating losses continues.

unclassified-right.gif

In the metro Washington, D.C., region some effects are obvious. For example, innumerable governmental and commercial business offices, daycares, K-12 schools, and universities closed. On the transportation front Metro service, while partially returned to service, is at best intermittent; a stretch of Gallows Road near Tysons Center melted and buckled; flights were grounded for two weeks at Dulles International Airport and a week at Reagan National Airport (DCA). And even after the dome departed and the runways were repaired, intermittent outages of maintenance systems kept some planes grounded, and ticketing system glitches made it difficult to know which DCA or Dulles flights were operating or available. Neither Uber nor Lyft are back in business yet, though taxis are mainly unaffected, though scarce. 

Less apparent is the human toll of the heat dome. Blackouts in the area due to a huge surge in air conditioner use and exacerbated by problems at several substations likely played a factor in the deaths of 258 (mainly older) adults and 45 children who died from heat stress, according to the Federal Emergency Management Agency (FEMA) with input from the U.S. Department of Energy (DOE).

U.S. and Global Business Impacts from Data Center Outages

Details are starting to emerge from the various DHS elements about another kind of damage, less obvious, but in many ways more extensive and pernicious. It appears that no fewer than 45 of about 90 data centers operating in “Data Center Alley”—a region that includes Loudoun and Prince Williams counties and through which more than half of all Internet traffic flows—failed or otherwise proactively shut down during a week of prolonged extreme heat.

Without the data centers operational, the IT and data communications services of dozens of cloud-based nationwide and regional banks, all manner of businesses, hospitals, state and local governments, and at least one mobile phone carrier stopped providing service altogether (See Index, p. 225).

Other examples of nationwide and world-spanning disruptions:

  • The S&P, Dow Jones, and NASDAQ all shed 10 percent, and most international exchanges dropped by as much or more, as the widespread nature of Calcifer’s economic impacts became apparent.

  • Many ATMs from coast-to-coast as well as in Canada, Mexico, Brazil, Argentina and Chile were left unfunctional following Calcifer’s departure. Approximately 20 to 30 percent remain out of service.

  • 911 systems in some regions have yet to be fully restored.

  • A high percentage of the electric and water utilities that moved their business applications and data to the Cloud are having difficulty accessing their smart meter data and have yet to issue bills to their customers.

  • Inside and outside the United States, access to and functionality of online stores is a hit-or-miss affair. That includes big retailers like Amazon, Best Buy, Etsy, and Walmart, as well as innumerable midsize and smaller stores that either depend on the infrastructures of the larger ones or whose co-located operations were affected.

  • Logistics companies have taken a major hit as well, so much so that even if you do get your order otherwise lined up in a shopping cart, many transactions fail due to continuing hiccups at DHL, Fedex, and UPS. And there are additional speedbumps given the issues large packaging provider Uline continues to experience with its IT systems.

Sweeping U.S. Government Impacts

U.S. federal government operations have been impaired, as well as many agencies that began transferring their key applications and data to the Cloud and co-located data centers back in the late 2010s. A massive postal service disruption, disruptions to income tax accounting and reporting at the IRS, and farm support and fire suppression activities at the U.S. Department of Agriculture are examples.



Nor has the U.S. Department of Defense (DOD) been spared. One illustrative example is the Air Force and its Cloud One roster of Cloud service providers that include Amazon Web Services (AWS), Google Cloud, Microsoft Azure, and Oracle Cloud. While these “big 4” so-called hyperscalers operate among the most resilient, fault-tolerant, over-engineered data centers in the world, they were not entirely immune to the brutal conditions that attended Calcifer, and they were far from immune from the ripple effects of so many other lower-tier federal and commercial data centers going dark for days. Both the Air Force’s Cloud One and Platform One secure development and runtime environments appeared to be down until mid-August. And its Weather Enterprise system, vital to global mission planning, remains out of service, forcing the Air Force to draw on a patchwork of commercially available services, even as some of them are impacted as well.

Have there been impacts to classified systems? Much harder to discern, but as independent consultant Vic Vickers, a lead architect for some of the areas first large data centers, well versed in Ashburn-area data center operations put it, the information that travels through those data centers “ranges from the extremely mundane to the very highly sensitive.”

Seeking the Root Causes of Technical Failure

More than half of the highest-throughput data connections in and around Ashburn, many of them serving essential businesses, were rerouted in the first few days of the Calcifer’s arrival. Others attempted the same but failed. While the costs to the U.S. and other economies are still being tallied, the costs to data center owner and operators will be high, and that’s not including the likely pending lawsuits. Analysts at credit ratings firm Moodys estimated that damage assessment of servers, cooling systems, and power equipment will take 12 to 18 months and cost $2 billion dollars per center.

unclassified-left.gifOn the legal front, the National Law Review reports that law firms representing data center owners and operator companies, thousands of their co-located customers, and what are likely to be hundreds of thousands or even millions of global end user Cloud and data-center-as-a-service (DCAAS) customers are still in the early stages of building their suits. Major class action lawsuits are expected against DCAAS owners, operators, and their engineering firms for violating duty of care. It is understood that suits will likely claim Calcifer and its effects were foreseeable; that there was a duty to design and operate facilities providing data center services to meet foreseeable extreme weather circumstances such as Calcifer; that the data center owners, operators, and systems designers breached that duty, causing enormous economic losses, and are now liable for damages.

The DCAAS providers say these suits are sending a message and driving reengineering efforts. At the same time, they will seek legislation that supports liability protections and “safe harbors” while they begin to update their design criteria and operations standards.   

How did we get here? It’s a question that the Federal Reserve (Fed), the Securities and Exchange Commission (SEC), the Federal Telecommunications Commission (FTC), and other U.S. and international financial and data communications regulators are trying to answer. Governance issues may have played a role, as an integrated patchwork of regulated, unregulated, and increasingly interdependent infrastructure entities each operated with distinctly different business models, standards organizations, and regulatory models.

As the first mover and perennial leader, AWS set the pace for all others when it came to the widespread use of availability zones (AVs). AVs are essentially clusters of data centers within a region that allow customers to run instances of an application in several locations to avoid a single point of failure. While beyond the reach of many budgets, this approach enabled sophisticated strategies for the failover and backup of critical applications.



SM7

[New] Newsletter

Subscribe to SM7

Find out your top seven security news stories, delivered to your inbox weekly, and powered by ASIS International. 

With 25 AWS data centers situated across Loudoun and Prince William counties (See Index, p. 245), AWS’ US-East region is ground zero for the Cloud, for all manner of smart devices, and for the Internet itself. But even a distributed reliability plan can break down if the network fails, breaking the flow of data across public and private cloud infrastructures. So, while many in Ashburn and its vicinity used the availability zone method, Calcifer disrupted approximately half of the data centers there to a significant extent—some from network issues alone, independent of the heat factor.

The data center world is entirely self-regulating through a tiered rating system maintained by the private sector Uptime Institute. It certifies data centers on a scale of 1 to 4 for reliability, briefly paraphrased here from Uptime’s own website (See Index, p. 255):

  • Tier I: This is the basic capacity level with infrastructure to support information technology for an office setting and beyond.

  • Tier II: Cover redundant capacity components for power and cooling that provide better maintenance opportunities and safety against disruptions.

  • Tier III: Is concurrently maintainable with redundant components as a key differentiator, with redundant distribution paths to serve the critical environment.

  • Tier IV: The most demanding certification level, a Tier IV center has several independent and physically isolated systems that act as redundant capacity components and distribution paths. The separation is necessary to prevent an event from compromising both systems. The environment will not be affected by a disruption from planned and unplanned events. Tier IV data centers also require continuous cooling to make the environment stable.

In marketing their reliability, data center companies often use the number of nines—three, four, or five—to signify how briefly their services will not be available in a given year.  For example, five 9s availability means that a company’s apps or websites are operational 99.999 percent of the time, or about five-minutes of downtime per year. While this causes little trouble for most companies, for certain sectors (e.g., utilities, telecommunications, e-commerce, financial services, critical manufacturing, aviation, etc.) five-minutes of downtime can cause major problems (See Index, p. 242).

With cooling systems sized to be able to handle (previously) rare high-heat events, as defined by the American Society of Heating, Refrigerating, and Air-Conditioning Engineers (ASHRAE) and dubbed the ASHREA 20-year, no engineering design team seemed to think their cooling systems would ever have to deal with temperatures over 110 Fahrenheit  (43 degrees Celsius) with wet bulb temperatures of 90 degrees or higher.

What did those that failed have in common, and how did others weather the storm? The former were generally older, air-cooled Tier 1, 2 or 3s, while Tier 4s—especially the newer ones—were able to avoid running their cooling and power systems into the ground by alternating with their fully redundant backups. Some of this is attributable to the skill of the operators, some to the foresight of Tier 4 and even lower Tier centers who put their worst-case scenario plans and procedures in place and began shifting loads out of the area well before the worst effects of Calcifer arrived. Those also benefitted who had followed the recent wave of deploying cooling systems that ran water or glycol past the processors and memory chips themselves, or the newer immersion cooling techniques where server and computing components are submerged in dielectric liquid. Others benefitted who experimented and found they could operate safely at higher server hall temperatures than many previously thought.

The good news, if there is any, is that businesses that could afford to pay for the highest reliability data center services were less grievously impacted. Many of these were global banks for whom long-term disruption might have upended not just the U.S., but the global economy. Those that couldn’t afford the confidence of multiple redundant activity zones were more likely to be local or regional businesses or governments, many who today are still scrambling to employ alternate methods. Never have ham radio operators been more important or more appreciated. They are providing indispensable back-channel communications once planned for other types of catastrophic disruptions to business-as-usual communications paths like geomagnetic disturbances or electro-magnetic pulse attacks (See Index, p. 267, exhibit B).

Failures of Imagination and Oversight … Again

Calcifer’s disruptions continue to affect all the critical infrastructure sectors and national critical functions within the DHS portfolio. Initial impact reports from across the nation include data losses and disrupted IT and communications services. It may be that DHS put most of its risk management attention and energy into guarding against cyber threats, and, at least at first blush, was caught off guard by this unprecedented—though far from unforeseeable—extreme weather event.

Several former DHS and commercial risk experts interviewed for this report shared that this was an entirely foreseeable event, and it should have been on the radar  since data center failures from extreme heat events started piling up three years earlier. The authors of this report note there are already boisterous calls for an investigation and a report as large—or larger than what followed the attacks on 9/11.



In the wake of the 9/11 attacks on New York City and Washington D.C., and the Northeast Blackout of 2003 for which grid operator reaction times were slowed by a small but significant malware infestation, the U.S. government formed a Blue Ribbon Panel and launched a series of mostly classified experiments to identify other Achilles’ Heels—other national blind spots that could be exploited by smaller adversaries to devastating effect.

From those inquiries, and a few others (See Index, p. 275), we learned that targeted cyberattacks on the U.S. grid might create catastrophic consequences unless electric utilities rapidly improved their cyber postures. The Federal Power Act of 2005 set the stage for what soon became known as the North American Electric Reliability Corporation (NERC) Critical Infrastructure Protection standards. And their mandatory nature, which included fines to compel action, largely yielded what was intended: demonstrably more secure electric utilities.

unclassified-right.gif

Unlike the electric sector, which regularly exercises its preparedness to operate through natural disasters, like hurricanes and earthquakes, and to thwart nation state-adversaries’ malign intentions in the form of large-scale cyberattacks, the Information and Communications Technology (ICT) sector, of which data centers are perhaps the most essential element, does not practice at scale for either cyber nor extreme weather events.

The varieties of business models, the complex tangle of relationships between colocations providers, their tenants, cloud providers, and the end-user customers (not to mention their customers) who pay for the reliable functioning of them all, makes meaningful risk analysis of the data center universe extremely difficult. So difficult in fact, that to some in the U.S. government, including but not limited to DHS’s Cyber and Infrastructure Agency (CISA) it seemed at best impractical, and at worst impossible. So, while quarterly reports touching on risks are submitted to the SEC, largely opaque to government risk managers are facts as consequential as:

  • Which customers are served by which data center providers.

  • Service level agreements paid out in cash or credits for reliability breaches or failure to meet latency guarantees.

  • How data center owner/operators are preparing for worsening environmental conditions (e.g., extreme heat, drought, flooding, etc.).

Heavily referencing the sixth round of the United Nation’s authoritative Intergovernmental Panel on Climate Change (IPCC) group of climate scientists, in 2022 the SEC moved to add rules for businesses to report their emissions as well as their climate transition and physical risks (See Index, p. 281).

One trade group representing data center owners, the Information Technology Industry Council, was critical of the proposed reporting requirements in its public comment (See Index, p. 282):

“The proposed definitions of ‘climate-related risks,’ ‘transition risks,’ and ‘climate-related opportunities’ are overly broad and unworkable, and should be narrowed to focus on registrants’ business operations.”

Invoking the “major questions doctrine,” federal agency regulatory reach was dramatically curbed by the 2022 U.S. Supreme Court decision in West Virginia v. Environmental Protection Agency. As a result, the new SEC rules that went into effect in 2023 were substantially pared back, particularly in reporting carbon emissions. As originally proposed and ultimately promulgated, the rules are not nearly robust enough to define—let alone enforce—data center reliability requirements with catastrophic climate physical risks in mind.

Looking to the effective regulatory models for reliability established for the bulk electric system—and more recently, larger water sector utilities—a regulating agency (not the SEC) could require compliance to climate-informed General Design Criteria (GDC) incorporated into Uptime Institute-registered Tier 3 and 4 data centers.

Specifically addressed in the designs of these data centers, the GDC would be climate-informed to provide enhanced resilience against extreme weather and other climate-change exacerbated phenomena including:

  • Sea Level Rise (SLR), storm surge, and subsidence.

  • Increasing frequency and severity of storms.

  • Extreme heat and extreme cold conditions.

  • Drought and other water scarcity issues.

  • Melting permafrost.

  • Projected increases in the 10, 50, and 100-year temperature profile estimates.

Research will need to be conducted to examine whether relentless pursuit of sustainability and efficiency goals has the unintentional side-effect of reducing high performing facilities’ ability to ride through extreme events like Calcifer.

Like electricity, DCAAS is an industrial utility scale technology directly, and instantaneously, affecting millions—and perhaps billions—of people in a catastrophic failure such as Calcifer. DCAAS may expect to undergo a public safety regulatory evolution, with bulk electricity as the clear precedent. Following the Northeast 1965 Blackout (See Index, p. 276), the power sector voluntarily organized into a standards organization (1968) that later morphed via the Federal Power Act (2015) into Federal Energy Regulatory Commission (FERC) and NERC’s reliability regulations. Calcifer showed that the DCAAS sector should expect a far more accelerated regulatory timetable.  

What Now and What’s Next

Civilizational dependency on data centers has now been laid bare. As more and more entities raced to the Cloud, and as the Internet of Things (IoT) begat smart devices beyond number, how could we not see the vulnerable position we were putting ourselves in? 

In companies large and small, as well as the vast majority of residences, landline telephones have all but disappeared, replaced by Voice over Internet (VOIP) and 5G cellular services, both massively dependent on what the large telco carriers call switching centers. And note: switching centers are data centers, with a significant presence in Ashburn.

In a potentially ironic twist, going slow may have had some benefits. It’s well known in the data center community that federal data centers, not subject to competitive, market-driven reliability and latency pressures, compare poorly with their commercial counterparts, performance-wise. But it may well turn out that outdated data centers may in some ways be more resilient when it comes to operating at the edge of the environmental parameters for which they were designed. In fact, not being 100 percent in the Cloud might have been the best defense, something folks in the bulk power system business have been saying all along, in ways roughly analogous to the caution they express over becoming overly dependent on automated grid control systems (See Index, p. 283).

On the other hand, some of the highest performance data centers with lowest power usage effectiveness (PUE) scores may have been among the first to fail. More research will have to be conducted to see if relentless pursuit of sustainability and efficiency goals has had the unintentional side-effect of reducing higher performing facilities’ ability to ride through extreme events like Calcifer.

Amidst all the uncertainties in this incident that will likely take years to fully play out, the one sure thing is that reliability regulation of the data centers and their owners and operators should now be fully on the table. For the sake of the nation, a significant level of transparency is going to be required. And while regulation has few fans in industry, one thing it can be counted on achieving is arming reliability champions inside these companies with the business cases they need to make major improvements. We saw it in energy. Don’t be surprised if we see something similar or even more stringent for data centers.

One final note: We have yet to see nation-state actors attempt to take advantage of this situation to inflict compounding harms on the United States. U.S. government  and the DOD are continuing to closely monitor the situation and are prepared to respond if called upon.

Scene 2 END

This is part two of a three-part series that Security Technology will publish during December 2022, illustrating the ramifications of a fictional heat dome settling on Data Center Alley in Ashburn, Virginia, and the ripple effects of that event. Read part one of the series: "A Heat Dome Hits Virginia: One Data Center's Story."

Andy Bochman is a senior grid strategist and defender at the Idaho National Lab, where he provides strategic guidance on topics at the intersection of grid security and climate resilience to senior U.S. and international government and industry leaders. He is also a non-resident senior fellow at the Atlantic Council’s Global Energy Center.

Tracy Staedter is a freelance science and technology writer who has previously been published in Scientific American, IEEE Spectrum, and MIT Technology Review.

Special thanks to editors and SME fact checkers: Jamie Richards, Kelly Wilson, Matt Wombacher,  Lynn Schloesser, Chris Payne, Tom Santucci, Peter Behr, and Tim Roxey.

© 2022 Andy Bochman

arrow_upward