The CrowdStrike Internet disaster prompted me gather some thoughts which have been percolating in my mind for a while. The CrowdStrike event puts a little different spin on them and perhaps, if desirable, will get my thoughts more attention. Of course, this could be my naivete and everyone else already knows all this.
When I worked in the Cyber Security Group at Pacific Northwest Laboratory one of the things I was asked to do was to review and comment on a DHS proposal for making the Internet more secure. In the paper was the suggestion that the Federal Government have a central location with gatekeeping capabilities to isolate sections of the Internet from each other to prevent worms from spreading to the entire Internet. I advocated against this because having a single (even if somewhat distributed) point of control/failure would make it an exceedingly attractive target. Sure, it could be made very secure. But with that big of a payoff for access it will be attacked by the best of the brightest of most nation states as well as those with common criminal intent. When the bad guys inevitably get access, those who intended to make the Internet more secure will be responsible for enabling a catastrophe.
I suspect most large companies are in a similar situation and/or are inadvertently working toward one. At my company most of our security monitoring is being migrated into the cloud. I can’t imagine a major corporation not using Office 365. Which depends on Azure. And I’m sure many other critical or nearly critical products are could based in every company. I used to work at Microsoft and trust Microsoft to do a good job with their security/reliability/etc.. Amazon and Google do as well or better than Microsoft, but the payoff for breaching one of these cloud providers is so great that I find it difficult to imagine it won’t someday be breached/shutdown in some form.
Of course, the same goes for any highly used system. CrowdStrike probably wasn’t breached. But it was a single point of failure for a large section of the planet. And the consequences of this accident probably cost billions. And I shouldn’t have to remind anyone about the SolarWinds hack and how many companies that affected.
And if you want to get really concerned, think of what happened in the TV series Battlestar Galactica. The enemy robots compromised all the computerized systems of human civilization and used that to hide their nuclear strike and suppress the defenses. We now have AI built into our computer security. Enemy robots don’t even have to break and enter. They just need to convince their AI cousins to switch sides.
I don’t know that there is a practical solution. I know what I advocated for in the DHS proposal. I advocated for independent solutions providing diversity and redundancy of the Internet. Even if you postulate an infinitely benign government, government control of everything is a single point of failure.
Having diverse hardware, software, processes, and people (hardware and software are not the only things which can be hacked and/or broken) is very expensive to implement, operate, and maintain. And redundancy is a surprisingly difficult task. As a Boeing Reliability Engineer once told me, “It doesn’t much matter how many backup systems you have. What matters is, how independent they are.” Having the ability to land safely with three out of four engines shutdown doesn’t matter if someone contaminated the fuel in the supply truck.
Perhaps there isn’t a practical solution. But people should at least be aware and hence they may be able to mitigate risks in some instances.
As I recall, part of the point of the original ARPAnet was to have a distributed comms and information system that was explicitly *not* through a single hub.
I immediately thought of one of my favorite villains- outsourcing. In this case, outsourcing to a corporation that anyone with two brain cells to rub together knew was unreliable. While it makes sense for an individual to purchase anti-virus software (assuming the government doesn’t outlaw it), a giant corporation needs to look after its own security which would in a natural way create the distributed security network. To take another recent example, are all the car dealers still paralyzed? That seems to have fallen out of the news but it was the same problem. They were all convinced to outsource to a single contractor who got hacked.
I’ve been at the same employer long enough to have been told that outsourcing our IT services would save “millions per year”.
Volume of tickets logged and time to resolution both doubled, which meant satisfaction went to about a quarter what it was… But we indeed saved about 10% on a per-ticket basis!
We finished un-making that bed. (and guess the mantra justifying the switch back? “This will help us save millions!”), leadership has now fixated on migrating *everything* to the cloud for their cost-savings delusions.
Days after the cloudy initiative was declared–one of the biggest and earliest cloud advocates, DHH of 37 Signals, posted an article of how moving to in-house managed data centers has saved their company millions.
“Enemy robots don’t even have to break and enter. They just need to convince their AI cousins to switch sides.”
Ya, internet systems might be one place where diversity truly could be a strength.
I’m COMPLETELY stupid on this subject so don’t tell I’m talking out of my ass. (I already know.)
But it seems to me the problem is that hackers can hide. They route their nefarious bullshit through internets all over the global to be untraceable?
If so, is there some way to gate that? Keeping them from hopping through varied systems without permission?
Where I see some monster problems of the future is AI power wars. Got to have electricity to operate. And we got less than half of what up and coming AI/computing needs.
Rolling black/brown-outs in parts of the country for humans. So the machines stay up is probably what’s going on now.
Take a drive down the Colombia river lately? (Or just use google maps in satellite mode.)
You will see a ton of data centers clustered around every dam. (One of the most reliable grids on earth.) Not counting all the windmill projects. (Which is what I’m sure are for humans, and we’ll be forced to live under the vulgarities of green power.)
The elites are setting themselves up to be first in line for power.
It seems natural that AI would invade foreign systems for reliable power?
Maybe someone could write some really good “geek”, fiction about it?
I have been having a running debate with an AI booster and I am so going to use that scenario.
To funny! keep us informed please.
And something we might want to think about is how much enviro-damage will be acceptable to keep it all going?
Will AI care if a nuke plant is leaking radiation and needs to be shut down for repairs?
If it can just run it for 20 more years while poisoning humanity?
Might AI think. I don’t care about spent fuel rods, just dump them out back and let them cook.
It’s just a few humans.
Got to break a few eggs to make an omelet, right?
I sure wonder about people who put mission-critical systems on Microsoft products.
Working with big companies and government agencies, I see more and more stupid IT people lording it over ordinary workers and making it harder and harder to get work done with all their inane rules and ever increasing authority.
I’d love to see an organization where IT actually focused on moving forward the stated mission instead of building their own little empire at the expense of everyone else.
At least at my current employer, IT workers don’t brag about their ridiculous encryption system bricking company owned cameras…
Oh, and the issue at hand today? When did big IT departments stop testing updates on a handful of computers and start blindly accepting all updates from vendors with no checking? It seems to me it wasn’t very long ago that this was common practice.
Re companies not testing stuff, that’s a good point. The same, of course, could be said for Crowdstrike.
One wonders why its CEO still has his job.
Your automobile contains a triply redundant brake system. The only single point of failure is…the car seat.
You forgot about a bunch of stuff. For a start, think about the brake pedal, its points of attachment, and brake cable.
Design of reliable systems is full of that. You don’t necessarily make things redundant, let alone multiply redundant. Instead, the design involves looking at failure rates of all the components, and the consequences of failure of each component. If a component failure would cause a system failure, and the failure rate of that component is too high for the system failure spec, you address that in some way, and redundancy around the offending component is usually the answer. But if a particular component has a low enough failure rate it doesn’t impair the system failure rate, it can be left non-redundant.
I haven’t done brake systems but I have done storage systems for computers. There you use “RAID” which is redundant disk drives, because disk drives are sufficiently failure prone you need to be concerned. But you don’t build redundant sheet metal, because the box that holds the drives is reliable enough without it. Yes, in theory the metal could fall apart and drop the drives on the floor, but the probability of that happening is low enough, by quite a comfortable margin.
Similarly, as Joe points out, the brake pedal and the axis on which it pivots are single points of failure, but they are reliable enough. The brake cable is a bit iffier, I would think. One thing that helps is that cars have parking brakes, which are essentially an entirely separate brake system that will stop the car (not as well, but it will) if the brake pedal decides to fall off.