Following President Biden’s May 2021 cybersecurity executive order, the Office of Management and Budget (OMB) memo M-21-31, “Improving the Federal Government’s Investigative and Remediation Capabilities Related to Cybersecurity Incidents,” gave agencies some of the first concrete steps they should take on the path to zero trust. The August 2021 memo outlines a four-tier system for logging events; agencies must implement the levels in six-month intervals. When agencies are in full compliance with M-21-31, they will be able to provide application container security and management, logging orchestration and automation, advanced centralized access, and user behavior monitoring.
Recently MeriTalk sat down with Monzy Merza, vice president of cybersecurity go-to-market at Databricks, a data and artificial intelligence (AI) company that offers the first and only lakehouse platform in the cloud. Merza chatted about the implications and opportunities with M-21-31 and offered insights for successfully meeting its mandates.
MeriTalk: How has the cyber threat detection and response landscape evolved over the last two years?
Monzy Merza: Two or three years ago, a SIEM, a security incident event management system, was the center console for a lot of security operations. We were just starting to see the emergence of security orchestration, automation, and response (SOAR) platforms. In the meantime, we’ve seen an explosion in the use of machine learning (ML), data science, and big data in a scalable way for cybersecurity – partially driven by demand. In addition, agencies are facing more requirements to collect and analyze increasing amounts of data. There’s no better example than OMB M-21-31. Technology is evolving as a consequence. Teams are evolving as well. Two years ago, the security operations teams knew the tasks that had to be done. Today, they have additional tasks that require new techniques, tools, and capabilities for individuals. That leads to a lot of pressure on managers and staff to keep up.
MeriTalk: Why is event logging so critical to an agency’s cybersecurity?
Merza: For a security practitioner, event logs are essentially the transcript of what happened in an environment. They ensure that things are happening in the way we expect, and they’re also important for threat detection, investigation, and response. They help us react to events and also look proactively at our environment and make improvements.
MeriTalk: M-21-31 is an unfunded mandate. How can agencies meet the requirements within existing budgets?
Merza: M-21-31 gives agencies an opportunity to evaluate how they’re consuming technology and structuring security operations and then make changes in order to meet the new requirements.
With M-21-31, government leaders are anticipating collecting five to seven times more than their previous log volume. Many event logging and SIEM companies charge customers based on the amount of data that is collected and stored. At these new volumes, that pricing model becomes untenable. Agencies can’t go back to the same tools because they can’t scale to meet these needs and the pricing model is too costly.
But there are cloud-native vendors that charge only based on the data you actually use and analyze. Those models are more tenable when data volumes are significantly larger and the data is more widespread.
MeriTalk: What should agencies do to facilitate data sharing?
Merza: There’s an old blues song that says everybody wants to go to heaven, but no one wants to die. That’s sort of how people feel about data sharing. Everyone thinks it’s a good idea, but no one is ready to do it. Agencies face a lot of governance and administrative challenges with data sharing, as well as technical challenges. First, most security systems are built on proprietary technologies, yet agencies must produce data in a way that the receiving party can consume it. Then, the receiving agency must be able to operationalize the data in a timely fashion.
Let’s say there is a breach in an agency and the agency wants another agency to help. They are not sharing a few megabytes of a threat feed; they are sharing massive volumes of log data. All of the challenges of data sharing are compounded. Do you move the data or copy it? How do you give access? Do you put time boundaries on that data? How do you maintain governance of that information? As data volumes are getting bigger and sharing requirements grow more complex, having an open system becomes very important.
MeriTalk: You talked about the evolving skill sets of security staff. How can agencies empower analysts in an M-21-31 world?
Merza: Agencies need to realize the cybersecurity landscape has changed. We’re in a different terrain that requires us to rethink the way we deal with adversaries and our own people. We have an opportunity to learn new tools and techniques – or operating in the new terrain will be untenable.
When you choose technologies, you also need to consider the personas that they will serve. We often ask, how will this security tool address the threat? But you also have to look at it in reverse. How is your staff using the information? Different people use information in different ways to meet the requirements of their jobs. As technology evolves, leaders must consider how they are enabling different personas with security analytics tools.
MeriTalk: What difficulties do agencies encounter by using legacy SIEM platforms to collect and monitor their logs? How can Databricks help agencies go beyond traditional SIEM?
Merza: Legacy security tools such as SIEMs, SOARs, and compliance reporting tools fall into two major categories: data warehouses and data lakes. Data warehouses are good at transactional logic and applying governance models to data sets. But they can’t deal with today’s large volumes of data. They also require schematization and normalization of data, so they can only look at specific kinds of data sets. Today’s security operations use unstructured, semi-structured, or binary data – and data warehouses are ill suited for that.
Because of the limitations of legacy tools, data lakes emerged. Using cheap batch storage, they let you bring in semi or unstructured data and run ML operations on top. But they are poor at transactional logic; some people call them data swamps.
Using both of these technologies can create data silos. You have disparate governance models to apply and can’t have consistency in use cases. For example, if you have a credential abuse use case and detect credential-based threats, do you apply that in your SIEM, which is based on a relational database, or in your user behavior analytics platform, which is built on a data warehouse? What does this do to scale or governance? Everything becomes more complicated.
In an ideal world, you want a simple system that combines the best of both technologies. And that system must be open because the ecosystem is very complex; agencies are using a hundred-plus technologies. It also needs transactional and governance model capabilities. It must scale regardless of the volume and type of data sets, and it needs to be multi-cloud native.
Databricks calls this scalable, open, collaborative platform the Lakehouse, which combines what data lakes and data warehouses do best. Agencies can do transactional logic, have superior governance, and massive scale for all sorts of data sets – not just for cybersecurity, but any number of data-driven use cases. This capability is core to any data analytics and AI capability that an organization would want.
MeriTalk: What are the advantages of working with Databricks to fulfil the requirements of M-21-31?
Merza: Number one, Databricks’ pricing model is driven by what you use or analyze and not what you collect and store. This helps agencies operate within the same cost envelope that they are operating in today. Number two, Databricks is built on open-source technologies. Number three, Databricks scales to accommodate massive data volumes, and because of our pricing model, agencies don’t have to worry about paying more.
Also, Databricks’ ability to do ML and AI lets you detect more sophisticated threats and do analytical automation. You can do better analytics and faster detection with the same number of people and the same processes.
Because Databricks integrates with a wide range of data sources, developer tools, and partner solutions, agencies can start achieving M-21-31 maturity without having to throw away their processes or forklift everything.
Databricks is multi-cloud; it’s native to all major cloud service providers. A lot of large government agencies have infrastructure in multiple clouds, and they need to do logging in all of them. It’s impractical to bring all that data back into one cloud because egress costs are very high. Databricks enables agencies to analyze the data where it lives. It doesn’t sound like much, but it is huge from a cost and operations perspective.