Agencies face constant challenges in preparing security data for machine learning
Written by FedScoop staff
The ability of federal agencies to harness artificial intelligence and machine learning to identify abnormal behavior on their networks increasingly depends on building strong capacities for data collection, preparation and analysis.
While a large majority of IT and federal agency officials polled in a new FedScoop study say their agencies have above-industry capabilities to monitor, collect, and analyze behavioral data on their networks, try to use this data for machine learning (ML) remains a significant challenge, especially to identify and respond to abnormal behavior on their networks.
According to the survey, one of their main challenges is the lack of experience and skills required in training and testing machine learning algorithms; a lack of adequate tools to do all the data processing work, as well as a lack of clarity as to which market tools and services meet their ML needs; and a lack of reliable, ML-ready data to work with.
This FedScoop study, released this week, interviewed 160 prequalified IT and program managers at large, medium and small federal agencies to explore the state of their data analytics and machine learning capabilities. The study also identified obstacles that agencies continue to face throughout the lifecycle of data collection, processing and analysis. And the study looked at the types of services that agencies turn to for greater support. The survey was conducted online in August and September 2021 and subscribed by Cloudera.
Among other discoveries:
ML challenges vary by agency size – 4 in 10 respondents in large agencies (10,000+ employees) – who tend to face larger-scale data challenges – cited lack of adequate ML skills as a major challenge, up from 2 in 10 respondents in small agencies (less than 1,000 employees) – which often continue to scale up ML efforts or rely more on third parties.
Conversely, 1 in 3 respondents in small agencies cited the lack of adequate tools as one of their biggest challenges, compared to less than 1 in 4 in large agencies. And more than twice as many respondents in small and midsize agencies struggle with a lack of reliable, ML-ready data, compared to their counterparts in large agencies.
Lack of skills in the ML process – Respondents in agencies of all sizes say they face significant skill gaps across the data processing and machine learning lifecycle – from data ingestion to extraction, transformation and loading, analysis, ML training and operationalization of ML. The study suggests that these gaps hamper the ability of agencies to implement zero trust models and build greater cyber resilience.
Agencies seem to have the data they need – There was positive news in the study, which found that agencies have the capabilities to monitor, process, store and analyze behavioral data about users, devices and applications running on their networks – with more than 2 in 3 respondents saying these capabilities meet or exceed accepted industry and NIST standards. What was less clear, according to the report, was to what extent agencies are fully or effectively exploiting these capacities.
Use of external support – While federal IT officials indicate they have the capabilities to handle abnormal behavior data, a significant portion also states that they choose to bring in the expertise of external service providers at every step of the data collection process. data to the ML. The areas in which agencies most often seek assistance are data analysis and data integration and production; but there is also a strong demand for help with ML governance and ML training.
The study also addressed other dimensions of data preparation, including:
- Ability of agencies to securely collect data at the edge of their networks as well as in their network environments.
- Where agencies store their ML production data.
- The extent to which agencies rely on open source solutions versus in-house and commercial solutions to prepare their ML data.
“While federal IT officials maintain that their agencies have the capabilities to ingest, prepare, and analyze data, they still need help leveraging those capabilities to leverage machine learning to better detect and respond to abnormal behavior on their networks, ”the study concludes.
Additionally, the lack of skills across most ML-related data processing steps – and the rapid evolution of data management and ML tools – suggest that agencies “would benefit from moving to more modern platforms.” and integrated to ingest and analyze behavioral data to improve cyber resilience. They would also likely reach zero trust frameworks faster by engaging with service providers specializing in modernized data and ML solutions capable of detecting anomalous behavior, ”the report said.
Download the full report “Preparing for Data Analysis for Cyber Resilience” for detailed results and guidance on improving data collection, preparation and analysis for better threat detection .
This article was produced by FedScoop and sponsored by Cloudera.