When the U.S. State Department implemented a plan to better turn data into insights, it chose Databricks as its primary data preparation platform and the fuel for the advanced analytics needed to effectively perform the responsibilities of agency.
Because of its mission of advising the president on all matters related to foreign policy and assisting in setting the country’s foreign policy through treaties and agreements with other countries, the State Department collects enormous amounts of critical data. to the safety and security of the country.
Like many organizations, the State Department collects data from workplace applications such as Salesforce and ServiceNow. But more than that, it gathers data from emails, phone calls, communications between agencies, communications and social media platforms like WhatsApp, and other sources.
And most of the data collected through those multiple channels is managed and stored in separate repositories.
In an attempt to better manage all that data, consolidate it and make key information easily accessible when needed-sometimes in real time as world events happen-the State Department launched the Center for Analytics in March 2020 to better transform data into fuel for foreign policy decision making.
As part of that change, the State Department deployed the Databricks platform about 18 months ago.
Databricks, founded in 2013 and based in San Francisco, is a data management vendor whose lakehouse platform combines the capabilities of traditional data warehouses with data lakes.
“Because we pioneered Databricks, it has become the central data platform for us to source our data and clean up the data and enable it for advanced analytics,” said Mark Lopez, specialist master at Deloitte, a consultant for State Department, recently during the Data. + AI Summit, a user conference hosted by Databricks.
Mark LopezExpert master, Deloitte
The mission
At the time the State Department adopted and put in place Databricks for data preparation and management, the department also wanted to improve its overall analytics operations to make it easier to find key information at the right moment it needed to get insights resulting in actions.
But within that overall goal of improving the efficiency and effectiveness of its analytics are more specific goals.
These include enhancing Freedom of Information Act requirements using the increased intelligence and machine learning capabilities of the Databricks platform.
The U.S. government received nearly 800,000 FOIA requests in fiscal 2020, and although the Homeland Security and Justice departments received the most, the State Department also received a high volume of requests.
Finding the exact information requested in the trillions of documents maintained by the State Department is often difficult, but now the combination of machine learning and AI capabilities such as natural language processing and text mining makes the process even more efficient.
In addition, the State Department wants to use machine learning and AI to discover insights from mission -centered records, conduct investigations and improve security, respond to information requests from Congress and provide assistance. in evacuating people abroad who need to quickly leave a dangerous location.
By combining Databricks platform’s AI and machine learning capabilities with other analytics tools, the State Department has achieved its goals, according to Alan Gersch, also an expert master at Deloitte .
The State Department is now using Databricks to develop machine learning models that feed into BI dashboards from vendors such as Tableau that are used to inform policy decisions. The agency also uses Databricks and NLP-powered models to enrich archived data with metadata to speed up searches, and integrates Databricks with Microsoft Azure Data Factory to aggregate different data sources for i automate reports delivered by the agency to the president and secretary of state.
As a result, processes that used to take days now take less than an hour, in many instances.
“Databricks act as force multipliers and the glue that brings other systems together and enhances them and speeds them up,” Gersch said.
Applying technology
The US first sent troops to Afghanistan after the Taliban attacks on the World Trade Center and Pentagon on September 11, 2001.
Twenty years later, on August 30, 2021, the US withdrew its last troops. But just because all U.S. troops have been removed from Afghanistan, that doesn’t mean the U.S. is done with evacuating people from the region.
Some U.S. citizens remained in Afghanistan. So did the many Afghans who helped the US and others who were in mortal danger as a result of their actions during the 20 years of war between the US and the Taliban.
Determining who needs to be evacuated, however, is a complex task. So is the process for screening various groups of people who may want to leave Afghanistan with US assistance
To identify and help the many people who need to be stuck in Afghanistan, the State Department has established a task force of data scientists, data engineers and data analysts, according to Lopez. And using tools from the Databricks platform along with Azure Data Factory, the task force within a few months identified and obtained the relevant information needed throughout the vetting process.
“We needed to understand where these people are, do they plan to leave, who is part of their family,” Lopez said.
Ultimately, the Databricks platform along with the Azure Data Factory allowed the State Department to take data from diverse sources, consolidate it into one location, explore which data points could be connected and associated with both people or a person and their family members, and release people in Afghanistan who need to leave.
“The goal is to get people on flights out of Afghanistan, and at some point we have hundreds of flights coming out every day,” Lopez said. “Many of these are enabled using Databricks and the Azure stack as well. Using Databricks as our central data processing engine has really allowed us to integrate multiple source systems and processes and grow quickly.”