Google Cloud aims to support the BigLake data lake for all unstructured data

Credit: Dreamstime

In its continued bid to support all types of data and provide a one-stop data platform in the form of BigLake, Google Cloud will add support for the most commonly used open source table formats in the lakes of data.

The vendor, which made the announcement in its annual Cloud Next conference, BigLake is described as a service that enables data analytics and data engineering on both structured and unstructured data.

“Our storage engine, BigLake, will add support for Apache Iceberg, Databricks’ Delta Lake, and Apache Hudi,” wrote Gerrit Kazmaier, vice president of data analytics at Google Cloud, in a blog post. “By supporting these widely adopted data formats. , we can help remove barriers that prevent organizations from getting the full value from their data.”

Support for Apache Iceberg is available in preview, the company said, adding that support for Hudi and Delta Lake is coming soon. A specific timeline for preview and general availability has not been announced.

Google Cloud decided to support open-source table formats because their addition will enable transaction management capabilities in data lakes, said Matt Aslett, director of research at Ventana Research.

“More than half (57 percent) of data lake users use at least one of these emerging table formats today, which has the potential to increase the use of data lakes as a replacement for data warehousing environment, which supports analytics workloads based on structured data processing,” Aslett said.

However, recent research by Ventana Research’s Data Lakes Dynamics Insights indicated that less than a quarter of organizations have adopted a data lake to replace an existing data warehouse environment, and that data lake environments and data warehouses co-exist in nearly three-quarters of organizations.

“This works in favor of Google’s BigLake because it has the ability to address both data warehousing and data lake approaches with a single environment,” Aslett said.

Google’s addition of support for these open-source table formats appears to be a response to Snowflake and Databricks product updates, said Doug Henschen, principal analyst at Constellation Research.

“Apache Iceberg is the hot new option gaining traction because it promises openness as well as performance gains, but Google makes it clear it’s not choosing sides by promising support for Delta Lake and Hudi too,” Henschen said.

Google Cloud competitor Oracle may also announce similar features at its upcoming annual CloudWorld conference, said Tony Baer, ​​principal analyst, dbInsight.

BigQuery supports unstructured data

As part of this Cloud Next announcements, Google Cloud also added new features to its managed enterprise data warehouse, BigQuery, including adding support for unstructured data.

“Starting today, data teams can analyze structured and unstructured data in BigQuery, with easy access to Google Cloud’s capabilities in machine learning (ML), speech recognition, computer vision, translation, and text processing, using the familiar SQL interface of BigQuery,” Kazmaier wrote .

Data teams in most businesses, according to Google, mostly use structured data, which makes up only 10 percent of all data produced. Structured data includes data from operational databases, software-as-a-service (SaaS) applications such as Abode, SAP, ServiceNow, Workday and semistructured data in the form of JSON log files.

Unstructured data, on the other hand, includes video from television archives, audio from call centers or radio and documents in various formats. Google Cloud argues that businesses face an increasing need to work with unstructured data.

Google Cloud’s move to add support for unstructured data is a differentiating capability for cloud service providers, analysts said.

No other rival cloud service provider is currently addressing the need to support unstructured data as aggressively as Google Cloud, Henschen said.

“Addressing all types of data on one platform promises to simplify things for CIOs, data scientists and developers,” added Henschen.

Other BigQuery updates on Cloud Next



#Google #Cloud #aims #support #BigLake #data #lake #unstructured #data #Source Link #Google Cloud aims to support the BigLake data lake for all unstructured data

Leave a Comment