Google intends to support the BigLake data lake for all unstructured data

In its continuing bid to support all types of data and provide a one-stop data platform in the form of BigLake, Google said Tuesday it will add support for the most commonly used open-source file formats. table in data lakes.

The company, which made the announcement at its annual Cloud Next conference, describes BigLake as a service that enables data analytics and data engineering on both structured and unstructured data.

“Our storage engine, BigLake, will add support for Apache Iceberg, Databricks’ Delta Lake, and Apache Hudi,” wrote Gerrit Kazmaier, vice president of data analytics at Google Cloud, in a blog post. “By supporting these widely adopted data formats. , we can help remove barriers that prevent organizations from getting the full value from their data.”

Support for Apache Iceberg is available in preview, the company said, adding that support for Hudi and Delta Lake is coming soon. A specific timeline for preview and general availability has not been announced.

Google decided to support open-source table formats because their addition will enable transaction management capabilities in data lakes, said Matt Aslett, director of research at Ventana Research.

“More than half (57%) of data lake users use at least one of these emerging table formats today, which has the potential to increase the use of data lakes as replacement environments data storage, which supports analytics workloads based on the processing of structured data,” said Aslett.

However, recent Data Lakes Dynamics Insights research by Ventana Research indicated that less than a quarter of organizations have adopted a data lake to replace an existing data warehouse environment, and the data lake and data environments warehouse co-exists in nearly three-quarters of organizations.

“This works in favor of Google’s BigLake because it has the ability to address both data warehousing and data lake approaches with a single environment,” Aslett said.

Google’s addition of support for these open-source table formats appears to be a response to Snowflake and Databricks product updates, said Doug Henschen, principal analyst at Constellation Research.

“Apache Iceberg is the hot new option gaining traction because it promises openness as well as performance gains, but Google makes it clear it’s not choosing sides by promising support for Delta Lake and Hudi too,” Henschen said.

Google’s rival Oracle may also announce similar features at its upcoming annual CloudWorld conference, said Tony Baer, ​​chief analyst, dbInsight.

BigQuery supports unstructured data

As part of its Cloud Next announcements, Google also added new features to its managed enterprise data warehouse, BigQuery, including adding support for unstructured data.

“Starting today, data teams can analyze structured and unstructured data in BigQuery, with easy access to Google Cloud’s capabilities in machine learning (ML), speech recognition, computer vision, translation, and text processing, using the familiar SQL interface of BigQuery,” Kazmaier wrote .

Data teams in most enterprises, according to Google, mostly use structured data, which makes up only 10% of all data produced. Structured data includes data from operational databases, SaaS applications such as Abode, SAP, ServiceNow, Workday and semistructured data in the form of JSON log files.

Unstructured data, on the other hand, includes video from television archives, audio from call centers or radio and documents in various formats.

Google argues that businesses face an increasing need to work with unstructured data.

Google’s move to add support for unstructured data is a differentiating capability for cloud service providers, analysts said.

No other rival cloud service provider is currently addressing the need to support unstructured data as aggressively as Google, Henschen said.

“Addressing all types of data on one platform promises to simplify things for CIOs, data scientists and developers,” added Henschen.

Other BigQuery updates on Cloud Next

Google also announced support for the open-source unified analytics engine Apache Spark. The move is in line with the company’s strategy to position its cloud service as a modern lakehouse that supports analytics, warehousing, and data science, analysts said.

The new integration, which is in private preview, will allow enterprise data teams to create BigQuery methods, using Apache Spark, that integrate with their SQL pipelines, the company said.

“By accepting Spark, Google is accepting the data scientist’s most popular choice,” Henschen said.

“In contrast to Google, Snowflake is still early in its data science journey using Python and other languages ​​by offering Snowpark on top of its database, and it relies heavily on partners for support,” added Henschen.

Another rival, Databricks, has also improved support for data warehouse and business intelligence (BI) workloads on its platform.

Meanwhile, Google has also integrated its change stream service, called Datastream, into BigQuery.

“The new integration will help organizations more effectively replicate data from all kinds of sources—including real-time data in AlloyDB, PostgreSQL, MySQL and third-party databases like Oracle—directly to BigQuery,” the company said in a blog post.

Additionally, Google has updated its data unifier service, DataPlex, to automate processes related to data quality.

“For example, users can more easily understand the lineage of data—where the data came from and how it changed and moved over time—reducing the need for manual, time-consuming processes,” wrote Kazmaier said in a blog post.

Looker Studio integrates business intelligence products

With Cloud Next, the company said it will unify its business intelligence products by combining Looker and Data Studio to form Looker Studio, which will be available in three options.

“Looker Studio currently supports more than 800 data sources with a catalog of more than 600 connectors, making it simple to explore data from different sources,” wrote Kate Wright, senior director of BI product management on Google Cloud, in a blog post.

Looker Studio, which will offer private preview access to data models currently, is also expected to get a new interface, the company said, adding that the base version of Looker Studio will be free.

Prior to the merging of products, Looker was a paid service and Data Studio was a free service. The free version, according to Aslett, is not expected to come with support. To get support and additional features, businesses need to update to the Pro version of Looker Studio.

“Customers upgrading to Looker Studio Pro will get new enterprise management features, team collaboration capabilities, and SLAs [service level agreements]. This is just the first release, and we’ve developed a roadmap of capabilities, starting with Dataplex integration for data lineage and metadata visibility, that our enterprise customers have been asking for,” said Wright.

Other updates to Looker include support for visualization tools, such as Tableau and Microsoft Power BI, to access data, the company said.

Vertex AI Vision released

In an effort to help developers and data scientists build and deploy computer vision-based applications, Google has added a new feature called Vertex AI Vision to extend the capabilities of its machine learning platform Vertex AI .

The company is working to ease machine learning (ML) operations with the launch of the Vertex AI platform last year in May, followed by the introduction of the collaborative development environment Vertex AI Workbench in October.

“The new end-to-end application development environment will help you capture, analyze, and store visual data,” the company said, claiming the new service could reduce the time to create computer vision applications from weeks to hours and at one-tenth the cost of current offerings.

Google says it achieves these efficiencies by providing a relatively easier-to-use interface and a library of pretrained machine learning models for common tasks like occupancy counting, product recognition, and object detection.

“It also provides the option to import your existing AutoML or custom ML models, from Vertex AI, into your Vertex AI Vision applications. As always, all of our new AI products also follow our AI Principles,” the company said.

Copyright © 2022 IDG Communications, Inc.

#Google #intends #support #BigLake #data #lake #unstructured #data #Source Link #Google intends to support the BigLake data lake for all unstructured data

Leave a Comment