In its continued bid to support all types of data and provide a one-stop data platform in the form of BigLake, Google Cloud will add support for the most commonly used open source table formats in the lakes of data.
The vendor, which made the announcement in its annual Cloud Next conference, BigLake is described as a service that enables data analytics and data engineering on both structured and unstructured data.
“Our storage engine, BigLake, will add support for Apache Iceberg, Databricks’ Delta Lake, and Apache Hudi,” wrote Gerrit Kazmaier, vice president of data analytics at Google Cloud, in a blog post. “By supporting these widely adopted data formats. , we can help remove barriers that prevent organizations from getting the full value from their data.”
Support for Apache Iceberg is available in preview, the company said, adding that support for Hudi and Delta Lake is coming soon. A specific timeline for preview and general availability has not been announced.
Google Cloud decided to support open-source table formats because their addition will enable transaction management capabilities in data lakes, said Matt Aslett, director of research at Ventana Research.
“More than half (57 percent) of data lake users use at least one of these emerging table formats today, which has the potential to increase the use of data lakes as a replacement for data warehousing environment, which supports analytics workloads based on structured data processing,” Aslett said.
However, recent research by Ventana Research’s Data Lakes Dynamics Insights indicated that less than a quarter of organizations have adopted a data lake to replace an existing data warehouse environment, and that data lake environments and data warehouses co-exist in nearly three-quarters of organizations.
“This works in favor of Google’s BigLake because it has the ability to address both data warehousing and data lake approaches with a single environment,” Aslett said.
Google’s addition of support for these open-source table formats appears to be a response to Snowflake and Databricks product updates, said Doug Henschen, principal analyst at Constellation Research.
“Apache Iceberg is the hot new option gaining traction because it promises openness as well as performance gains, but Google makes it clear it’s not choosing sides by promising support for Delta Lake and Hudi too,” Henschen said.
Google Cloud competitor Oracle may also announce similar features at its upcoming annual CloudWorld conference, said Tony Baer, principal analyst, dbInsight.
BigQuery supports unstructured data
As part of this Cloud Next announcements, Google Cloud also added new features to its managed enterprise data warehouse, BigQuery, including adding support for unstructured data.
“Starting today, data teams can analyze structured and unstructured data in BigQuery, with easy access to Google Cloud’s capabilities in machine learning (ML), speech recognition, computer vision, translation, and text processing, using the familiar SQL interface of BigQuery,” Kazmaier wrote .
Data teams in most businesses, according to Google, mostly use structured data, which makes up only 10 percent of all data produced. Structured data includes data from operational databases, software-as-a-service (SaaS) applications such as Abode, SAP, ServiceNow, Workday and semistructured data in the form of JSON log files.
Unstructured data, on the other hand, includes video from television archives, audio from call centers or radio and documents in various formats. Google Cloud argues that businesses face an increasing need to work with unstructured data.
Google Cloud’s move to add support for unstructured data is a differentiating capability for cloud service providers, analysts said.
No other rival cloud service provider is currently addressing the need to support unstructured data as aggressively as Google Cloud, Henschen said.
“Addressing all types of data on one platform promises to simplify things for CIOs, data scientists and developers,” added Henschen.
Other BigQuery updates on Cloud Next
Google Cloud also announced support for the open-source unified analytics engine Apache Spark. The move is in line with the company’s strategy to position its cloud service as a modern lakehouse that supports analytics, warehousing, and data science, analysts said.
The new integration, which is in private preview, will allow enterprise data teams to create BigQuery methods, using Apache Spark, that integrate with their SQL pipelines, the company said.
“By accepting Spark, Google is accepting the data scientist’s most popular choice,” Henschen said. “In contrast to Google, Snowflake is still early in its data science journey using Python and other languages by offering Snowpark on top of its database, and it relies heavily on partners for support.”
Another rival, Databricks, has also improved support for data warehouse and business intelligence (BI) workloads on its platform. Meanwhile, Google Cloud has also integrated its change stream service, called Datastream, with BigQuery.
“The new integration will help organizations more effectively replicate data from all types of sources – including real-time data in AlloyDB, PostgreSQL, MySQL and third-party databases such as Oracle – directly in BigQuery,” the company said in a blog post.
Additionally, Google Cloud has updated its data unifier service, DataPlex, to automate processes related to data quality.
“For example, users can more easily understand data lineage — where data came from and how it changed and moved over time — reducing the need for manual, time-consuming processes,” Kazmaier wrote in blog post.
Looker Studio integrates business intelligence products
In Cloud Nextthe vendor said it will unify its business intelligence products by combining Looker and Data Studio to form Looker Studio, which will be available in three options.
“Looker Studio currently supports more than 800 data sources with a catalog of more than 600 connectors, making it simple to explore data from different sources,” wrote Kate Wright, senior director of BI product management on Google Cloud, in a blog post.
Looker Studio, which will offer private preview access to data models currently, is also expected to get a new interface, the company said, adding that the base version of Looker Studio will be free.
Prior to the merging of products, Looker was a paid service and Data Studio was a free service. The free version, according to Aslett, is not expected to come with support. To get support and additional features, businesses need to update to the Pro version of Looker Studio.
“Customers upgrading to Looker Studio Pro will get new enterprise management features, team collaboration capabilities, and SLAs [service level agreements],” Wright said. “This is just the first release, and we’ve developed a roadmap of capabilities, starting with Dataplex integration for data lineage and metadata visibility, that our enterprise customers have been asking for.”
Other updates to Looker include support for visualization tools, such as Tableau and Microsoft Power BI, to access data, the company said.
Vertex AI Vision released
In an effort to help developers and data scientists build and deploy computer vision-based applications, Google has added a new feature called Vertex AI Vision to extend the capabilities of its machine learning platform Vertex AI .
The company is working to ease machine learning (ML) operations with the launch of the Vertex AI platform last year in May, followed by the introduction of the collaborative development environment Vertex AI Workbench in October.
“The new end-to-end application development environment will help you ingest, analyze, and store visual data,” the company said, claiming the new service could reduce time to create computer vision applications in weeks to hours and at one-tenth the cost of current offerings.
Google says it achieves these efficiencies by providing a slightly easier-to-use interface and a library of pre-trained machine learning models for common tasks like occupancy counting, product recognition, and identification of thing.
“It also provides the option to import your existing AutoML or custom ML models, from Vertex AI, into your Vertex AI Vision applications. As always, all of our new AI products also follow our AI Principles,” the company said.
Google CloudClouddata tags