Google prides itself on its open data cloud to unify information from every source

Google Cloud has ambitions to build what it says will be the most open, scalable and powerful data cloud of all, as part of its mission to ensure customers can use all their data, from any source, wherever it is or what format it is.

Google announced this “data cloud” vision today at Google Cloud Next 2022, where it introduced an avalanche of updates to its existing data services, as well as several new ones. The new updates are all designed to make this vision of an open and extensible data cloud a reality.

“Every company is a big-data company today,” Gerrit Kazmaier, vice president and general manager of Data Analytics at Google Cloud, told SiliconANGLE in an interview. “It’s a call for a data ecosystem. It’s going to be a fundamental basis of modern business.

One of the first steps in realizing that vision is to ensure that customers can actually use all of their data. To that end, Google’s data warehouse service BigQuery has gained the ability to analyze unstructured streaming data for the first time.

BigQuery can now ingest all types of data, regardless of its format or storage environment. Google says that’s important because most teams today can only work with structured data from operational databases and applications like ServiceNow, Salesforce, Workday and others.

But unstructured data, such as video from television archives, audio from call centers and radio, paper documents and so on account for more than 90% of all information available to organizations today. . This data, once left gathering dust, can now be analyzed in BigQuery and used to power services such as machine learning, speech recognition, translation, text processing and data analytics through the familiar Structured Query Language interface.

This is a big step but so far not the only one. To further its goals, Google says, it is adding support for major data formats such as Apache Iceberg, Delta Lake and Apache Hudi to its BigLake storage engine. “By supporting these widely adopted data formats, we can help remove barriers that prevent organizations from getting the full value from their data,” Kazmaier said. “With BigLake, you get the ability to manage data across multiple clouds. I’ll see you where you are.”

Meanwhile, BigQuery is getting a new integration with Apache Spark that will allow data scientists to significantly improve data processing times. Datastream is also being integrated with BigQuery, in a move that will allow customers to more effectively replicate data from sources such as AlloyDB, PostgreSQL, MySQL and other third-party databases such as Oracle.

To ensure that users have more confidence in their data, Google said, it is expanding the capabilities of its Dataplex service, giving it the ability to automate processes related to improving data quality and lineage. “For example, users can more easily understand data lineage – where the data came from and how it changed and moved over time – reducing the need for manual, time-consuming processes,” Kazmaier said.

Unified business intelligence

Making data more accessible is one thing, but customers also need to be able to access that data. To that end, Google said it will unify its portfolio of business intelligence tools under the Looker umbrella. Looker will be integrated with Data Studio and other key BI tools to simplify how people can get insights from their data.

As part of the integration, Data Studio has been rebranded as Looker Studio, helping customers go beyond looking at dashboards by embedding their workflows and applications with ready-made intelligence to aid data-driven decision-making, said Google. Looker, for example, will integrate with Google Workspace, providing easier access to insights from productivity tools such as Sheets.

In addition, Google said, it will make it simpler for customers to work with the BI tools of their choice. Looker already integrates with Tableau Software for example, and will soon do the same with Microsoft Power BI.

Enabling artificial intelligence

One of the most common use cases for data today is enabling AI services — an area where Google is a clear leader. Nor does it plan to relinquish that lead anytime soon. In an effort to make AI-based computer vision and image recognition more accessible, Google is launching a new service called Vertex AI Vision.

The service extends the capabilities of Vertex AI, providing an end-to-end application development environment for ingesting, analyzing and storing visual data. So users can stream video from manufacturing plants to create AI models that can improve safety, or otherwise capture video footage from store shelves to better manage product inventory, Google said.

“Vertex AI Vision can reduce the time to create computer vision applications from weeks to hours at a tenth of the cost of current offerings,” Kazmaier explained. “To achieve these efficiencies, Vertex AI Vision provides an easy-to-use, drag-and-drop interface and a library of pre-trained ML models for common tasks such as counting occupancy, identifying product and object discovery.”

For less technical users, Google is introducing more “AI agents,” tools that make it easy for anyone to apply AI models to common business tasks, making them accessible technology to almost anyone.

The new AI Agents include Translation Hub, which enables self-service document translation with support for an impressive 135 languages ​​at launch. Translation Hub integrates technologies like Google’s Neural Machine Translation and AutoML and works by ingesting and translating content from multiple document types, including Google Docs, Word documents, Slides and PDFs. Not only does it maintain exact layout and formatting, but it also has granular management controls including support for post-editing human-in-the-loop feedback and document review.

With Translation Hub, researchers can share important documents with their colleagues around the world, while product and service providers can reach underserved markets. Moreover, Google said, public sector administrators can reach more community members in their native language.

The second new AI agent is the Document AI Workbench, which facilitates the development of custom document parsers that can be trained to extract and summarize key information from large documents. “The Document AI Workbench can remove the barriers to building custom document parsers, helping organizations capture areas of interest specific to their business needs,” said June Yang, vice president of cloud AI and industry solutions.

Google also introduced the Document AI Warehouse, designed to remove the challenge of tagging and extracting data from documents.

Extended integrations

Finally, Google said it is expanding its integrations with some of the most popular enterprise data platforms to ensure that the information stored within them is also accessible to its customers.

Kazmaier explained that giving customers the flexibility to work with any data platform is critical to ensuring choice and avoiding data lock-in. With that in mind, he said, Google is committed to working with all major enterprise data platform providers, including the likes of Collibra NV, Databricks Inc., Elastic NV, FiveTran Inc., MongoDB Inc., Reltio Inc. . and Strimm Ltd. , to ensure that its tools work with their products.

David Meyer, senior vice president of product management at Databricks, told SiliconANGLE in an interview that the company has been working with Google for about two years on BigQuery supporting Databricks’ Delta Lake, following similar work at Amazon. Web Services Inc. and Microsoft Corp. Azure.

“Making it so you don’t have to move data out of your data lake reduces cost and complexity,” Meyer said. “We see this as an inflection point.” However, he added, this is just the beginning of work on Google Cloud, and the two companies will work on solving other challenges, such as joint management efforts.

Kazmaier said the company is also working with the 17 members of the Data Cloud Alliance to promote open standards and interoperability in the data industry. It also continues support for open-source database engines such as MongoDB, MySQL, PostgreSQL and Redis, as well as Google Cloud databases such as AlloyDB for PostgreSQL, Cloud Bigtable, Firestore and Cloud Spanner.

With reporting from Robert Hof

Images: Google

Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, ​​Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many other luminaries and experts.

#Google #prides #open #data #cloud #unify #information #source #Source Link #Google prides itself on its open data cloud to unify information from every source

Leave a Comment