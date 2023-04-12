Left Menu

Databricks releases free data for training AI models for commercial use

Databricks, a San Francisco-based startup last valued at $38 billion, released a trove of data on Wednesday that it says businesses and researchers can use to train chatbots similar to ChatGPT. The data, based on questionnaires of employees of Databricks, fills in an important gap in the company's efforts to create commercially usable tools to train AI systems that could offer alternatives to Microsoft-backed OpenAI.

Reuters | Updated: 12-04-2023 18:57 IST | Created: 12-04-2023 18:30 IST
Databricks releases free data for training AI models for commercial use
Representative image Image Credit: ANI

Databricks, a San Francisco-based startup last valued at $38 billion, released a trove of data on Wednesday that it says businesses and researchers can use to train chatbots similar to ChatGPT.

The data, based on questionnaires of employees of Databricks, fills in an important gap in the company's efforts to create commercially usable tools to train AI systems that could offer alternatives to Microsoft-backed OpenAI. Databricks said it spent the past several weeks gathering 15,000 questions and responses from its 5,000 employees in 40 countries and then vetted the data for quality, an effort Chief Executive Ali Ghodsi estimated cost the company millions of dollars.

Databricks sells software tools for building AI systems. Ghodsi told Reuters that the company is releasing the free training data in the hope that other companies will use it to make their own AI systems, possibly using Databricks to do so.

The free dataset came after Databricks last month released Dolly, an open source large language model, the technological basis for chatbots. But it could not be used in commercial products because the data used to train the model was generated by OpenAI's ChatGPT, whose terms of service forbid using its data to develop commercial AI systems that could compete with OpenAI. Using data generated by AI to train other AI systems has become common. New chatbots published by Stanford University and University of California Berkeley this year, for example, used such machine-generated data from ChatGPT, but both made clear that their models could not be used for commercial purposes.

Ghodsi acknowledges the dataset is far from perfect because it consists of only the Databricks' employee base, which he said skews male. Users will be able to examine the training data themselves, which they cannot do for models such as ChatGPT or Alphabet Inc's Bard, whose training data wasn't released. "We're not claiming that this is an unbiased dataset," Ghodsi said. "We're just trying to push the community to go in this direction of more transparency, and more of everyone owning their own models instead of just a few that we have to trust."

(This story has not been edited by Devdiscourse staff and is auto-generated from a syndicated feed.)

TRENDING

1
(Update: Deferred) SpaceX all set to launch its seventh dedicated smallsat rideshare mission

(Update: Deferred) SpaceX all set to launch its seventh dedicated smallsat r...

 Global
2
Health News Roundup: Moderna fends off Arbutus appeal in COVID-19 vaccine patent fight; California county starts monitoring wastewater for illicit drugs and more

Health News Roundup: Moderna fends off Arbutus appeal in COVID-19 vaccine pa...

 Global
3
Moderna says flu shot misses early success bar, but expects 2024 revenue

Moderna says flu shot misses early success bar, but expects 2024 revenue

Global
4
Foldable iPhone in 2028: AI predicts the future of iPhones

Foldable iPhone in 2028: AI predicts the future of iPhones

 India

DevShots

Latest News

OPINION / BLOG / INTERVIEW

The Rise of India's Frugality: How it Surpassed the UK and What It Means for the Future

Breaking Down Barriers: How AI is Making Medical Care More Personalized Than Ever

The Unprepared World: How AI is Changing Everything

The Power of IoT in Healthcare: Enhancing Patient Care with Automated Monitoring

Connect us on

LinkedIn Quora Youtube RSS
Give Feedback
Subscribe to our Newsletter  

SECTORS

EDITIONS

OTHER LINKS

OTHER PRODUCTS

CONNECT

Devdiscourse

Email: info@devdiscourse.com
Phone: +91-720-6444012, +91-7027739813, 14, 15

VisionRI | Disclaimer | Terms of use | Privacy Policy

© Copyright 2023