With the overwhelming growth of language-based AI applications, many individuals are becoming concerned about where the data and information were obtained and whether the Ai companies had the rights to it.
In order for language-based AI to operate effectively the application must dig through the internet and develop a greater understanding of how humans speak and write. Much of this data is obtained by analyzing millions of human-created posts. Reddit is an enormous source of information for companies like OpenAI as it has been the single most valuable source of data in providing AIs with the information necessary to develop more natural-sounding text.
This then begs the question, do OpenAI and other AI companies have the rights to this data, and if they don't, who does?
Reddit and Twitter have already begun charging AI companies for access to their data but in a paper released by Jesse Dodge, a researcher for the non-profit Allen Institute for AI, found that both Google and Facebook have taken information from countless copyrighted Wikipedia pages and news articles to develop large scale databases commonly used in creating natural language for their AI. This has resulted in numerous cases of individual-scale intellectual property theft.
So far, OpenAI and other AI tech companies have refrained from commenting on the subject.
Source: Wall Street Journal