The company also published information about the training methodology for its Granite model series, saying that its pretraining dataset selectively blocks “websites known to disseminate pirated information,” including the controversial Books3 dataset which authors claim contains hundreds of thousands of pirated books.
The initiative underscores the AI industry’s growing recognition of the potential legal risks that come with obtaining large volumes of training data—much of which ...
Learn more about Bloomberg Law or Log In to keep reading:
See Breaking News in Context
Bloomberg Law provides trusted coverage of current events enhanced with legal analysis.
Already a subscriber?
Log in to keep reading or access research tools and resources.
