Disclosure: The views and opinions expressed here are solely those of the author and do not necessarily represent the views and opinions of crypto.news editorial.
Elon Musk filed a lawsuit against OpenAI, claiming that it deviated from its mission of developing artificial intelligence ‘for the benefit of humanity’. Carlos E. Perez suspects the lawsuit could turn the current Generative AI market leader into the next WeWork.
OpenAI’s for-profit conversion is the focus of this legal battle. But overemphasis on profit betrays vested corporate interests. It also distracts from more critical issues for end users (ethical AI training and data management).
Grok, the brainchild of Elon and a competitor to ChatGPT, can access ‘real-time information’ from tweets. OpenAI is already notorious for scraping copyrighted data left, right and centre. Now Google has signed a $60 million deal to access Reddit users’ data to train Gemini and Cloud AI.
Simply pushing for open source does not serve the user’s interests in this environment. They need ways to provide meaningful consent and compensation to assist with graduate education. For example, emerging platforms that create tools for crowdsourcing AI training data are critical in this regard. More on this later.
It is a mostly non-profit organization for users
More than 5.3 billion people worldwide use the internet, and approximately 93% of them use centralized social media. Therefore, it is likely that most of the 147 billion terabytes of data generated online in 2023 will be created by users. The volume is expected to exceed 180 billion by 2025.
While this massive data set, or ‘public information’, is accelerating the training and development of AI, users are missing out on the benefits of most pieces. They have neither control nor real ownership. The “I Agree” form of consent is also not meaningful; At best it is a deception, at worst it is coercion.
Data is the new oil. It is not in Big Tech’s interest to give end users more control over their data. First, making users pay for data would significantly increase LLM education costs, which are already over $100 million. But as Chris Dixon argues in his book “Read, Write, Own,” five big companies controlling and potentially ruining everything are the fast track to dystopia.
However, given the evolution of blockchains as a distributed data layer and source of truth, the best era for users has just begun. Most importantly, unlike large corporations, new-age AI companies are embracing such alternatives for better performance, cost efficiency and ultimately the betterment of humanity.
Crowdsourcing data for ethical AI training
Web2’s read-write-trust model relies on entities and stakeholders not being evil. But human greed knows no bounds; According to the 18th-century philosopher David Hume, we are all a bunch of ‘self-interested rogues’.
That’s why Web3’s read-write-self model requires blockchain, cryptography, etc. so that distributed network participants can’t go bad. uses. Chris explores this idea extensively in his book.
The Web3 technology stack is fundamentally community-driven and user-focused. Providing a set of tools that will give users control over their own data (financial, social, creative, etc.) is the mainstay in this field. Blockchains, for example, serve as distributed, verifiable data layers to process transactions and immutably determine origin.
Moreover, valid privacy and security mechanisms such as zero-knowledge proofs (zkProofs) or multi-party computation (MPC) have evolved over the past few years. They open new avenues for data verification, sharing and management by allowing counterparties to establish facts without revealing the content.
These broad capabilities are highly relevant from the perspective of AI education. It is now possible to obtain reliable data without relying on central providers or validators. But most importantly, web3’s decentralized, intermediary-free nature helps directly connect those generating data (i.e. users) and projects that need it to train AI models.
Removing ‘trusted agents’ and gate controllers significantly reduces costs. It also regulates incentives so that projects can compensate users for their efforts and contributions. For example, users can save scripts in their local dialect, recognize and tag objects, sort and categorize images, structure unstructured data, etc. They can earn crypto money by completing micro tasks such as.
On the other hand, companies can build more accurate models using high-quality data validated by people in the loop and at a fair price. It’s a win-win situation.
Not just open source, but bottom-up advances
Traditional frameworks are so hostile to individuals and user communities that open source alone means nothing. Providing ethical AI education requires radical changes to existing business models and educational frameworks.
Replacing top-down systems with a bottom-up, bottom-up approach is the way to go. It is also about establishing a meritocratic order that maintains a high level of ownership, autonomy and cooperation. The most profitable thing in this world is not maximization but fair distribution.
Interestingly, these systems will benefit large companies as well as empowering small businesses and individual users. Because after all, high-quality data, affordable prices and accurate artificial intelligence models are what everyone needs.
Now, with incentives aligned, it is in the collective interest of the industry to adopt and embrace new-age models. Holding on to narrow, short-term gains will not work in the long run. The future has different demands from the past.
William Simonin
William Simonin is president of Ta-da, an AI data marketplace that leverages blockchain to gamify data verification. He previously worked as a software engineer and researcher at the French Ministry of Defense for nearly six years and served as President of the Epitech Nancy Security Association and later as Professor of Functional Programming. He is a French entrepreneur and co-founder of several artificial intelligence, technology and cryptocurrency companies.