Industry Partner Feature Article
How the Difference between AI-boosted Search and Gen AI Will Shape our Work and Creativity over the Near Term
Jack W. Plunkett, CEO, Plunkett Research, Ltd.
AIIP Industry Silver Partner – Learn more about the Industry Partners Program here.
August 26, 2024
What’s behind the massive licensing deals that are suddenly popping up between major publishers, such as News Corp., and the largest generative AI firms, such as OpenAI? (The News Corp. OpenAI deal is said to be worth about $250 million to News Corp. for certain content rights over five years. Similar recent deals include those with Associated Press, Conde’ Nast, LeMonde and Dotdash Meredith.)
Understanding these developments can help you understand the future of both Gen AI and AI-based search tools, and how these booming technologies may assist, impede or contour your own work if you are a researcher, writer, analyst or publisher.
First, I believe an understanding of two nuances in AI’s use of content is vital. I’ll keep this simple:
- As you know by now, Gen AI is based on large language models (LLMs). This means that the technology can “generate” summaries, articles, responses, blog posts, etc., based on the text (written by others) that has been ingested into the LLMs. (The larger the language model, the better, and the more well-crafted the question, the better.) You might reasonably say that a brilliantly designed Gen AI system can write original materials based on background text that it has studied beforehand—in a somewhat human manner. Eventually, when meticulously engineered, and when trained on an extremely wide variety and depth of content, Gen AI systems may not need to directly plagiarize content, but instead will write very original responses. This is why the minimum investment needed to establish a LLM is considered to be $100 million plus, and I feel that even that amount is being found to be woefully inadequate by underfunded startups.
- AI can also be utilized to display (not create) full-text answers to search queries. This is not “generative,” it is AI-assisted search. Despite all of the buzz we hear about Gen AI, this amped-up search is one of the most powerful capabilities of AI when dealing with the written word, and I believe it will disrupt and redefine the way we all use search over the very near future.
Authors and publishers may or may not have significant rights that would enable them to stop Gen AI platforms from ingesting their works for the purpose of training their LLMs. Intellectual property attorneys and the courts of law in which they operate are in for many, many years of sorting this out. I am reminded of the uproar ignited when Google Books began scanning huge numbers of volumes with the intent of enabling Google-based search of the books’ contents. Plunkett Research happily participated initially, submitting certain of our Plunkett’s industry Almanacs in ebook format. However, before long we decided this was not a good business practice for us, as extensive and important segments of text were being displayed for free, negating the need for readers to access the books through normal commercial channels.
On the other hand, content owners may have existing rights to control the extent (beyond fair use) to which words and images from their publications can be directly quoted and displayed in search results. A desire by search companies to directly quote news, in-depth articles and up-to-the-minute images in their platforms is specifically driving the big checks that are being written to publishers. OpenAI is testing a beta of a SearchGPT tool, exactly to power the next generation of search results.
Today, we remain in Wild West-like days of competition in AI platforms and related law. Authors and publishers may add “No AI Training Without a License” notices to their works, which we now do with Plunkett’s Industry Almanacs. This may be of little force and effect. Also, the robots.txt section of a website’s HTML can hold similar restrictions.
On the other hand, many website owners may want to encourage AI platform referrals (hopefully with links) to their content, in which case they can design their pages with layers of subheads (e.g., H2 and H3 segments in HTML) that help guide AI software to a rapid understanding of the category of the blocks of text that are displayed. Meanwhile, not surprisingly, at least a few web-based services have sprung up to act as brokers between publishers (large and small) and AI companies. Their services may include model licenses and assistance in building API connections to publishers’ data—behind paywalls. Such companies include Tollbit and ScalePost. Getty, owners of iStock and other digital image platforms, has taken things further by entering into multiple major contracts enabling Gen AI companies to train on the photos, videos and art that are contained in Getty platforms—thus enabling participating photographers and artists who created the images to be paid automatic (and modest) royalties for AI usage. Buckle your seatbelt: OpenAI and its competitors are moving with blinding speed in launching disruptive AI tools.
Jack Plunkett is CEO of Plunkett Research, Ltd., a Houston-based provider of market research and industry analysis. Plunkett’s client list includes 10,000 leading corporations, universities and government agencies worldwide. Plunkett’s data is distributed electronically through subscriptions to its website and around the globe by major booksellers and news distributors, including Bloomberg and ThomsonReuters. He is the author or editor of more than 30 books, including the gold medal-winning The Next Boom. Plunkett is frequently interviewed as an expert source by publications such as Time magazine, The Wall Street Journal, and Investor’s Business Daily, media outlets such as NPR’s Marketplace, and ABC News. He was a finalist in the Entrepreneur of the Year Awards sponsored by Ernst & Young.
Used by permission, Copyright © 2024, Plunkett Research, Ltd., All Rights Reserved