You may see the notation ‘Dataset Earnings’ on line items in the ‘License’ section of your Financials dashboard. These earnings apply to licensing activity for datasets, a new product offering that is aimed to support emerging technology companies looking for metadata associated with creative assets. Read below for a summary of what the program is all about.
Please note that as of April 2023, Pond5 contributors may opt-out of Dataset Earnings. See the section ‘Can I opt out of having my content included in future datasets?’ for more information about this feature.
What are datasets?
Datasets are a product offering developed to support companies building computer vision and Large Language Models (“LLM’s”). Datasets are sets of content and metadata organized by a specific theme or topic that can include images (including photos, illustrations, and vectors), videos, music, sound effects, and 3D models. Datasets are comprised of various metadata, including keywords, titles, and descriptions.
Datasets provide a potential new source of revenue by expanding our reach to new customers and industries, including AI researchers, and leading technology developers and manufacturers.
Some of the datasets Pond5 licenses are in collaboration with Shutterstock and include both Pond5 and Shutterstock content. You can find supporting information on Shutterstock datasets, also known as data deals, here.
What is computer vision?
Computer vision is a scientific discipline that seeks to develop techniques to help computers “see” and understand the context of digital images such as photographs and videos. A model is the engine that governs the behavior of the computer vision system. Researchers train machine learning models to identify visual objects within imagery and to improve computer assisted labeling techniques.
What are datasets used for?
In general, companies use content datasets to power computer vision applications such as:
- Visual search: People can easily search images on their smartphone library by entering a keyword like “cat” or “sunset” to find all relevant photos.
- Autonomous vehicles: Self-driving cars can operate safely by understanding their specific surroundings — including other cars, people, roads, stop signs, and more.
- Content moderation: Social media companies can rapidly identify, review and remove content that is violent or extreme in nature.
- Product categorization: eCommerce and retail companies can recommend relevant products to their customers.
- AI content generation: AI platforms can train systems to automatically generate new images based on text prompts.
The goal is to help companies easily build, train, and automate their object recognition models to improve their technology and better serve the needs of users.
How is my content used? What kind of license is provided for datasets?
Each dataset consists of content based on the customer’s specific model training needs and a limited license that covers usage only within the scope of training machine learning technology. Companies purchasing datasets (content and metadata) may only use them to train machine learning and computer vision models. The use of content for commercial or public applications such as marketing, advertising, etc., is strictly prohibited, and companies are required to have appropriate security measures in place to ensure there is no unauthorized distribution of content.
What type of content is included in these datasets?
All media types on Pond5 are eligible, however, the content used in datasets varies depending on the client’s need.
What type of metadata is included with datasets?
The specific metadata provided within datasets varies based on the needs of the partner and can include a combination of information provided by contributors or sourced by Pond5 assisted technology.
Contributor-provided metadata may include the asset description, title, and keywords, describing objects depicted in the assets. In certain cases, metadata may include some geolocation information provided by contributors. Broad demographic information about the models featured in photographs, including age, gender, and race/ethnicity, may also be included in metadata labels.
Is any personal data being shared with partners?
No. The metadata included in datasets is anonymized for any personal information and only includes descriptive information about visual or audio assets. Model releases are never shared, and the identity of models and contributors is not disclosed.
Artist Earnings from Datasets
This is a new form of earnings for contributors beyond normal marketplace earnings, and an exciting opportunity to expand into emerging markets. We are firmly committed to including our contributors as partners on this journey, and ensuring they receive a share of the proceeds from computer vision datasets.
How do I know I’ve made a Dataset Earnings sale?
Dataset Earnings will be noted on the Financials section of the artist dashboard only if your content is utilized in one of the datasets. It will appear as its own line item, separate from other sales you receive through the Pond5 marketplace. The line item will read ‘Dataset Earnings’ under the ‘License’ section.

You will also receive an email notifying you of a Dataset Earnings payment if you earn $25 or more during a given period.
How are artist commissions calculated for Dataset Earnings?
Each partnership is different and will have varying pricing models dependent on content and usage. Artists will receive their usual royalty rate based on the Net License Revenue received by Pond5. Note that the license prices will be lower for these types of sales than regular marketplace purchases, given the highly restricted and special purpose only rights being conveyed.
Can I opt out of having my content included in future datasets?
Yes. As AI and other emerging technologies continue to evolve, we believe that Dataset Earnings will provide an exciting opportunity for additional revenue for our contributors. However, we understand if you do not wish to include your content in these deals and, as of April 2023, we have added an option in the contributor account settings (under Preferences) that allows you to opt out of future datasets. You may see Dataset Earnings in your dashboard connected to deals that have been closed prior to opting out.

AI-generated Content on Pond5
You can learn more about AI-generated content in our marketplace by viewing our Legal Guidelines, and read about the future of AI for Pond5 & Shutterstock.
Close