The Machine Learning Powering Generative Art NFTs

Generative art has been one of the quintessential machine-learning use cases, but only recently has the space achieved mainstream prominence.

AccessTimeIconNov 21, 2022 at 4:44 p.m. UTC
Updated Nov 21, 2022 at 6:27 p.m. UTC
Layer 2
10 Years of Decentralizing the Future
May 29-31, 2024 - Austin, TexasThe biggest and most established global event for everything crypto, blockchain and Web3.Register Now

Artificial intelligence (AI) in the non-fungible token (NFT) space is becoming increasingly relevant. Generative art (that is, art that has been created by an autonomous system) has quickly emerged into one of the main categories of the NFT market, driving innovative projects and astonishing collections. From the works of AI art legends such as Refik Anadol or Sofia Crespo to Tyler Hobbs’s new QQL project, NFTs have become one of the main vehicles to access AI-powered art.

Generative art has been one of the quintessential machine-learning use cases, but only recently has the space achieved mainstream prominence. The leap has been mostly powered by computational gains and a new generation of techniques that can help models learn without requiring a lot of labeled datasets, which are incredibly limited and expensive to build. Even though the gap between the generative art community and AI research has been closing in the last few years, many of the new generative art techniques still haven’t been widely adopted by prominent artists, as it takes a while to experiment with these new methods.

Jesus Rodriguez is the CEO of IntoTheBlock.

The generative art catalyzers

The rise of generative AI has come as a surprise even to many of the early AI pioneers who mostly saw this discipline as a relatively obscure area of machine learning. The impressive progress in generative AI can be traced back to three main factors:

  1. Multimodal AI: In the last five years, we have seen an explosion of AI methods that can operate across different domains such as language, image, video or sound. This has enabled the creation of models like DALL-E or Stable Diffusion, which generate images or videos from natural language.
  2. Pretrained language models: The emergence of multimodal AI has been accompanied by remarkable progress in language models with methods like GPT-3. This has enabled the use of language as an input mechanism to produce artistic outputs such as images, sounds or videos. Language has played a paramount role in this new phase of generative AI as it has lowered the barrier for people to interact with generative AI models.
  3. Diffusion methods: Most of the photo-realistic art produced by AI methods that we see today is based on a technique called diffusion models. Prior to diffusion models coming onto the scene, the generative AI space was dominated by methods such as generative adversarial networks (GAN) or variational auto-encoders (VAE), which have trouble scaling and suffer from lack of diversity of generated outputs. Diffusion models address those limitations by following an unconventional approach of destroying the training data images until they are complete noise and reconstructing them back. The reasoning is that if a model is able to reconstruct an image from something that is, theoretically, noise, then it should be able to do it from pretty much any representation, including other domains like language. Not surprisingly, diffusion methods have become the foundation of text-to-image generation models like DALL-E and Stable Diffusion.

The influence of these methods in generative art has coincided with the emergence of another technology trend: NFTs, which have unlocked incredibly important capabilities for digital art such as digital ownership, programmable incentives and more democratized distribution models.

The methods powering generative art in NFTs

Text to image: Text-to-image (TTI) synthesis has been the most popular area of generative AI within the NFT community. The TTI space has produced some AI models that are literally transcending into pop culture. OpenAI’s DALL-E has arguably become the best-known example of TTI used to generate artistic images. GLIDE is another TTI model created by OpenAI, which has been adopted in many generative art settings. Google has been dabbling into the generative art space, experimenting with different approaches such as Imagen, which is based on diffusion models, or Parti, which is based on a different technique called autoregressive models. Meta has also been cultivating the generative art community with models like Make-A-Scene. AI startups are making inroads in the TTI space as well with models like Midjourney gaining a vibrant community via its Discord distribution and Stability AI shocking the AI community by open sourcing Stable Diffusion.

From an NFT perspective, TTI models have seen the widest adoption because a disproportionate percentage of digital art collectibles today are represented as static images.

Text-to-video: Text-to-video(TTV) is a more challenging aspect of generative art but one in which we are seeing major progress. Meta and Google recently published TTV models such as Make-A-Video and Imagen Video, which can generate high-frame-fidelity video clips based on natural language. Video is one of the most active areas of research for generative art, and we should expect most image generation models to have video equivalents. Videos are still not as prominent in the NFT space as images, but this is likely to change as TTV models become more widely adopted by generative artists. Video is one of the areas that differentiates digital art from traditional art.

Image-to-image: Image generation via textual inputs feels almost natural but has limitations when it comes to capturing aspects such as positions between different objects, orientation or even very specific details of scenery. Sketches or other images are a better mechanism to convey this information. Several of the top diffusion models such as DALL-E, Stable Diffusion and Imagen all incorporate mechanisms for generating images from sketchers. Similarly, these models incorporate techniques such as in-painting or out-painting, which allow for extending images within or beyond their original borders.

Most of the best-established generative art practices focus on creating images from other images. Not surprisingly, several popular generative art NFT collections are based on variations of image-to-image methods.

Music generation: Automatic music generation has been another common use case in generative AI that has gained prominence over the last few years. OpenAI has also been at the forefront of this revolution with models including MuseNet and, more prominently, Jukebox, which is able to generate music in various styles and genres. Recently, Google entered the space with AudioLM, a model that creates realistic speech and piano music simply by listening to sound snippets. Stability AI-backed Harmonai started pushing the boundaries of the AI music generation space with the release of Dance Diffusion, a set of algorithms and tools that can generate original clips of music.

AI-generated music is one of the biggest areas in which NFTs can deliver unique value. Different from other art forms, music is distributed in digital form. Generative AI can evolve into a natural complement for music producers, and NFTs offer creators unique ways to express ownership in music clips or songs.

An enviable match: NFTs and generative art

Throughout the history of technology there have been several instances in which relatively different trends are able to influence each other to gain incredible market share. The most recent example is the social-mobile-cloud revolution in which each one of those trends expanded the market of the other two. Generative AI and NFTs are starting to exhibit a similar dynamic. Both trends have been able to bring a complex technology market to mainstream culture. NFTs complement generative AI with digital ownership and distribution models that would be nearly impossible to implement otherwise. Similarly, generative AI is likely to become one of the most important sources of NFT creation.


Please note that our privacy policy, terms of use, cookies, and do not sell my personal information has been updated.

CoinDesk is an award-winning media outlet that covers the cryptocurrency industry. Its journalists abide by a strict set of editorial policies. In November 2023, CoinDesk was acquired by the Bullish group, owner of Bullish, a regulated, digital assets exchange. The Bullish group is majority-owned by; both companies have interests in a variety of blockchain and digital asset businesses and significant holdings of digital assets, including bitcoin. CoinDesk operates as an independent subsidiary with an editorial committee to protect journalistic independence. CoinDesk employees, including journalists, may receive options in the Bullish group as part of their compensation.

Jesus Rodriguez

Jesus Rodriguez is the CEO and co-founder of IntoTheBlock, a platform focused on enabling market intelligence and institutional DeFi solutions for crypto markets. He is also the co-founder and President of Faktory, a generative AI platform for business and consumer apps.

Learn more about Consensus 2024, CoinDesk's longest-running and most influential event that brings together all sides of crypto, blockchain and Web3. Head to to register and buy your pass now.

Read more about