PodcastsLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0Making Transformers Sing - with Mikey Shulman of Suno

Making Transformers Sing - with Mikey Shulman of Suno
Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0Thu Mar 14 2024
Music Generation with Suno's Mikey Shulman:
- Mikey Shulman, co-founder of Suno, spearheaded the development of Bark, an open-source transformer-based model that quickly gained popularity for its text-to-audio generation capabilities.
- The model leverages transformers to forecast and create audio continuously by predicting upcoming bits based on previous ones.
- Suno primarily focuses on music generation rather than speech due to the team's passion for music and belief in its profound human connection.
- Users interact with Suno through two modes: a simple mode for quick creation based on descriptions, and an expert mode allowing detailed customization like setting lyrics, rhythm, and song title. More than half of users prefer the expert mode, indicating a desire for deeper involvement in music creation.
Challenges and Considerations in Music AI Generation:
- Developing models for music generation involves considerations such as sound fidelity, pitch accuracy, rhythm coherence, melody quality, and user controllability.
- User-generated inventive prompts influence how well the model responds to requests and impacts the generated output.
- Exploring various musical styles like house or blues demonstrates the flexibility of AI-generated music across genres.
- While professionals may find utility in using AI-generated music for inspiration or sample generation, Suno primarily targets broadening access to music creation for general users.
Ethical and Legal Aspects of Music Generation:
- Suno prohibits users from inputting existing copyrighted lyrics or songs into the model due to legal constraints around ownership rights.
- The company emphasizes encouraging originality and creativity in music composition over remixing existing songs without proper authorization.
- Future developments may explore ways to enable users legally to work with existing songs while respecting copyright laws.
Suno's Evolution from Text-to-Audio to Music Generation:
- Suno initially gained fame with Bark, a transformer-based "text-to-audio" model.
- Despite suggestions to stick to speech-related projects, the founders shifted towards music generation due to their passion for music.
- The company is focused on enhancing models' controllability and audio fidelity continually.
Expanding Music Experiences Beyond Text-to-Music:
- Suno is committed to providing diverse musical experiences beyond text-to-music interactions.
- The emphasis lies on engaging users actively in music creation through multiplayer modes and collaborative concert concepts.
- By promoting active participation in music rather than passive consumption, Suno aims to boost engagement within the music industry.
Future of Collaborative Concerts and Continuous Music Experiences:
- Suno envisions upcoming collaborative concerts involving audience interaction without traditional artists or with significant audience input.
- Plans include creating self-contained experiences where users can customize their own models based on personal preferences.
- These initiatives aim to personalize music experiences, making them more interactive and community-driven.
Diversification of AI Applications in Music Production:
- In the broader landscape of audio generation, categories typically cover music, speech, and sound effects applications.
- Different areas within music generation include stock music for various uses, AI cover art production, new song creation using AI tools, and AI plugins for professional musicians.
Challenges with Objective Benchmarks in Audio Generation:
- While benchmarks are crucial for objectively measuring performance, they may not capture all essential aspects required for evaluating audio quality effectively.
- Aesthetics play a significant role in assessing audio outputs since human perception influences emotional impact evaluation.
Incorporating Social Science Principles in Machine Learning Engineering:
- Drawing parallels between economics principles like Goodhart's Law and natural experiments with machine learning engineering practices can offer valuable insights.