AI工具

【科技解密】Spotify 最新推出生成式點歌神器，一句話讓 AI 變身你的貼身 DJ

你說「來點老搖滾」，AI 就能立刻點出對味神曲？Spotify 全新研究 Text2Tracks，不再靠猜歌名，而是直接用生成式 AI 精準「生出歌曲代碼」。不只懂語意，還能掌握你的音樂情緒地圖，推薦準度大幅超越傳統系統——真正讓 AI 成為你的貼身 DJ。

Cathy

27 Apr 2025 — 4 min read

解密 AI 音樂推薦新方法 Text2Tracks

你是否曾經對音樂 App 說：「來點老搖滾放鬆一下」，卻總收到風格跑調的播放清單？Spotify研究團隊最新發表論文中提出AI 系統Text2Tracks，只需一句話，AI 就能用生成式方法直接挑出最符合情緒的歌曲。不看歌名、只聽語氣，推薦更快更準，精準掌握你的聽歌風格！

為什麼傳統方法不夠聰明？

目前不少平台用大型語言模型（LLM）來處理這類語音或文字指令。但面臨的問題有：

歌名 ≠ 情境：歌名不像電影標題，不見得能反映曲風或情緒。
版本混亂：同一首歌可能有多個版本，容易選錯。
生成太慢：LLM 一次只能產生一個字元，組成完整歌名耗時又貴。

因此，Spotify 團隊提出 Text2Tracks：讓 AI 直接「產出歌曲代碼（ID）」來推薦音樂，跳過歌名比對這一步，更快、更準、更貼近你想聽的 vibe。

Text2Tracks 怎麼運作？

Text2Tracks 使用「生成式檢索（Generative Retrieval）」技術：

語意學習：先用大量播放清單資料訓練 LLM，讓它理解「文字描述」與「音樂風格」的關聯。
生成歌曲代碼（ID）：AI 不再產生歌名，而是直接產出音樂資料庫裡對應的 ID。
智慧選曲：透過「多樣化束搜尋（Diverse Beam Search）」策略，避免只推薦熱門單一曲風，而是平衡多樣與精準。

這代表，AI 不再「找歌名」，而是「找歌的靈魂」。

ID 的秘密：找對表示法，推薦才能準

研究中團隊比較了三種 ID 表示法：

歌名型 ID（Artist + Track Name）：傳統但資訊模糊
數字型 ID（Artist + 數字代碼）：好實作，但語意粗略
語意型 ID（Semantic IDs）：根據歌曲風格編碼，像是為音樂設計的「郵遞區號」

結果證實：用「語意型 ID」搭配「協同過濾向量」學習出來的模型，準確率高出競爭對手 127%。甚至比用歌名還好，這顛覆了目前 LLM 主流做法。

對產業的啟示？

Text2Tracks 展示出未來生成式 AI 如何重新定義「用一句話找一首好歌」的方式。不只是更快，而是更懂你。

Spotify 表示未來將持續拓展這套技術，應用於各種推薦場景，例如自動產生情境清單、聲音摘要搜尋，甚至打造多模態的音樂互動體驗。

Text2Tracks: Prompt-Based Music Recommendations

Your Words, Your Vibe—How Spotify's AI Knows What You Want to Hear

Ever said, “Play some chill rock from the 80s,” and got a playlist that didn’t match the vibe? Spotify’s new research, Text2Tracks, tackles this issue by training AI to understand not just the words you say, but the intention behind them.

Why Existing Methods Don’t Cut It

Typical systems rely on LLMs to generate artist names or track titles in response to prompts. But:

Track names ≠ meaning: Song titles often don’t reflect genre or emotion.
Version confusion: Multiple song versions make selection tricky.
Generation lag: LLMs create text one token at a time, making full names slow to compute.

So, Spotify proposed a smarter route: don’t generate song names—generate track IDs. This is faster, cleaner, and closer to your actual vibe.

How Text2Tracks Works

The system uses a Generative Retrieval model to directly generate song identifiers (IDs) from user prompts:

Training: An LLM is fine-tuned on playlists, learning how phrases like “mellow beach mood” connect to actual songs.
Generation: At runtime, Text2Tracks creates track IDs (not song titles), streamlining recommendation.
Diversified Beam Search: Ensures the recommendations are varied yet relevant.

It’s like GPS for your ears—less guessing, more groove.

The Role of Track IDs: Precision Matters

Text2Tracks evaluated three ID strategies:

Artist + Title IDs: Intuitive but noisy
Artist + Integer IDs: Clean but lacks nuance
Semantic IDs: Learn song “coordinates” in a style-based vector space

The winner? Semantic IDs from collaborative filtering—they outperformed standard methods by over 120%, proving that structured, learned representations are key to musical accuracy.

What’s Next?

This model could revolutionize how people discover music. With faster, more meaningful recommendations, Text2Tracks sets the stage for:

Personalized ambient playlists
Music assistants that truly “get” your mood
Next-gen voice-to-music interactions

In short, it’s not about hitting the charts—it’s about hitting the mark. And this AI? It’s learning to tune into you.

source: Text2Tracks: Prompt-based Music Recommendation via Generative Retrieval

image by Spotify