Neural Network Udio to Generate a Song
Technology

Neural Network Udio to Generate a Song

25 min read

Music has always been an expression of soul and emotions, and with the development of technology, the process of its creation has become much easier. With the advent of neural networks such as Udio , anyone can become a composer, regardless of professional skills. 

In this article, we’ll look at how Udio can help you create tracks quickly and easily, opening up new horizons for musical creativity. Step by step, we’ll show you how the neural network turns ideas into full-fledged works.

What’s under the hood of Udio and what are the benefits of the platform?

Udio builds on previous platforms like MusicLM and Meta*’s AudioGen . MusicLM developed a model that could generate audio tracks based on text descriptions, recognizing complex musical structures and patterns. AudioGen, meanwhile, focused on generating sound sequences and effects, also using tokens to convert audio into digital format. These projects showed that music, like text, could be “broken down” into individual elements that algorithms could analyze and use to create new tracks.

Udio has taken the best of these platforms and made them available to a wider audience, allowing you to create music not just using ready-made templates, but with the ability to customize and change in real time.

The idea is to decompose musical data into sequences of discrete tokens. Think of it as words or notes that make up musical speech. The neural network learns to predict the next “token” by analyzing the previous ones. With the addition of a text description (for example, “create a rock track” or “compose a classical melody”), Udio generates full-fledged musical compositions.

First, the system generates the basis of the track — musical phrases and rhythmic pattern. Then, Udio expands the melody, adds variations of instruments and dynamics so that the composition becomes more complete and rich.

It is worth noting that music generation is a linear process that unfolds in real time. This makes the task of generating music especially difficult: you need to not only “draw” individual sounds, but also correctly fit them into a sequence, where each moment affects the next. To cope with this task, Udio uses a multi-level Transformer architecture that helps analyze and predict time dependencies, creating a composition that is logical, coherent and pleasant to the ear.

Also, the ability to expand tracks and create remixes has been added, allowing the user to not only create a melody from scratch, but also vary it by adding new elements or changing existing ones.

How to get started with Udio?

To start working in the Udio neural network, you need to register. Users from different countries, including Russia, can create an account via Google, Apple or email. Let’s go to Udio .

To register, click the “Start a Free Trial” button at the top of the screen, review the free plan, and click “Sign Up .” You can also register through platforms like X (formerly Twitter) and Discord .

Register, come up with a nickname, and go to the main interface. It is intuitive, but there is one nuance - it is entirely in English. We will analyze it in more detail later.

Important: Immediately after registration you will be given 108 credits - this is Udio’s internal currency, which can be used to pay for creating tracks or additional chips. Their number can be seen above the “Updates” item

One generation costs 2 credits, so the free 108 should be enough for dozens of attempts. In the udio-130 model mode, you can create a composition up to 2 minutes long, spending 4 credits. This mode is best used if you have a clear understanding of the result and an active subscription with access to the “Inpaint” function , which allows you to edit individual parts of the track. Otherwise, it will be impossible to change individual fragments.

Let’s now take a look at Udio’s interface:

Left sidebar:

Home — return to the main page

Create — create a new track (current page)

Library - access to your library of created tracks

Following - shows who you are following (other users or sources)

Updates — platform updates and news

Central panel:

The top field “Describe Your Song” is a text field where you describe the track you want to create. Here you can specify the theme or mood of the future composition

Music genre selection buttons (tags) - just below the text field there is a row of buttons with the names of music genres

The “Reset” button resets all selected genre settings to their original settings.

The “Upload Audio” button allows you to upload your own audio file for further editing or use in a composition

The “Write Your Lyrics” block - here you can choose what to do with the song lyrics:

 • Auto-Generate — automatically generate lyrics for a song

 • Custom - create or upload your own lyrics manually

 • Instrumental - if you don’t need lyrics, you can choose an instrumental track

The Create button is a large pink button that starts the process of creating a track based on all the entered data and settings. 

Advanced Controls - a button that, when pressed, reveals additional parameters for more detailed track customization

Sidebar on the right:

This displays the tracks you’ve already created or added to your library.

The “Liked” button filters the tracks by those you liked

The “Published” button shows published tracks

The “Date Created” button allows you to sort tracks by creation date

The “Newest First” button sorts the tracks from newest to oldest

Go to Advanced Controls, where a new menu opens:

Clip Start — determines from what point in the track the generation will begin, and is available for 32-second segments. For example, to start the composition, select 0%

Clip Timing - controls the duration of the audio, this feature is only available for the Udio-130 model. 

Lyrics Timing — sets the moment at which the lyrics in the track begin

Style Reduction — option to exclude unwanted genres and styles. Available only by subscription

Prompt Strength - controls how accurately the AI ​​will follow your request

Lyrics Strength - sets the accuracy of the AI ​​following the selected genre, which is especially useful if you have your own lyrics

Clarity — is responsible for the purity of the sound of instruments. A higher value gives a clearer sound, but too high a level can lead to an artificial sound

Seed — content generation parameter. Thanks to it, you can get multiple variations of the same prompt. The default value is -1, but you can change it to another

Generation Quality — quality of generation. As the level increases, the track creation time will increase

Getting Started with Udio

On the left sidebar, go to the “Create” section . Let’s look at the “Auto-Generate” mode — this is a function where AI independently generates text without your participation. Let’s start with the top line: here the system allows you to enter a text description of what you want to hear — a prompt.

For example, let’s enter “a hip-hop beat inspired by the streets of Tokyo ” — and move on to the next step. Now let’s set the genre for our future track.

So, the second step is to add tags to the text, which are located under the text entry field. These tags indicate the mood and musical genres of the future track. Among them, you can choose the following: rock, hip hop, classical, pop, pop rock, folk and others. Just click on the tag you like, and it will automatically add to your text. If you do not see the tag you want to use, you can enter it yourself, and the system will take it into account when generating. 

After adding tags, click on “Create” and get a unique track. 

Udio gives you more choice by generating 2 variants for each request. You can choose the best of the suggested tracks or try to generate new ones if none of them are suitable.

If you’re not sure where to start, use the random query feature. Click the two cubes icon in the upper right corner of the prompt input field, and the system will suggest a sample query along with tags.

Thus, we conclude that the main tool for creating a track is a combination of text and tags. 

For example, in the prompt “a hip-hop beat inspired by the streets of Tokyo, hip hop, inside rock, punk ” the text part is “a hip-hop beat inspired by the streets of Tokyo” and the tags are “hip hop”, “inside rock” and “punk”. They help the neural network understand what mood you want to set. It is noteworthy that these parts can be separated by such symbols as commas or periods. I chose commas.

What’s especially interesting is that Udio allows you to create tracks inspired by the style of specific artists. For example, you can modify the query to: “a smooth jazz track, relaxing, evening vibes, in the style of Miles Davis” . But it’s important to understand that Udio doesn’t recreate the voice of a specific artist, the service interprets their style through similar musical elements and tags.

Fun fact: Udio simplifies the process of creating music queries with autocomplete and suggested tags. Autocomplete helps you formulate ideas faster, while tags allow you to enrich your query with additional details. 

Another fact: during the generation process, the neural network can change your request if it determines that a small adjustment of the tags will lead to a better result.

Here are some interesting prompts with a nice result:

“a jazz song about a conversation between the moon and the stars”

“a smooth jazz track inspired by the sound of raindrops on an old piano”

“an upbeat jazz tune about a cat wandering through a smoky jazz club”

“a mellow jazz song that captures the feeling of reading a mystery novel by candlelight”

“a lively jazz piece inspired by the fluttering wings of butterflies in a secret garden”

Important: Click on the song title to open its card. Here you will see the lyrics, tags and the original request, which you can copy. In the same menu you can publish the track (Publish) , share it (Share) , change the title or description of the composition (Edit Song Details) or edit the composition itself (Create) .

One of the useful features of Udio is the ability to create karaoke-style music videos. To do this, you need to go to the upload mode. It is located in the upper right corner of the composition card. Usually, this is a simple format with a picture and text that complements the track visually. Here’s what I got:

Each track generated by Udio is unique. Even if you enter the same query several times, the system will offer different tracks.

If you are not satisfied with the result or want to continue experimenting, the interface allows you to easily edit queries without having to start from scratch. But if you want to start from scratch, you can always click the “Reset” button , and all changes will be discarded.

How to improve your queries?

With Udio, you can not only automatically create tracks, but also add your own lyrics to the music. To do this, go to the “Custom” mode , where you can enter your own version instead of the proposed text. Be sure to include special designations in your work. They are called descriptors.

Here is the list:

[Verse] – a verse, the part of the song that develops the plot. Each verse is unique in its lyrics, but similar in its melody.

[Chorus] – a repetitive part of a song with a key idea. It is most often the one that remains in the memory.

[Hook] – a hook is a short, catchy melody or phrase that grabs attention.

[Guitar Solo] – a guitar solo, an instrumental piece where the main emphasis is on the guitar playing.

[Sax Solo] – saxophone solo, an instrumental part in which the emphasis is on the saxophone.

[Drop] – an abrupt musical transition, usually used in electronic music to change the rhythm or tempo.

If you have trouble coming up with a text or idea, you can always use the “Write for me” button . It will help generate a suitable text based on the parameters you set. Here’s what it generated for me.

Complex tag combinations do not always lead to successful track generation. If several attempts do not give the desired result, try adjusting the tags or use the remix function (we will talk about it later). Do not be upset if the track does not work out the first time - the neural network is just a tool that helps in the creative process.

Another unique feature of Udio is the ability to recognize and correct stress in text. For example, if the system mispronounces a word, you can correct its spelling with a stress mark, putting “ˈ” on the desired letter, and the neural network will read it with the correct stress.

Udio supports many languages ​​- more than 20 at the moment. Among them are Chinese, Japanese, Russian, German, French, Spanish, Polish and others. This allows you to create songs in different languages, expanding your creative horizons.

Important : for the “Custom” mode , it is better to limit the text to 32 seconds to minimize the number of edits. The 130-second mode often requires a lot of additional adjustments. It is recommended to enter one or two quatrains with descriptors, and the remaining couplets can be created separately in the following generations.

If you need a fully instrumental track, select the “Instrumental” mode . When generating 6 tracks in this mode, one of them still contained elements resembling a voice. The mode itself copes with its task well, although with errors.

Create your own track:

Having received the basic information, let’s add our text in the “Custom” mode and make our own song. I chose a well-known song by the group “Zemlyane” - “Grass at home”. I want to create a song in the style of energetic rock. I want to start with a short intro, continue with a verse, then smoothly move to the chorus, add another verse, then another one and finish with an instrumental outro.

Let’s start by creating a prompt. Enter in the top line: “a powerful rock track in the style of heavy metal” . I chose the following tags: rock, intense, energetic, fast-paced, powerful . Let me remind you that if you haven’t found the tag you need, you can enter it yourself - the system will take it into account.

Then we enter our chorus - a quatrain and add the [Chorus] descriptor so that the system understands that this is a chorus. I did not touch the other settings. Press the “Create” button , and the system will generate 2 tracks . Choose the one you like more.

I wasn’t particularly happy with the result - at the end there were words that definitely weren’t Russian. As I said, it might not work the first time, so let’s try again! I deleted the unsuccessful tracks by clicking on the three dots next to the title and selecting “Delete” in the menu that opened. By the way, after deleting, a notification will appear for 5 seconds with the option to restore the track if it was deleted accidentally.

After 7 generations, I finally found a suitable option and renamed it in the track card (by clicking on its name) via the “Edit Song Details” option . Now you can safely move on to the next step.

Expanding the track:

Once you have found the 32-second track you want, let’s talk about expanding. Expanding tracks in Udio is an important step in creating a full-fledged track. It allows you to turn short musical fragments into complete pieces. Instead of settling for just short sections, you can extend good moments, turning them into the core of your song.

Switch to the “Extend” mode to add new fragments to the selected section. Just hover the mouse cursor over the generated track. When you do this, a turquoise  “Extend” inscription will appear , which you can click on to open additional options for expanding the track.

Let’s go to the “Extend” section . Here, one of the useful functions is “Crop and Extend” . It allows you to select a specific fragment of the track for further development. Click on the slider and activate it. The function is very useful if you like one part of the melody more than the others - you can make it the central element of the track, and build your composition around it. For me, it’s the chorus.

After selecting the desired fragment using ” Crop and Extend ”, we proceed to the next step - specifying the direction of track expansion.

You can choose where to add the new fragment by clicking on one of the options in the Extension Placement section .

Add Intro (Before) – adds an intro at the beginning of the track

Add Section (Before/After) – adds a new section (verse, chorus, etc.) after an existing segment

Add Outro (After) – adds an ending at the end of the track

Udio has the ability to insert new sections both before and after the original fragment. You can continue this process, expanding the track up to 10 sections. There is also an option to add an intro or outro to create a full musical composition.

I choose the section around which the rest of the composition will develop (the chorus) and add a verse using Add Section (before) . I insert 6 lines of the verse and add the [Verse] descriptor so that the system understands that this is a verse and not a chorus.

We continue working with the generated track. I decided to add an intro right away. To do this, I do the same thing, only for a new track. I hover over the title, then go to “Extend” and activate the “Crop and Extend” option . I select the desired part of the track and click on Add Intro (Before) . I leave the prompt the same, since it suits me, no changes are required. For the intro, I insert a couple of lines and the [Intro] descriptor, then click “Extend” again . The system generates two versions of the track, and I choose the one I like best. It worked on the second try.

So, we have an intro-verse-chorus. Then after the chorus we add the second verse. Working with the third track, we go to “Extend” again . The prompt and settings remain unchanged, we activate the “Crop and Extend” option , click on Add Section (After).

Add the text of the second verse with the descriptor [Verse] so that the system understands that this is a verse, and click “Extend” to complete the generation. Select a track that sounds appropriate.

I won’t add another chorus - I’d rather show more variety. Instead, I’ll add another verse and finish with an outro. We do everything according to the same principle: go to “Extend” , select the desired section and click on Add Section (After) . Add the [Verse] descriptor to the text , since this is a verse.

Click “Extend”, generate 2 tracks. As usual, choose the best one and move on to the next step.

The song’s running time was 2:09 , and we were close to the 2:10 limit . So I trimmed the end and added an instrumental outro. To do this, I went to the “Extend” mode in the last generated track , trimmed the ending by a couple of seconds by activating “Crop and Extend” , selected Add Outro (After) and selected the “Instrumental” mode . That’s it!

I click “Extend” and again select the track I like. I selected it on the first try. 

Track skeleton:

[Intro]

Earth in the porthole,

The earth is visible through the porthole…

How a son grieves for his mother,

We are sad about the Earth - it is alone.

[Verse]

And the stars nevertheless,

And the stars nevertheless

A little closer, but still just as cold.

And as in the hours of an eclipse,

And as in the hours of the eclipse

We wait for the light and see earthly dreams.

[Chorus]

And we dream not of the roar of the cosmodrome,

Not this icy blue,

And we dream of grass, grass near the house,

Green, green grass.

[Verse]

And we fly in orbits, along unbeaten paths

The space is stitched with meteorites

Risk and courage are justified, space music

It comes into our business conversation.

[Verse]

In some kind of matte haze, the Earth is in the porthole

Evening and early dawn

And the son is sad about his mother, and the son is sad about his mother

The mother awaits her son, and the Earth awaits her sons.

[Outro]

Instrumental

The song is completely ready! To download it, click on the three dots and press “Download”. There are four formats available for import, but only MP3 and Video can be downloaded for free. WAV and Steams formats require a subscription.

Let’s see what happened!

Fun fact: singing style, voice and genre can be changed. If you need a female voice, just specify it in the prompt. If you need the next verse in a different genre, specify it as well. The system will adjust.

To exit to the main menu, click on the triangle next to the “Extend” button and select Create .

Let’s remix the track:

One of the interesting features of Udio is the ability to create remixes. A remix is ​​a modified version of an existing track, with more noticeable or, conversely, subtle differences. To create a remix, first select the desired track. Currently, remixes are only available for 32-second fragments, so I chose the very first piece of the song - the chorus, which is exactly 32 seconds long. I click on the three dots, select the Create section, hover over this section with the mouse and select the Remix tab .

We are moving into remix mode. The interface is similar to the “Extend” mode , but without the “Crop and Extend” option . But there is a Variance slider , which regulates the degree of remix. In the extreme left position, the original track is almost unchanged, but as the slider moves to the right, the changes become more noticeable - individual elements of the track begin to transform.

Low values ​​affect only small nuances, leaving the main structure of the track almost untouched. The timbres of the instruments may change slightly. The more you drag the slider, the more drastic the changes are, and at the very right position the result will hardly resemble the original track.

Move the slider and click the yellow “Remix” button . I chose the value 0.75 .

Here is an improved version of the text:

The system generates 2 tracks, from which you can choose the one you like more. After the process is complete, the tracks get a yellow icon , which makes it easier to find them among the others. After using the “Extend” function, the icon was turquoise . Let’s see what happened!

What do we have as a result? A cool function, after which the chorus began to sound much more lively.

What about copyright and terms of use?

Udio’s terms of use detail the rights and responsibilities of users when creating and using generated content. Below are the main points :

Copyright : The user retains full rights to the created content. The track is completely yours. But, there is one important nuance. Udio requires you to indicate the service as a source, especially for commercial use. Be sure to take this into account when posting on platforms such as YouTube.

Commercial Use : Everything created can be used in commercial projects - monetized videos, advertising, cinema, even TV. Anything that includes the possibility of generating income through streaming or monetization.

Protected Materials: The user is responsible for ensuring that the music does not contain elements protected by copyright. If you use such materials, you must obtain appropriate permission to use them.

Editing : Music created with Udio can be modified, remixed, or used in other projects, but be sure to credit Udio as the source.

For more information, please be sure to read the full text on the page: Udio Terms of Service .

What’s the subscription? Is it worth taking?

1. Free

Price: $0/month.

Key Features:

Daily Credit Limit: Free plan users get 10 credits per day to create music. An additional 100 credits per month are available.

Song Generation: You can create up to 4 songs at a time, allowing you to try out different sounds and arrangements in one go.

Full Track Limit: You can generate up to 3 full tracks per day (2 minutes 10 seconds each), which is enough for basic use or experimenting with music.

Who is it for: The tariff plan is intended for users who are just starting to work with musical neural networks or use the service in rare cases.

Note: When using the free plan, you must indicate in the credits that the melody was created using Udio. If the video is published on YouTube, be sure to note that the music was generated by a neural network.

2. Standard

Price : $10/month.

Key Features:

Monthly Credit Limit: Users of this plan get 1200 credits per month, with no daily limit. 

Song Generation: Up to 6 songs can be created at a time. Suitable for those who work with more complex music projects or create a lot of content.

Editing and customization: The plan allows editing of created tracks. Users can generate songs from audio clips, download musical elements (e.g. melodies, drums, vocal tracks), upload their own album covers. 

Who is it for: The plan is designed for people who regularly create music, such as aspiring musicians, composers or bloggers who need to constantly work with audio content.

3. Pro

Price: $30/month.

Key Features:

Monthly Credit Limit: The plan provides users with 4800 credits per month, with no daily limit. The maximum package, designed for intensive use.

Song generation: Up to 8 songs can be generated simultaneously, increasing the speed of working on large projects.

All features of previous plans: In addition to all the features of the Free and Standard plans (track editing, generating songs from audio files, etc.), Pro users get additional exclusive features.

Early Access: Pro customers get early access to new platform features and updates.

Who is it for: Ideal for professional musicians, producers, content makers who create music on a regular basis or work on large-scale music projects.

Student discount

The plan purchase page also mentions that students can get a discount on all plans by following a special link provided on the site and clicking  “Are you a student? Check out our discounted student pricing.”

Push:

All plans are based on a subscription model, which means monthly or annual renewal depending on the user’s choice.

The differences between the plans mainly concern the number of credits, the ability to generate tracks simultaneously, and access to editing and customization features.

To sum it up:

Udio is a good platform for creating and editing music with AI. But it is worth admitting that there may be difficulties when working with Russian-language songs. Due to the accent, problems with stress or rhythm, it is necessary to carefully work out the lyrics and descriptors for a quality result. Despite these nuances, Udio makes it easy to experiment with different genres, voices and styles, which makes it a good tool for both professionals and beginners.

Thank you for your attention! It would be interesting to hear about your experience working with neural networks for creating music. Perhaps you already have a favorite track generation service? Share your impressions!

Meta* and its products (Facebook, Instagram) are banned in the Russian Federation