Picture by CorruptedMe @ DA
UTAU Tips
Warning - Bold words are emphasized or keywords. Written with much sarcasm so if you cannot stand it, please don't read.
CURRENT - This blog was written before the Summer of 2014, but now it's after it so I've made some updates, updates I want you to know I changed will have //CURRENT in front of the updated info.
CURRENT - This blog was written before the Summer of 2014, but now it's after it so I've made some updates, updates I want you to know I changed will have //CURRENT in front of the updated info.
My preferences in UTAU - In a nutshell-
1.) I'm into VCVs, I can do CVs too, but prefer VCVs. Hime no VCVs~~//shot
2.) I love guys who are baritone and guys who have utaus that are baritone~
3.) I automatically put a velocity of 120 or 130 to avoid spacing sounds. //CURRENTLY: I do not use velocity on normal utaus anymore, only really spacey ones.
4.) I mainly use Fresamp Ver11. Sometimes I use TIPS and Resampler.
5.) What I do may or may not be what others do so don't expect my methods to be like theirs.
2.) I love guys who are baritone and guys who have utaus that are baritone~
3.) I automatically put a velocity of 120 or 130 to avoid spacing sounds. //CURRENTLY: I do not use velocity on normal utaus anymore, only really spacey ones.
4.) I mainly use Fresamp Ver11. Sometimes I use TIPS and Resampler.
5.) What I do may or may not be what others do so don't expect my methods to be like theirs.
-Flag Tip-
Most of the time if you're here in my tutorial/tips section, you either are curious on what I have to say or looking for some magical trick that will make your utau sound wonderful //shot, no worries x,D that's nothing to be ashamed of thinking.
Basically, you have to understand some of the flags, but when I explain my flag, I'll put it in layman terms.
Also keep in mind there are different resamplers, they all have their own flags, so they may vary from resampler to resampler. //CURRENT : Remember that the flag does not make an utau great, it's the recording and effort itself!
I'll only cover the flags I use and what I do//which is not everything x,D
Most of the time if you're here in my tutorial/tips section, you either are curious on what I have to say or looking for some magical trick that will make your utau sound wonderful //shot, no worries x,D that's nothing to be ashamed of thinking.
Basically, you have to understand some of the flags, but when I explain my flag, I'll put it in layman terms.
Also keep in mind there are different resamplers, they all have their own flags, so they may vary from resampler to resampler. //CURRENT : Remember that the flag does not make an utau great, it's the recording and effort itself!
I'll only cover the flags I use and what I do//which is not everything x,D
My Flag : Y0H0B0BRE0F0P99C99E0M0L0w10
- Reason behind my flag -
I prefer to have utau to be as clear as possible only because my microphone/headset isn't the best x,D
It's a logitech though so it's decent. //CURRENT: Now it's Blue Yet but I still use a flag.
Y0 --> strong
H0 --> clear
B0 --> No breathy voice
BRE0 --> Same as B0
F0 or F1 --> Some songs prefer F1 and viseversa
P99 --> This is the 'volume' of it, F99 the strongest/loudest your utau is, and P0 can make certain notes very quiet.
C99 --> I use this to avoid a scratchy voice.
E0--> Leaves my voice in the treble (sound like a vocal or somewhat like high pass in audacity.)
M0 --> Don't remember.
L0 --> No attack on voice, it can change the pitch of the voice, recommend only L10 or under.
w10 --> Depending what you use, this can refer to utaugrowl (does what it sounds like, growls) or produces a more robotic sound.
I prefer to have utau to be as clear as possible only because my microphone/headset isn't the best x,D
It's a logitech though so it's decent. //CURRENT: Now it's Blue Yet but I still use a flag.
Y0 --> strong
H0 --> clear
B0 --> No breathy voice
BRE0 --> Same as B0
F0 or F1 --> Some songs prefer F1 and viseversa
P99 --> This is the 'volume' of it, F99 the strongest/loudest your utau is, and P0 can make certain notes very quiet.
C99 --> I use this to avoid a scratchy voice.
E0--> Leaves my voice in the treble (sound like a vocal or somewhat like high pass in audacity.)
M0 --> Don't remember.
L0 --> No attack on voice, it can change the pitch of the voice, recommend only L10 or under.
w10 --> Depending what you use, this can refer to utaugrowl (does what it sounds like, growls) or produces a more robotic sound.
OTO
Okay, if you're here, you're probably like... "What possibly could you say about otoing?? there's already tutorials and explanations on it so why bother putting this?"
Personally, for the past three years I've been with utau, I really never explored otoing further than what I needed to know until last year.
Basically, the best way to oto is in uniform; for both VCVs and CV.
Reason why is because you want to use crossfade to your advantage.
Of course VCVs' uniform is much more obvious a.k.a. vowel consonant vowel, Just move the blue, shaded area forward or back to your utau's samples you're good. Only problem is... There's so many to oto with VCV, it's like regular sound (about 350 sounds) X 7 so it can take awhile to oto it.
For CV, I'd recommend something like like 50, 20 uniform. It lets crossfade sound better especially for other languages, and when you can, try using a VCV uniform for end sounds like 'a n' or 'a i' if you're into trying CV VC. There are actual definitions of what the green line and red line are for and can have some special cases depending on vocal. Majority of the time, it's best the have green before the red line. Don't let their distance become more than 50 in difference.
You necessarily can do it non-uniformly and I bet people will definitely argue with me on this issue, but this is what I find easier both in sound and applying an utau to a ust.
It's more of a 'Lazy' way of doing it because when you right click on the notes to clear its preutterance, all have uniform data or a uniform crossfade data so you don't have to go through the entire ust and edit the preutterance of each sound like 'n' or 'o' (the vowels)
To get use to VCVs, I say it's all practice and repetition. I use audacity for most of my utaus but you can also use oremo which is easier and more convenient to exporting to wav files and for recording.
Personally, for the past three years I've been with utau, I really never explored otoing further than what I needed to know until last year.
Basically, the best way to oto is in uniform; for both VCVs and CV.
Reason why is because you want to use crossfade to your advantage.
Of course VCVs' uniform is much more obvious a.k.a. vowel consonant vowel, Just move the blue, shaded area forward or back to your utau's samples you're good. Only problem is... There's so many to oto with VCV, it's like regular sound (about 350 sounds) X 7 so it can take awhile to oto it.
For CV, I'd recommend something like like 50, 20 uniform. It lets crossfade sound better especially for other languages, and when you can, try using a VCV uniform for end sounds like 'a n' or 'a i' if you're into trying CV VC. There are actual definitions of what the green line and red line are for and can have some special cases depending on vocal. Majority of the time, it's best the have green before the red line. Don't let their distance become more than 50 in difference.
You necessarily can do it non-uniformly and I bet people will definitely argue with me on this issue, but this is what I find easier both in sound and applying an utau to a ust.
It's more of a 'Lazy' way of doing it because when you right click on the notes to clear its preutterance, all have uniform data or a uniform crossfade data so you don't have to go through the entire ust and edit the preutterance of each sound like 'n' or 'o' (the vowels)
To get use to VCVs, I say it's all practice and repetition. I use audacity for most of my utaus but you can also use oremo which is easier and more convenient to exporting to wav files and for recording.
Tips for recording
Here of course a good microphone would definitely help with vocal samples and the output in utau but that doesn't mean the utau is great.
Yea, you're going to argue here a lot but I believe it's the approach in recording is what sets an utau's value (to me) if you consider a 'value' as likability or rather use.
I've heard utaus with great mics and clear voices, but some of the voices do not sound appealing. Some of the voices sound too nasally, like me //shot, because they were recorded with a certain pitch and 'breathiness' or some sound out of their range and scratchy.
For recording, I recommend recording a pitch of about E4 if single pitched bank with one's 'chest' voice. 'Head' voices will usually sound nasally because of that. If possible a di-pitch would be best so that one pitch is the chest voice and one is the head voice so they match better in their octave.
Now for the actual recording... I recommend you not trying to mimic a monotone type of voice or the vocaloids. You could try for the pronunciation, but personally I recommend do not use them for pitch (specifically rin and len). Sometimes it sounds good, but most of the time I tried it, it made me sound dead ~
Sounds crazy, but you're only trying to mimic 'one' of their pitches. Unless you have a multipitch, your 'monotone recorded samples' are going to sing high and sound strange. Lower pitches tend to be more monotone. You may have seen tutorials saying to use your 'normal' voice, but becareful with that. It's my opinion, but if you speak English, you'll notice the deeper voice compared to Japanese voices (majority of the time, considering the majority). Imagine an American trying to use their normal tone and accent when speaking Japanese... Yeah, not a pretty sight. Low voices in the singing octave like C4 to C5, sound stressed because a normal tone is in the A3 or lower. Of course this is my opinion, but this section is basically how I do utau and what I think.
My best recommendation would be if you're going for a higher octave voice, is to try and pretend you're speaking Japanese (or you ARE speaking Japanese if you're studying the language) with a higher voice. Trying to mimic the Japanese accent/pronunciation and tone will make you sound so much better. Record as if you're singing (or talking or however you see it) Japanese, but not crazy like change from one pitch to another. Do that also in uniform pitch and as if you're singing your favorite Japanese song in your best voice (or desired voice). It's recommended to record non-vibrato samples to avoid weird sounds due to the frequency adjusting but you are free to try vibratos.
//CURRENT: Many great UTAUs are not recorded to sound like their voicer, but rather they ACT as their character.
Keep in mind, you should pick a quiet area or a soundproof area. Three reasons why, one, your voice can echo, echo effects should be added on the wav exported by UTAU, not the actual sample, two, you might pick up background noises and ruin the sample (having rerecord it), or three, someone might come in and yell at you because you're too loud which interrupts the recording session and can really ruin your mood to record.
Now, here's another thing about recording, you must have endurance or patience.
VCV recording sessions (because I've done them so much) takes twenty minutes of straight through recording, but an average person who doesn't do vcvs as frequently about a half an hour //CURRENT: or more excluding corrections and re-recordings.
CV recording sessions can take up to five minutes.
I recommend recording all the samples first, then exporting (which will take the most time.)
I would recommend adding a bit of space before the sample and after the same because then you'll be able to use automatic breaths built into your vcvs instead of having to change a rest to a breath every time you want a breath in UTAU.
ENDURANCE - It'd be best if you have water near you to drink because your voice will get dry if you go for multiple pitches especially in VCVs. Most likely if you take a break, you'll not want to finish your bank, so I highly suggest you do it in one sitting unless multipitch (then do each pitch bank for each session.)
PATIENCE - Exporting can take forever, FRQ is rather quick if you're using the resampler wrapper. Normal time to FRQ a VCV without RW would be an hour or two+ for me, with RW, it's 10 minutes or less.
Yea, you're going to argue here a lot but I believe it's the approach in recording is what sets an utau's value (to me) if you consider a 'value' as likability or rather use.
I've heard utaus with great mics and clear voices, but some of the voices do not sound appealing. Some of the voices sound too nasally, like me //shot, because they were recorded with a certain pitch and 'breathiness' or some sound out of their range and scratchy.
For recording, I recommend recording a pitch of about E4 if single pitched bank with one's 'chest' voice. 'Head' voices will usually sound nasally because of that. If possible a di-pitch would be best so that one pitch is the chest voice and one is the head voice so they match better in their octave.
Now for the actual recording... I recommend you not trying to mimic a monotone type of voice or the vocaloids. You could try for the pronunciation, but personally I recommend do not use them for pitch (specifically rin and len). Sometimes it sounds good, but most of the time I tried it, it made me sound dead ~
Sounds crazy, but you're only trying to mimic 'one' of their pitches. Unless you have a multipitch, your 'monotone recorded samples' are going to sing high and sound strange. Lower pitches tend to be more monotone. You may have seen tutorials saying to use your 'normal' voice, but becareful with that. It's my opinion, but if you speak English, you'll notice the deeper voice compared to Japanese voices (majority of the time, considering the majority). Imagine an American trying to use their normal tone and accent when speaking Japanese... Yeah, not a pretty sight. Low voices in the singing octave like C4 to C5, sound stressed because a normal tone is in the A3 or lower. Of course this is my opinion, but this section is basically how I do utau and what I think.
My best recommendation would be if you're going for a higher octave voice, is to try and pretend you're speaking Japanese (or you ARE speaking Japanese if you're studying the language) with a higher voice. Trying to mimic the Japanese accent/pronunciation and tone will make you sound so much better. Record as if you're singing (or talking or however you see it) Japanese, but not crazy like change from one pitch to another. Do that also in uniform pitch and as if you're singing your favorite Japanese song in your best voice (or desired voice). It's recommended to record non-vibrato samples to avoid weird sounds due to the frequency adjusting but you are free to try vibratos.
//CURRENT: Many great UTAUs are not recorded to sound like their voicer, but rather they ACT as their character.
Keep in mind, you should pick a quiet area or a soundproof area. Three reasons why, one, your voice can echo, echo effects should be added on the wav exported by UTAU, not the actual sample, two, you might pick up background noises and ruin the sample (having rerecord it), or three, someone might come in and yell at you because you're too loud which interrupts the recording session and can really ruin your mood to record.
Now, here's another thing about recording, you must have endurance or patience.
VCV recording sessions (because I've done them so much) takes twenty minutes of straight through recording, but an average person who doesn't do vcvs as frequently about a half an hour //CURRENT: or more excluding corrections and re-recordings.
CV recording sessions can take up to five minutes.
I recommend recording all the samples first, then exporting (which will take the most time.)
I would recommend adding a bit of space before the sample and after the same because then you'll be able to use automatic breaths built into your vcvs instead of having to change a rest to a breath every time you want a breath in UTAU.
ENDURANCE - It'd be best if you have water near you to drink because your voice will get dry if you go for multiple pitches especially in VCVs. Most likely if you take a break, you'll not want to finish your bank, so I highly suggest you do it in one sitting unless multipitch (then do each pitch bank for each session.)
PATIENCE - Exporting can take forever, FRQ is rather quick if you're using the resampler wrapper. Normal time to FRQ a VCV without RW would be an hour or two+ for me, with RW, it's 10 minutes or less.
more 'human' tone
If you are looking for a more natural, human tone, few tips.
Keep in mind of which do you want, more human doesn't always mean best singer, but a good singer can sound human.
First, always good to use vibrato, big or small (Big ones can mimic like vocaloids if you like vocaloid vibratos.) or small as in a natural, non-singer voice
Why? Normally utau normalizes your tone with the FRQ files to a straight line. Normal notes without vibrato are flat, thus more robotic.
Second, have your utau sing in the correct octave they were recorded in. If you try getting something like a A3 utau sing C5, they will sound strange and definitely not as human as they possibly can. If you want your utau to sing high like in the C5 area, record a C5 pitch bank, then you're better off.
Pick songs that better suit your character's octave, just like how you can sing certain songs better than others.
Naturally, you can tell by the voicer's sample if they have a more natural voice or acted voice. That will also affect how human they will sound.
If you're going for a more human tone in your samples (less focus on the actual product in utau) again, record naturally, don't let your voice heighten and become extra nasally.
Along the line using UTAU to make vocals more human like, I suggest when making pitchblends, avoid changing the entire note's pitching, just edit the beginning and end of the note/sounds.
Also, humans' vocals when looked at with a frequency reader, will actually be a bit all over the charts because human vocals cannot produce an absolutely perfect note (or else that's just robotic...) Emphasizing pitch blend edits is important in UTAU as well (considering it mimics human vocal frequencies.)
Also, if you're going for more human-like samples, besides the way you record, pick a condenser microphone that'll not alter your voice unless that is what you want.
Considering if you have extra money to spend on a more expensive USB microphone, pick a microphone with omnidirectional or highly recommended, Cardioid (I call it, the around sound and cardioid is the front recording of the microphone.)
Currently, I use a LogiTech ClearChat. (30$~)
Considering what I use it for, Skype, UTAU, a bit of gaming, I'm satisfied with it (and it's rather convenient without any setup and is portable) , though it's far from the best USB microphone out there in quality wise.
Traits below are both positive and negative regarding to UTAU specifically for home usage instead of professional usage.
If I had to recommend a cheap microphone to serve the purpose of having mediocre quality and for other things like gaming, it'd be LogiTech ClearChat.
-Cheaper
-Headphones
-Easy to Use
-Not great quality, decent
-USB port fragile
-High Pass recordings (CAN BE CONSIDER A PRO IF ONE PREFERS IT, BUT IT CAN BE REPLICATED WITH EFFECTS SO IT'S NOT RECOMMENDED)
If I had to recommend an condenser microphone, though much more expensive but better in general it'd be a Blue Yeti (~150$) Considering that you're going for home recording and you're not serious about a dedicated studio with $1000+ gear hanging around.
-Great Quality
-Omnidirectional (As well as Bidirectional and cardioid)
-Made of metal
-More expensive
-Somewhat big/bulky but is light
-USB port fragile
-A bit muffled, but can be corrected with equalization and effects
If I had to recommend a good condenser microphone with about similar quality from the one above but a bit cheaper, it'd be the SnowBall Omni. (100$)
-Great Quality
-Omnidirectional (Also Bidirectional and cardioid)
-Cheaper than Blue Yeti
-A little older model microphone (release date)
-USB port fragile
-Not as flexible as the Blue Yeti
Among the UTAU community, Blue Yeti, ClearChat, Shure SM58+X2u, etc are good microphones so whichever one you decide on will be acceptable in quality.
If quality doesn't bother you much and you have a very powerful vocals, I suggest trying dynamic microphones, Shure for example, is a good cardioid dynamic microphone but also has excellent quality. Not all good microphones fit all voices, they have their own Hz ranging, look into that.
Keep in mind of which do you want, more human doesn't always mean best singer, but a good singer can sound human.
First, always good to use vibrato, big or small (Big ones can mimic like vocaloids if you like vocaloid vibratos.) or small as in a natural, non-singer voice
Why? Normally utau normalizes your tone with the FRQ files to a straight line. Normal notes without vibrato are flat, thus more robotic.
Second, have your utau sing in the correct octave they were recorded in. If you try getting something like a A3 utau sing C5, they will sound strange and definitely not as human as they possibly can. If you want your utau to sing high like in the C5 area, record a C5 pitch bank, then you're better off.
Pick songs that better suit your character's octave, just like how you can sing certain songs better than others.
Naturally, you can tell by the voicer's sample if they have a more natural voice or acted voice. That will also affect how human they will sound.
If you're going for a more human tone in your samples (less focus on the actual product in utau) again, record naturally, don't let your voice heighten and become extra nasally.
Along the line using UTAU to make vocals more human like, I suggest when making pitchblends, avoid changing the entire note's pitching, just edit the beginning and end of the note/sounds.
Also, humans' vocals when looked at with a frequency reader, will actually be a bit all over the charts because human vocals cannot produce an absolutely perfect note (or else that's just robotic...) Emphasizing pitch blend edits is important in UTAU as well (considering it mimics human vocal frequencies.)
Also, if you're going for more human-like samples, besides the way you record, pick a condenser microphone that'll not alter your voice unless that is what you want.
Considering if you have extra money to spend on a more expensive USB microphone, pick a microphone with omnidirectional or highly recommended, Cardioid (I call it, the around sound and cardioid is the front recording of the microphone.)
Currently, I use a LogiTech ClearChat. (30$~)
Considering what I use it for, Skype, UTAU, a bit of gaming, I'm satisfied with it (and it's rather convenient without any setup and is portable) , though it's far from the best USB microphone out there in quality wise.
Traits below are both positive and negative regarding to UTAU specifically for home usage instead of professional usage.
If I had to recommend a cheap microphone to serve the purpose of having mediocre quality and for other things like gaming, it'd be LogiTech ClearChat.
-Cheaper
-Headphones
-Easy to Use
-Not great quality, decent
-USB port fragile
-High Pass recordings (CAN BE CONSIDER A PRO IF ONE PREFERS IT, BUT IT CAN BE REPLICATED WITH EFFECTS SO IT'S NOT RECOMMENDED)
If I had to recommend an condenser microphone, though much more expensive but better in general it'd be a Blue Yeti (~150$) Considering that you're going for home recording and you're not serious about a dedicated studio with $1000+ gear hanging around.
-Great Quality
-Omnidirectional (As well as Bidirectional and cardioid)
-Made of metal
-More expensive
-Somewhat big/bulky but is light
-USB port fragile
-A bit muffled, but can be corrected with equalization and effects
If I had to recommend a good condenser microphone with about similar quality from the one above but a bit cheaper, it'd be the SnowBall Omni. (100$)
-Great Quality
-Omnidirectional (Also Bidirectional and cardioid)
-Cheaper than Blue Yeti
-A little older model microphone (release date)
-USB port fragile
-Not as flexible as the Blue Yeti
Among the UTAU community, Blue Yeti, ClearChat, Shure SM58+X2u, etc are good microphones so whichever one you decide on will be acceptable in quality.
If quality doesn't bother you much and you have a very powerful vocals, I suggest trying dynamic microphones, Shure for example, is a good cardioid dynamic microphone but also has excellent quality. Not all good microphones fit all voices, they have their own Hz ranging, look into that.
Velocity
Velocity should be rather simple,
Slower tempo, songs --> slow velocity (leaving the velocity box empty), or just a straightforward velocity like 120.
Faster tempo, songs --> fast velocity like 140.
Warning: Adding a velocity like 180 will make an average velocity utau have a 'twang' sound, from my opinion.
Higher velocity in general is best used for utaus with large spacing between sounds or long consonants, otherwise is not recommended for common usage.
Slower tempo, songs --> slow velocity (leaving the velocity box empty), or just a straightforward velocity like 120.
Faster tempo, songs --> fast velocity like 140.
Warning: Adding a velocity like 180 will make an average velocity utau have a 'twang' sound, from my opinion.
Higher velocity in general is best used for utaus with large spacing between sounds or long consonants, otherwise is not recommended for common usage.