synchronised speech?
Moderator: Moderators
synchronised speech?
Anyone remember Mega Apocalypse for the C64? Or ACE by Cascade? The authors claimed to have a system that enabled digitised speech without slowing the game action.
Does anyone know how this might be implemented?
The only way I can think of doing this would be to call multiple interrupts using timers set by either VIA chip, but even so wouldn't that result in a very low hertz rate for the sample, rendering it extremely low quality..? I recall the speech in Mega Apocalypse was quite impressive.
(I'm able to play back samples, just not 'synchronised'.)
I am about 80% through completing a new game, and thinking of something novel to do with sound effects.
Does anyone know how this might be implemented?
The only way I can think of doing this would be to call multiple interrupts using timers set by either VIA chip, but even so wouldn't that result in a very low hertz rate for the sample, rendering it extremely low quality..? I recall the speech in Mega Apocalypse was quite impressive.
(I'm able to play back samples, just not 'synchronised'.)
I am about 80% through completing a new game, and thinking of something novel to do with sound effects.
3^4 is 81.0000001
- Mike
- Herr VC
- Posts: 4840
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Re: synchronised speech?
The key word is digitized speech.
With Formant synthesis, a rather low bandwidth is only needed. You just have to change the amplitude and frequency of (at least) three sinusoidal tones once every 1/200 .. 1/50 second to produce discernible vowels and diphthongs; for consonants, a filtered noise generator is used instead.
The VIC however, only provides square waves 'out of the box', so there ...
With Formant synthesis, a rather low bandwidth is only needed. You just have to change the amplitude and frequency of (at least) three sinusoidal tones once every 1/200 .. 1/50 second to produce discernible vowels and diphthongs; for consonants, a filtered noise generator is used instead.
The VIC however, only provides square waves 'out of the box', so there ...
Re: synchronised speech?
Thanks, but the speech in Mega Apocalypse is definitely sampled, not synthesised. I can even hear the English accent in the voice..Mike wrote: With Formant synthesis, a rather low bandwidth is only needed. You just have to change the amplitude and frequency of (at least) three sinusoidal tones once every 1/200 .. 1/50 second to produce discernible vowels and diphthongs; for consonants, a filtered noise generator is used instead.
3^4 is 81.0000001
- Mayhem
- High Bidder
- Posts: 3027
- Joined: Mon May 24, 2004 7:03 am
- Website: http://www.mayhem64.co.uk
- Location: London
Re: synchronised speech?
I can't remember exactly, but the speech in Mega Apocalypse is either Simon Nicol (the programmer) or Rob Hubbard (the musician).
Lie with passion and be forever damned...
Re: synchronised speech?
I believe that if you used a sampling rate of once per scan line, you'd get roughly 1/3 the sampling rate of CD audio, which is more than sufficient. The code to play sound samples is short, and would not steal much CPU time. And with 100% machine code and sprites, making a game run in the remaining CPU time would be no trick at all.
- Kweepa
- Vic 20 Scientist
- Posts: 1315
- Joined: Fri Jan 04, 2008 5:11 pm
- Location: Austin, Texas
- Occupation: Game maker
Re: synchronised speech?
You really can't have an interrupt every scanline and get any kind of performance, even for a looping 256 byte sample:
32 total cycles. One scan line is only 65/71 cycles so even the simplest digitized 'speech' will take up about 50% cpu.
You could probably manage 8kHz (telephone quality) though - that would 'only' consume 32/(1000000/8000) = 25% cpu.
Once you need more than 256 samples, the code bloats up considerably.
42 cycles. For 8kHz, 42/(1000000/8000) = 33% cpu.
(And here you'd still need to poll mod+2 in your main loop, checking for the sample end.)
Code: Select all
7 (interrupt)
3 pha
5 inc mod+1 ; this code on the zero page
.mod
3 lda $4000 ; sample page
4 sta $900e ; volume control + aux color
4 pla
6 rti
You could probably manage 8kHz (telephone quality) though - that would 'only' consume 32/(1000000/8000) = 25% cpu.
Once you need more than 256 samples, the code bloats up considerably.
Code: Select all
7 (interrupt)
3 pha
2 clc
2 lda mod+1 ; zero page code
3 adc #0
3 sta mod+1
3 bcc mod
0 inc mod+2
.mod
3 lda $4000 ; sample start
4 sta $900e ; volume control + aux color
4 pla
6 rti
(And here you'd still need to poll mod+2 in your main loop, checking for the sample end.)
- Mike
- Herr VC
- Posts: 4840
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Re: synchronised speech?
I already pointed the infeasibility of RJBowman's approach out to him in a posting dating from 2012: (link)
Not only does the interrupt routine already take a lot of cycles for being executed every 65/71 cycles, the interrupt vectors are fixed in the VIC-20 and are always routed to the following routines in the KERNAL:
... which means: besides the (variable) interrupt latency, there are another 29 or 7 cycles before the actual interrupt server runs, and it is also necessary to acknowledge the interrupt, adding another 4 cycles for a BIT instruction.
For the first example with IRQ we now arrive at 29(Kweepa's small routine incl. ACK)+9([!] max. latency)+29(IRQ header)=67, which just bombs on NTSC and barely works on PAL. Using an NMI: 29+9+7=45 ... means 63..70% CPU load.
Not only does the interrupt routine already take a lot of cycles for being executed every 65/71 cycles, the interrupt vectors are fixed in the VIC-20 and are always routed to the following routines in the KERNAL:
Code: Select all
>FFFA A9 FE ; NMI entry ($FEA9)
>FFFC 22 FD ; Reset entry ($FD22)
>FFFE 72 FF ; IRQ entry ($FF72)
; IRQ
.FF72 48 PHA [3]
.FF73 8A TXA [2]
.FF74 48 PHA [3]
.FF75 98 TYA [2]
.FF76 48 PHA [3]
.FF77 BA TSX [2]
.FF78 BD 04 01 LDA $0104,X [4+0]
.FF7B 29 10 AND #$10 [2]
.FF7D F0 03 BEQ $FF82 [2+1]
.FF7F 6C 16 03 JMP ($0316)
.FF82 6C 14 03 JMP ($0314) [5]
-------------------------------------------
29 Cycles
; NMI
.FEA9 78 SEI [2]
.FEAA 6C 18 03 JMP ($0318) [5]
-------------------------------------------
7 Cycles
For the first example with IRQ we now arrive at 29(Kweepa's small routine incl. ACK)+9([!] max. latency)+29(IRQ header)=67, which just bombs on NTSC and barely works on PAL. Using an NMI: 29+9+7=45 ... means 63..70% CPU load.
You can find the PSID file of Mega Apocalypse somewhere on the 'net. It's roughly 22K in size. So it would appear most of the game is taken up by speech samples. Judging from their quality, they seem to be played back at ~4 kHz sample rate max. That is manageable. If the sample play routine is 'always-on' and just gets switched between a silent sample and the required voice sample, you get that synchronised speech and the CPU load remains constant.Victragic wrote:Thanks, but the speech in Mega Apocalypse is definitely sampled, not synthesised. I can even hear the English accent in the voice.
Re: synchronised speech?
I can confirm that most popular 64 digitized speech is only played at 4Hz. I had a simple program from a 64 programming book that could simply adjust the rate of play.
Now ACE was the first game I ever had when I opened up my C128 and I believe that the samples on that were even lower than 4bit! The fidelity was so low that you could barely make out what was being said.
Also, I remember looking into ESS and how they managed to make their speech so clear yet keep the samples so small. After reading a couple of articles they basically chopped up the samples and removed the redundant information - used loops to fill those gaps and the end result was still a high quality reconstruction.
I tested it out using Soundforge and did a simple editing of a sound byte or two (Berzerk sound bites) and it worked. I always wondered if speech could be played in real time without bogging down the CPU and causing popping sounds..I guess now I know.
Now ACE was the first game I ever had when I opened up my C128 and I believe that the samples on that were even lower than 4bit! The fidelity was so low that you could barely make out what was being said.
Also, I remember looking into ESS and how they managed to make their speech so clear yet keep the samples so small. After reading a couple of articles they basically chopped up the samples and removed the redundant information - used loops to fill those gaps and the end result was still a high quality reconstruction.
I tested it out using Soundforge and did a simple editing of a sound byte or two (Berzerk sound bites) and it worked. I always wondered if speech could be played in real time without bogging down the CPU and causing popping sounds..I guess now I know.
Learning all the time...
Re: synchronised speech?
I'm playing catch-up with this topic.
I'm thinking that the sacrifice of an 'always-on' sound sampler would be too great, having to do so much other work as well with graphics. Nice gimmick, but it will remain just that.
From memory ESS speech was all 'frozen', but that raises an interesting point. Aside from data compression, I'm wondering if selective cutting of samples would allow an illusion of synchronisation enough to fool the user into thinking that the action is continuing. After all, we're talking about very short samples.Also, I remember looking into ESS and how they managed to make their speech so clear yet keep the samples so small. After reading a couple of articles they basically chopped up the samples and removed the redundant information - used loops to fill those gaps and the end result was still a high quality reconstruction.
I'm thinking that the sacrifice of an 'always-on' sound sampler would be too great, having to do so much other work as well with graphics. Nice gimmick, but it will remain just that.
3^4 is 81.0000001
- Mike
- Herr VC
- Posts: 4840
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Re: synchronised speech?
You'll find that quite a lot of games with sampled sound on the C64 use that technique. At 4 kHz the slowdown surely can be tolerated - after all, the RS232 KERNAL routines put a similar load on the CPU at - say - 2400 or 4800 baud. And for the game main loop, the timing surely gets easier when interrupts remain at constant load.Victragic wrote:I'm thinking that the sacrifice of an 'always-on' sound sampler would be too great, having to do so much other work as well with graphics. Nice gimmick, but it will remain just that.
You might take a hear at:From memory ESS speech was all 'frozen', but that raises an interesting point. Aside from data compression, I'm wondering if selective cutting of samples would allow an illusion of synchronisation enough to fool the user into thinking that the action is continuing. After all, we're talking about very short samples.
o C64MP3,
o Monophono, and
o Cubase64.
All on the C64. Here's also some interesting Vocoder stuff: Frodigi, Frodigi 2 and Frodigi 3.
Re: synchronised speech?
dont forget my humps
I'm just a Software Guy who has no Idea how the Hardware works. Don't listen to me.