synchronised speech?

Victragic · Post by **Victragic** » Wed Oct 22, 2014 4:01 am

Anyone remember Mega Apocalypse for the C64? Or ACE by Cascade? The authors claimed to have a system that enabled digitised speech without slowing the game action.

Does anyone know how this might be implemented?

The only way I can think of doing this would be to call multiple interrupts using timers set by either VIA chip, but even so wouldn't that result in a very low hertz rate for the sample, rendering it extremely low quality..? I recall the speech in Mega Apocalypse was quite impressive.

(I'm able to play back samples, just not 'synchronised'.)

I am about 80% through completing a new game, and thinking of something novel to do with sound effects.

Mike · Post by **Mike** » Wed Oct 22, 2014 4:16 am

The key word is digitized speech.

With Formant synthesis, a rather low bandwidth is only needed. You just have to change the amplitude and frequency of (at least) three sinusoidal tones once every 1/200 .. 1/50 second to produce discernible vowels and diphthongs; for consonants, a filtered noise generator is used instead.

The VIC however, only provides square waves 'out of the box', so there ...

Victragic · Post by **Victragic** » Wed Oct 22, 2014 6:11 pm

Mike wrote: With Formant synthesis, a rather low bandwidth is only needed. You just have to change the amplitude and frequency of (at least) three sinusoidal tones once every 1/200 .. 1/50 second to produce discernible vowels and diphthongs; for consonants, a filtered noise generator is used instead.

Thanks, but the speech in Mega Apocalypse is definitely sampled, not synthesised. I can even hear the English accent in the voice..

Mayhem · Post by **Mayhem** » Thu Oct 23, 2014 6:09 am

I can't remember exactly, but the speech in Mega Apocalypse is either Simon Nicol (the programmer) or Rob Hubbard (the musician).

RJBowman · Post by **RJBowman** » Tue Oct 28, 2014 3:24 pm

I believe that if you used a sampling rate of once per scan line, you'd get roughly 1/3 the sampling rate of CD audio, which is more than sufficient. The code to play sound samples is short, and would not steal much CPU time. And with 100% machine code and sprites, making a game run in the remaining CPU time would be no trick at all.

Kweepa · Post by **Kweepa** » Wed Oct 29, 2014 1:11 pm

You really can't have an interrupt every scanline and get any kind of performance, even for a looping 256 byte sample:

Code: Select all

7 (interrupt)
3 pha
5 inc mod+1 ; this code on the zero page
.mod
3 lda $4000 ; sample page
4 sta $900e ; volume control + aux color
4 pla
6 rti

32 total cycles. One scan line is only 65/71 cycles so even the simplest digitized 'speech' will take up about 50% cpu.
You could probably manage 8kHz (telephone quality) though - that would 'only' consume 32/(1000000/8000) = 25% cpu.
Once you need more than 256 samples, the code bloats up considerably.

Code: Select all

7 (interrupt)
3 pha
2 clc
2 lda mod+1 ; zero page code
3 adc #0
3 sta mod+1
3 bcc mod
0 inc mod+2
.mod
3 lda $4000 ; sample start
4 sta $900e ; volume control + aux color
4 pla
6 rti

42 cycles. For 8kHz, 42/(1000000/8000) = 33% cpu.
(And here you'd still need to poll mod+2 in your main loop, checking for the sample end.)

Mike · Post by **Mike** » Thu Oct 30, 2014 4:30 am

I already pointed the infeasibility of RJBowman's approach out to him in a posting dating from 2012: (link)

Not only does the interrupt routine already take a lot of cycles for being executed every 65/71 cycles, the interrupt vectors are fixed in the VIC-20 and are always routed to the following routines in the KERNAL:

Code: Select all

>FFFA  A9 FE     ; NMI entry ($FEA9)
>FFFC  22 FD     ; Reset entry ($FD22)
>FFFE  72 FF     ; IRQ entry ($FF72)

                 ; IRQ
.FF72  48        PHA              [3]
.FF73  8A        TXA              [2]
.FF74  48        PHA              [3]
.FF75  98        TYA              [2]
.FF76  48        PHA              [3]
.FF77  BA        TSX              [2]
.FF78  BD 04 01  LDA $0104,X      [4+0]
.FF7B  29 10     AND #$10         [2]
.FF7D  F0 03     BEQ $FF82        [2+1]
.FF7F  6C 16 03  JMP ($0316)
.FF82  6C 14 03  JMP ($0314)      [5]
-------------------------------------------
                                  29 Cycles

                 ; NMI
.FEA9  78        SEI              [2]
.FEAA  6C 18 03  JMP ($0318)      [5]
-------------------------------------------
                                   7 Cycles

... which means: besides the (variable) interrupt latency, there are another 29 or 7 cycles before the actual interrupt server runs, and it is also necessary to acknowledge the interrupt, adding another 4 cycles for a BIT instruction.

For the first example with IRQ we now arrive at 29(Kweepa's small routine incl. ACK)+9([!] max. latency)+29(IRQ header)=67, which just bombs on NTSC and barely works on PAL. Using an NMI: 29+9+7=45 ... means 63..70% CPU load.

Victragic wrote:Thanks, but the speech in Mega Apocalypse is definitely sampled, not synthesised. I can even hear the English accent in the voice.

You can find the PSID file of Mega Apocalypse somewhere on the 'net. It's roughly 22K in size. So it would appear most of the game is taken up by speech samples.

Judging from their quality, they seem to be played back at ~4 kHz sample rate max. That is manageable. If the sample play routine is 'always-on' and just gets switched between a silent sample and the required voice sample, you get that synchronised speech and the CPU load remains constant.

darkatx · Post by **darkatx** » Sat Nov 01, 2014 10:52 am

I can confirm that most popular 64 digitized speech is only played at 4Hz. I had a simple program from a 64 programming book that could simply adjust the rate of play.
Now ACE was the first game I ever had when I opened up my C128 and I believe that the samples on that were even lower than 4bit! The fidelity was so low that you could barely make out what was being said.
Also, I remember looking into ESS and how they managed to make their speech so clear yet keep the samples so small. After reading a couple of articles they basically chopped up the samples and removed the redundant information - used loops to fill those gaps and the end result was still a high quality reconstruction.
I tested it out using Soundforge and did a simple editing of a sound byte or two (Berzerk sound bites) and it worked. I always wondered if speech could be played in real time without bogging down the CPU and causing popping sounds..I guess now I know.

Victragic · Post by **Victragic** » Mon Nov 03, 2014 7:40 pm

I'm playing catch-up with this topic.

Also, I remember looking into ESS and how they managed to make their speech so clear yet keep the samples so small. After reading a couple of articles they basically chopped up the samples and removed the redundant information - used loops to fill those gaps and the end result was still a high quality reconstruction.

From memory ESS speech was all 'frozen', but that raises an interesting point. Aside from data compression, I'm wondering if selective cutting of samples would allow an illusion of synchronisation enough to fool the user into thinking that the action is continuing. After all, we're talking about very short samples.

I'm thinking that the sacrifice of an 'always-on' sound sampler would be too great, having to do so much other work as well with graphics. Nice gimmick, but it will remain just that.

Mike · Post by **Mike** » Thu Nov 06, 2014 12:15 pm

Victragic wrote:I'm thinking that the sacrifice of an 'always-on' sound sampler would be too great, having to do so much other work as well with graphics. Nice gimmick, but it will remain just that.

You'll find that quite a lot of games with sampled sound on the C64 use that technique. At 4 kHz the slowdown surely can be tolerated - after all, the RS232 KERNAL routines put a similar load on the CPU at - say - 2400 or 4800 baud. And for the game main loop, the timing surely gets easier when interrupts remain at constant load.

From memory ESS speech was all 'frozen', but that raises an interesting point. Aside from data compression, I'm wondering if selective cutting of samples would allow an illusion of synchronisation enough to fool the user into thinking that the action is continuing. After all, we're talking about very short samples.

You might take a hear at:

o C64MP3,
o Monophono, and
o Cubase64.

All on the C64. Here's also some interesting Vocoder stuff: Frodigi, Frodigi 2 and Frodigi 3.

groepaz · Post by **groepaz** » Thu Nov 06, 2014 5:55 pm

dont forget my humps

Denial

synchronised speech?

synchronised speech?

Re: synchronised speech?

Re: synchronised speech?

Re: synchronised speech?

Re: synchronised speech?

Re: synchronised speech?

Re: synchronised speech?

Re: synchronised speech?

Re: synchronised speech?

Re: synchronised speech?

Re: synchronised speech?