Shorten text on a cc65 program for Vic20?

HarryP2 · Post by **HarryP2** » Sat Jul 09, 2022 1:08 pm

Hi! I have a program called AdvSkelVic. It is the source code around which one can build one's own text adventure. It uses CBMSimpleIO and some Assembler, so it's pretty efficient. I have a cc65 function for cc65/CBM targets called printtok() that substitutes tokens in a string for tokenized text. It is re-entrant, so you can embed tokens in other tokens. I will reveal the URLs on request. Other than tokens, is there any way to shorten strings with little to no effort and extra code?

DarwinNE · Post by **DarwinNE** » Mon Jul 11, 2022 4:43 am

Tokenized text seems a nice approach. In my adventures, I implemented a Huffman compression. It allows to get compress ratio of about 45-50%. Decompression is reasonably quick with a binary tree and is done on the fly.

HarryP2 · Post by **HarryP2** » Mon Jul 11, 2022 4:47 am

Thank you.

HarryP2 · Post by **HarryP2** » Mon Jul 11, 2022 5:22 am

I have code to compress Adaptive Huffman Codes, but it's not ready for use yet, as I'm still trying to get big numbers from my compression techniques. Can you kindly post your compression code here, please?

DarwinNE · Post by **DarwinNE** » Mon Jul 11, 2022 5:25 am

Yes, of course. Here is it:

https://github.com/DarwinNE/aws2c/blob/ ... compress.c

Probably the compression code is not very elegant or efficient, but it is meant to work on a modern computer. The decompression code is generated by the output_decoder function:

https://github.com/DarwinNE/aws2c/blob/ ... ess.c#L207

The whole is part of AWS2C that generates the C source code of an adventure from a sort of a meta-language called AWS.

HarryP2 · Post by **HarryP2** » Mon Jul 11, 2022 5:29 am

Again, thank you. Does anybody here have any other ideas?

HarryP2 · Post by **HarryP2** » Mon Jul 11, 2022 6:58 am

DarwinNE: I ask you to add tokens to your text adventure creator and mention they're from me.

HarryP2 · Post by **HarryP2** » Mon Jul 11, 2022 7:32 am

I ask you to support cc65 directly and use my Cubbyhole technique and other optimizations mentioned at https://sourceforge.net/projects/cc65extra/files/. The Cubbyhole technique puts some code and data in the first 1k on a Vic20 and C64, 2k on the Plus4 and 7k on a C128 that is not used during the course of a cc65 program. My MadLib* and AdvSkel*65 programs demonstrate their use.

Mike · Post by **Mike** » Mon Jul 11, 2022 4:09 pm

HarryP2 wrote:[...] Does anybody here have any other ideas?

I used a text compression method in Bible Series, part II: Pentateuch that encodes three subsequent characters of a fixed 5-bit alphabet (all lowercase letters, blank, comma, period, semi-colon, hyphen and single quote) into two bytes.

Chuck Guzis had this algorithm published in Dr. Dobb's Journal #207 (November 1993), and I made some small enhancements to ensure binary transparency. With this algorithm, most texts compress by ~30%, which was sufficient in this case to fit the whole Torah on one disk for the 1581.

Before you "ask" me about something - said application and its support algorithms work fine as they are.

HarryP2 · Post by **HarryP2** » Tue Jul 12, 2022 4:59 am

I can't use your algorithm as you stated, as I have a need for 7 bits per entry. But, I could compress to 7 bits per entry, including tokens. What do you think?

BTW, on some versions of my text adventure code, I store most strings in a system's extra memory. I could create a function to decompress and access the text on-the-fly. Thank you for the info!

HarryP2 · Post by **HarryP2** » Thu Jul 14, 2022 9:30 am

I have a technique called POBasic, short for Placement Offset Basic. It shortens some literals to an offset to the previous version of the literal. Sort of LZ77 for single bytes. The problem with both this and Adaptive Huffman is knowing the values before the current string. I don't think POBasic would be better than Huffman, but both combined seems to help.

BTW, if you use this method, all I ask is that you reveal in your docs. that you use this technique and e-mail me that you did so and with the name and URL of the program. If you have any questions, just reply here, PM me or ask me for my e-mail address.

HarryP2 · Post by **HarryP2** » Sun Jul 17, 2022 8:17 am

One more idea: if you have a Huffman lit that will compress to >=8 bits, don't compress it. Rather, write it directly. Unfortunately, you need an extra bit to determine whether a byte is compressed or not, but my tests show that this works. Then again, if a block doesn't include such bytes, one bit at the start of the block will determine whether to do my idea. Try it out!

BTW, Is there an easy way to compress text other than tokenization? I'm currently not ready for Huffman.

HarryP2 · Post by **HarryP2** » Tue Jul 19, 2022 11:09 am

Hi! I am offering an exchange of compression techniques, and I want some ideas not listed on Wikipedia or in addition to something listed on Wikipedia in exchange. My ideas:

* Tokenization: often-repeated text is shortened to a one byte token. AN example of this is my cc65 printtok() function at https://sourceforge.net/projec.../files/ui/.
* POBasic: If a lit is the same as a near-by previous lit, shorten it to an offset to the previous version.
* Lasrt16: If a LZ77-compressed block was a repeat of one of the last 32 LZ77-compressed blocks, compress it to a number indicating which previous block is used.
* If a Huffman Code-compressed literal compresses to more than 8 bits, don't compress it. Rather, copy it as an 8-bit literal. This requires an extra bit per literal, but my experiments show that this usually helps.

You may use these ideas in your programs provided you mention me as the source of the ideas in them and tell me via PM or e-mail you're using these ideas. Right now, I'm especially looking for something which requires little work.

chysn · Post by **chysn** » Wed Jul 20, 2022 12:39 pm

It doesn't seem like a one-byte token would provide a big enough lexicon for a reasonably-complex text adventure. I'd probably use an extendible tokenization system (where $ff indicates that there's a second byte that switches to a second table, and so on), or maybe a straight 12-bit token.

HarryP2 · Post by **HarryP2** » Thu Jul 21, 2022 4:21 am

Thank you. You're right about the tokens. I managed to compress the text of one of my text adventures by 13.6%, and I still have room for a few more tokens.

Denial

Shorten text on a cc65 program for Vic20?

Shorten text on a cc65 program for Vic20?

Re: Shorten text on a cc65 program for Vic20?

Re: Shorten text on a cc65 program for Vic20?

Re: Shorten text on a cc65 program for Vic20?

Re: Shorten text on a cc65 program for Vic20?

Re: Shorten text on a cc65 program for Vic20?

Re: Shorten text on a cc65 program for Vic20?

Re: Shorten text on a cc65 program for Vic20?

Re: Shorten text on a cc65 program for Vic20?

Re: Shorten text on a cc65 program for Vic20?

Re: Shorten text on a cc65 program for Vic20?

Re: Shorten text on a cc65 program for Vic20?

Exchange of compression ideas

Re: Exchange of compression ideas

Re: Exchange of compression ideas