Author Topic: Vocab.000 file format  (Read 6852 times)

0 Members and 1 Guest are viewing this topic.

Offline gumby

Vocab.000 file format
« on: August 08, 2010, 05:45:50 PM »
I'm attempting to create a vocab.000 file from scratch.  I exported the one that is created from a template game, exported it, and opened it with a hex editor.  The contents didn't make any sense to me at all.

That is, until I stumbled upon this at the freesci site:
http://freesci.linuxgames.com/scihtml/x5171.html#AEN5173

It explains the file format in detail.  Thought I would pass it along if anyone else wants to tinker with the vocab.000 file directly.


In the Great Underground Empire (Zork port in development)
Winter Break 2012 Rope Prop Competition

Offline gumby

SCI Vocab Editor
« Reply #1 on: September 21, 2010, 07:17:19 PM »
Okay, so here's the tool that I've developed.  It will take a vocab.000 file & export it to a spreadsheet (Excel 97/2000/xp format - OO will open it as well) and will also convert an properly formatted spreadsheet & convert it to a vocab.000 file.

Why?  I needed some way to add new vocab words within a batch process.  I figured that Excel was as good a format as any, and allows for a non-batch process (i.e. a person) to manipulate the vocabulary externally from Studio/Companion (for what purpose, I don't know).

An exported spreadsheet consists of a list of words, word classes (parts of speech) & groups (synonyms for lack of a better description).  It is not necessary to put the words in alphabetical order when planning on converting the spreadsheet back to a vocab file.  Sorting is performed at the time of file creation.

It's written in Perl, but I've also compiled it into an executable.  The executable should work on XP machines, no pre-install of Perl required.  As for Vista/Win7 you may want to set the compatibility level to XP before usage. 

Here is the link for the download: SCIVocabEditor
« Last Edit: September 28, 2010, 07:10:37 PM by Cloudee1 »
In the Great Underground Empire (Zork port in development)
Winter Break 2012 Rope Prop Competition

Offline Collector

Re: Vocab.000 file format
« Reply #2 on: September 21, 2010, 09:01:15 PM »
I can host it with the other SCI tools on SHP, if you like. As far as the format, might it make more sense to use XML? I love working with Excel, but it seems likely that it could add some odd formatting issues, i.e, like using Word to edit a simple ASCII INI file. If you really want to use Excel, perhaps import/export using CSV?

Anyway, nice to see someone developing new SCI tools.
KQII Remake Pic

Offline gumby

Re: Vocab.000 file format
« Reply #3 on: September 21, 2010, 09:30:24 PM »
Sure, I'd love it if you would host it on your site.  The reason I went with Excel was really for the 'built in' editing functions & the sorting capability (developers would probably want to sort by the group number to develop synonyms).  Seems like when I work with XML many times it seems like overkill - heavy & bloated.   

However, there *is* potential for formatting/translation errors using this with Excel/OO.  I didn't run into any formatting issues as long as the columns were formatted as just plain text (especially the word group & class fields).  I have doubts that anyone will want to use it except out of curiosity - the editor within Companion works just fine for adding/manipulating words for me.

I may revisit this tool & add XML support at a later date...   

In the Great Underground Empire (Zork port in development)
Winter Break 2012 Rope Prop Competition

Offline Collector

Re: Vocab.000 file format
« Reply #4 on: September 21, 2010, 10:27:10 PM »
Sure, I'd love it if you would host it on your site.  The reason I went with Excel was really for the 'built in' editing functions & the sorting capability (developers would probably want to sort by the group number to develop synonyms).  Seems like when I work with XML many times it seems like overkill - heavy & bloated.   

However, there *is* potential for formatting/translation errors using this with Excel/OO.  I didn't run into any formatting issues as long as the columns were formatted as just plain text (especially the word group & class fields).  I have doubts that anyone will want to use it except out of curiosity - the editor within Companion works just fine for adding/manipulating words for me.
That is why I also suggested CSV. I often use Excel for sorting and other kinds of manipulation of non spreadsheet lists. A CSV can be opened and edited in both Excel and Notepad. When I was a trainer I had to manage rosters that came to me as a spreadsheet with first and last names in the same cell. This limits sorting options. Rather than manually breaking them up into separate cells, I would copy and paste it into Notepad, do a search and replace of spaces with commas and save it as a CSV. I could then open it in Excel with first and last names in separate columns in a matter of seconds. A CSV is essentially ASCII format (no formatting) and would hold all of the data, with lines separating the rows and commas separating the columns.
KQII Remake Pic

Offline Cloudee1

Re: Vocab.000 file format
« Reply #5 on: September 21, 2010, 11:04:09 PM »
Here's an idea for a related tool, I would think anyway. how about something that would allow us to extract text resources so that we could run them through some sort of spelling or grammer check and reinject the corrected lines.
Halloween Competition Brass Lantern Prop Competition Groundhog Day Competition

Offline gumby

Re: Vocab.000 file format
« Reply #6 on: September 21, 2010, 11:24:15 PM »
Ha!  I started on that today!  I've already gotten the format pretty much figured out for the text resources within the resource.001 file.  I've got my eye on several command-line tools for grammar and spelling I was thinking about integrating directly into the tool.

Collector - yeah, you're right.  CSV would be a safe format; I wouldn't think that a comma would be used within a word, and certainly not in a group or word class.  I suppose if a CSV format was desired, one could always export it in CSV format from Excel (but obviously, it would be subject to however Excel formatted it).  I'll look into that as well when I revisit this tool.
« Last Edit: September 21, 2010, 11:32:16 PM by gumby »
In the Great Underground Empire (Zork port in development)
Winter Break 2012 Rope Prop Competition

Offline MusicallyInspired

Re: Vocab.000 file format
« Reply #7 on: September 22, 2010, 01:39:51 AM »
Nice work, gumby! And a spell/grammar check for text resources is also a neat idea. Nice to see some new activity in creating SCI resource programs!
Brass Lantern Prop Competition

Offline Collector

Re: Vocab.000 file format
« Reply #8 on: September 22, 2010, 04:12:17 AM »
Even if a word had a comma, you can define a cell's content by enclosing it with double quotes, i.e. "my,word","next,word" etc. It will include the comma inside the quotes as part of the cell's data.
KQII Remake Pic

Offline gumby

Re: Vocab.000 file format
« Reply #9 on: September 22, 2010, 08:09:24 AM »
I completely forgot that you could wrap quotes around csv values.  And I've been spent enough of my professional career stripping quotes out of files not dissimilar to this! We could also even go with tab-delimited - that should take care it too.
In the Great Underground Empire (Zork port in development)
Winter Break 2012 Rope Prop Competition

Offline gumby

Re: Vocab.000 file format
« Reply #10 on: April 04, 2012, 09:13:48 AM »
This vocabulary editor is broken.  It correctly builds the words part of the file properly, but the header isn't correct.  I spent a couple of hours with it last night and wasn't able to figure it out.  It's not creating the offset values properly.

I created a vocab.000 file and imported it into a game.  I had said() strings that referenced the new vocabulary, which seemed okay.  I could compile the game and run it, but when I tried to exercise the said()s, none of the words starting with the letter 'c' were recognized by the game.  The header was malformed so that the offset for the 'c' words that the whole section was effectively skipped.

The only way I could get the vocabulary to work was to set all the offsets in the header to a value of 1, which basically makes reading of the vocabulary resource start from the head of the file each time for each letter of the alphabet, instead of skipping right to the relevant letter of the alphabet.  Suboptimal, but effective.
« Last Edit: April 04, 2012, 09:17:24 AM by gumby »
In the Great Underground Empire (Zork port in development)
Winter Break 2012 Rope Prop Competition


SMF 2.0.19 | SMF © 2021, Simple Machines
Simple Audio Video Embedder

Page created in 0.031 seconds with 23 queries.