Author Topic: Message file format?  (Read 3984 times)

0 Members and 1 Guest are viewing this topic.

Offline ZvikaZ

Message file format?
« on: February 28, 2021, 10:42:53 AM »
Is there some documentation for the message file format?
(besides reading ScummVM's message.cpp , or the original https://github.com/OmerMor/SCI16/blob/master/INTERP/MESSAGE.C)



Offline Kawa

Re: Message file format?
« Reply #1 on: February 28, 2021, 03:02:50 PM »
There are three common formats in use. Their format is like this:

First, the resource identifier magic bytes as usual. Then two bytes for the version number (up to 2101 for lame, up to 3411 for okay, anything higher for best), then two pad bytes.

Then, if it's a lame version, you get two bytes for the amount of messages, then that many times you get a byte for the noun, a byte for the verb, and two bytes for the offset to the text. After all of that is the actual message strings, one after another. Anything after that is general use space for comments or whatever, which can be ignored.

If it's an okay version, you skip two bytes, then read the amount of messages. Each entry is a byte for the noun, verb, condition, sequence, and talker, then two bytes offset to the text, and three bytes of padding. Again, the actual text follows.

If it's the best version, you get two bytes as a pointer to the end (to quickly find the comment area?), two bytes of mystery that aren't actually used, then two bytes message count. Where the okay version has three padding bytes, the best version instead has four reference bytes, in the order noun/verb/cond/seq.

And that's the format!

I know for a fact that the end of version >4000 is used for optional comments because I've seen the actual Sierra Message Editor matching that version, and I know for a well-informed guess that the padding in the middle version is just that.

In SCI11+, the comment area ends with "UTF8" if the message text is to be interpreted that way. This is for my tools' use, not the interpreter itself. The interpreter doesn't give a rat's behind.
« Last Edit: February 28, 2021, 06:35:18 PM by Kawa »

Offline OmerMor

Re: Message file format?
« Reply #2 on: February 28, 2021, 04:41:30 PM »
There are three common formats in use. Their format is like this:

First, the resource identifier magic bytes as usual. Then two bytes for the version number (up to 2101 for lame, up to 3411 for okay, anything higher for best), then two pad bytes.

Then, if it's a lame version, you get two bytes for the amount of messages, then that many times you get a byte for the noun, a byte for the verb, and two bytes for the offset to the text. After all of that is the actual message strings, one after another. Anything after that is general use space for comments or whatever, which can be ignored.

If it's an okay version, you skip two bytes, then read the amount of messages. Each entry is a byte for the noun, verb, condition, sequence, and talker, then two bytes offset to the text, and three bytes of padding. Again, the actual text follows.

If it's the best version, you get two bytes as a pointer to the end (to quickly find the comment area?), two bytes of mystery, then two bytes message count. Where the okay version has three padding bytes, the best version instead has four reference bytes, in the order noun/verb/cond/seq.

And that's the format!

I know for a fact that the end of version >4000 is used for optional comments because I've seen the actual Sierra Message Editor matching that version, and I know for a well-informed guess that the padding in the middle version is just that.

In SCI11+, the comment area ends with "UTF8" if the message text is to be interpreted that way. This is for my tools' use, not the interpreter itself. The interpreter doesn't give a rat's behind.

Here's the actual struct for the best version:
Code: [Select]
_Packed struct MsgData {
UInt32 version;
UInt16 length; // size of file starting at count field, excluding
// comments
UInt16 lastMsgNum;
UInt16 nEntries;
IndexEntry entries[1];
};

There's goes the mystery out of your mystery bytes.  ;D

Offline Kawa

Re: Message file format?
« Reply #3 on: February 28, 2021, 06:34:40 PM »
Y'know what?

I could've known that. It says so right here in my copy of the sauce! But in the end it doesn't really matter, because lastMsgNum is unused.

Offline ZvikaZ

Re: Message file format?
« Reply #4 on: March 11, 2021, 05:09:40 PM »
Thanks for the detailed answer!

Can you translate 'lame/ok/best' ( ;D) to some more common terms? (SCI1.1, etc.)

And what's the meaning of the best's reference bytes? What are they used for?

Offline Daventry

Re: Message file format?
« Reply #5 on: March 12, 2021, 01:06:33 AM »
Zvikaz,
you already asked about the messages. Here's the discussion.
http://sciprogramming.com/community/index.php?topic=1960.msg14138#msg14138
The official documentation is only available for version 4000+, thanks to Overmor. There are three versions of messages in total: 2000+, 3000+, 4000+. Here are my notes, I made them for myself. Sorry about the English. This is a Google translation.
* .msg
Game messages. Basic files for translation.
Three slightly different versions are known. Not too important, but still.
There is no exact numbering and changes made over time in the public domain (in any case, I did not find). Therefore, somewhat tentatively, I number them as follows:
version 2000, 3000, 4000.

1. Version 2000 (number within a specific message file> = 2000 <3000).
Block 1 - file header, size = 8 bytes
Offset Size Description
00 2 Message file identifier, equal to 0x8F or 0x0F
(in some cases, when numbering a specific type of resource, the base value 0x80 is used,
which is added to the identifier, for example 0x0F + 0x80, we end up with the identifier 0x8F)
02 4 Version number (0x00000835 = 2101, but optional, there are other numbers in the range from 2000 to 3000,
for example, the diskette versions of EQ1 contain numbers 2100, 2101, 2110, 2054)
06 2 Number of messages in the file

Block 2 - message addresses, size = 4 bytes
Offset Size Description
00 1 Noun (noun)
01 1 Verb (verb)
02 2 Address of the message text ("address" = "this value" + 2)

Block 3 - the messages themselves
Offset Size Description
00 variable Message in text format, ends with 0

Block 4 - developer comments (not used in the game) and this block may be absent altogether
Offset Size Description
00 2 Always 0 (?), Destination unknown
02 variable Comment in text format, ends with 0

2. Version 3000 (number within a specific message file> = 3000 <4000).
Block 1 - file header, size = 10 bytes
Offset Size Description
00 2 Message file identifier, equal to 0x8F or 0x0F
02 4 Version number (0x00000d53 = 3411, optional, for example, numbers 3300, 3340, 3400 are found in QfG1vga)
06 2 Pointer to first byte AFTER text data, start of block 4 ("address" = "this value" + 8 bytes).
In fact, this is the file size WITHOUT comments
08 2 Number of messages in file

Block 2 - message addresses, size = 10 bytes
Offset Size Description
00 1 Noun (noun) (which object the action is performed on, the list is in the corresponding .hep file)
01 1 Verb (verb) (which action, i.e. which pictogram is selected in the menu or item from the inventory)
02 1 Case (state, position, condition)
03 1 Sequence (sequence, several consecutive message windows can be called)
04 1 Talker
05 2 Message address ("real addr." = "This addr." + 2)
07 3 Meaning unclear

Block 3 - the messages themselves
Offset Size Description
00 variable Message in text format, ends with 0

Block 4 - comments (most likely missing)

3. Version 4000 (number within a specific message file> = 4000 <= 5000).
Block 1 - file header, size = 12 bytes
Offset Size Description
00 2 Message file identifier, equal to 0x8F or 0x0F
02 4 Version number (0x00000faa = 4010, optional such value)
06 2 Pointer to first byte AFTER text data, start of block 4 ("address" = "this value" + 8 bytes).
In fact, this is the file size WITHOUT comments
08 2 Number of the last message. Why it is needed is unclear.
Almost always more than the number of messages (in the next variable).
0A 2 Number of messages in file

Block 2 - message addresses, size = 11 bytes
Offset Size Description
00 1 Noun (noun) (which object the action is performed on, the list is in the corresponding .hep file)
01 1 Verb (verb) (which action, i.e. which pictogram is selected in the menu or item from the inventory)
02 1 Case (Condition)
03 1 Sequence (sequence, several consecutive message windows can be called)
04 1 Talker
05 2 Message address ("real addr." = "This addr." + 2)
07 1 Noun of referenced message (next 4 values, this is a reference to some other message)
08 1 Verb of referenced message
09 1 Condition of referenced message
0A 1 Sequence of referenced message (not really used in the game, probably introduced "just in case" :))

Block 3 - the messages themselves
Offset Size Description
00 variable Message in text format, ends with 0

Block 4 - developer comments (not used in the game), this block may not be available at all
Offset Size Description
00 variable Comment in text format, ends with 0
.. 6 The meaning is unclear, the 6th byte is always 0

It can be seen that the format did not change too much. First, the combination Noun + Verb was used, then Case and Sequence were added,
and at the end also the reference values ​​Noun, Verb, Case.
This system works as follows. When the player clicks on something, a "verb", "noun" pair is generated.
The cursor type is a verb, the object on the screen is a noun. For example, we click the "eye" on the "door". Generated by: DOOR LOOK, translated into appropriate
numerical values ​​4 1, a condition is added, if the state of the door or the environment is not important, then 4 1 0. If the door can be opened or closed, or
for example, night has come and this is significant, then there will be a corresponding number of "conditions". Then you don't even need to code anything, just the message is searched
with the corresponding codes. Several messages may appear in the same situation. Sequence indicates the sequence number of the message in the chain.
The combination of noun-verb-case-sequence for each "room" is unique.
The reference values ​​are introduced in order not to duplicate (when it happens) the same message, i.e. in several different messages the text is empty, but a link is given
for one (common for these situations) message.
A value of 0 is always interpreted as ALL, i.e. any verb, any noun, any condition. Items in the inventory are verbs. For example, "knife"
you have to understand how "act with a knife on ..."

Default values.
Verbs:
ALL LOOK TALK WALK DO
0 1 2 3 4
Nouns:
ALL
0
Conditions:
ALL
0
Speaking:
PrintMSG SysMSG NARRATOR
97 98 99
In a particular game, these values ​​can be redefined, for example in KQ6 "DO" is number 5.

A little about the TraduSCI format (SCIaMano). Enrico Rolfi figured out how to preserve the original messages when translating. He added
block 5, storing the original messages and block 2a (between 2 and 3), there is a signature and some service information.


SMF 2.0.19 | SMF © 2021, Simple Machines
Simple Audio Video Embedder

Page created in 0.02 seconds with 22 queries.