Author Topic: Vocab.900 - The 'Black Box'  (Read 31122 times)

0 Members and 1 Guest are viewing this topic.

Offline gumby

Vocab.900 - The 'Black Box'
« on: December 11, 2010, 05:17:23 PM »
I've been trying to understand the parse trees (using the FreeSCI parser documentation heavily).  Making good progress, I think.

Am I correct in thinking that this resource holds all valid combinations of said strings?  For example:  the user input 'turn on flashlight' translates to word classes of 'verb preposition noun', which has the corresponding hexadecimal values of 800 010 100.  And I can see some word classes referenced in some of the parser rules in the resource.  Also, I can see that the resource also refers to the 3 possible parts of the said string (predicate/subject/suffix) & some terminal values, but I haven't cracked it yet.

Not exactly expecting help here, I guess this is just a status report...
« Last Edit: December 11, 2010, 05:41:30 PM by gumby »


In the Great Underground Empire (Zork port in development)
Winter Break 2012 Rope Prop Competition

Offline MusicallyInspired

Re: Vocab.900 - The 'Black Box'
« Reply #1 on: December 11, 2010, 05:43:47 PM »
Dude, you're awesome for tackling all this. I barely understand how this stuff works to begin with and you're just powering through! The parser is definitely the most ambiguously documented feature of the SCI engine.
Brass Lantern Prop Competition

Offline gumby

Re: Vocab.900 - The 'Black Box'
« Reply #2 on: December 11, 2010, 09:40:18 PM »
Hey, thanks.  It's basically necessary that I understand this stuff for my port of Zork to SCI.  At this point, it's all up to parser for any of it to work so might as well figure it all out.  I would hate to do something the hard way if the parser already has provisions for it.
In the Great Underground Empire (Zork port in development)
Winter Break 2012 Rope Prop Competition

Offline gumby

Re: Vocab.900 - The 'Black Box'
« Reply #3 on: December 12, 2010, 03:40:37 PM »
Okay, I'm on my way.  I've dissected the file & I think I've got it.  It's all the parse tree rules - I re-read (and continue to do so) the great parser info from FreeSCI.  I've inputted all the trees into a spreadsheet & am attempting to convert it from 'symbol-based' to something that I can post here & explain without getting lost in the technical details.
In the Great Underground Empire (Zork port in development)
Winter Break 2012 Rope Prop Competition

Offline Collector

Re: Vocab.900 - The 'Black Box'
« Reply #4 on: December 12, 2010, 09:47:32 PM »
We'll have to give you a whole section on the Wiki for all you've worked out about the parser. I don't think anyone has teased out how to use the parser as much as you have in a how to way, at least not having also documented it with as much detail. Looking over Brain's help file I have found a lot of general SCI information, but not as much explaining how to use it. Perhaps I'll find more of that in his tutorial when I get back to entering it into the Wiki.

Edit: I have added an "Exploring the Parser" link in the navigation bar to link all of the parser stuff.
« Last Edit: December 12, 2010, 10:00:03 PM by Collector »
KQII Remake Pic

Offline MusicallyInspired

Re: Vocab.900 - The 'Black Box'
« Reply #5 on: December 13, 2010, 01:07:54 AM »
That is fan-flipping-tastic. This is going to be one loaded and super useful Wiki!
Brass Lantern Prop Competition

Offline Collector

Re: Vocab.900 - The 'Black Box'
« Reply #6 on: December 13, 2010, 02:54:53 AM »
That is fan-flipping-tastic. This is going to be one loaded and super useful Wiki!
I think so. I only have about 30 more pages to do of the help files and then I can move back to the tutorial. You will be able to load the Wiki in one tab and the Help files in the other for easy reference. Both have links to each other at the top of their nav boxes. I imagine many will have them loaded in their browsers as they work on their SCI projects.
KQII Remake Pic

Offline gumby

Re: Vocab.900 - The 'Black Box'
« Reply #7 on: December 14, 2010, 12:29:36 AM »
Here's my first try at explaining this:

Basically, you start with your input sentence at the first rule, moving down the list until all the non-terminals' are finally replaced with 'terminals'.  Not all rules apply to all parses, and sometimes a parse will arrive at a 'dead-end', where the parser will 'walk' back up the tree & try the next possible rule.  Note that there can be duplicate rules strings for any one rule number (I've separated them by rule number in the table below).

Important notes:  This grammar is ambiguous!  This means that you could parse an input sentence multiple ways (I won't show it here, but trust me).  There are several things that overcome this problem.  First of all, it is a left-handed parse (meaning we always proceed left-to-right through our input string matching tokens (words).  Second, order of rules in the grammar matter, resulting in the first complete parse is what is accepted - a rule higher up in the list takes precedence in the parse.


First, I've translated the hex 'operators' into abbreviated actual values.  Here is the table for that:

AbbrTypeExplanation
---------------------------
PredNon-terminalPredicate part: This identifies the first part of a sentence
SubjNon-terminalSubject part: This identifies the second part of a sentence
SuffNon-terminalSuffix part: This identifies the third and last part of a sentence
RefeNon-terminalReference part: This identifies words that reference another word in the same sentence part
mClaTerminalMatch on class mask
mWorTerminalMatch on word group         ---Note, does not appear in this grammar
ForcTerminalForce storage: Apparently, this was only used for debugging.         ---nor does this

Let's start with an example 'open door' to walk through the table.  For reference, open is an imperative verb while door is a noun.  For now, we will ignore the meanings & focus on the 'values'.  Note that the rules that actually apply to the successful parse are indicated with a red step number.

StepGrammar RuleNotes
01.013f -> 013cReally just a placeholder.  It instructs us to proceed to rule 013c.
02.013c -> 013bThe first rule in the 013c 'set' instructs us to proceed to rule 13b.
03.013b -> 0136, 013bThe first rule in the 013b 'set' instructs us to proceed to rule 0136.
04.0136 is a terminal rule, the 'indicative verb'Not a match to our first word in our input string (looking for an imperative verb).
05.013b -> 0136, 0130, 013b0136 is no match (same as #4 above)
06.013b -> 01360136 is no match, skip
07.013b -> 0136 0133 012f0136 is no match, skip
08.013b -> 0136 0133 0133 013f0136 is no match, skip
09.013b -> 012f
10.012f is a terminal rule - the 'imperative verb'A match for our 1st token in our input string!
11.Now the 2nd token in our input stringStart back at the beginning of the parse rules...
12.013f -> 013c
13.013c -> 013bskip all (9) 013b rules dead-end with no match on our noun terminal (0130).  Details removed for brevity...
14.013c -> 013dproceed to rule 013d
15.013d -> 0131, 0139skip - we don't have enough input tokens left to fulfill this rule
16.013d -> 0131, 013a, 0139skip, not enough input tokens
17.013d -> 013askip - ultimately dead-ends with no 0130 to match our noun...
18.013d -> 0139proceed to this rule...
19.0139 -> 0130, 0135, 013dskip, not enough tokens
20.0139 -> 0130, 0139skip, not enough tokens
21.0139 -> 0130proceed...
22.0130 is a terminal rule - the 'noun'a match for our 2nd token in our input string!
23.Finished.  No more tokens.

Here's a recap of which rules actually contributed to our parse:
StepInput stringNext rule / notes
0.open doorBefore parse...
1.013f door(next rule: 013c)
9.013c door(next rule: 012f)
10.012f door(terminal, imp,verb - done with token 1)
12.012f 013f(starting at beginning of parse tree with 2nd token)
14.012f 013c(next rule: 013d)
18.012f 013d(next rule: 0139)
21.012f 0139(next rule: 0130)
22.012f 0130(terminal, noun - done with token 2)
23.Done
Note:  In many of the rules above, I cheated for brevity.  Technically, we should take the first rule in the list & navigate all the way down the 'tree' (or parse rules) to a terminal (if possible).  If all the tokens in a parse are terminated, but there are 'incomplete' rules (rules left in the parse, but no tokens to apply them to), then it's treated as 'no match'.

Here all the parse rules in the 'black box', represented in order.  Note that M is the 'meaning' of V, the value.

Rule  (M)   (V)   (M)   (V)   (M)   (V)   (M)   (V)
----  ----  ----  ----  ----  ----  ----  ----  ----
013f  Pred  013c

013c  Pred  013b
013c  Pred  013b  Subj  013d  Refe  0133
013c  Pred  013b  Subj  013d  Refe  013a
013c  Pred  013b  Subj  013d  Refe  0137
013c  Pred  013b  Subj  013d
013c  Pred  013b  Subj  013d  Refe  0133  Suff  013d
013c  Pred  013b  Suff  013d  Subj  013d
013c  Pred  013b  Subj  013d  Suff  013e
013c  Pred  013b  Suff  013e
013c  Subj  013d
013c  Subj  013d  Suff 013e
013c  Refe  013b  Refe 013d Refe 0133 Pred 013c
013c  Refe  013d  Pred 013c

013d  0145  0131  Pred  0139             // I have *no* idea what operator 0145 corresponds to
013d  0145  0131  Refe  013a  Pred  0139  
013d  Refe  013a  Pred  0139
013d  Pred  013a
013d  Pred  0139

013e  Refe  0138  Pred  013d
013e  Refe  0136  Pred  013d
013e  Refe  0138  Pred  0136  Refe  013d
013e  Refe  0138  Pred  0136  Refe  0132
013e  Pred  013b
013e  Refe  013b  Pred  013e
013e  Refe  0138  Pred  013d  Refe  0133  Refe  013d

0139  Pred  0130  0145  0135  Pred  013d
0139  Refe  0130  Pred  0139
0139  Pred  0130

013b  Refe  0136  Pred  013b
013b  Refe  0136  Refe  0130  Pred  013b
013b  Pred  0136
013b  Refe  0136  Refe  0133  Pred  012f
013b  Refe  0136  Refe  0133  Refe  0133  Pred  012f
013b  Pred  012f
013b  Pred  012f  Refe  0133
013b  Pred  012f  Refe  0133  Refe  0133
013b  Pred  012f  Refe  013a
013b  Pred  012f  Refe  0137
013b  Refe  0137  Refe  0137  Pred  013b
013b  Refe  0137  Pred  013b
013b  Pred  0137
013b  Refe  0137  Refe  0130  Pred  013b
013b  Pred  012f  0145  0135  Pred  012f
013b  Pred  012f  Pred  012f  0145  0135  Pred  012f

013a  Pred  0132
013a  Refe  0132  Pred  013a

(Terminals)
0137  mCla  0400   // Adverb  
012f  mCla  0800  // Imperative Verb
012f  mCla  0200  // Indicative verb
0136  mCla  0200  // Indicative verb
0130  mCla  0100  // Noun
0134  mCla  0080  // Pronoun
0132  mCla  0040  // Adjective
0131  mCla  0020  // Article
0133  mCla  0010  // Preposition
0138  mCla  0008  // Special
0135  mCla  0004  // Special

In my next post, I'll explain the other part of the parse - the semantic part... (using the other tokens in the grammar we did not touch on here, the subject, predicate, suffix & reference).  From this exercise we have all the 'words' for the Said() string; the semantic part will determine the way the words are 'joined' together in the said string ('/', '<', etc).
« Last Edit: December 15, 2010, 08:01:06 AM by gumby »
In the Great Underground Empire (Zork port in development)
Winter Break 2012 Rope Prop Competition

Offline gumby

Re: Vocab.900 - The 'Black Box'
« Reply #8 on: December 14, 2010, 09:48:18 AM »
Someone had asked at one point how the developers at Sierra handled some of the 'ugly parses'.  I believe the answer to that lies in the the semantic token (0x14d) terminal token - which is a match on word group.

The existing 'black box' currently does not use the 'match on word group' token.  However, it could be used if you had some off-the-wall sentence that you needed special handling.  Just plug in a couple more rules with the word groups you want and have some special parse rules for those words.  I'll have to hack an example together later (maybe we ought to try 'basket of goodies' or something that doesn't seem to work right out of the box... focus on the 'of' word group here).

I also had another revelation.  The black box combined with the ability for words to have synonyms (word groups) totally addresses game internationalization.  I know I've seen a game by Robin that is all in French, I assume it uses the same black box as the other games (don't know how this works, maybe the black box works just fine with romantic languages?  Perhaps it was created with foresight & all the parse rules are already there for multiple languages?).

My point is that translating a game to a different language would be a 3 step process:
1.  Update the vocab to have synonyms for all the used words in the foreign language
2.  Update/Replace the black box with rules that make sense for the foreign language
3.  Translate all user output (text resources, etc) to the foreign language.

Hell, you could have one game support multiple languages and have the user select their desired language in the user preferences (assuming the parser could handle parse rules from both languages simultaneously).  You'd have 2 sets of text resources, one for each language, but that's no problem.  

This is unbelievably awesome coming from an engine so old.  Internationalization is still a major problem in today's computing world with current applications.  I am friggin' impressed.
« Last Edit: December 14, 2010, 09:51:10 AM by gumby »
In the Great Underground Empire (Zork port in development)
Winter Break 2012 Rope Prop Competition

Offline MusicallyInspired

Re: Vocab.900 - The 'Black Box'
« Reply #9 on: December 14, 2010, 10:47:17 AM »
Sierra was always way ahead of its time in eveything they did. It's impressive but certainly not surprising to me. Sierra was awesome.
Brass Lantern Prop Competition

Offline lance.ewing

Re: Vocab.900 - The 'Black Box'
« Reply #10 on: December 15, 2010, 05:09:21 PM »
To be honest I never spent much time looking into decoding SCI back when I was writing all those tools and specs for AGI. But having worked as a Java developer for the past 13 years, it occurs to me now that the way SCI works seems to be quite similar to Java. Doesn't SCI have a VM that the scripts compile to? And it's object oriented as well. Sounds very much like Java.

I guess the only difference is the Lisp syntax. Nowadays we have languages like Clojure that are lisp-like and compile to the Java VM, so that's even closer.

Yes, Sierra were definitely way ahead of their time. They were building the SCI system back in 1987 but Sun didn't build Java until around 1991 (Oak language as it was called then).

Offline Collector

Re: Vocab.900 - The 'Black Box'
« Reply #11 on: December 15, 2010, 05:25:28 PM »
Doesn't SCI have a VM that the scripts compile to? And it's object oriented as well. Sounds very much like Java.

Here is information about the SCI VM:

http://sierrahelp.com/SCI/Wiki/index.php?title=Chapter_5_-_The_SCI_Virtual_Machine
KQII Remake Pic

Offline gumby

Re: Vocab.900 - The 'Black Box'
« Reply #12 on: December 15, 2010, 07:24:40 PM »
To be honest I never spent much time looking into decoding SCI back when I was writing all those tools and specs for AGI. But having worked as a Java developer for the past 13 years, it occurs to me now that the way SCI works seems to be quite similar to Java. Doesn't SCI have a VM that the scripts compile to? And it's object oriented as well. Sounds very much like Java.

I guess the only difference is the Lisp syntax. Nowadays we have languages like Clojure that are lisp-like and compile to the Java VM, so that's even closer.

Yes, Sierra were definitely way ahead of their time. They were building the SCI system back in 1987 but Sun didn't build Java until around 1991 (Oak language as it was called then).


I assumed that SCI Studio/Companion simply used the lisp-like syntax as it's development language.  I don't know what language Sierra might have used to create their games.  I really wish that Troflip revisited the syntactical option (seen when creating a new game) for a C/C++ style syntax.  For me, LISP = Lots of Irritating and Silly Parentheses....
In the Great Underground Empire (Zork port in development)
Winter Break 2012 Rope Prop Competition

Offline Omni

Re: Vocab.900 - The 'Black Box'
« Reply #13 on: December 15, 2010, 10:53:10 PM »
I was under the impression that SCI32 was a bastardization of lisp and C. I used to know some of the programmers on "The Realm" and that is the way it was always explained to me =p

Offline lance.ewing

Re: Vocab.900 - The 'Black Box'
« Reply #14 on: December 16, 2010, 02:01:25 PM »
Correct me if I am wrong but I thought that 10+ years back someone in the SCI community (maybe Brian) had some real snippets of SCI code. I think it came from a book or perhaps a magazine article. Not sure exactly. If memory serves me right, it was for one of the Police Quest games (I might be wrong about that but fairly sure it was a Police Quest game). I had assumed that Brian based his syntax on the short snippets of real code that Sierra had released with books or articles or interviews or whatever. I'm impressed that Brian came up with something that actually works so well given that there was such little info available.
« Last Edit: December 16, 2010, 05:24:53 PM by lance.ewing »


SMF 2.0.19 | SMF © 2021, Simple Machines
Simple Audio Video Embedder

Page created in 0.034 seconds with 23 queries.