Author Topic: Original SCI syntax  (Read 143367 times)

0 Members and 4 Guests are viewing this topic.

Offline lance.ewing

Re: Original SCI syntax
« Reply #45 on: March 30, 2015, 09:02:24 AM »
Hmmm, yeah, I can see how that would be cleaner where all we're doing to the state value is, for example, incrementing it.

Offline troflip

Re: Original SCI syntax
« Reply #46 on: April 11, 2015, 01:34:50 PM »
A few other things I've found out from decompiling scripts (often where my decompiler fails):

It seems that, like LISP, more than one parameter was allowed for comparison operators (in addition to other operators). So to check is an event's message was between 32 and 255 you could have:

Code: [Select]
(if (<= 32 (pEvent message?) 255)
    // do something
)


I also see a lot of code like this:
Code: [Select]
(if (globalvar0025)
    (globalvar0025 dispose:)
)

... but the request to retrieve the globalvar isn't made twice. So it's basically re-using the thing that was tested in the condition, which makes a difference if executing the code in the condition caused side effects (like calling a function might).

In smalltalk, I believe, this might be written something like this:

Code: [Select]
globalvar0025 ifNotNilDo: [:theVar | theVar dispose:]

But presumably the original SCI had some kind of syntax to evaluate an expression, and then use the result of that expression in another expression, if the expression was true. Any guesses as to what that would look like?

I've also seen a case where an entire switch statement was used as a value to pass to a function (SQ3, script 993). Basically the equivalent of:

Code: [Select]
(= handle
        (switch (paramTotal)
            (case 0
                FOpen(name 0)
               
            )
            (case 1
                FOpen(name param1)
               
            )
            (default
                0
            )
        )
)

Check out my website: http://icefallgames.com
Groundhog Day Competition

Offline lance.ewing

Re: Original SCI syntax
« Reply #47 on: April 12, 2015, 06:14:25 AM »
The example with the switch statement doesn't look particularly unusual to me. It seems to fit with the LISP-like nature of the language. I agree that it wouldn't make much sense in a language such as Java (which is what I program in for my day job), but perhaps for SCI, where everything in parentheses, i.e. every s-expression, would seem to have a value, it might make sense. That particular example you've shown would seem to make a lot of sense. I'd say that the same type of usage would be possible for the "if" statement as well, and technically and syntactically it is probably possible for things like "for" and "while" as well, but I doubt that there'd be a good reason to write SCI code that calls for that. The code would be cryptic and ugly. But for "switch" and "if", it seems quite useful. Obviously a "switch" and an "if" can do more than execute a single thing that returns a single value, so presumably the switch or if evaluates to the value of the final statement that was executed within the structure. This is exactly how LISP behaves for its equivalent statements. I found a number of references to this behaviour in LISP. Apparently special forms such as "if" and the like are designed to be convenient for use at the top level (so wouldn't normally be used in the way you've discovered them being used), but it is not illegal to use them embedded in other forms. Whether it is meaningful to do so in the context it is used is the real question. The example you've shown is clearly meaningful.

Out of interest, how are you intending to represent an "if-elseif-elseif-else" type of structure? I noticed that LISP doesn't support this with the "if" statement itself. It uses something called "cond" for that. I guess "else if" could be achieved with nested ifs but that might get a bit too nested and ugly.

That observation about the <= operator makes sense. I agree that LISP also supported this. I just tested it on an online LISP IDE.  :-)

For the globalvar0025 example, globalvar0025 must be an object given the call to the dispose method. I'm assuming therefore that an object reference can be used as a conditional expression, i.e. if it not yet pointing to something, it would evaluate to false, but if it is pointing to something, then it would evaluate to true. Are you able to copy and paste the byte code for this and a bit either side for context? I probably need to see that to have an opinion on the side effects of it re-using the object.

Btw, I'm fairly sure the FOpen calls would be (FOpen name 0) rather than FOpen(name 0)

Offline troflip

Re: Original SCI syntax
« Reply #48 on: April 12, 2015, 04:38:17 PM »
Out of interest, how are you intending to represent an "if-elseif-elseif-else" type of structure? I noticed that LISP doesn't support this with the "if" statement itself. It uses something called "cond" for that. I guess "else if" could be achieved with nested ifs but that might get a bit too nested and ugly.

I hadn't really though about that yet, but I think that's pretty much covered with switch statements already. Both SCI Companion and SCI Studio support expressions for case values, not just constants (though I haven't seen any evidence of original game scripts using anything but constants).

For the globalvar0025 example, globalvar0025 must be an object given the call to the dispose method. I'm assuming therefore that an object reference can be used as a conditional expression, i.e. if it not yet pointing to something, it would evaluate to false, but if it is pointing to something, then it would evaluate to true. Are you able to copy and paste the byte code for this and a bit either side for context? I probably need to see that to have an opinion on the side effects of it re-using the object.

In Motion::init, in the template game we have:
Code: [Select]
= theCycler (send client:cycler)
(if(theCycler)
(send theCycler:cycleCnt(0))
)

When decompiling the SQ3 version of Motion::init, my decompiler produced this:

Code: [Select]
        (if (= temp3 (send client:cycler))
            (send ERROR_UNKNOWN_OBJECT:cycleCnt(0))
        )

... because it didn't find an object being sent to.

If you look at the disassembly, with my annotated comments, you'll see why:

Code: [Select]
  0374:38 00a7          pushi a7 // $a7 cycler
  0377:76               push0
  0378:63 08             pToa client
  037a:4a 04             send 4  // This, and the above 3 instructions are "(send client:cycler)"

  037c:a5 03              sat temp3   // This assigns the result of that to temp3
  037e:30 0006            bnt code_0387   // This branches on the result of the assignment (which is equal to the thing that got assigned)
  0381:39 7e            pushi 7e // $7e cycleCnt    // sending to the cycleCnt selector
  0383:78               push1   // one paramter
  0384:76               push0                     // which is "0"
  0385:4a 06             send 6 // Normally just before here there would be an instruction that puts something into the accumulator.
                                        // Instead it is using what was left from the last instruction that changed the acc (037c:)


My decompiler doesn't (yet) know how to look backwards through branch instructions. And even if it could, I need to figure out what to output for the decompiled source code. In this case, Brian created a temp variable (theCycler) to handle the fact that SCIStudio doesn't support this kind of construct. The original script doesn't have this temp variable.

It's possible the original code used the same constructs as Brian's decompiled code, and Sierra's compiler happened to be smart enough to optimize the temp variable out of the compiled script. I kind of doubt that though, because other parts of the original scripts don't seem particularly "optimized". Like for instance, they have unreachable code, like two JMP instructions in a row (the second is never reachable).

Btw, I'm fairly sure the FOpen calls would be (FOpen name 0) rather than FOpen(name 0)

Yeah, sorry, that was using SCIStudio syntax. I guess I need to be more clear whether each snippet is SCIStudio syntax, LISP, smalltalk, or an attempt at "original SCI syntax".  :P
« Last Edit: April 12, 2015, 04:46:11 PM by troflip »
Check out my website: http://icefallgames.com
Groundhog Day Competition

Offline lance.ewing

Re: Original SCI syntax
« Reply #49 on: April 12, 2015, 06:38:58 PM »
Embedding the assignment in the "if" test expression looks okay to me. I think that that would be valid in the original syntax, based on what we were discussing a few posts back. At the same time, Brian's approach was probably valid as well, i.e. having it outside the "if", in fact it would have to be valid if the embedded one works. I wonder whether the compiler would produce the same byte code for both cases? I guess it probably would.

You know, I have a feeling that it actually is a compiler optimization that is causing your problem, but perhaps not the one you've suggested. I don't understand the comment about the temporary variable though. Doesn't the byte code "sat temp3" imply that there was a temporary variable in the original source code? It doesn't seem like Brian created it given the "sat". It's only use appears to be to hold the object ID of the cycler property, which we can assume might in some instances not yet contain a reference to a cycler. The same thing could be achieved like this (using assumed original syntax):

Code: [Select]
        (if (client cycler?)
            ((client cycler?) cycleCnt: 0)
        )

I'm not suggesting that that is better because it probably isn't, and it's obvious from the byte code that the original source didn't look like that since the same send doesn't appear twice. What I'm saying is that it conveys the same thing, i.e. send the cycleCnt message to the client's cycler if the client's cycler property is set. So the temporary variable's only purpose seems to be to avoid the double call to client to get the cycler property. And given this, I quite like the syntax you've already got for it, but obviously with the unknown object being resolved to the temp3 variable:

Code: [Select]
        (if (= temp3 (client cycler?))
            (temp3 cycleCnt: 0)
        )

Not sure if you've thought about it yet, but in a situation like the above, the temporary variable could be named using the property name as part of the variable's name, e.g. generated like this:  tmpCycler, or even clientCycler.

Back to the missing instruction to load the accumulator with a value (presumably from temp3). I don't think it is the temporary variable that has been optimized out given the presence of the "sat". I'm guessing that the compiler stripped out a "lat temp3" because it worked out that the accumulator already had that value. Can the decompiler recognise this scenario? Or would it need to identify and reverse such optimizations before decompiling?

I was wondering over recent days whether a decompiler like this actually needs to almost interpret the byte code as it goes along, for example, pushing things on to stack, keeping track of what value the accumulator currently has, effectively doing what the byte code is saying in order to work out what state things are in at important points in the code, such as this send. Are you doing something like that?

As to the evidence of non-optimized code (such as the double jmp), perhaps it was a compiler option that the individual coder could choose. Maybe if things were getting tight, they'd turn the optimizer option on. Maybe some people did it by default and others didn't. Who knows. I guess we might be able to find evidence for this though. If I get a chance, I might see if I can find something that looks optimised vs something that doesn't. Given their tight constraints back then, they must have supported a few optimizations.

Offline troflip

Re: Original SCI syntax
« Reply #50 on: April 12, 2015, 08:14:06 PM »
I'm not concerned with the assignment in the if. That works, and it's the most obvious result of the decompilation.

And this:
Code: [Select]
(if (= temp0 foo)
    // Stuff
)

would definitely produce different byte code than this:
Code: [Select]
(= temp0 foo)
(if (temp0)
    // Stuff
)

The latter will have an extra lat instruction to load the temp variable into the accumulator for the branch. Both my compiler and, I assume, Brian's compiler do this. With a little work, this could be safely optimized out though. Maybe it's true that Sierra's compiler did that.


I don't understand the comment about the temporary variable though. Doesn't the byte code "sat temp3" imply that there was a temporary variable in the original source code? It doesn't seem like Brian created it given the "sat".

Yeah, that was a bad example. But you can imagine, say, the result of a FGets kernel call or something. The only way to get identical behavior would be to pull the result into a temp variable.

Not sure if you've thought about it yet, but in a situation like the above, the temporary variable could be named using the property name as part of the variable's name, e.g. generated like this:  tmpCycler, or even clientCycler.

Yeah, that's a good idea. It shouldn't be too hard to write code to scan for how the variable is used and suggest a name.

I was wondering over recent days whether a decompiler like this actually needs to almost interpret the byte code as it goes along, for example, pushing things on to stack, keeping track of what value the accumulator currently has, effectively doing what the byte code is saying in order to work out what state things are in at important points in the code, such as this send. Are you doing something like that?

I'm doing something like that. It works backwards though. For instance, I'll see a send call, and know that it uses one value in the accumulator, and 2 values on the stack, say. Then I go backwards from there to collect the instructions that produce those values. One current flaw, though, is that the instruction groups are broken on branch boundaries. It's clear I need to change that.
Check out my website: http://icefallgames.com
Groundhog Day Competition

Offline lance.ewing

Re: Original SCI syntax
« Reply #51 on: April 13, 2015, 02:44:50 PM »
Yeah, that was a bad example. But you can imagine, say, the result of a FGets kernel call or something. The only way to get identical behavior would be to pull the result into a temp variable.

Using FGets as an example, can you show me source code to illustrate what you mean?

would definitely produce different byte code than this:

Yeah, I guess if Sierra's compiler was stripping out redundant instructions, then it would be difficult to distinguish between the embedded assignment vs the prior assignment. But in such a scenario, it seems appropriate for the decompiler to use the embedded syntax.

Offline troflip

Re: Original SCI syntax
« Reply #52 on: April 13, 2015, 03:48:01 PM »
Suppose we wanted to read a string from a file, and, if any bytes were read, print out how many were read. In SCI Studio syntax:

Code: [Select]
= temp0 FGets(strBuffer, 100, hFile)
(if (temp0)
    FormatPrint("Read %d bytes" temp0)
)

But in the actual disassembly, a temp variable might not be used. It might be (in pseudo assembly syntax):

Code: [Select]
callk FGets ....      // result goes into accumulator
bnt -> B               // if zero, then branch to B, otherwise...
push2                   // two parameters to FormatPrint
lofss                     // push a reference to string "Read %d bytes" to the stack
push                    // push the accumulator to the stack... this was the result of FGets
calle FormatPrint  // print
B: more code... 

Which is conceptually (in SCI Studio syntax) the following impossible code:

Code: [Select]
(if (FGets(strBuffer, 100, hFile))
    FormatPrint("Read %d bytes" <the result of FGets>)
)

If I make my decompiler look backwards to see what was last put in the accumulator, and repeat that, I would get:

Code: [Select]
(if (FGets(strBuffer, 100, hFile))
    FormatPrint("Read %d bytes" FGets(strBuffer, 100, hFile))
)

... which has a totally different meaning, since calling FGets a second time will retrieve the next string in the file.

The only way to have equivalent functionality (in SCIStudio syntax) is to create an additional temp variable and store the result.

Clearly Sierra's original code would not have had two calls to FGets (Since that is different functionality). So the only possibilities are:
1) Sierra's original code had a temp variable, and their compiler was able to figure out it wasn't needed outside that particular scenario, and thus optimize it out., or
2) Sierra's original syntax had the ability to say "if this expression is not nil, then do the following code with the expression"

For the purposes of my decompiler, I can probably just duplicate the expression (so the functionality would be wrong if the expression had side effects like FGets), and then tag it with a comment saying to look at this in more detail because it might be wrong.


As for "unoptimized" Sierra code, I just ran across this example in Laura Bow's code for DSelector::advance. Note the 3 jmp instructions in a row. There is no way to reach the second or third instruction, they are pointless.

Code: [Select]
    (method (advance) // method_0637
  0637:3f 01             link 1 // (var $1)
  0639:35 00              ldi 0
  063b:a5 00              sat temp0

        code_063d
  063d:87 01              lap param1
  063f:30 0039            bnt code_067b
  0642:7a               push2
  0643:67 22             pTos cursor
  0645:67 1c             pTos x
  0647:43 66 04         callk StrAt 4

  064a:30 002e            bnt code_067b
  064d:35 01              ldi 1
  064f:a5 00              sat temp0
  0651:67 22             pTos cursor
  0653:63 1c             pToa x
  0655:02                 add
  0656:65 22             aTop cursor
  0658:67 26             pTos mark
  065a:35 01              ldi 1
  065c:02                 add
  065d:36                push
  065e:63 1e             pToa y
  0660:22                 lt?
  0661:30 0005            bnt code_0669
  0664:6b 26            ipToa mark
  0666:32 0007            jmp code_0670

        code_0669
  0669:67 24             pTos lsTop
  066b:63 1c             pToa x
  066d:02                 add
  066e:65 24             aTop lsTop

        code_0670
  0670:e7 01              -ap param1
  0672:32 ffc8            jmp code_063d
  0675:32 0003            jmp code_067b
  0678:32 ffc2            jmp code_063d

        code_067b
  067b:85 00              lat temp0
  067d:30 0005            bnt code_0685
  0680:39 53            pushi 53 // $53 draw
  0682:76               push0
  0683:54 04             self 4


        code_0685
  0685:48                 ret
    )

« Last Edit: April 13, 2015, 03:53:07 PM by troflip »
Check out my website: http://icefallgames.com
Groundhog Day Competition

Offline lance.ewing

Re: Original SCI syntax
« Reply #53 on: April 13, 2015, 07:43:41 PM »
As for "unoptimized" Sierra code, I just ran across this example in Laura Bow's code for DSelector::advance. Note the 3 jmp instructions in a row. There is no way to reach the second or third instruction, they are pointless.

Yeah, that's certainly pointless. Maybe the compiler did run some optimizations on this byte code and left it in a mess (a working mess, but a mess none the less). I can't see how unreferenced jmp instructions like that could have ended up like that through a normal unoptimized compilation (unless there was a weird bug in the compiler). There's nothing branching or jumping to them. It's really weird. Let's say that something was originally branching or jumping to those locations in the unoptimized code. The optimizer might have seen "oh, it's a branch to a jmp" or a "jmp to a jmp" and altered the first branch and/or jmp to go to the end destination rather than the in between destination. But maybe it then didn't go far enough and strip out the redundant jmps. A bug in the optimizer, or a bug in the compiler. Seems like a bug in something.

Maybe the first step for the decompiler should be to optimize the byte code.   :)

Offline troflip

Re: Original SCI syntax
« Reply #54 on: April 13, 2015, 08:15:35 PM »
I can't see how unreferenced jmp instructions like that could have ended up like that through a normal unoptimized compilation (unless there was a weird bug in the compiler).

Yeah, it would have to be dumb code to start with. Like, this would do it (assuming a "continue" statement was supported):

Code: [Select]
(while (something)
    // perform some operations in the loop
    continue   // jmp back to loop start
    break    // jmp to loop exit
)  // jmp back to loop start

Otherwise, there could have been more complex logic where the continue/break made sense, and the compiler optimized it out.
Check out my website: http://icefallgames.com
Groundhog Day Competition

Offline lance.ewing

Re: Original SCI syntax
« Reply #55 on: April 14, 2015, 02:48:46 PM »
Yeah, the original code must have looked something like that. The language probably did support both continue and break. The second jmp was most likely a break in the original code but I doubt that their original source would have had a continue followed by a break. This is one of the common library classes present in all of the games, right? All of their devs would have seen it at some point and someone would have fixed it if it looked like that.

Regarding the FGets example, thanks for that. I guess that one is a hypothetical scenario, but you mentioned originally that you saw a lot of code that was structured along these lines. Can you find a real example that doesn't make use of a temporary variable in the byte code?

Offline troflip

Re: Original SCI syntax
« Reply #56 on: April 14, 2015, 03:31:55 PM »
Regarding the FGets example, thanks for that. I guess that one is a hypothetical scenario, but you mentioned originally that you saw a lot of code that was structured along these lines. Can you find a real example that doesn't make use of a temporary variable in the byte code?

I don't think I've found that yet. Once my decompiler is working better, I'll be able to decompile large numbers of scripts at once and easily search for all these cases.
Check out my website: http://icefallgames.com
Groundhog Day Competition

Offline lance.ewing

Re: Original SCI syntax
« Reply #57 on: May 09, 2015, 04:28:35 PM »
Code: [Select]
(instance gregory of Actor
(properties
x 118
y 120
z 8
heading 0
  view 200
loop 0
cel 0
  sightAngle 40
  approachX 118
approachY 112
approachDist 55
_approachVerbs $0
  noun eat
)
  (method (doVerb theVerb)
(switch theVerb
  (else
(super doVerb: theVerb)
)
)
)
)

I was looking back over some of the discussion on the SCI code that can be generated by the games themselves and noticed your example above with the switch statement. I then also saw that your decompile of SQ5 has code in the feature writer n948.txt file that generates code of the same structure, but it appears to include other values for the switch, i.e. for LOOK, DO, and TALK. Did you manage to get it to generate those "case" values?

Code: [Select]
            lofsa   " \t\t\t(LOOK^M\n\t\t\t)^M\n"
            push   
            lea     @local65460
            push   
            callk   Format, 6
            pushi   1
            lea     @temp0
            push   
            call    localproc_0714, 2
code_0b3b:  ldi     0
            lali    local80
            bnt     code_0b5b
            pushi   3
            lea     @temp0
            push   
            lofsa   " \t\t\t(DO^M\n\t\t\t)^M\n"
            push   
            lea     @local80
            push   
            callk   Format, 6
            pushi   1
            lea     @temp0
            push   
            call    localproc_0714, 2
code_0b5b:  ldi     0
            lali    local280
            bnt     code_0b7e
            pushi   3
            lea     @temp0
            push   
            lofsa   " \t\t\t(TALK^M\n\t\t\t)^M\n"

Offline troflip

Re: Original SCI syntax
« Reply #58 on: May 09, 2015, 05:51:35 PM »
Ok, I'm able to invoke the feature writer, but it looks like some of the local vars need to be set to certain values in order for it to generate those cases. It also looks like it either generates or reads from a "smh" file to do something. I'll play around with it later...
Check out my website: http://icefallgames.com
Groundhog Day Competition

Offline troflip

Re: Original SCI syntax
« Reply #59 on: May 10, 2015, 01:54:43 PM »
I don't think there's a way to get it to generate the case values without recompiling (and there are things currently preventing me from doing that for this file).

However, we can tell from the code that switch statements look like this:

Code: [Select]
(switch theVerb
    (LOOK
        // stuff...
    )
    (DO
        // stuff
    )
    (else
        (super doVerb: theVerb)
    )
)

e.g. there is no 'case' keyword. I think that's a new discovery.


As for the shm files, the code seems to read from that file to find the verb that the user entered in the feature writer. But it doesn't seem to do anything with the result (unless my decompiler is generating bad code, which is possible). At any rate, the file is supposed to exist in a msg directory one up from the exe location.

So if the exe was here:
c:\foo\sciv.exe

The shm file would be here:
c:\msg\300.shm

That gives us some insight into their development environment, I suppose.

Based on the code that reads the file, the shm file would appear to be a series of defines, I assume like this:

Code: [Select]
(define WALK 0)
(define DO 1)
(define TALK 2)
; CASES

Check out my website: http://icefallgames.com
Groundhog Day Competition


SMF 2.0.19 | SMF © 2021, Simple Machines
Simple Audio Video Embedder

Page created in 0.029 seconds with 23 queries.