Community

SCI Programming => SCI Development Tools => Topic started by: lance.ewing on December 22, 2010, 05:14:59 AM

Title: SCI Decompiler?
Post by: lance.ewing on December 22, 2010, 05:14:59 AM
I've seen references to someone other than Brian writing a SCI decompiler. Does anyone know much about this? Who it was? Where it is?
Title: Re: SCI Decompiler?
Post by: MusicallyInspired on December 22, 2010, 09:54:32 AM
It was never released, is all I can remember.
Title: Re: SCI Decompiler?
Post by: lance.ewing on December 22, 2010, 01:58:27 PM
Sounds like a fun project doesn't it? It would be interesting to talk to the author to see how far he/she got. It is essentially the equivalent of the SHOWLOG tool I wrote for AGI but obviously a lot, lot harder. You'd basically need to start by understanding the compilation process. I'd imagine that looking at some of the successful Java decompilers might throw up some ideas on how to tackle it.
Title: Re: SCI Decompiler?
Post by: MusicallyInspired on December 22, 2010, 02:18:20 PM
It would be fantastic to be able to decompile Sierra scripts and.....mess with things :). Or create more efficient fan patches. But they all use different versions of SCI and seeing as the compiler in Studio/Companion works for only 1 version would it work for all SCI0 games?
Title: Re: SCI Decompiler?
Post by: lskovlun on December 22, 2010, 06:04:48 PM
there is a generic decompiler architecture in the scummvm codebase now (it was written in a Google Summer of Code project). It should be quite possible to write an SCI decompiler based on that. Doesn't help with compilation, of course.
Title: Re: SCI Decompiler?
Post by: MusicallyInspired on December 22, 2010, 09:01:33 PM
It'd at least be nice to see how Sierra programmers went about doing things.
Title: Re: SCI Decompiler?
Post by: lance.ewing on December 23, 2010, 01:40:54 AM
there is a generic decompiler architecture in the scummvm codebase now (it was written in a Google Summer of Code project). It should be quite possible to write an SCI decompiler based on that. Doesn't help with compilation, of course.

Interesting. What does scummvm use the generic decompiler architecture for?
Title: Re: SCI Decompiler?
Post by: Collector on December 23, 2010, 03:57:02 AM
A lot of the games supported by ScummVM had to be RE'd to add that support. I would assume it has it for this reason, though I am not sure why it would be incorporated into ScummVM instead of just standalone tools for the SVM developers.
Title: Re: SCI Decompiler?
Post by: lance.ewing on January 04, 2011, 06:02:56 PM
I'm sure this information must be available somewhere, but to save me having to search for it, does anyone know off the top of their heads what exact version of the SCI interpreter that SCI Studio and SCI Companion are intended to support?
Title: Re: SCI Decompiler?
Post by: lance.ewing on January 04, 2011, 06:13:09 PM
0.000.685?

I found the string .685 in the SCIV.EXE, so I'm guessing this is the one.
Title: Re: SCI Decompiler?
Post by: Collector on January 04, 2011, 08:59:53 PM
I believe that the template game was made from LSL3. The Free SCI documentation IDs LSL3 as 0.000.572, which is what mine shows. 0.000.685 covers SQ3, Camelot and ICEMAN.
Title: Re: SCI Decompiler?
Post by: lance.ewing on January 05, 2011, 04:50:42 PM
I believe that the template game was made from LSL3. The Free SCI documentation IDs LSL3 as 0.000.572, which is what mine shows. 0.000.685 covers SQ3, Camelot and ICEMAN.

Ah, just seen this post from back in November that seems to confirm 0.000.572 as being the template game version:

http://sciprogramming.com/community/index.php/topic,309.msg1617.html#msg1617

...although, having said this, I just found this post that seems to suggest it is 0.000.685:

http://sciprogramming.com/community/index.php/topic,50.msg113.html#msg113

...but then this next page seems to suggest a close connection between the SCI0 specifications and version 0.000.572:

http://wiki.scummvm.org/index.php/SCI/Specifications/Graphics/SCI_Ports

I'm fairly sure I saw the string .685 in the SCIV.EXE executable that comes bundled with both SCI Studio and SCI Companion, but that doesn't mean that the template game was based on that version. Perhaps the template game was based on 0.000.572 but the interpreter version packaged with the IDEs is 0.000.685. Unfortunately the SCIV.EXE is a compressed EXE by the looks of it, so the full strings are not fully visible enough for me to be sure of the version. Might need to fire up DOSBOX and see if I can get the interpreter to display the version.
Title: Re: SCI Decompiler?
Post by: lance.ewing on January 05, 2011, 05:05:55 PM
Actually the better approach was to use UNLZEXE to obtain the original uncompressed SCIV.EXE file. After having done this, I can now clearly see the version number in the file:

0.000.685

So the interpreter packaged with SCI Companion is 0.000.685.

Just did the same for SCI Studio and it is also 0.000.685.

Maybe earlier versions of SCI Studio used 0.000.572. It seems like the SCI specs and potentially the template game could have been based on that version.
Title: Re: SCI Decompiler?
Post by: lance.ewing on February 09, 2011, 08:20:09 AM
I believe at least a couple of you have had a go in the past at manually decompiling scripts by hand to the SCI Studio syntax. I was wondering if someone could describe the process they went through in doing this? For example, for a particular method, how did you recognise the higher level keywords? Was it certain patterns of instruction sequence? Is it possible to provide an example? Is this already covered somewhere?
Title: Re: SCI Decompiler?
Post by: gumby on February 09, 2011, 09:53:43 PM
Not sure if this is relevant, but I was in SCI Companion today and selected to view a file that did not have the source available & was displayed the disassembly.  I found it pretty brutal - I never fully (or really partially) understood assembly, and it's been years.

It might be worthwhile to put together a working script within Companion, compile it, then decompile & see what can be learned?  I know this is somewhat of a backwards approach and perhaps not an answer to your question.
Title: Re: SCI Decompiler?
Post by: troflip on February 09, 2011, 11:16:26 PM
First of all, note there is a difference between disassembling and decompiling:
- disassembly is basically just the straight machine code, displayed "conveniently" as semi-readable instructions, with some symbol lookups to help with understanding. It's relatively straightforward to implement, but obviously the result is still difficult to understand.
- decompiling is a return to source code, and is a much more difficult task - well, it's essentially impossible - some of the information (such as the names of your variables) are impossible to recover.

The freesci website (http://freesci.linuxgames.com/scihtml/book1.html) essentially has documentation of everything you need to disassemble/decompile scripts (subject to the limitations I mentioned above). That is, a description of what every assembly instruction does.

Assuming you have the disassembly for a script (SCI Companion does this, and I think Brian had a tool that did it too), these are the challenges:
- You'll need to look at patterns in the branching and jump assembly instructions, and convert them to: if statements, for loops, while statements, switch statements, etc... It's not always obvious what higher level construct they would map to, so this is a bit of a judgement call
- The arithmetic instructions are pretty straightforward to convert to the higher level operators. Pretty much a 1-to-1 mapping.
- You'll need to become familiar with the all the assignment (load/store) instructions to know whether its a script-local, global, function parameter, or function-local variable that's being used.
- The method calls are a little tricky - if I remember correctly I think in the freesci documentation I listed above there were a few errors about how the "send" instructions worker. But in general, by looking what gets pushed on the stack prior to the send instruction, you can figure out what parameters are being passed.

I got reasonably far along in writing a decompiler (the branching stuff was somewhat challenging, but I think I got that working pretty well in the end). I know I quit at some point though, so I must have run into some big remaining hurdles. There is a c++ file for the decompiler in the SCI Companion source code I provided, with a comment on top saying something like "abandoned attempt at writing a decompiler".

The one time I recall decompiling by hand was to create the Avoid script. Brian left this out of the original template game, even though many SCI games used "avoiders". I remember it being a pretty tedious process to do by hand.
Title: Re: SCI Decompiler?
Post by: lance.ewing on February 10, 2011, 12:58:20 PM
Thanks for that description and also the tip about the decompiler code in SCI Companion. It has occurred to me that a decompiler that might work for the original games may not work 100% out of the box for fan made games. The reason for this is that the are obviously different compilers. The Sierra compiler may choose slightly different vm instructions for the same basic source structures and vice versa.
Title: Re: SCI Decompiler?
Post by: troflip on February 11, 2011, 04:22:14 PM
Yes, that's true (and even the SCIStudio and SCI Companion produce different compiled code). I don't think the differences are that great, but there certainly are some (probably mostly around how loops and conditionals are implemented).  A good decompiler should be able to take any different "style" of compiled code and produce reasonable source code though - it's just more work :-).
Title: Re: SCI Decompiler?
Post by: lance.ewing on March 01, 2011, 01:10:45 PM
Before I write a line of code, I want to do a lot of research and investigation into SCI and especially the VM instruction set. So I've read over that section in the SCI specs, I've even printed it out and carry it around in my pocket just in case there is some spare time to read over it again and again and again... i.e. get very familiar with the instructions and what they do.

What I've done over the past couple of days is attempt to work out the most commonly used instructions. I did this by using Brian''s SCI Disassembler for a few games (KQ4, COC, PQ2) and then with some grep/sed/cut/uniq/sort magic, I've now got a CSV file with the counts of each instruction across those games. I then loaded that into Excel and with a simple calculation I've now got the percentages for each instruction, i.e. the percentage of all instructions encountered were of the given instruction name.

It makes very interesting reading. At the top of the list by a long way is pushi. Next comes push1, then bnt, push0, send, push, ldi, push2, jmp, lofsa. That makes up the top ten. From this we can see that of the top 10, half of them are variations of push. In fact it turns out that nearly 50% of all instructions encountered were a variation of push. The pushi instruction accounts for 25% by itself. Add in the counts for the other push instructions and it is increased to nearly 50%. The top 10 instructions account for just over 70% of all instructions encountered. The top 20 for about 85% or so, and the top 30 for nearly 95%. What this tells me is what instructions I should spend most of my time researching. The others account for very little. Some of them don't even appear to have been used.

Now what I guess I'll do is look at how each of these top 10 (20 and 30) instructions have been used and manually deduce what the associated SCI source would have been. From that I'll hopefully see in what scenarios the SCI compiler used particular instructions.
Title: Re: SCI Decompiler?
Post by: gumby on May 11, 2011, 09:42:03 PM
Actually the better approach was to use UNLZEXE to obtain the original uncompressed SCIV.EXE file. After having done this, I can now clearly see the version number in the file:

0.000.685

So the interpreter packaged with SCI Companion is 0.000.685.

Just did the same for SCI Studio and it is also 0.000.685.

Maybe earlier versions of SCI Studio used 0.000.572. It seems like the SCI specs and potentially the template game could have been based on that version.


Just got the wonderful 'Oops' error from Companion that indicates that it's version is indeed 0.000.685 (in case there was still a question).
Title: Re: SCI Decompiler?
Post by: OmerMor on July 26, 2011, 10:38:27 AM
lance, any progress so far?
I'd love to hear about it.
Title: Re: SCI Decompiler?
Post by: lance.ewing on July 27, 2011, 01:25:59 AM
Unfortunately I haven't made all that much progress. When I started to get back into the fanmade AGI/SCI community again, I decided I'd pick one thing so that I didn't spread myself too thinly. So I decided that that would be the Java version of PICEDIT. But then I got fascinated with the whole SCI area, especially regarding the language, and for a few months I didn't do anything on PICEDIT at all. This is because I was thinking about the SCI decompiler and doing investigation into how that would work.

I thought I'd start by putting together a web site to capture all my thoughts and discoveries. The main part of that web site was going to be a table with all of the SCI instructions on the left hand side and then various notes about them on the right hand side, with the right most column being some thoughts about how the original code might have looked. I only got part way through building that table when I realised I was neglecting PICEDIT. My ultimate goal for PICEDIT is to add SCI support, but before adding that support, I wanted to bring it up to speed with the other AGI picture editors. So I put the SCI decompiling investigation on hold while I focussed more on PICEDIT. That is where I'm at at the moment.

You can have a look at the web site that I was starting to put together. A lot of the thoughts on there are things I mentioned in various posts on this forum. This is the URL:

http://www.scriptinterpreter.com (http://www.scriptinterpreter.com)
Title: Re: SCI Decompiler?
Post by: Collector on July 27, 2011, 04:51:44 AM
It might be good to track down Jeff Stephenson. I did a little Googling to see if I could find any trace of him. I didn't look too long, but came across this from someone that worked at Sierra:

Quote
My first OO language was SCI--Sierra Creative Interpreter. This language was written by Jeff Stephenson for programming adventure games at Sierra On-Line. My brother-in-law Chris Smith, who got me a job with Sierra in late 1989, sent me some sketchy documentation for SCI. One of the highlights of my programming life was sitting on my bed reading that documentation the day I got it. SCI was a combination of Lisp, C and Smalltalk, but the message-passing was very much Smalltalkesque. I loved the language.

http://mwilden.com/smalltalk/index.htm

Also found this http://marketplace.publicradio.org/standard/display/slideshow.php?ftr_id=60233

At least we know what he looks like, now.
Title: Re: SCI Decompiler?
Post by: lance.ewing on July 28, 2011, 01:17:10 AM
I did try to track down Jeff back around the start of this year and also found both of the pages that you sent through. I actually spoke to Mark Wilden quite a bit at the time but he unfortunately didn't remember much of the detail of the syntax of the language and he didn't know how to get in contact with Jeff.

I did manage to get one step further towards tracking him down though. I found an almost certain match for him on a social networking web site. I think it was mylife.com. I tried sending him a message through that site but didn't receive a reply back. So that was about as far as I got.

Even if we were to track him down, chances are that he wouldn't be able to remember much, unless of course he has documentation or something still with him. But when you're working in IT, you're not really meant to hang on to code or documentation from a previous employer. So memory is often all that there is to go on. Mark Wilden programmed in SCI for quite some time and yet he struggles to remember much about the syntax. I think that that small code snippet from Police Quest SWAT is possibly more useful than anything former Sierra employees can remember from 20 years ago.
Title: Re: SCI Decompiler?
Post by: Collector on July 28, 2011, 04:06:46 AM
That snippet might help to jar memories. I would not completely dismiss the idea that someone didn't keep code or documentation. After all, Mark Wilden had documentation sent to him, so something did get outside of Oakhurst.
Title: Re: SCI Decompiler?
Post by: OmerMor on July 28, 2011, 10:09:19 AM
BTW - I found that article about Jeff some time ago when I tried to look for information about Avis Durgan - Jeff's wife.
Her name is used an the encryption key in AGI. She is pictured with Jeff in that article.
Maybe you could track Avis down, and get in touch with Jeff through her. Just a though.
Title: Re: SCI Decompiler?
Post by: OmerMor on July 28, 2011, 10:47:15 AM
Found it on the archive for brian provinciano's site:
http://web.archive.org/web/20050205191150/http://www.classicgaming.com/agisci/swatcode.shtml (http://web.archive.org/web/20050205191150/http://www.classicgaming.com/agisci/swatcode.shtml)
Title: Re: SCI Decompiler?
Post by: lance.ewing on July 28, 2011, 12:33:38 PM
Great detective work. I tried looking on the Wayback Machine 6 months ago when I was try to find the code and couldn't find it. I remember specifically looking around Brian's web site but couldn't see anything. I was probably looking at the wrong years. I did talk to Brian 6 months ago and he remembered the code but couldn't remember the details of what it was, where it had come from and he didn't think he had it anymore. So it was there to be found...  I just had to keep digging.
Title: Re: SCI Decompiler?
Post by: lance.ewing on July 28, 2011, 12:35:49 PM
BTW - I found that article about Jeff some time ago when I tried to look for information about Avid Durgan - Jeff's wife.
Her name is used an the encryption key in AGI. She is pictured with Jeff in that article.
Maybe you could track Avis down, and get in touch with Jeff through her. Just a though.

Actually it was the Avis Durgan name that lead me to mylife.com. There is a Jeff Stephenson with a friend called Avis Durgan registered on mylife.com. They both live in the right part of the US as well, so I'm fairly sure it was them.
Title: Re: SCI Decompiler?
Post by: OmerMor on July 29, 2011, 06:14:15 AM
So the only thing left to do it try contact Avis as well. Maybe is more socially involved than Jeff and checks her emails and messages more often.
Here is a facebook page for a woman called Avis Durgan: http://www.facebook.com/profile.php?id=100001115595747 (http://www.facebook.com/profile.php?id=100001115595747)
I bet it's her. You should give it a try!
Title: Re: SCI Decompiler?
Post by: lance.ewing on July 30, 2011, 01:52:44 AM
Also found this http://marketplace.publicradio.org/standard/display/slideshow.php?ftr_id=60233

Reading that article again, I recall why I think that the mylife.com Avis and Jeff are the right people. The article mentions that Jeff previously worked at Microsoft, that he is in his late 50s, and that at the time the article was written were living in Encinitas, California.

The mylife.com summary for both Avis Durgan and Jeff Stephenson has this list of places lived:

San Diego, CA
Redmond, WA
Encinitas, CA
Seattle, WA

Both Redmond and Seattle are linked with Microsoft. Encinitas is in the list, and San Diego is very close to Encinitas. The age mylife.com gives from Jeff is 59, which is late 50s. So everything seems to match up.

It is amazing what is on the net these days. I found this web site just by searching for the right combination of keywords:

http://www.ussearch.com/consumer/criminal-records/names/ca+san+diego/jeff/stephenson/002430043751.html

If you hover over the address history View More link, you get a list of all known towns/cities. The list not only includes those locations listed above but also Oakhurst and Coarsegold, both names associated with Sierra-on-line, and presumably the places names in between are everywhere he has been between his time at Sierra and his time at Microsoft.

San Diego, CA
Encinitas, CA
Redmond, WA
Bellevue, WA
Seattle, WA
Chapel Hill, NC
Durham, NC
Carrboro, NC
Ahwahnee, CA
Oakhurst, CA
Coarsegold, CA

And if we use the same web site to search for Avis Durgan, then all of the same places above appear for her, everything that is except for Oakhurst and Coarsegold.

Apparently Jeff's full name is Jeffrey Alan Stephenson if we can believe that web site.

You know what, I just realised that mylife.com has probably automatically created the pages for Jeff and Avis based on publicly available information such as what appears on the one I mentioned above. So my email has possibly gone to an inbox that no one is checking.
Title: Re: SCI Decompiler?
Post by: OmerMor on July 30, 2011, 02:35:36 PM
Nice catch lance. Personally, I had no doubt it were them. Are you gonna email Avis now?

btw regarding obsolete sci sites:
Old tutorials by Lars Skovlund (lskovlund) (http://web.archive.org/web/20010817042146/http://sci.locuples.com/scituts.html)
Some snapshots from an old sci messageboard from insidetheweb (sadly we can't read the posts, they were not archived :():
    1999-11-04 (http://web.archive.org/web/19991104152729/http://www.insidetheweb.com/mbs.cgi/mb173941)
    2000-10-16 (http://web.archive.org/web/20001016135159/http://www.insidetheweb.com/mbs.cgi/mb173941)
The SCI Scriptorium (http://web.archive.org/web/20040930164626/http://lightning.prohosting.com/~scide/)
Lance Ewing's The Hidden Secrets of Sierra (http://web.archive.org/web/20030618215312/http://www.123.co.nz/lance/sierra.htm)

There were many more sites, the used hosting services such as xoom.com which prevented archiving (through their robots.txt policy), and are now probably gone forever.
Title: Re: SCI Decompiler?
Post by: lance.ewing on August 04, 2011, 02:43:11 PM
Yeah, I have a link to the archived version of my old web site on my new web site (www.agifans.com). I was glad indeed when I found that archive.org had a copy of it. I also used to have an old VIC 20 web site but I suspect that one is a bit too old to have been archived. I wrote a VIC 20 emulator back around the same time I was writing the AGI tools. Actually I ended up writing it in Pascal, then C and finally in Java. It was another one of my spare time projects at that time. I had grand plans at the time of writing an AGI like adventure game system for the VIC 20 to prove just how much you could do on the VIC 20 with the benefit of todays technology.

It is a shame that a lot of the old SCI and AGI web sites are no longer accessible, and that they weren't archived. Before the Internet, I used to log on to BBS (bulletin board systems) and chat in adventure game forums there. I even started writing a series of text documents entitled "SCI Examined" that I uploaded to various BBSs in New Zealand. Obviously the exposure was fairly limited. It was a record of my attempts at the time to hack SCI. I ended up focussing more on AGI and left others to look into SCI.
Title: Re: SCI Decompiler?
Post by: OmerMor on August 15, 2011, 01:52:56 PM
Lance,
have you managed to track Avis / Jeff ?

P.S.
congrats on your PICEDIT milestone.
Title: Re: SCI Decompiler?
Post by: lance.ewing on August 16, 2011, 02:33:10 PM
No, not yet. I haven't really been focussing on that over the past few weeks. I've been trying to get another milestone release of PICEDIT out before I go on holiday for a week. I've probably still got another 3 or 4 milestone releases to go for 1.3, more than I initially thought I'd do. I decided to release more often so that at least something gets out there for people to look at. And that is just version 1.3. I still have grand plans to add SCI0 picture support for version 2.0 of PICEDIT, which would be the next release after 1.3. The 1.3 release is mostly about playing catchup with the rest of the AGI and SCI pictures tools. I think that a dual AGI/SCI picture tool might be unique, especially if it supports things like import and export between SCI0 and AGI.

So for now the SCI syntax quest is on hold. I did add that sample SCI code that you found to the end of my Syntax page on www.scriptinterpreter.com though:

http://www.scriptinterpreter.com/syntax

What I was attempting to do with the Instruction Set page (http://www.scriptinterpreter.com/instruction-set) was to build up a table of all the SCI instructions, work out their percentage of appearance within the games and then based on the conclusions of the Syntax page try to work out what original SCI source code would have compiled to each of the instructions. Some instructions are obviously used very often but others are used exclusively for particular scenarios.
Title: Re: SCI Decompiler?
Post by: Collector on October 29, 2021, 09:03:25 PM
As dead as this thread is, Jeff has a Facebook presence. He is now painting. Here is a self portrait he uses for his art page.

https://www.facebook.com/jeff.westcoast
https://www.facebook.com/jeffstephensonart/
Title: Re: SCI Decompiler?
Post by: OmerMor on October 31, 2021, 05:34:02 PM
Nice find!

In 2015 I managed to contact Jeff via LinkedIn.
Here's what he wrote:

Quote
I've been off-and-on aware that there are a number of folks out there reverse-engineering SCI. The stuff I've seen doesn't really resemble the original source language (which was syntactically like Lisp but with a message-passing kind of mechanism behind it that was kind of Smalltalk or Objective-C like). But then folks are basing the result on looking at the byte-codes rather than the source code, obviously.

I don't really remember whether I was around for SCI2 - I seem to have a sense that SCI2 was more about the graphics than the basic system, but that was all so long ago... Later stuff might well have incorporated real variables and more linkage between "rooms" in the game - SCI was originally designed to run on pretty small systems, and a lot more could be done on more capable machines.
Title: Re: SCI Decompiler?
Post by: Collector on October 31, 2021, 10:02:51 PM
He would have talking about Studio script not being close to the original. Could be interesting to get his take on the current "Sierra script" that Companion uses.
Title: Re: SCI Decompiler?
Post by: Kawa on November 01, 2021, 06:36:53 AM
I've seen the original Sierra code, from SCI0 to 2. I don't know what Jeff Stephenson would say about Companion's scripts but I can honestly say it's this close, if you disregard extra stuff like the &getpoly and verbs macros.

Code: [Select]
(define X an entire snippet of code)

(class Example kindof Object
;"kindof" instead of "of", though "of" was used later on
(properties
; exactly like in Companion, though some versions of SC allow one of two type markers
)
(methods
; list of forward-declared method names
DoAThing
DoAnotherThing
)
(method (DoAThing with &tmp i)
; exactly like in Companion
)
)
; and maybe two or three other little gotchas.

Edit: the (methods) block is supported but ignored, and kindof is considered synonymous to of. Also, sends can have commas, like (bluh foo: 42, bar: 4), but where Sierra's compiler allows trailing commas (bar: 4,), Companion does not.