Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Decompilation and Decompilers

News Recommended Books Recommended Links Compilation Decompilation Decompilers
Disassemblers Decomplation of control structures Legal Aspects Java decompilations Clipper and Foxpro  Peephole refactoring
Selected projects     History Humor Etc

Decompilation is actually is pretty underdeveloped area, but many methods developed for compilation and especially for the optimization of object code are directly applicable to the problem. 

Students that wish to study this area are strongly advised to learn basic theory of compilation and, especially, classic code optimization methods. Generally one needs to augment pattern-based constructs recovery with control flow and data flow analysis.

The program graph analysis is much better suited for recovery of control structures than pure pattern matching although even such a simple approach as peephole refactoring can recover some local control flow constructs.

Data flow analysis makes it possible to detect major structures and even somewhat understand their usage.

Another way to extract useful information is slicing.  Peephole optimization can used as decompilation method as well.


Dr. Nikolai Bezroukov


Notes:
  • This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • The site contain some broken links as it develops like a living tree... Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.
Google Search
Open directory

Research Index

Old News ;-)

[Jan 16, 2006] computer intelligence assembler disassembler ciasdis tool by Albert van der Horst

This page is about how the Post-It Fix-Up principle works out in practical program code in Forth. For the impatient: jump to the downloads

Actual assemblers

Applying the Post-It Fix-Up principle to a 8086 assembler led to the discovery of problems that had to be solved. It turns out that some types of fixups better be considered not relative to the start of the instruction, but relative to the end. Otherwise there would be diffent fixups for e.g. byte/cell indication (B| X|), dependant on the length of the opcode. It is still there in the fig-forth version of the opcodes, such as B| W| besides B1| and W1| . So a new class of fixup, the "fix up's from behind" or reverse fixups were added. It turned out that other fixup's are not needed for the Intel, up to the Pentium. Other processors require fixup's with build in data. These so called data fixups are needed for the 6809 and the DEC Alpha.
A program was added that generates a PostScript file with the first byte opcodes for 8080 as well as 8086 , and the 80386 , a so called quick reference card. Comparing that to Intels documentation led to the discovery of one more bug. I had to redesign the opcodes, so other people could have trouble using this beast without such a reference card and the `SHOW: MOV|SG,' that lists all forms allowed for the move segment instruction.

Is Sony in violation of the LGPL - Programming stuff

Update: Click here

I'm sure you've already heard about the Sony rootkit that was first revealed by Mark Russinovich of Sysinternals. After the Finnish hacker Matti Nikki (aka muzzy) found some revealing strings in one of the files (go.exe) that are part of the copy protection software, the rootkit is also suspected to be in violation of the open-source license LGPL. The strings indicate that code from the open-source project LAME was used in the copy protection software in a way that's not compatible with the LGPL license which is used by LAME.

On Slashot muzzy mentioned that he doesn't have access to Sabre BinDiff, a tool that can be used to compare binary files. I was in the opposite position as I have BinDiff but I didn't have the file in question (go.exe). I mailed muzzy and he hooked me up with the file.

I compared go.exe with a VC++-compiled version of lame_enc.dll but unfortunately BinDiff didn't find a single relevant matched function. A quick manual check didn't reveal any LAME functions in go.exe either.

Even though go.exe apparently does not contain any LAME code, a considerable amount of tables and constants from the LAME source files can be found in the go.exe file. Here's a list of the LAME tables I've been able to locate. The first column shows the hex address where the table can be found in the go.exe file, the second column shows the name of the table as it appears in the LAME source code and the third column shows the LAME source file where the table can be found.

I have to add though, that not a single table actually seems to be used by the go.exe code. What does that mean? I've asked random people and I've heard speculation ranging between "accidentaly linked" and "encrypted code in go.exe that uses the tables and can't be found in the disassembler". Further analysis needs to be made but at this point I'm leaning towards more or less accidental inclusion.

 
Comments
Display comments as (Linear | Threaded)
The code in GO.EXE could be compiled with another compiler. In that case your comparison would probably not find a match, but it may still be there.
#1 Rhialto on Nov 14 2005, 12:41 Reply
This idea is absolutely right and I've thought about it. go.exe was apparently compiled using VC++ 7 (debug build) while lame_enc.dll was compiled using VC++ 6 (release build). That's what PEiD ( http://peid.has.it/ ) says, at least.

In the past I've succesfully used BinDiff to match functions from files compiled with gcc to functions from files compiled with VC++ though. The question is now whether VC++ 7 is so much different from VC++ 6 that BinDiff is less likely (or even unable) to match them even though code produced from VC++ 6 and gcc seem to be similar enough for BinDiff to work.

Furthermore I think the main point of importance is that the tables in go.exe are not referenced by any code (at least not in a way that a static disassembler can detect). I think the reason for this might be the solution to the entire violation question.
#1.1 sp on Nov 14 2005, 12:52 Reply
They are very different, the VC7 compiler was a complete rewrite of the VC6 compiler and has many improvements.
#1.1.1 Anonymous on Nov 15 2005, 17:03 Reply
Heya,

I'm another reverser, and I'd be interested in taking a look at this myself (I also have IDA and BinDiff)... could you possibly send me a copy of the exe?

Cheers,

Will
#2 Will Whistler on Nov 14 2005, 13:06 Reply
This raises an interesting question: are tables originating from LGPLed code enough to make the LGPL apply to the final executable, even though it might not actually use the data?

After all, the tables also have been written and are part of the source code covered by the license. I don't think copyright law would make a difference between the source for executable code and that for the data needed by that code.
#3 Arend Lammertink (http://plone.vrijschrift.org) on Nov 14 2005, 14:38 Reply
Good observation. That's actually exactly why I didn't even make an attempt to answer the question posed in the topic. I don't know enough about license and copyright issues to make an educated guess.
#3.1 sp on Nov 14 2005, 15:27 Reply
Coming to think of it, it's not surprising at all you can't find any code if you compare a dll and a static linked executable on Windows.

Windows' dlls are designed in such a way that function calls between dlls are completely different from their static equivalents. Function calls are adressed using an offset table in the dll. The caller uses special access code. That's why dlls are accompanied by "import" libraries. Every function that can be used from outside of a dll has to be "exported" using some declspec macro's. I'm sure these will also influence name mangling, etc.

To make a long story short: try comparing the executable with a static Lame library...
#4 Arend Lammertink (http://plone.vrijschrift.org) on Nov 14 2005, 15:18 Reply
I assumed that this wouldn't matter because of the level of abstraction BinDiff uses to determine whether code from two files is equal or not. The calling convention shouldn't really matter here.

But alas, assumption is the mother of all fuck-ups. So I went back to check. As I expected I don't get any results from a statically linked LAME either.

I also want to draw attention to another issue. LAME is an application that uses a lot of FPU instructions. Go.exe barely uses any.

I've created an opcode distribution list for the files lame_enc.dll and go.exe. The former uses tens of thousands of FPU instructions with fld being the 2nd most used instruction (only mov is used more often). The latter file, on the other hand, uses only a few hundred FPU instructions and there are 26 more frequently used CPU instructions before the 1st FPU instruction comes in the list.
#4.1 sp on Nov 14 2005, 20:11 Reply
What relevant parts of the LGPL would be infringed if it does contain this? The LGPL doesn't require that things that link to it also be LGPL, unlike the GPL.
#5 Nick Johnson on Nov 14 2005, 21:04 Reply
They still have to offer the source code for any LGPL code they distribute, or modify and distribute, and they still have to include an LGPL license notice. They can link to LPGL code, but they can't hide it.
#5.1 Rodrin on Nov 15 2005, 16:22 Reply
Another, perhaps more logical explanation, given the lack of substantial similarity: Perhaps the Sony software includes LAME signatures so it can detect whether a user is running LAME to encode MP3s.
#6 Ansel (http://www.anselsbrain.com/) on Nov 14 2005, 21:22 Reply
Perhaps the tables in question aren't used to execute anything, but merely to detect LAME and/or programs that use it?
#7 HyperHacker (http://hypernova.amarok-shadow.com) on Nov 15 2005, 07:19 Reply
Let me quote some comment on a slashdot story (http://yro.slashdot.org/yro/05/11/15/1250229.shtml?tid=117&tid=188&tid=17) from muzzy:

That only concerns GO.EXE, and while the analysis is correct for that executable, I checked for LAME references against every binary in the compressed XCP.DAT file after I managed to unpack it (thanks to freedom-to-tinker.com guys for providing description of the format). Turns out, there's more binaries including references to LAME, and this time there's actually code that uses the data as well. And not just LAME, there's also Id3lib included in one dll, and bladeenc and mpglib distributed along with the DRM. All of this is LGPL, it's code, and it's being used.
#8 Cone on Nov 15 2005, 15:06 Reply
Yes, this is correct. We're right now working on the new files and we've already matched code manually. We're now in the process of developing a few tools to match code automatically because there's a lot of code to match.
#8.1 sp on Nov 15 2005, 15:08 Reply
Congratulations, guys!
#8.1.1 Arend Lammertink on Nov 15 2005, 15:26 Reply
What if the tables from LAME are there, to be used to detect a LAME encoder being used on the system? ie, if you try to rip the tracks, it will see that LAME is running, and perhaps corrupt the resulting ripped file?
#9 Ed Felton on Nov 15 2005, 16:30 Reply
Wonders if go.exe makes any systems calls to register itself.
#10 hawkeyeaz1 on Nov 15 2005, 16:44 Reply

[Oct 28, 2005] Program Transformation Wiki - Decompilation Resources

plain This page contains links to projects only peripherally related to decompilation. Links to the actual decompilers are located on the DecompilationGeneralApproach, DecompilationApplicationSpecificApproach (Java, Foxbase, etc) and DecompilationCompilerSpecific pages. An attempt is being made to reduce overlap with the other pages.

freshmeat.net Project details for uncc

uncc is a C decompiler which helps reverse engineers and programmers improve their understanding of assembly code.

Homepage:
http://www.uncc.info/
Tar/GZ:
http://www.uncc.info/uncc-0.1.2.1.tar.gz

Pyreverse v. 0.2

About: pyreverse is a set of tools for reverse engineering Python code. It features dependency analysis tools, documentation generation, and XMI generation for importation in a UML modeling tool. A special module can be used to generate files readable by Argo UML.

Changes: Support was added for links between classes, exceptions raised in functions, and package diagrams. Some documentation with a full example was added.

[Dec 28, 2001] Decompilation page and link to a decompiler by Satish Kumar. Contains a beta version of DisC - Decompiler for TurboC and a small intro to the problem of decompilation using Intel assembler fragments of small C programs as an example. Compare to Decompilation of Binary Programs - dcc

[Dec 18, 2000] Clipper and FoxPro decompilation

[Oct 10, 2000] Filter Factory Plug-in Decompiler   This little 16bit DOS program generates a ".afs" of any ".8bf" file compiled by the Filter Factory of Adobe Photoshop (PC version only).

[Sept 15, 2000] Java decompilers

C. Cifuentes and K. Gough. Decompilation of binary programs. Software -- Practice and Experience, 25(7):811--829, July 1995.

C. Cifuentes, Interprocedural data-flow decompilation. Journal of Programming Languages Vol 4 No 2 (1996) pp 77--99

Of Graphs and Coffi Grounds: Decompiling Java - Patrick Lam (1998)
  

......expressions, must be combined with the control-flow restructuring. Finally, phase III of decompilation must be implemented; we must eliminate goto for arbitrary control-flow graphs. This will probably be done using work previously done with the Compilers and Concurrency group, described in [Ero95] [EH94]. 6 Summary This paper discusses techniques used in the control-flow structuring phase of the Sculpt decompiler. In particular, the three main phases of control-flow structuring were described. Stage I recognized simple structures in the control-flow graph which could be locally reduced; among ......

[EH94] Ana M. Erosa and Laurie J. Hendren. Taming control flow: A structured approach to eliminating goto statements. In Proceedings of the 1994 International Conference on Computer Languages, pages 229--240, May 1994.

Making Graphs Reducible with Controlled Node Splitting - Johan Janssen (1997)    H   (Correct)

......control flow graphs is the fact that data flow analysis, that is an essential part of any compiler, can be done more efficiently [3]. Related work: The problem of converting irreducible flow graphs to reducible flow graphs can be tackled at the front-end or at the back-end of the compiler. In [4] and [5] methods for normalizing the control flow graph of a program at the front-end are given. These methods rewrite an intermediate program in a normalized form. During normalization irreducible flow graphs are converted to reducible ones. To make a graph reducible, code has to be duplicated, ......

[4] Ana M. Erosa and Laurie J. Hendren. Taming control flow: A structured approach to eliminating goto statements. In Proceedings of the 1994 International Conference on Computer Languages, pages 229--240, Toulouse, France, May 1994.

GOTO Removal Based On Regular Expressions - Paul Morris Ronald      

......our GOTO removal program served as a bridge to extend the range of the software. We have also applied this work to the process of "levitating" IBM 370 assembly language code to higher-level forms (Morris et al., 1996). Previous work (Ashcroft et al., 1972, Peterson et al., 1973, Ramshaw, 1988, Erosa et al., 1994), while highly developed and abstract in nature, treats GOTO removal as an isolated problem in the area of programming languages. This paper is based on the observation that GOTO removal for flowcharts resembles the problem of converting finite-state transition networks to regular expressions, and ......

......preferred over maintaining the existing structure. The algorithm presented here is applicable to nonreducible programs, and meets the lesser path-equivalence standard. Subjectively, this standard appears to produce more readable results than approaches that introduce auxiliary variables, such as (Erosa et al., 1994), and is easier to relate to the source presented for restructuring. These considerations are important for reverse-engineering. While the algorithm discussed in this paper has worked well in practical contexts, we feel its most significant feature is the link to important concepts in computer ......


Erosa, A. M. and Hendron, L. J. (1994). `Taming Control Flow: A structured Approach to Eliminating Goto Statements,' Proc. IEEE International Conference on Computer Languages, 229--240.


Recommended Links


In case of broken links please try to use Google search. If you find the page please notify us about new location
Google     

Program Transformation Wiki - Decompilation Resources  by Cristina Cifuentes. Her Tethys is available for download as a compressed Postscript (474K). She lists some interesting projects. Biased toward her own research. Does not cover Java at all.

Decompilers_and_Disassemblers

freshmeat.net Decompilers and Disassemblers -- 11 entries as of 01/14/2002

Decompiler - Room 42 Computers & The Internet Resource Center

Decompilers and Disassemblers

Java Code Engineering & Reverse Engineering - Miscellaneous - Meurrens 980131

Pin-Outs.Com Programming Languages Java Decompilers_and_Disassemblers

Java Virtual Machine (JVM) and Bytecodes Web Resources

Code Reversing for Beginners

Open Directory:

Sable Research Group

Cracking The Art of Reverse Engineering


Decompilation

Decompilers

Selected open source projects

Boomerang

This is an open source decompiler, with several front ends (two are well developed) and a C back end. It uses an internal representation based on the Static Single Assignment form, and pioneers dataflow-based type analysis. At the time of writing, it is still limited to quite small (toy) binary programs.

Desquirr

This is an IDA Pro Plugin, written by David Eriksson as part of his Master's thesis. It decompiles one function at a time to the IDA output window. While not intended to be a serious decompiler, it illustrates what can be done with the help of a powerful disassembler and about 5000 lines of C++ code. Because a disassembler does not carry semantics for machine instructions, each supported processor requires a module to decode instruction semantics and addressing modes. The X86 and ARM processors are supported. Conditionals and loops are emitted as gotos, there is some simple switch analysis, and some recovery of parameters and returns is implemented.

 

 

 


Legal Aspects


 Java decompilers

 


 

Clipper and FoxPro decompilation


Etc

SUBJECT-TABLE.html

Structuring Decompiled Graphs - Cifuentes (ResearchIndex)

Details   Context   1. F.E. Allen. Control flow analysis. SIGPLAN Notices, 5(7):1--19, July 1970.

Details   Context   2. F.E. Allen. A basis for program optimization. In Proc. IFIP Congress, pages 385--390, Amsterdam, Holland, 1972. North-Holland Pub.Co.

Details   Context   3. F.E. Allen. Interprocedural data flow analysis. In Proc. IFIP Congress, pages 398--402, Amsterdam, Holland, 1974. North-Holland Pub.Co.

Details   Context   4. F.E. Allen and J. Cocke. Graph theoretic constructs for program control flow analysis. Technical Report RC 3923 (No. 17789), IBM, Thomas J. Watson Research Center, Yorktown Heights, New York, July 1972.

Details   Context   5. F.E. Allen and J. Cocke. A program data flow analysis procedure. Communications of the ACM, 19(3):137--147, March 1976.

Details   Context   6. Z. Ammarguellat. A control-flow normalization algorithm and its complexity. IEEE Transactions on Software Engineering, 18(3):237--251, March 1992.

Details   Context   7. B.S. Baker. An algorithm for structuring flowgraphs. Journal of the ACM, 24(1):98--120, January 1977.

Details   Context   8. C. Bohm and G. Jacopini. Flow diagrams, Turing machines and languages with only two formation rules. Communications of the ACM, 9(5):366--371, May 1966.

Details   Context   9. C. Cifuentes. Reverse Compilation Techniques. PhD dissertation, Queensland University of Technology, School of Computing Science, July 1994.

Details   Context   10. C. Cifuentes. Interprocedural dataflow decompilation. In print: Journal of Programming Languages, 1996.

Details   Context   11. C. Cifuentes and K.J. Gough. Decompilation of binary programs. Software -- Practice and Experience, 25(7):811--829, July 1995.

Details   Context   12. J. Cocke. Global common subexpression elimination. SIGPLAN Notices, 5(7):20-- 25, July 1970.

Details   Context   13. A.M. Erosa and L.J. Hendren. Taming control flow: A structured approach to eliminating goto statements. In Proceedings of the International Conference on Computer Languages, Universit'e Paul Sabatier, Toulouse, France, May 1994. IEEE Computer Society.

Details   Context   14. M.S. Hecht. Flow Analysis of Computer Programs. Elsevier North-Holland, Inc, 52 Vanderbilt Avenue, New York, New York 10017, 1977.

Details   Context   15. B.C. Housel. A Study of Decompiling Machine Languages into High-Level Machine Independent Languages. PhD dissertation, Purdue University, Computer Science, August 1973.

Details   Context   16. G.L. Steele Jr. and G.J. Sussman. Design of a LISP-based microprocessor. Communications of the ACM, 23(11):628--645, November 1980.

Details   Context   17. D.E. Knuth and R.W. Floyd. Notes on avoiding go to statements. Information Processing Letters, 1(1):23--31, 1971.

Details   Context   18. S.R. Kosaraju. Analysis of structured programs. Journal of Computer and System Sciences, 9(3):232--255, 1974.

Details   Context   19. U. Lichtblau. Decompilation of control structures by means of graph transformations. In Proceedings of the International Joint Conference on Theory and Practice of Software Development (TAPSOFT), Berlin, 1985.

Details   Context   20. U. Lichtblau. Recognizing rooted context-free flowgraph languages in polynomial time. In G. Rozenberg H. Ehrig, H.J. Kreowski, editor, Graph Grammars and their application to Computer Science, number 532 in Lecture Notes in Computer Science, pages 538--548. Springer-Verlag, 1991.

Details   Context   21. J. McCarthy. Recursive functions of symbolic expressions and their computation by machine, part I. Communications of the ACM, 3(4):184--195, April 1960.

Details   Context   22. G. Oulsnam. Unravelling unstructured programs. The Computer Journal, 25(3):379--387, 1982.

Details   Context   23. D.J. Pavey and L.A. Winsborrow. Demonstrating equivalence of source code and PROM contents. The Computer Language, 36(7):654--667, 1993.

Details   Context   24. L. Ramshaw. Eliminating go to's while preserving program structure. Journal of the ACM, 35(4):893--920, October 1988.

Details   Context   25. M. Sharir. Structural analysis: A new approach to flow analysis in optimizing compilers. Computer Languages, 5:141--153, 1980.

Details   Context   26. R.L. Sites, A. Chernoff, M.B. Kirk, M.P. Marks, and S.G. Robinson. Binary translation. Communications of the ACM, 36(2):69--81, February 1993.

Details   Context   27. M.H. Williams. Generating structured flow diagrams: the nature of unstructuredness. The Computer Journal, 20(1):45--50, 1977.

Details   Context   28. M.H. Williams and G.Chen. Restructuring Pascal programs containing goto statements. The Computer Journal, 28(2):134--137, 1985.

Details   Context   29. M.H. Williams and H.L. Ossher. Conversion of unstructured flow diagrams to structured form. The Computer Journal, 21(2):161--167, 1978. This article was processed using the L A T E X macro package with LLNCS style

Details   Context   [AKPW83] J.R. Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. Conver- sion of control dependence to data dependence. pages 177--189, 1983.

Details   Context   [AM75] E. Ashcroft and Z. Manna. Translating programs schemas to while-schemas. SIAM Journal of Computing, 4(2):125-- 146, 1975.

Details   Context   [Amm92] Zahira Ammarguellat. A control-flow normalization algorithm and its complexity. IEEE Transactions on Software Engineering, 18(2):237--250, 1992.

Details   Context   [ASU86] A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, Massachusetts, 1986.

Details   Context   [Bak77] Brenda S. Baker. An algorithm for structuring flowgraphs. Journal of the Association for Computing Machinery, 24(1):98--120, January 1977.

Details   Context   [Cif93] Cristina Cifuentes. A structuring algorithm for decompilation. In Proceedings of the XIX Conferencia Latinoamericana de Informatica, pages 267--276, Buenos Aires, Argentina, August 1993.

Details   Context   [EH94] Ana M. Erosa and Laurie J. Hendren. Taming control flow: A structured approach to eliminating goto statements. pages 229--240. International Conference on Computer Languages, May 1994.

Details   Context   [LY97] Tim Lindholm and Frank Yellin. The Java Virtual Machine Specification. The Java Series. Addison-Wesley, 1997.

Details   Context   [PKT73] W.W. Peterson, T. Kasami, and N. Tokura. On the capabilities of while, repeat and exit statements. Communications of the ACM, 16(8):503--512, 1973.

Details   Context   [Ram88] Lyle Ramshaw. Eliminating go to's while preserving program structure. Journal of the Association for Computing Machinery, 35(4):893--920, October 1988.

Details   Context   [Sri96] KB Sriram. Hashjava. url: http://www.sbktech.org/hashjava.html, 1996.

Details   Context   [vV96] Hanpeter van Vliet. Mocha. current url: http://www.brouhaha.com/~eric/ computers/mocha-b1.zip, 1996.

Details   Context   [Wil77] M.H. Williams. Generating structured flow diagrams: The nature of unstructuredness. Computer Journal, 20(1):45--50, 1977.

Details   Context   [Wil97] U. G. Wilhelm. Cryptographically protected objects, May 1997. A french version appeared in the Proceedings of RenPar'9, Lausanne, CH. http://lsewww.epfl.ch/~wilhelm/ CryPO.html.

Details   Context   [WO78] M.H. Williams and H.L. Ossher. Conversion of unstructured flow diagrams to structured. Computer Journal, 21(2):161--167, 1978. A Additional Rewriting Rules We anticipate using a few other tree rewriting rules that might improve readability of our code. The anticipated rules build more natural for-loops.


Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified:face="Arial" June 05, 2008