Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Softpanorama University Reverse Engineering Links

News

See also

Recommended Links

Articles Legal issues Bibliography Copyright Links Decompilation page(contains Java section)

Slicing

Understanding

Differential Testing

Code Reviews Watermarking Obfuscation Commercial Vendors Conferences

Reverse engineering is a very broad term. On high end it includes design recovery and on the other end -- recompilation and disassembly. But the essence of all this different activities is understanding of a particular program when something is missing (design documentation, source code, etc.).  Actually it might be useful to distinguish 'reverse engineering in the small" from "reverse engineering on the large", like we distinguish "programming in the small" and " programming in the large". Design recovery and program renovation is generally connected with research on programming understanding, while lower level reverse engineering activities (decompilation and disassembly) are more connected with compiler design and machine architecture issues.

In the United States, once you own a copy of a program, you can back it up, compile it, run it, and even modify it as necessary, without permission from the copyright holder. See 17 USC 117 According to the CONTU Final Report, which is generally interpreted by the courts as legislative history, "the right to add features to the program that were not present at the time of rightful acquisition'' falls within the owner's rights of modification under section 117.

What does all this mean? Once you've legally downloaded or bought a program, you can can run it, you can modify it, you can distribute your patches for other people to use. If you think you need a license from the copyright holder, you've been bamboozled by Microsoft. As long as you're not distributing the software, you have nothing to worry about unless you are trying to defeat some protection mechanism in the original software. Please browse my  Copyright issues for more information.

IMHO we need to support public interests against current abuse. If you own a website please consider joining opposition to copyright extentions,   No Cense,  or other similar opposition group.  UCITA didn't even make the Federal Trade Commission happy.

The other threat to reverse engineering is from pseudo scientists, for example all those "object-oriented no matter what"  zealots, who does not understand the difference between programming-in-the-large and programming-in-the small ;-). There is a lot of pseudoscience papers and even books on the problem of modernization of existing software system and a small army of successful sellers of snake oil makes good money selling more or less useless (and sometimes outright harmful) methodologies. There was a big splash of their activity related to Y2K problem, but now it's over. Actually all this Y2K efforts has one positive side effect: along with snake oil some useful research was conducted and some useful tools were polished and/or developed due to the huge money influx to the area at this time (see for example Open Directory - Computers Software Year 2000 Products).

Actually reverse engineering is more about understanding the product than an attempt to produce a clone benefiting from the original code. Documentation to many software products is notoriously bad, source code is unavailable or too expensive to obtain and if you need to develop a product that interfaces to such software package reverse engineering is your only option. In this page I mainly try to help the latter category of developers.

Please note that I am no longer involved in the research in this area and many links below can be dead or outdated.

Dr. Nikolai Bezroukov


Notes:
  • This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • The site contain some broken links as it develops like a living tree... Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.
Google Search
Open directory

Research Index

Old News ;-)

[Mar 25, 2006] Headway Software - Products - Structure101  An interesting static byte code analyzer for Java

Structure101 for Java parses your byte code and creates an implementation model of all the dependencies mapped up through the compositional hierarchy.  It does this at a rate of mega-SLOCs per minute. You can browse the model and view dependency diagrams at any level - method, class, package or jar. (More...)

We consider structure to be important through the life of an application - not just something that gets fixed in an expensive 'Big Bang'. At the same time, we realize that many of our customers only begin looking at structure when they get the feeling it is out of control.

...Structure101TM, currently available for Java only, is designed for live, evolving, imperfect, real projects, where ongoing development must continue. We have focused on making sense of large, difficult code-bases. Structure101 lets you keep a lid on the structural complexity so that it doesn't get any worse, and enables you to gradually streamline the structure while still working to hard delivery schedules.

We have been doing structure since 1999. The core engine of Structure101, the Higraph, is on its 3rd incarnation, lightning fast and massively scalable. It is our passion to continually find new ways to understand and control structure - to make structure simple.

It is very common for packages and classes to outgrow themselves.  Big fat packages or classes tend to be difficult to work with because they lack the structure that helps to guide human understanding.  Structure101 helps by letting you view even very large dependency graphs of the package or class contents.  To help further, Structure101 can perform an Auto-partition on the graph, to reveal the hidden, inherent structure.  As well has helping you understand what you've got, seeing the inherent structure may help you to decide how to add structure by creating sub-packages or classes.

[Jan 16, 2006] computer intelligence assembler disassembler ciasdis tool by Albert van der Horst

computer_intelligence_assembler_disassembler

This page is about how the Post-It Fix-Up principle works out in practical program code in Forth. For the impatient: jump to the downloads

Actual assemblers

Applying the Post-It Fix-Up principle to a 8086 assembler led to the discovery of problems that had to be solved. It turns out that some types of fixups better be considered not relative to the start of the instruction, but relative to the end. Otherwise there would be diffent fixups for e.g. byte/cell indication (B| X|), dependant on the length of the opcode. It is still there in the fig-forth version of the opcodes, such as B| W| besides B1| and W1| . So a new class of fixup, the "fix up's from behind" or reverse fixups were added. It turned out that other fixup's are not needed for the Intel, up to the Pentium. Other processors require fixup's with build in data. These so called data fixups are needed for the 6809 and the DEC Alpha.
A program was added that generates a PostScript file with the first byte opcodes for 8080 as well as 8086 , and the 80386 , a so called quick reference card. Comparing that to Intels documentation led to the discovery of one more bug. I had to redesign the opcodes, so other people could have trouble using this beast without such a reference card and the `SHOW: MOV|SG,' that lists all forms allowed for the move segment instruction.

[Jun 18, 2005] Reverse Engineering and Program Understanding

[Jan 5, 2002] Architecture Recovery of Dynamically Linked Applications- A Case Study [PDF] by Igor Ivkovic and Michael W. Godfrey View as HTML

Most previously published case studies in architecture recovery have been performed on statically linked software systems. Due to the increase in use of middleware technologies, such as CORBA, and OOP concepts, such as polymorphism, there is an opportunity and a need to analyze architectures of these dynamically linked systems. This paper presents the results of software architecture extraction of the Nautilus file manager, which employs CORBA in its implementation. A combination of existing static analysis and use-case modeling architecture recovery techniques was used with the expectation of complex but complete architecture extraction of a system such as Nautilus. We have found that this combined approach named Dynamo-1 presented in this paper provided successful focused architecture recovery and guidance for the future work in complete architecture recovery of dynamically linked applications.

Keywords:

Nautilus, GNOME, program comprehension

[Dec 30, 2001] The Law & Economics of Reverse Engineering  by Prof. Pamela Samuelson. -- one of the best legal paper on the subject. Highly recommended.

[Dec 28, 2001] Decompilation page and link to a decompiler by Satish Kumar. Contains a beta version of DisC - Decompiler for TurboC and a small intro to the problem of decompilation using Intel assembler fragments of small C programs as an example.   Compare to Decompilation of Binary Programs - dcc. See Decomlilation and Decompilers Page

[Feb 06, 2001] Slashdot Brief Analysis On Reverse Engineering Software --" An article on PlanetIT.com discusses a court ruling that establishes the reverse-engineering of hardware and software as legal, under the "fair use" umbrella. What ramifications does this have in the industry? Can I reverse-engineer MS Word and write a word processor that can read and save .DOC files?"

Well, that clears that up, then. (Score:3, Insightful)
by Anoriymous Coward on Sunday February 04, @03:14PM EST (#18)
(User #257749 Info)
I'm confused. Possibly so is the author of this article. He seems to imply that UCITA is a pending piece of federal legislation, rather than state legislation. As it is, UCITA appears to be dead and buried in most states (hooray!).

He draws a line between the Reimerdes and Connectix cases by quoting that Reimerdes "didn't have a right to the DVD". Did he steal it? More confusion.

Anyway, it seems the 9th Circuit gets overturned all the time, so I wouldn't get too hopeful about this being a positive sign.
Re:Well, that clears that up, then. (Score:1)
by edwardames
(edwardames at hotmail dot com) on Sunday February 04, @08:57PM EST (#166)
(User #157809 Info)
That assertion by the journalist also took me aback for a second. I have no doubt the software industry would likely try to get legislation through Congress to "correct" a court ruling such as this one, but that's just my suspicion. UCITA, though it would impact cases like the one in the story, certainly has nothing to do with the U.S. Congress. UCITA's going through the legislatures, even if it is going slowly.

Despite your opinion of the current status of UCITA, I think that it is far from dead. Take a look at this map to see where UCITA lobbying activities are underway. Check out anti-UCITA ucita.com. and pro-UCITA ucitaonline.com. It's still an issue that has to be followed or it'll take us all by surprise one day, by becoming the law of the land.

Ed

Reverse Engineering file formats (Score:5, Insightful)
by Pedrito
(pdavis68@hotmail.com) on Sunday February 04, @03:28PM EST (#31)
(User #94783 Info)
I reverse engineered quite a few MS file formats (see my out-of-print book Undocumented Windows File Formats) and never had any hassles from MS regarding the reverse engineering.

In fact, MS tried to hire me to provide them with the specs for one of their file formats. Apparently the author of the code never documented the file format. MS had released specs for it, but they were completely wrong.

After being told by several friends that MS was notorious for delaying payment with contractors, I asked for half the money up-front. They refused and I never did the work.

But I digress. I reverse engineered a number of file formats that were "proprietary" Microsoft files. If they're going to go after anyone for it, surely they would have gone after me since I was publishing them left and right in magazines and my book.

I've figured ever since then that MS must have known that the whole thing about reverse engineering in their licenses must be unenforceable.

You can also look at all the work Andrew Schulman and Matt Pietrek did reverse engineering Windows code and the PE file format and neither of them ever got hassled either, as far as I know.

Pete Davis


-- "Suppose you were an idiot. And suppose you were a member of congress. But I repeat myself." - Mark Twain
DeCSS Reverse Engineering? No proof (Score:3, Informative)
by joneshenry on Sunday February 04, @03:44PM EST (#47)
(User #9497 Info)
I urge everyone who thinks that DeCSS was reverse engineering to actually read materials such as the transcript of Johansen's testimony. There is simply no evidence that DeCSS was the product of legitimate reverse engineering. Not just once but twice anonymous information was contributed to crack the problem in a form that does not resemble what one would get from treating the system as a black box. Johansen testified: "Yes, I believe the CSS authentication had been posted anonymously in Assembler language on the Internet, and Derek Fawcus had picked that up and rewritten it in C language and posted it on his website." Note the word "Assembler". Johansen also testified that he was given further information from a complete stranger on IRC. On the Livid-dev mailing list on Saturday, October 02, 1999 Eric Smith had posted: "The specific issue WRT the CSS code is that the x86 code was apparently simply ripped out of a working commerical implementation (which was presumably copyrighted)" to which Derek Fawcus had replied "Well I guess it might have been, but I don't _know_ that." (Fawcus went on to explain how he had "worked to understand the algorithm underlying the x86 code.") Why the developers didn't run away as fast as they could once there were questions is something I cannot understand. Didn't anyone learn from previous examples such as Compaq's reverse engineering of the IBM PC BIOS? Compaq set up their reverse engineering effort so that at every stage they could prove the source of information using engineers whom they could assert did not have prior exposure to IBM IP.
Reverse engineering in non-US jurisdictions (Score:3, Informative)
by Wills on Sunday February 04, @04:42PM EST (#87)
(User #242929 Info)
 

Two years ago an Australian court ruled reverse engineering to be lawful (Slashdot story, October 1999) . Other jurisdictions outside the US have given similar positive decisions.

Drivers (Score:2, Insightful)
by tzoompy on Sunday February 04, @09:19PM EST (#171)
(User #312872 Info)
A lot of the *nix drivers are done by reverse-engineering. I know of a couple of Linmodem driver projects that started with a copy of the binary of the corresponding Winmodem. Reverse engineering applied in the purpose of getting hardware specs out of the driver is OK with most of the driver companies. The Win/Lin modem manufacturers care mostly about the SP processing algorithms rather than the DSP specs. The problem with reverse engineering for these modems is that together with the hardware specs, there is sufficient information about what SP algorithm they are using that a sufficienly knowledgeable person can reverse engineer everything out of their driver.

[Dec 18, 2000] Clipper and FoxPro decompilation

[Oct 10, 2000] Filter Factory Plug-in Decompiler   This little 16bit DOS program generates a ".afs" of any ".8bf" file compiled by the Filter Factory of Adobe Photoshop (PC version only).

[Sept 15, 2000] Java decompilers

[Aug 19, 2000] Open Visualization Data Explorer

The wonderful ferroconcrete world we live in has more lawyers than rats. There are patents underlying the most obvious software designs (yes, a simple lawsuit showing prior art will defeat three quarters of them, but I for one won't spend my life savings on them, and companies with pockets that are deep enough prefer not to invalidate competitors patents for fear of getting blasted themselves).

Patent issues aside, there's the legal debate about licenses. If we (the Open Source developers) cannot put our legal squabbles aside (my license is more free than yours -- no, mine is), how would anyone expect to put big business to put theirs aside? Beside ego, they've got shareholders to take into account.

I've been mighty impressed with IBM's venture into the Open Source arena. I think they've taken the boldest steps of all. It's not just half-baked Java stuff (with tremendous investments behind them) or stuff without direct revenue potential (like jfs, which they couldn't sell as long as competitors think their mouse trap is better). If you search for "IBM Visual Data Explorer" on www.ibm.com, you'll get a price list with a rather hefty price tag (and if you dig deeper, you'll find an impressive array of Fortune 500 companies and research institutes that paid those prices and got their moneys worth). If you look at opendx.org, you'll see the same software, free. The stuff is awesome!

Whatever their motivation, I rate IBM highly for its commitment to Open Source. It's a rather stunning move, given their revenue streams and the fact that they spearheaded the move from free to paid-for software eons ago.

Maintenance Understanding Metrics and Documentation Tools for Ada C C++ and FORTRAN  -- "Our tools help  developers  understand, document, and maintain impossibly large or complex  amounts of source code."

Linux Today Upside R.I.P. reverse engineering

Reverse engineering is a powerful tool that allows innovators to improve upon someone else's design or make compatible products.

Or ... reverse engineering is a way for freeloaders to gain unauthorized access to proprietary products and infringe upon someone's intellectual property rights.

Either way you look at it, keep in mind that some forms of software reverse engineering are restricted by legal rules of contract and copyright law.

Latest battle
Last month Mattel (MAT) filed a lawsuit against two software programmers, alleging both copyright and software license violations.

Mattel and its subsidiary, Microsystems Software Inc., filed documents in Massachusetts federal court claiming that software programmers from Sweden and Canada reverse-engineered its CyberPatrol Internet filtering software. According to Mattel, the programmers then created a utility known as "cphack.exe" or "CP4break.zip" that allows people to see a list of those sites considered off-limits by CyberPatrol.

Mattel said that by reverse engineering CyberPatrol, "developing source code and binaries to bypass" CyberPatrol's protections, and then posting the utilities on the Internet, the programmers had violated Mattel's copyrights and the terms of the CyberPatrol license.

Last Friday, the court hearing this case agreed that Mattel's claims have merit, and issued a temporary restraining order prohibiting the distribution of cphack.exe and CP4break.zip.

SecurityFocus.com: The Fine Print in UCITA (Mar 17, 2000)
Cyber Patrol sues codebreakers (the AP story is *wrong*) (Mar 16, 2000)
Wired: Furor Over Virginia E-Biz Law [UCITA] (Mar 16, 2000)
SJ Mercury/AP: Software filter firm sues hackers (Mar 16, 2000)
SJ Mercury: Greed undermines benefits of digital technology (Mar 05, 2000)
ZDNet: Hollywood's war on open source (Feb 28, 2000)
Freshmeat: The Dangers of UCITA (Feb 24, 2000)
osOpinion: Cronus Overthrown: a perspective on CSS and SDMI (Feb 07, 2000)
UPDATED: Richard Stallman -- Why We Must Fight UCITA (Feb 06, 2000)
Arne Flones -- The Digital Millenium Copyright Act: A Corporate Bully Bludgeon (Jan 25, 2000)
Copyright Office: Exemption to Prohibition on Circumvention of Copyright Protection Systems... (Jan 21, 2000)
Linux Journal: Copyright Strikes Back (Nov 23, 1999)

[fm] objdump-beautifier

objdump-beautifier is a Perl script to make objdump output more useful. It traces function calls and jumps, locates string constants, removes leading zeroes, and corrects objdump's annoying habit of making negative numbers positive.

webreview.com - A New law Affects Innovation and Compatibility

But when it comes to reverse engineering, Nebergall says a prohibition in a shrink-wrap license would probably be binding under UCITA. In the absence of UCITA, according to software engineer and lawyer Cem Kaner, "No court has ever upheld a ban on reverse engineering for mass-market software."

Like other provisions in UCITA, a prohibition could be overturned by law. But current law is not too strong. The flagship copyright law of the decade, the 1998 Digital Millennium Copyright Act, permits reverse engineering "for achieving interoperability." Sounds good; if Corel wants to create a word processing product that accepts .DOC files, that's all for the benefit of interoperability, isn't it?

But an aggrieved company could plausibly claim that reverse-engineered products constitute competition, not just interoperability—so a prohibition on reverse-engineering might stand. And as Kaner points out, reverse engineering has many legitimate purposes that might be squelched by a shrink-wrap license.

Reverse engineering raises many of the same questions as the user interface or "look-and-feel" copyright suits ten years ago. Both issues raise the questions of what is the true intellectual property in software, and how important it is to promote new innovations or competition in comparison to protecting earlier innovations. Where your sympathies fall will determine whether you think UCITA is fair.

Design Recovery and Program Understanding

Design Recovery involves the examination of legacy code in order to reconstruct design decisions taken by the original implementors. Artifacts in both the source code and in executable images are examined and analyzed. Software tools are necessary because such systems are often very large and the goal is the understanding of significant global and diffuse features of that code. In partnership with the IBM Centre for Advanced Studies, the University of Toronto, the University of Victoria, and McGill University, the group is developing novel tools and techniques that will leverage the human learning process when applied to the understanding of legacy source.

A prototype tool (ART for Analysis of Redundancy in Text) based on exact matching of text has successfully identified useful structural information in a 40 MB source tree and has demonstrated a potential to scale up to 500 MB. Work is underway on integrating the tools of the research partners to produce a system with a shared repository, visualization tools, and access to a major commercial tool.

[August 2, 1999] Andys Binary Folding Editor -- Andys Binary Folding Editor is primarily designed for structured browsing, although it also provides minimal editing facilities. This program is designed to take in a set of binary files, and with the aid of an initialisation file, decode and display the definitions (structures or unions) within them. BE is particularly suited to displaying non-variable length definitions within the files. This makes examination of known file types easy, and allows rapid and reliable navigation of memory dumps. BE is often used as the data navigation half of a debugger. 

[July 25, 1999] cgvg -- Tools for convenient grepping through code. cgvg is a pair of Perl scripts ("cg" and "vg") which act as wrappers for find and grep. The main idea is to act as a temporary replacement for cscope until there is
a good GPLed cscope available. "cg" does the grep through code, storing the info in a text file in the user's home directory, and "vg" lets the user open on editor at the line in the file where a particular match was found. Some features include color highlighting, human-readable output, resizing to screen width, support for many editors, etc.

[July 17, 1999] code2html -- Converts a program's source code to syntax highlighted HTML May 08th 1999, 15:02 stable: 0.6.2 - devel: none  license: Freeware

[July 17, 1999] legdoc is a perl script to document C source file trees. It uses tags often found in legacy C code to provide documenation for the same. It can either convert a list of named files or an entire directory. The documentation will be stored in index.html. 

[June 20, 1999 ]Reverse Engineering the LEGO RCX

[June 20, 1999] Harald Gall, PhD 

[June 20, 1999] Software Reuse and Reverse Engineering

 

 


Recommended Links


In case of broken links please try to use Google search. If you find the page please notify us about new location
Google     

New:

[Dec 30, 2001] The Law & Economics of Reverse Engineering  by Prof. Pamela Samuelson. -- one of the best legal paper on the subject. Highly recommended.  See also Professor Samuelson

[Dec 28, 2001] Decompilation page and link to a decompiler by Satish Kumar. Contains a beta version of DisC - Decompiler for TurboC and a small intro to the problem of decompilation using Intel assembler fragments of small C programs as an example.   See Decomlilation and Decompilers Page

Internal

External:


Articles


Legal issues

See also Softpanorama Copyright Links

Etc


Bibliography


Commercial Vendors


Differential Testing 

DIGITAL Technical Journal - Differential Testing for Software -- technique useful in reverse engineering. In case you reimplementation is "almost" complete it can be tested against etalon by feeding to both randomly modified test cases -- difference in behaviors can lead to interesting insights. IMHO especially useful in case of reimplementation of Microsoft Office suit... 

Application Note -- Software Test Tools Considered Harmful

A failure to synchronize during playback causes the test recording to abort and typically signals that the application has changed in a way that the test has detected. The problem is that in some cases, when the application hasn't changed, the failed test would imply that it had incorrectly. Too many such false-negative results would tend to lead to the view that the test suite is unreliable. 

Testing Strategies and Methods


Obfuscation


Watermarking


Conferences

WCRE The Working Conference on Reverse Engineering (WCRE) is the premier research conference on the theory and practice of recovering information from existing software and systems. WCRE explores innovative methods of extracting the many kinds of information that can be recovered from software, software engineering documents, and systems artifacts, and to examine innovative ways of using this information in system renovation and program understanding. WCRE proceedings are available from IEEE Computer Society Press
(phone +1-714-821-8380 or +1-800-CS-BOOKS; cs.books@computer.org). See CS Press Catalog site.


Etc

 


Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: June 05, 2008