Source Code Repositories: Authenticating Production of Source Code

August 15, 2017 - from DisputeSoft's Josh Siegel

The reliable record of changes maintained by source code repositories makes them among best evidence an expert can be provided for the purpose of authenticating source code production. A source code repository tracks the development of a program by maintaining native source code files that can be examined as they existed throughout the history of the program’s development. Each change made to a source code file can be recorded by the repository, and it is difficult to alter the data within a repository without leaving traces of the alteration. For example, source code repositories often tie a unique, sequential ID number to each update of code. A gap in the sequence of code updates may indicate that the repository has been altered. Similarly, if a program was purportedly developed over the course of several years, but all of the code contained in a produced repository was added to the repository on the same day, the produced repository is probably not the repository used during the development of the software.

When an expert lacks access to a source code repository, he or she can still potentially authenticate produced source code if provided with individual files in native format. Although files produced outside of a repository are more easily altered, files in native format still contain metadata that may allow an expert to authenticate evidence. Metadata is data about data, and experts review metadata that records such information as the date of creation and last date of modification for computer files produced as evidence. For example, if all three of the following conditions were true, they would be strong indicators that produced code is authentic:

The party producing the code states that the year of completion for a version of code is 2005.
The produced code contains only files with last modified dates prior to 2005.
There are no obvious omissions from the produced code.
However, even where code is produced in native format as in the example above, individual source code files may be drawn from several different versions of a program. Access to a source code repository allows an expert to verify whether produced code constitutes a true and accurate copy of a version of a program as it existed at a certain time, or if the produced code was reconstructed from several different versions. Additionally, source code in native format but produced outside of a repository can omit files containing evidence of copying. Access to the repository allows an expert to evaluate the completeness of a production of source code.

Because converting a file out of its native format may alter or delete data that might have been used to authenticate the file, source code produced in a non-native file format is the most difficult to verify. Unless provided with more information, an expert may be unable to authenticate code produced in static formats such as paper printouts or text images (e.g., PDFs). For example, when filing a copyright, only the first 25 and last 25 pages of a program must be submitted to the copyright office. When this “deposit copy” of a program contains the program in its entirety, an expert can compare it to produced code for the purposes of authenticating the produced code. However, this method is limited to small programs and cannot rule out the possibilities that the copyright filing itself contains errors or that the documentation submitted to the copyright office is a reconstruction.

For all of these reasons, the source code repository is instrumental for verifying the completeness, authenticity, and validity of a source code production.

Source Code Repositories: Reviewing the Right Version of a Program

July 14, 2017 - from DisputeSoft's Josh Siegel

When examining software for evidence of copying in a misappropriation case, an expert attempts to examine the allegedly infringing program as it existed on or about the date of alleged copying. Programs evolve constantly due to regulatory changes, new operating system requirements, customer feedback, bug fixes, and many other external demands. Such updates may result in substantial alteration to a program over time, and the code that comprises a program on the date of alleged copying may differ significantly from the program’s code at the time of litigation. The code of the program at the time of litigation may contain little or no indication of copying, while previous versions of that same program may show significant evidence of copying. In some cases, a group or individual who has copied code may attempt to delete and rewrite some of the copied code over time in order to hide the fact that the program began as a copy of an existing work.

Using the change management features of a source code repository, an independent expert can “roll back” all of the updates made to a program to a specific date. This technique allows the expert to review and compare two programs as close to the date of alleged copying as possible, when the programs would likely be the most similar. If a plaintiff alleges that copying occurred on more than one date, an expert can use the version management feature of a code repository to analyze and compare code as close in time possible to each of the dates in question.

Once an expert has access to the accused program’s code as it existed on the date of alleged copying, he or she can form opinions about whether copying occurred. Programmer comments on updates, comparisons between software programs, or simply an unusually large addition of source code all are important clues that help a trained eye recognize software misappropriation.

Source Code Repositories: What is a Source Code Repository?

June 15, 2017 - from DisputeSoft's Josh Siegel

Software development is a competitive business, and disputes over intellectual property can arise when software engineers move to new companies that compete with their former employers. Should the dispute result in litigation, a source code repository can help an expert witness determine whether a former employee copied a previous employer’s proprietary source code on the way out the door.

When developing software for a business purpose, many software developers employ a source code control mechanism, such as a source code repository. Using a source code repository has many potential benefits for an organization, including:

– Concurrent Development: Repositories usually allow multiple developers to make edits to different parts of the same program simultaneously. Developers can then merge their changes back into the main program.

– Increased Transparency: Most source code repositories require a developer to check out, edit, and then check back in the part of the program he or she was editing. The repository records which developer made changes and when, resulting in a log of updates made to the program over time.

– Version Control: When developers make enough changes to a program stored in a source code repository, they can designate the updated program as a new “version” of the software. A repository also stores previous versions of a program, a feature which allows companies to restore a previous version if, for example, an update introduces a harmful bug.

Regardless of the nature of the dispute, evidence uncovered from a source code repository can provide a robust factual record of a program’s development history, which an expert can use to arrive at supportable and peer-reviewable opinions.