A new Project Research Openness for Validation with Empirical Data (PROVEDIt) database that can help to bring more reliability to the interpretation of complex DNA evidence. The database will help reduce the risk of misinterpreting the profile. PROVEDIt database is published in the journal Forensic Science International: Genetics.

Forensic DNA evidence is a valuable tool in criminal investigations to link a suspect to the scene of a crime, but the process to make that determination is not so simple since the genetic material found at a crime scene often comes from more than one person. Right now, there is no standardization of tests. There is accreditation of crime labs, but that's different from having standards set out for labs to meet some critical threshold of a match statistic.

In analyzing DNA mixtures, scientists will often find partial matches, so part of the determination of whether a suspect contributed to an item of evidence depends on interpretations by forensic scientists. The team developed computational algorithms that sorted through possible DNA signal combinations in a piece of evidence, taking into account their prevalence in the general population to determine the likelihood that the genetic material came from one, two, three, four, or five people.

Information from the PROVEDIt database could be used to test software systems and interpretation protocols and be used as a benchmark for future developments in DNA analysis. The PROVEDIt database, which consists of approximately 25,000 samples, is accessible to anyone for free. We wanted to provide these data to the community so that they could test their own probabilistic systems. Other academicians or other researchers might develop their own systems by which to interpret the very complex types of samples.

The website's files contain data that can be used to develop new or compare existing interpretation or analysis strategies. Forensic laboratories could use the database for validating or testing new or existing forensic DNA interpretation protocols. Researchers requiring data to test newly developed methodologies, technologies, ideas, developments, hypotheses, or prototypes can use the database to advance their own work.

Lun, a computer science professor at Rutgers-Camden, led the way in developing the software systems, doing the number crunching to determine the likely number of contributors in a DNA sample, and calculating statistics to determine the likelihood that a person contributed to a sample or not. The approach that took to develop these methods is that the team thought that it is very important that they be empirically driven. That they can be used on real experimental data in order both to train or calibrate these methods and validate them.