PRODUCT INFORMATION Name of the drug Fluvoxamine maleate is chemically identified as (E)-5-methoxy-4/-trifluoromethylvalerophenone 0-2- aminoethyloxime maleate. It has the fol owing chemical structure: CAS No. 61718-82-9 MW= 434.4 Description Fluvoxamine maleate is a white to slightly off-white, odourless, crystal ine powder, sparingly soluble in water, freely soluble in etha
tivirus scanner. Ultimately, though, we don’t care about Perhaps the scariest part of this situation is that we usability as determined in the laboratory—we care about don’t completely understand why we are failing. We actual use: Do administrators misconfigure firewalls in have identifiable problems: unapplied patches, out-of- practice? How often does user confusion over proper virus scanner use actually lead to compromise? important advantage—the public availability of security To measure the use of security technologies in real- products. Highly-skilled attackers can keep modifying world circumstances, we have to account for how a given their newly created malicious codes until they can by- technology will interact with a huge variety of software, pass all current defenses , forcing every security ven- systems, users, uses, and attack profiles. The full com- dor to constantly update their products. Given this situ- plexity of the computational world cannot be captured ation, how can a regular user know that their vendor is in any lab setting or theoretical model—there are too providing adequate protection against the latest threats? many variables, and many of them change over time- The obvious answer is that users should check published frames (months or years) that cannot be practically mea- benchmarks; unfortunately, according to those tests, vir- sured in a laboratory setting using humans. As an al- tually every major product appears to be equivalent— ternative, we propose that the performance of security they all “pass” or catch virtually all tested threats.
technologies be measured “in the field.” Specifically,we propose that security technologies be tested using the In the antimalware field, researchers and industry same methodology as used in medical clinical trials. In members are currently working on developing better test- essence, we propose that we use the same measures of ing standards ; this task is extremely difficult, how- outcome, side effects, and user tolerance and compliance ever, because vendors and evaluators disagree regarding that regulatory bodies use to demonstrate that the benefit basic testing practices. For example, there is no consen- of a drug or medical device outweighs its risks. Clini- sus on how to construct an a collection of malware for cal trials come in many forms depending upon the spe- testing purposes. A major point of contention is whether cific questions they are designed to address; what they all such collections may contain new viruses, rather than have in common, though, is that the test subjects live in just ones not observed “in the wild” .
the “real” world, not a laboratory.
Clinical trials were originally developed because med- While there are certainly ethical issues involved with ical practitioners faced challenges analogous to those creating new computer viruses, we believe there is a faced by today’s security professionals: they knew a more fundamental issue: if you create malware from lot about health problems, but they didn’t know what scratch for testing purposes, how do you know you’ve worked to prevent or fix them. Clinical trials provided a created the right kinds? In other words, how will you methodology for separating “snake oil” from penicillin.
determine whether detection performance on synthetic As we will explain, clinical trials have a number of lim- test cases will correlate with performance on malware itations as a testing methodology; our hope, though, is observed in practice? This issue is just one part of a that clinical trials of security technologies will allow us much larger issue: how can you take into account all of to separate ineffective and dangerous technologies from the factors—detection mechanisms, relative frequencies those that provide significant security benefits.
of different kinds of malware, user behavior, host andnetwork environment, changing attacker strategies and goals—that affect a product’s real world performance ina set of standardized lab tests? The evaluation problem exists broadly in computer se-curity, for both academic research and commercial prod- We believe the simple answer is that you can’t—the ucts. The most egregious type of improperly evaluated task is impossible. There are simply too many variables.
security technology is often referred to as “snake oil” .
Researchers and companies will continue to argue about The ultimate question in computer security evaluation is, proper lab testing procedures because there is no single how do we differentiate effective security mechanisms right answer: every test incorporates assumptions about from such quackery, particularly in the eyes of a lay au- the real world, and these assumptions cannot be evalu- Such differentiation is becoming more important be- cause, almost always, even the best commercial systems Is there a way beyond this impasse? Perhaps, but only cannot detect many of the most recent threats. This lim- if we can test security technologies “in the field”—in the itation arises because new threats emerge much more contexts in which they are used. Of course, such testing frequently than before, and meanwhile some of them would involve attempting to protect real users from real aim for economic profits and use very complex tech- threats while measuring relative performance. This ap- nologies in order to bypass security mechanisms .
proach is technically difficult, expensive, ethically chal- Even though many security companies have started using lenging, and potentially very risky. We believe, however, more flexible techniques such as heuristics to respond to that such testing is feasible based on experiences from new threats, in this arms race attackers always have an the field of medicine, in the form of clinical trials.
Randomly Chosen Treatments Subjects or doctors do not choose their treatment; instead, the treatment is While computers and humans are very different sys- tems, the medical field has long faced evaluation prob- Control Groups Some subjects do not receive any treat- lems analogous to that of computer security. Specifi- ment or are given a placebo (e.g., a sugar pill).
cally, before the 20th century there existed many poten- Blinding In a single blind study, subjects do not know tial “defenses”—treatments that promised to ensure or which treatment they are receiving. In a double- repair health—but people continued to be attacked and blind study, the treating doctors do not know either.
compromised (suffer and die prematurely from disease).
Indicators Often the condition studied evolves over a While modern medicine has a variety of limitations, cur- rent medical practice has treatments that can reliably pre- end (e.g., wait until the subject is cured or dead), vent or cure many conditions that before were debilitat- progress is measured by observing indicators that ing or even fatal. What is remarkable about these treat- are known to correlate with the final outcome. For ments is that, in general, we don’t understand how they example, insulin and blood sugar levels of dia- work: our understanding of living systems is still prim- betes patients are monitored in diabetes-related tri- itive in many ways. Despite this lack of knowledge, als. Note that it is often hard to find a reliable indi- however, we are now able to differentiate treatments that cator (e.g., a cancer recurs even when all tests indi- work from those that do not. The primary methodology cate the treatment was successful); thus, longer term for drawing such conclusions is the clinical trial .
studies are always required to assess the reliability The key insight behind clinical trials is that when studying systems (such as the human body) that are com-plicated, diverse, and tightly coupled with a dynamic en- Due to the constraints of particular experiments, not all vironment, individual variables cannot be isolated and so clinical trials will include all of these features; the more cause and effect relationships cannot be inferred from that are used, however, the greater the statistical power individual observations: correlations can occur with- of the results. In other words, each of these mechanisms out causation, and observed effects can originate from help with determining causal relationships. The fewer unidentified causes. Clinical trials are an experimental that are used, the more likely the study will only show methodology designed to identify causal relationships in While clinical trials are very powerful tools for deter- In medicine, clinical trials, or randomized control tri- mining cause-effect relationships, they are not able to tell als (RCTs), are planned experiments that are designed to why those relationships exist. Clinical trials do not them- compare treatments for a given medical condition. They selves provide explanations or models; what they can do, use results based on a limited sample of patients to make however, is test the validity and completeness of models.
inferences about how treatments should be conducted in For example, in medicine drugs that work well in lab ex- the general population of patients. While the majority of periments routinely fail to work in clinical trials on peo- clinical trials are concerned with evaluating drugs, they ple. This failure happens even when the precise molec- can also be used to evaluate other interventions such as ular mechanism of the drug is known. Quite simply, we surgical procedures, radiotherapy, physical therapy, and cannot capture the full complexity of the human body in any current model or lab. With clinical trials, how- To account for variations in genetic makeup, lifestyle, ever, we can make sure that regular patients get effective life history, and environment, clinical trials are designed treatments—even if we don’t understand how those treat- Selected populations At risk or afflicted individuals are studied, rather than the general population.
Extended duration Experiments are performed for Because computers are engineered systems, we are much months or, ideally, years in order to evaluate longer better able to determine cause and effect in computer se- curity than in medicine. However, while it is relativelystraightforward to understand a given vulnerability and Random samples Subjects are randomly recruited from devise a patch that fixes it, as we explained in Section 2, it is not nearly so easy to determine what produce the ul- Comparable Treatments Subjects are given one of a timate result of more secure systems. So, here we ask, is small selection of treatments, each of which is in- it potentially feasible to adapt the clinical trial methodol- The key constraint to the feasibility question is to re- Treatments Three major antivirus programs would be alize that clinical trials cannot be use to address the same selected for the trial and randomly assigned to dif- questions as standard security evaluation techniques. We cannot use a clinical trial to analyze malware, expose a tivirus programs would be allowed to be installed; new software vulnerability, or test a new cryptographic otherwise, only the standard security software that protocol. However, we can use clinical trials to address comes with Windows Vista would be allowed to be used. Compliance would be verified by scanningoff-site backups.
• What is the security benefit of running an antivirus program on a personal computer in a typical home? Note that all provided software would be kept auto- • Do personal firewalls provide additional protection matically up to date, including updates to the lat- for technically advanced users on their home ma- subscription model.) Other upgrades (software and • Does user training protect organizations from social hardware) and new installations would be permitted at the user’s discretion (e.g., upgrades from Win-dows Vista to Windows 7 and the installation of new Note the key feature of these questions is that, because they involve interactions between computers and their Control A control group would receive no antivirus pro- users in specific environments, they cannot be answered gram and would be prohibited from running any in a controlled laboratory setting; nevertheless, they are host-based antivirus program. To ensure that users precisely the kinds of questions we need to answer if we were still protected, unobtrusive non-host based de- are to improve security in practice.
fenses (e.g., scanning disk backups, cloud-based an- It takes a team of people to develop a medical clinical trial design: experts in the specific treatment must work protection could not be provided with these other with general clinicians, statisticians, experts in patient mechanisms, we would then have to omit a control recruitment, ethicists, and others. Given that computer group. This case is analogous to a medical clini- security clinical trials will also deal with human popu- cal trial where it is unethical to omit treatment for lations (along with computer populations), many of the same technical, legal, ethical, and logistical issues willneed to be addressed. For these reasons, we cannot hope Blinding The antivirus programs would be modified to to present a complete trial design here; however, we can remove any obvious corporate insignia or other ad- give an outline for a plausible computer security clinical vertising. Color schemes would also be modified to trial. Here we present a sketch of a trial addressing the make them as similar as possible. Otherwise, how- first question: the benefit of antivirus programs.
ever, their interfaces would remain the same. Such It is generally recommended that all personal com- uniformity would help minimize the effect a prod- puter users (at least, those running a version of Microsoft uct’s brand on user behavior, e.g., a new product Windows) run an up-to-date antivirus scanner. A clinical trial designed to test their relative benefits could have the In addition, if we have a control group, the control group computers would run a program that mim- Population Users running (at the start of the trial) Mi- icked the appearance and behavior of an antivirus crosoft Windows Vista SP2 on a home machine con- program. It would provide a Windows tray icon and nected to the Internet via a large home internet ser- it periodically would report that its signatures were updated. In addition, it would check and report a Duration Three years, with preliminary results reported variety of relatively innocuous, common problems such as tracking cookies. This program would dono proper scanning and it would provide no protec- Sample 1000 ISP subscribers would be randomly re- cruited to participate in the trial. Each subscriberwould be given the following incentives to partic- Indicators A variety of measures would be required to ipate: free technical support and automatic offsite monitor the users and computers involved in the backups for all machines enrolled in the trial and study. Primary measures would classify the effi- their users. In return, they would have to agree to re- cacy of the tested systems based on scans of off- searchers monitoring their computer usage (subject site backups for examples of known malware. To to appropriate privacy and other controls). Users maximize accuracy, such scans would use a large would be allowed to drop out of the trial at any time.
number of commercial scanners (including those not part of the test). Further, supplementary soft- also adapt to new attacks via automated update mech- ware would record CPU, disk, and network usage.
anisms. Thus, clinical trials of security software will, Periodically, a small subset of machines would be implicitly, be testing the software and the organization inspected manually by security experts to evaluate behind it. In practice, then, we would really be com- computer health and other characteristics. Finally, paring humans (attackers) versus humans (defenders), as technical support records would give direct mea- mediated by a computational battlefield.
But even if we are talking about human institutions, The primary goal of such measurements would be as with many financial products, past performance is not to evaluate the “health” of the subject machines. Of indicative of future results. Given that we cannot pre- course, we cannot ever be completely sure that a dict the future of security technologies using any current seemingly healthy machine is not infected. We do technique (including formal models), however, past per- not need to know “ground truth” in this situation, formance is all we have to go on when choosing security however—we just need to measure relative perfor- solutions. Clinical trials are merely a formal methodol- mance. Thus, simplistic measures should suffice for ogy for rigorously assessing that past performance.
While there are a variety of logistical, technological, and financial challenges implicit in the above descrip- Even if adopted, a clinical trial methodology will not be tion, it should be clear that it would be possible to run a panacea with respect to security. While the approach this trial given the right resources. While we could spec- should demonstrate the real world effectiveness of prod- ulate on what results we might find from such a study, ucts, it will not explain why differences exist. For exam- the fact is that we don’t know what would be found. In- ple, consider two virus scanners. Our trial would perhaps deed, that is the key point of clinical trials: they can re- show that one product provides statistically better protec- veal interactions and behaviors that are not observed in tion than the other—but it would not (directly) provide laboratories nor predicted by theoretical models.
any explanation for their differential performance. Is itthe accuracy of virus detection? The speed or ease of update? While individual users may be able to say whatthey liked about the product they were given, such opin-ions only provide clues as to the cause. As such, the re- There are many potential objections to the use of the clin- sults produced by the trial may be both unexpected and, ical trial methodology in a computer security context.
Here we address some of the ones that have arisen in ourdiscussions.
Because of these limitations, clinical trials should be seen as a complement to, not a replacement of, lab testingof security technologies. We also believe better method- ologies are needed for lab evaluations. Our purpose here,though, is to point out that lab testing cannot be expected One significant objection is that computer security is to address all of the issues that arise in deploying secu- fundamentally different from medicine because the ad- rity solutions. Clinical trials provide a rigorous way to versaries we face are not microorganisms but people— determine to what extent solutions developed in the lab intelligent, motivated people. While many have debated the merits of the biological metaphor for computer secu-rity , we believe that debate is not relevant to the ques-tion of computer security clinical trials because the un- derlying methodology is applicable in any circumstancewhere one is performing experiments outside of a con- To be sure, clinical trials are an expensive and compli- trolled lab setting. Randomization, selected populations, cated way to evaluate systems. Aren’t there feasible al- controls, blinding—these are just techniques for isolat- ternatives? We have already discussed the limitations of ing one variable of interest from a complex background lab experiments; however, there is an alternative. Rather than deal with the overhead of blinding, controls, screen- Of course, it is true that clinical trials are back- ing populations, and the like, why not just observe real wards looking; thus, it is always possible that new users with the defenses they already have? attacks could render previously effective defenses Such experiments are known as observational trials.
obsolete—something that happens much less frequently They are used frequently in medicine, particularly when in medicine. However, virtually all modern security tools researchers are searching for effects that show up over long periods of time (e.g., decades). Unfortunately, ob- the importance of information assurance in the modern servational trials are very limited in their ability to estab- world and the increasing regulatory requirements for op- lish causal relationships. Thus, virtually any interesting erational security, we believe the cost and complexity of correlation found in an observational trial is later subject clinical trials are justified. While the ultimate value of security clinical trials will only be known in retrospect, While the cost of a security clinical trial can be miti- we are optimistic that clinical trials will help the develop- gated through appropriate automation, a clinical trial will ment and deployment of effective security technologies.
always be at least an order of magnitude more expensivethan a simple lab comparison because of labor costs, par- ticularly for technical support, subject recruitment, andongoing observation. For example, assume that a trial re- The authors wish to thank Tim Furlong for first thinking quired a 10:1 ratio of subjects to study personnel. Then, of the computer security clinical trial in a lab brainstorm- to run a trial with 1000 subjects we would need 100 study ing meeting in the summer of 2006. AS, YL, and HI ac- employees. If they are paid $100,000 on average, this knowledge support from Canada’s NSERC, though the Discovery Grants program and the Internetworked Sys- We believe this estimate is a worst case scenario— tems Security Network (ISSNet), and MITACS.
effective security clinical trials should be feasible for atenth this cost ($1,000,000/year) or less. But even thispessimistic estimate is potentially feasible: computer security is a multi-billion dollar market, and $10 mil-lion/year is well within the funding capabilities of gov-  AMTSO. Anti-Malware Testing Standards Orga- ernments or NGOs (non-profits). Further, this cost is jus- tified by the importance of the problem. Organizations  DEFCON. The Race to Zero Contest. http:// are now being required by regulation to implement secu- www.racetozero.net/, August 8–10, 2008.
rity solutions. Such implementations can be very expen-sive. To date, we have no way of determining whether  FOSSI, M., Ed. Symantec Global Internet Security those solutions provide concrete benefits in practice.
Threat Report, Volume XIV. Symantec, 2009.
If clinical trials are shown to work for computer se- curity, it is likely they will become mandated by regula- DEMETS, D. L. Fundamentals of Clinical Trials, tion, much as they have been for medicine. Such regula- tions would mean that changes in security practice wouldfirst need to be experimentally evaluated—for their se- curity benefit in practice—before being adopted. We think such a change would be to the benefit of the com- puter security industry. Before medical practice was reg- ulated, there was a vigorous but relatively small trade in patent medicines—unregulated preparations that claimedto cure people’s ills. Despite being pioneers in marketing  LARKIN, E. Storm Worm’s virulence may change and advertising, patent medicines were widely maligned tactics. Network World (August 2, 2007).
and mistrusted, largely because in general they didn’t ac-  OBERHEIDE, J., COOKE, E., AND JAHANIAN, tually work . In contrast, modern medicine is an ex- tremely large, lucrative, and well-respected enterprise. If work Cloud. In 17th USENIX Security Symposium our community can, as a group, recommend solutions for which we have scientific evidence of their efficacy, per-haps computer security will also see a transformation in  SCHNEIER, B. Snake oil. Crypto-Gram Newsletter (February 15, 1999). http://schneier.com.
 SOMAYAJI, A., LOCASTO, M., AND FEYEREISL, J. Panel: The Future of Biologically-Inspired Se-curity: Is There Anything Left to Learn? In 2007 In order for the field of computer security to progress, we Workshop on New Security (2008), ACM.
need better ways to measure the relative benefits of dif-ferent techniques and tools as they are used in practice.
 STYLES, J. Product Innovation in Early Modern To this end, we have proposed applying the proven tech- London. Past & Present 168, 1 (2000), 124–169.
niques used in medical clinical trials to security. Given
The Woodlands Institute for Health & Wellness Newsletter Greetings! IN THIS ISSUE DISEASES • Reference Ranges - Why Your NORMAL Lab Results May be What is thyroid: • Melissa Langton Crowned Mrs. United States 2006 • Testimonial of the Month by Melissa Langton • Recipe of the Month - Ceci Dip (Yeast-free) Reference Ranges - Why Your NORMAL Lab Results May be