ANTI-VIRUS SCANNER ANALYSIS BY USING THE "IN THE WILD" TEST SET Marko Helenius Virus Research Unit, University of Tampere, Department of Computer Science P.O.BOX 607, 33100 TAMPERE, FINLAND Tel: +358 31 215 7139, Fax: +358 31 215 070, E-mail: cshema@uta.fi This paper briefly introduces our methods of testing and the results of a test prepared for MikroPC magazine in August 1994. The test was performed with DOS-, Windows-, Netware- and memory resident versions of the scanners by using the "in the wild" test set. I have also tried to think what a reader of the results should be aware of. ACKNOWLEDGEMENTS Permission is granted to distribute copies of this information, provided that the contents of the files and information is not changed in any way and the source of the information is clearly mentioned. To republish the information permission must be obtained from both Virus Research Unit and MikroPC magazine. INTRODUCTION It is too often a myth how an anti-virus tester is working, what viruses he/she is using in the tests, what viruses he/she is in possession of and what kind of tools he/she is using. This is not however how this should be. I believe it should be known to the public, how anti-virus testers are working and what viruses they are using in the tests. Also I believe, that a tester should admit the lacks of his/her tests to avoid misleading information. Also I believe, that co-operation between anti-virus researchers is essential to give more resources for fighting against viruses. All anti-virus researchers have special knowledge and tools, which they are using. Exchanging these would be helpful to all anti-virus researchers. Also it would be wise to have advanced virus exchange between trustworthy anti-virus researchers. I believe that if all anti-virus testers could consider these methods, testing could take a great step towards the professional anti-virus testing described by Vesselin Bontchev [Bontchev 1994]. To give more exact picture of our work I have briefly presented our methods of testing, problems with collecting the "in the wild" test set and some things a tester should be aware of. In this paper is presented classification of viruses in view of an anti-virus tester, maintenance of our virus collection, problems with collecting the "in the wild" test set, how we are carrying out the memory resident scanner and boot sector virus tests, how we are testing are viruses capable to spread, the results of the test performed with the "in the wild" test set and what a reader should be aware of when reading the results. CLASSIFICATION OF VIRUSES Virus is often called as a self-replicating program, which attaches a self- replicating copy of itself into other programs. For an anti-virus tester it should be obvious, that he/she separates non-viruses from the test set or if he/she decides to include Trojan Horses or joke programs in the test set, he/she should do a separate test for these and clearly mentions the objects used in each part of testing. But what about droppers e.g. programs, which are releasing viruses ? Droppers should be excluded from the test set although one could say, that these are viruses, because these are capable to spread and attach a self-replicating copy of themselves into other programs. Droppers should however be distinguished from viruses, because searching these by a scanner is different from searching actual viruses launched by the droppers. What about first generation viruses e.g. original sample files created by virus authors ? If the contents of the virus changes so that the later generation replicates do not match the first generation virus, the first generation virus should be treated as a kind of dropper. Otherwise tests may have false results, because some scanners may detect the first generation viruses, but undetect the later generation replicates. To avoid this kind of problems, it is wise to avoid using first generation viruses in the tests. MAINTENANCE OF A VIRUS COLLECTION AND A TEST TREE In the Virus Research Unit viruses are preserved in two large directory trees corresponding with each other. Samples of each virus is stored in the leaf directories of the trees. The other tree includes original sample files, even first generation viruses may be included, and the other files, which are infected with the original viruses. This latter directory tree, called test tree, is used for testing anti-virus products. This division is done to avoid first generation viruses to be included in the test sets and on the other hand to restore the original sample files. Whenever we receive a new set of viruses we do the following things. First of all we will separate Trojan Horses, joke programs, droppers and other non-viruses from the new set of viruses. Only obvious cases are separated at this stage. After this duplicates and already existing viruses are separated by a specific tool called VIRSAMPL. This specific tool checks weather there are among new set of viruses already existing viruses. Mostly there are and new samples are moved into the same directory as existing viruses and duplicates are deleted. Next thing to do is to check weather the rest files are capable to spread. This spread testing is done automatically by a specific boot disk called "Spread test" (see section below). Original samples of viruses capable to spread are moved into the collection and infected files are restored into the test tree. New viruses are moved into new subdirectories by a specific tool called NEWBAT. NEWBAT creates a batch file, which automatically does the directory creation and copying process. Rules for this copying process must be defined in a separate file. COLLECTING THE "IN THE WILD" TEST SET For a virus to be included "in the wild" test set, it must have been found in the "field" at least once. This is not however as obvious as it sounds. How do we know, that a virus has been found in the field at least once ? Someone must have reported to some anti-virus researcher, that the virus has been found in the field. There is still one problem. How do we know that someone has reported the virus to some anti-virus researcher ? One solution is to use the Joe Well's list [Wells], which includes viruses, which have been reported as found in the field according to main anti-virus researchers. It does not however contain all the viruses found in the field, because all cases are not reported to Joe Wells. For example we have in Finland many viruses found in the field, which have been reported to anti-virus researchers and/or to Central Criminal Police, but are not in the Joe Well's list. I have also reports from other anti-virus researchers of viruses found from the field and which are not in the Joe Well's list. However those viruses mentioned in the Joe Well's list should at least be included in the test set. My solution was to include besides those viruses mentioned in the Joe Well's list viruses found in Finland and ask for comments from anti-virus researchers on viruses found in the field. (Thank you for those of you, who co-operated.) The problems were not behind after this solution, they were just about to begin. Non of the "in the wild" listings had exact information, which variants of viruses were found in the field. Sometimes the exact variant could be identified directly, but in most cases further examination was needed. Luckily most of the viruses found in the field in Finland were available to Finnish anti-virus researchers and I could be certain of the correct variant. After the basic work was done I had to compare several sources of information between each other to determine, which variant of the virus was "in the wild". I compared between each other listings of our virus collection, listings of CARO's naming standards, information of viruses provided with anti-virus products, listings of viruses I had already determined as "in the wild" and several listings of viruses "in the wild" (Joe Well's list, listings from Virus Bulletin and listings from other Anti-Virus researchers). In most cases this comparing process was producing results and I could almost certainly identify the correct variant, but this was a result of lot of work and time efforts used. Still I cannot be absolutely certain, that all variants were chosen correctly. BOOT SECTOR VIRUS TESTS It would be too hard to do the boot sector virus testing by changing a diskette between each different virus. This is why we are using a program, which copies diskette images from files to diskettes automatically. To determine, which image was originally from which type of diskette we have determined to name files as follows: *.IMG means 5.25" DD, *.IBD means 5.25" HD, *.IHD means 3.5" HD and *.IDD means 3.5" DD. After the image has been copied path and file name of the image file is added after test reports and then scanners are executed. All of this is done automatically by using a batch file, which does the image copying and scanning process for each tested image file separately. TESTING MEMORY RESIDENT SCANNERS We have decided to perform the memory resident scanner tests of file viruses by copying infected files while a tested scanner is memory resident and set to check copied files. First of all there must be a program or a batch file, which copies all sample files included in the test set into another location. Commands like "XCOPY C:\TESTSET D:\NOTFOUND /S" are not enough, because most scanners denies the access of the command and copying stops after first infected file has been found. It is not however difficult to create such batch file. Command "DIR /S /B>TEST.BAT" or commands "ATTRIB -A /S" and "ATTRIB /S>TEST.BAT" are enough to create the listing of files into the batch file. Now each line of the batch file must be edited to correspond form "XCOPY [FILE NAME] [TARGET FILE NAME]