Relevant Papers:
Diaconis,P. & Efron,B. (1983). Computer-Intensive Methods in Statistics. Scientific American, Volume 248.
[Web Link]
Cestnik,G., Konenenko,I, & Bratko,I. (1987). Assistant-86: A Knowledge-Elicitation Tool for Sophisticated Users. In I.Bratko & N.Lavrac (Eds.) Progress in Machine Learning, 31-45, Sigma Press.
[Web Link]
Papers That Cite This Data Set1:
Amaury Habrard and Marc Bernard and Marc Sebban. IOS Press Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data. Fundamenta Informaticae. 2004. [View Context].
Jinyan Li and Limsoon Wong. Using Rules to Analyse Bio-medical Data: A Comparison between C4.5 and PCL. WAIM. 2003. [View Context].
Michael L. Raymer and Travis E. Doom and Leslie A. Kuhn and William F. Punch. Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 33. 2003. [View Context].
Zhi-Hua Zhou and Yuan Jiang and Shifu Chen. Extracting symbolic rules from trained neural network ensembles. AI Commun, 16. 2003. [View Context].
Xiaoli Z. Fern and Carla Brodley. Boosting Lazy Decision Trees. ICML. 2003. [View Context].
Takashi Matsuda and Hiroshi Motoda and Tetsuya Yoshida and Takashi Washio. Mining Patterns from Structured Data by Beam-Wise Graph-Based Induction. Discovery Science. 2002. [View Context].
Wl/odzisl/aw Duch and Karol Grudzinski. Ensembles of Similarity-based Models. Intelligent Information Systems. 2001. [View Context].
Petri Kontkanen and Petri Myllym and Tomi Silander and Henry Tirri and Peter Gr. On predictive distributions and Bayesian networks. Department of Computer Science, Stanford University. 2000. [View Context].
Gary M. Weiss and Haym Hirsh. A Quantitative Study of Small Disjuncts: Experiments and Results. Department of Computer Science Rutgers University. 2000. [View Context].
David W. Opitz and Richard Maclin. Popular Ensemble Methods: An Empirical Study. J. Artif. Intell. Res. (JAIR, 11. 1999. [View Context].
Yk Huhtala and Juha Kärkkäinen and Pasi Porkka and Hannu Toivonen. Efficient Discovery of Functional and Approximate Dependencies Using Partitions. ICDE. 1998. [View Context].
. Prototype Selection for Composite Nearest Neighbor Classifiers. Department of Computer Science University of Massachusetts. 1997. [View Context].
Floriana Esposito and Donato Malerba and Giovanni Semeraro. A Comparative Analysis of Methods for Pruning Decision Trees. IEEE Trans. Pattern Anal. Mach. Intell, 19. 1997. [View Context].
Ron Kohavi. The Power of Decision Tables. ECML. 1995. [View Context].
Peter D. Turney. Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm. CoRR, csAI/9503102. 1995. [View Context].
Christophe Giraud and Tony Martinez and Christophe G. Giraud-Carrier. University of Bristol Department of Computer Science ILA: Combining Inductive Learning with Prior Knowledge and Reasoning. 1995. [View Context].
Gabor Melli. A Lazy Model-Based Approach to On-Line Classification. University of British Columbia. 1989. [View Context].
Federico Divina and Elena Marchiori. Handling Continuous Attributes in an Evolutionary Inductive Learner. Department of Computer Science Vrije Universiteit. [View Context].
Zhi-Hua Zhou and Xu-Ying Liu. Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem. [View Context].
Rafael S. Parpinelli and Heitor S. Lopes and Alex Alves Freitas. PART FOUR: ANT COLONY OPTIMIZATION AND IMMUNE SYSTEMS Chapter X An Ant Colony Algorithm for Classification Rule Discovery. CEFET-PR, Curitiba. [View Context].
Wl/odzisl/aw Duch and Rafal Adamczak and Geerd H. F Diercksen. Neural Networks from Similarity Based Perspective. Department of Computer Methods, Nicholas Copernicus University. [View Context].
Wl/odzisl/aw Duch and Karol Grudzinski and Geerd H. F Diercksen. Minimal distance neural methods. Department of Computer Methods, Nicholas Copernicus University. [View Context].
Wl odzisl and Rafal Adamczak and Krzysztof Grabczewski. Optimization of Logical Rules Derived by Neural Procedures. Department of Computer Methods, Nicholas Copernicus University. [View Context].
Wl/odzisl/aw Duch and Rafal Adamczak and Geerd H. F Diercksen. Classification, Association and Pattern Completion using Neural Similarity Based Methods. Department of Computer Methods, Nicholas Copernicus University. [View Context].
Elena Smirnova and Ida G. Sprinkhuizen-Kuyper and I. Nalbantis and b. ERIM and Universiteit Rotterdam. Unanimous Voting using Support Vector Machines. IKAT, Universiteit Maastricht. [View Context].
Rafael S. Parpinelli and Heitor S. Lopes and Alex Alves Freitas. An Ant Colony Based System for Data Mining: Applications to Medical Data. CEFET-PR, CPGEI Av. Sete de Setembro, 3165. [View Context].
Suresh K. Choubey and Jitender S. Deogun and Vijay V. Raghavan and Hayri Sever. A comparison of feature selection algorithms in the context of rough classifiers. [View Context].
Takao Mohri and Hidehiko Tanaka. An Optimal Weighting Criterion of Case Indexing for Both Numeric and Symbolic Attributes. Information Engineering Course, Faculty of Engineering The University of Tokyo. [View Context].
Wl/odzisl/aw Duch and Rafal/ Adamczak Email:duchraad@phys. uni. torun. pl. Statistical methods for construction of neural networks. Department of Computer Methods, Nicholas Copernicus University. [View Context].
Chris Drummond and Robert C. Holte. C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling. Institute for Information Technology, National Research Council Canada. [View Context].
Alexander K. Seewald. Dissertation Towards Understanding Stacking Studies of a General Ensemble Learning Scheme ausgefuhrt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Naturwissenschaften. [View Context].
Ida G. Sprinkhuizen-Kuyper and Elena Smirnova and I. Nalbantis. Reliability yields Information Gain. IKAT, Universiteit Maastricht. [View Context].
Christophe Giraud and Tony Martinez. ADYNAMIC INCREMENTAL NETWORK THAT LEARNS BY DISCRIMINATION. AA. [View Context].
Index of hepatitis
02 Dec 1996 147 Index
02 Dec 1996 dir costs
11 May 1990 3098 hepatitis.names
26 Feb 1990 7545 hepatitis.data
Index of costs
02 Dec 1996 230 Index
05 Dec 1995 2109 hepatitis.README
05 Dec 1995 301 hepatitis.cost
05 Dec 1995 405 hepatitis.delay
05 Dec 1995 415 hepatitis.expense
05 Dec 1995 75 hepatitis.group
Test Costs for the hepatitis Data
---------------------------------
Peter Turney
June 7, 1995
There are four files, in a C4.5-like format, that contain information
related to cost:
1. hepatitis.cost
2. hepatitis.delay
3. hepatitis.expense
4. hepatitis.group
For more information on the use and meaning of these files, see:
http://www.cs.washington.edu/research/jair/volume2/turney95a-html/title.html
The remainder of this file describes the format of the above four
files.
hepatitis.cost
--------------
Each row has the format ": ". The cost is in Canadian
dollars. The cost information is from the Ontario Health Insurance
Program's fee schedule. The costs in this file are for individual
tests, considered in isolation. When tests are performed in groups,
there may be discounts, due to shared common costs. Groups of tests
with common costs are identified in the file "hepatitis.group". Costs
with discounts are in the file "hepatitis.expense".
hepatitis.delay
---------------
Each row has the format ": ". Tests with
immediate results are marked "immediate". Tests with delayed results
are marked "delayed". Delayed tests are typically blood tests, which
are usually shipped to a laboratory. The results are sent back to the
doctor the next day.
hepatitis.expense
-----------------
Each row has the format ": , ". The
full cost is charged when the given test is the first test of its group
that has been ordered for a given patient. The discount cost is charged
when the given test is the second or later test of its group that has
been ordered. Typically the difference between the full cost and the
discount cost is $2.10, which is the common (shared) cost of collecting
blood from the patient.
hepatitis.group
---------------
The first row lists the groups. The remaining rows have the format
": ". The symbols used for groups are arbitrary. The
information in this file is meant to be used together with the
information in "hepatitis.expense". The tests in a group share a
common cost.
Hepatit.cost
age: 1.00
sex: 1.00
steroid: 1.00
antviral: 1.00
fatigue: 1.00
malaise: 1.00
anorexia: 1.00
liver_big: 1.00
liver_firm: 1.00
spleen_palpable: 1.00
spiders: 1.00
ascites: 1.00
varices: 1.00
bilirubin: 7.27
alk_phosphate: 7.27
sgot: 7.27
albumin: 7.27
protime: 8.30
histology: 1.00
Hepatit.delay
age: immediate.
sex: immediate.
steroid: immediate.
antviral: immediate.
fatigue: immediate.
malaise: immediate.
anorexia: immediate.
liver_big: immediate.
liver_firm: immediate.
spleen_palpable: immediate.
spiders: immediate.
ascites: immediate.
varices: immediate.
bilirubin: delayed.
alk_phosphate: delayed.
sgot: delayed.
albumin: delayed.
protime: delayed.
histology: immediate.
Hepatit.expence
age: 1.00, 1.00
sex: 1.00, 1.00
steroid: 1.00, 1.00
antviral: 1.00, 1.00
fatigue: 1.00, 1.00
malaise: 1.00, 1.00
anorexia: 1.00, 1.00
liver_big: 1.00, 1.00
liver_firm: 1.00, 1.00
spleen_palpable: 1.00, 1.00
spiders: 1.00, 1.00
ascites: 1.00, 1.00
varices: 1.00, 1.00
bilirubin: 7.27, 5.17
alk_phosphate: 7.27, 5.17
sgot: 7.27, 5.17
albumin: 7.27, 5.17
protime: 8.30, 6.20
histology: 1.00, 1.00
Hepatit.group
A.
bilirubin: A.
alk_phosphate: A.
sgot: A.
albumin: A.
protime: A.
Data
2,30,2,1,2,2,2,2,1,2,2,2,2,2,1.00,85,18,4.0,?,1
2,50,1,1,2,1,2,2,1,2,2,2,2,2,0.90,135,42,3.5,?,1
2,78,1,2,2,1,2,2,2,2,2,2,2,2,0.70,96,32,4.0,?,1
2,31,1,?,1,2,2,2,2,2,2,2,2,2,0.70,46,52,4.0,80,1
2,34,1,2,2,2,2,2,2,2,2,2,2,2,1.00,?,200,4.0,?,1
2,34,1,2,2,2,2,2,2,2,2,2,2,2,0.90,95,28,4.0,75,1
1,51,1,1,2,1,2,1,2,2,1,1,2,2,?,?,?,?,?,1
2,23,1,2,2,2,2,2,2,2,2,2,2,2,1.00,?,?,?,?,1
2,39,1,2,2,1,2,2,2,1,2,2,2,2,0.70,?,48,4.4,?,1
2,30,1,2,2,2,2,2,2,2,2,2,2,2,1.00,?,120,3.9,?,1
2,39,1,1,1,2,2,2,1,1,2,2,2,2,1.30,78,30,4.4,85,1
2,32,1,2,1,1,2,2,2,1,2,1,2,2,1.00,59,249,3.7,54,1
2,41,1,2,1,1,2,2,2,1,2,2,2,2,0.90,81,60,3.9,52,1
2,30,1,2,2,1,2,2,2,1,2,2,2,2,2.20,57,144,4.9,78,1
2,47,1,1,1,2,2,2,2,2,2,2,2,2,?,?,60,?,?,1
2,38,1,1,2,1,1,1,2,2,2,2,1,2,2.00,72,89,2.9,46,1
2,66,1,2,2,1,2,2,2,2,2,2,2,2,1.20,102,53,4.3,?,1
2,40,1,1,2,1,2,2,2,1,2,2,2,2,0.60,62,166,4.0,63,1
2,38,1,2,2,2,2,2,2,2,2,2,2,2,0.70,53,42,4.1,85,2
2,38,1,1,1,2,2,2,1,1,2,2,2,2,0.70,70,28,4.2,62,1
2,22,2,2,1,1,2,2,2,2,2,2,2,2,0.90,48,20,4.2,64,1
2,27,1,2,2,1,1,1,1,1,1,1,2,2,1.20,133,98,4.1,39,1
2,31,1,2,2,2,2,2,2,2,2,2,2,2,1.00,85,20,4.0,100,1
2,42,1,2,2,2,2,2,2,2,2,2,2,2,0.90,60,63,4.7,47,1
2,25,2,1,1,2,2,2,2,2,2,2,2,2,0.40,45,18,4.3,70,1
2,27,1,1,2,1,1,2,2,2,2,2,2,2,0.80,95,46,3.8,100,1
2,49,1,1,1,1,1,1,2,1,2,1,2,2,0.60,85,48,3.7,?,1
2,58,2,2,2,1,2,2,2,1,2,1,2,2,1.40,175,55,2.7,36,1
2,61,1,1,2,1,2,2,1,1,2,2,2,2,1.30,78,25,3.8,100,1
2,51,1,1,1,1,1,2,2,2,2,2,2,2,1.00,78,58,4.6,52,1
1,39,1,1,1,1,1,2,2,1,2,2,2,2,2.30,280,98,3.8,40,1
1,62,1,1,2,1,1,2,?,?,2,2,2,2,1.00,?,60,?,?,1
2,41,2,2,1,1,1,1,2,2,2,2,2,2,0.70,81,53,5.0,74,1
2,26,2,1,2,2,2,2,2,1,2,2,2,2,0.50,135,29,3.8,60,1
2,35,1,2,2,1,2,2,2,2,2,2,2,2,0.90,58,92,4.3,73,1
1,37,1,2,2,1,2,2,2,2,2,1,2,2,0.60,67,28,4.2,?,1
2,23,1,2,2,1,1,1,2,2,1,2,2,2,1.30,194,150,4.1,90,1
2,20,2,1,2,1,1,1,1,1,1,1,2,2,2.30,150,68,3.9,?,1
2,42,1,1,2,2,2,2,2,2,2,2,2,2,1.00,85,14,4.0,100,1
2,65,1,2,2,1,1,2,2,1,1,1,1,2,0.30,180,53,2.9,74,2
2,52,1,1,1,2,2,2,2,2,2,2,2,2,0.70,75,55,4.0,21,1
2,23,1,2,2,2,2,2,?,?,?,?,?,?,4.60,56,16,4.6,?,1
2,33,1,2,2,2,2,2,2,2,2,2,2,2,1.00,46,90,4.4,60,1
2,56,1,1,2,1,2,2,2,2,2,2,2,2,0.70,71,18,4.4,100,1
2,34,1,2,2,2,2,2,2,2,2,2,2,2,?,?,86,?,?,1
2,28,1,2,2,1,1,2,2,2,2,2,2,2,0.70,74,110,4.4,?,1
2,37,1,1,2,2,2,2,2,1,2,1,2,2,0.60,80,80,3.8,?,1
2,28,2,2,2,1,1,2,2,1,2,2,2,2,1.80,191,420,3.3,46,1
2,36,1,1,2,2,2,2,2,2,1,2,2,2,0.80,85,44,4.2,85,1
2,38,1,2,1,1,1,1,2,2,2,1,2,2,0.70,125,65,4.2,77,1
2,39,1,1,2,2,2,2,2,2,2,2,2,2,0.90,85,60,4.0,?,1
2,39,1,2,2,2,2,2,2,2,2,2,2,2,1.00,85,20,4.0,?,1
2,44,1,2,2,2,2,2,2,2,2,2,2,2,0.60,110,145,4.4,70,1
2,40,1,2,1,1,2,2,2,1,1,2,2,2,1.20,85,31,4.0,100,1
2,30,1,2,2,1,2,2,2,2,2,2,2,2,0.70,50,78,4.2,74,1
2,37,1,1,2,1,1,1,2,2,2,2,2,2,0.80,92,59,?,?,1
2,34,1,1,2,?,?,?,?,?,?,?,?,?,?,?,?,?,?,1
2,30,1,2,1,2,2,2,2,2,2,2,2,2,0.70,52,38,3.9,52,1
2,64,1,2,1,1,1,2,1,1,2,2,2,2,1.00,80,38,4.3,74,1
2,45,2,1,2,1,1,2,2,2,1,2,2,2,1.00,85,75,?,?,1
2,37,1,2,2,2,2,2,2,2,2,2,2,2,0.70,26,58,4.5,100,1
2,32,1,2,2,2,2,2,2,2,2,2,2,2,0.70,102,64,4.0,90,1
2,32,1,2,2,1,1,1,2,2,2,1,2,1,3.50,215,54,3.4,29,1
2,36,1,1,2,2,2,2,1,1,1,2,2,2,0.70,164,44,3.1,41,1
2,49,1,2,2,1,1,2,2,2,2,2,2,2,0.80,103,43,3.5,66,1
2,27,1,2,2,2,2,2,2,2,2,2,2,2,0.80,?,38,4.2,?,1
2,56,1,1,2,2,2,2,2,2,2,2,2,2,0.70,62,33,3.0,?,1
1,57,1,2,2,1,1,1,2,2,2,1,1,2,4.10,?,48,2.6,73,1
2,39,1,2,2,1,2,2,2,2,2,2,2,2,1.00,34,15,4.0,54,1
2,44,1,1,2,1,1,2,2,2,2,2,2,2,1.60,68,68,3.7,?,1
2,24,1,2,2,2,2,2,2,2,2,2,2,2,0.80,82,39,4.3,?,1
1,34,1,1,2,1,1,2,1,1,2,1,2,2,2.80,127,182,?,?,1
2,51,1,2,2,1,1,1,?,?,?,?,?,?,0.90,76,271,4.4,?,1
2,36,1,1,2,1,1,1,2,1,2,2,2,2,1.00,?,45,4.0,57,1
2,50,1,2,2,2,2,2,2,2,2,2,2,2,1.50,100,100,5.3,?,1
2,32,1,1,1,1,1,2,2,2,2,2,2,2,1.00,55,45,4.1,56,1
1,58,1,2,2,1,2,2,1,1,1,1,2,2,2.00,167,242,3.3,?,1
2,34,2,1,1,2,2,2,2,1,2,2,2,2,0.60,30,24,4.0,76,1
2,34,1,1,2,1,2,2,1,1,2,1,2,2,1.00,72,46,4.4,57,1
2,28,1,2,2,2,2,2,2,2,2,2,2,2,0.70,85,31,4.9,?,1
2,23,1,2,2,1,1,1,2,2,2,2,2,2,0.80,?,14,4.8,?,1
2,36,1,2,2,2,2,2,2,2,2,2,2,2,0.70,62,224,4.2,100,1
2,30,1,1,2,2,2,2,2,2,2,2,2,2,0.70,100,31,4.0,100,1
2,67,2,1,2,1,1,2,2,2,?,?,?,?,1.50,179,69,2.9,?,1
2,62,2,2,2,1,1,2,2,1,2,1,2,2,1.30,141,156,3.9,58,1
2,28,1,1,2,1,1,1,2,1,2,2,2,2,1.60,44,123,4.0,46,1
1,44,1,1,2,1,1,2,2,2,1,2,2,1,0.90,135,55,?,41,2
1,30,1,2,2,1,1,1,2,1,2,1,1,1,2.50,165,64,2.8,?,2
1,38,1,1,2,1,1,1,2,1,2,1,1,1,1.20,118,16,2.8,?,2
2,38,1,1,2,1,1,1,1,1,2,2,2,2,0.60,76,18,4.4,84,2
2,50,2,1,2,1,2,2,1,1,1,1,2,2,0.90,230,117,3.4,41,2
1,42,1,1,2,1,1,1,2,2,1,1,2,1,4.60,?,55,3.3,?,2
2,33,1,2,2,2,2,2,?,?,2,2,2,2,1.00,?,60,4.0,?,2
2,52,1,1,2,2,2,2,2,2,2,2,2,2,1.50,?,69,2.9,?,2
1,59,1,1,2,1,1,2,2,1,1,1,2,2,1.50,107,157,3.6,38,2
2,40,1,1,1,1,1,1,1,1,2,2,2,2,0.60,40,69,4.2,67,2
2,30,1,1,2,1,1,2,2,1,2,1,2,2,0.80,147,128,3.9,100,2
2,44,1,1,2,1,1,2,1,1,2,1,2,2,3.00,114,65,3.5,?,2
1,47,1,2,2,2,2,2,2,2,2,1,2,1,2.00,84,23,4.2,66,2
2,60,1,1,2,1,2,2,1,1,1,1,2,2,?,?,40,?,?,2
1,48,1,1,2,1,1,2,2,1,2,1,1,1,4.80,123,157,2.7,31,2
2,22,1,2,2,2,2,2,2,2,2,2,2,2,0.70,?,24,?,?,2
2,27,1,1,2,1,2,2,2,1,2,2,2,2,2.40,168,227,3.0,66,2
2,51,1,1,2,1,1,1,2,1,1,1,2,1,4.60,215,269,3.9,51,2
1,47,1,2,2,1,1,2,2,1,2,2,1,1,1.70,86,20,2.1,46,2
2,25,1,2,2,2,2,2,2,2,2,2,2,2,0.60,?,34,6.4,?,2
1,35,1,1,2,1,2,2,?,?,1,1,1,2,1.50,138,58,2.6,?,2
2,45,1,1,2,1,1,1,2,2,2,2,2,2,2.30,?,648,?,?,2
2,54,1,1,1,2,2,2,1,1,2,2,2,2,1.00,155,225,3.6,67,2
1,33,1,1,2,1,1,2,2,2,2,2,1,2,0.70,63,80,3.0,31,2
2,7,1,2,2,2,2,2,2,1,1,2,2,2,0.70,256,25,4.2,?,2
1,42,1,1,1,1,1,2,2,2,2,1,2,2,0.50,62,68,3.8,29,2
2,52,1,1,2,1,2,2,2,2,2,2,2,2,1.00,85,30,4.0,?,2
2,45,1,1,2,1,2,2,2,1,1,2,2,2,1.20,81,65,3.0,?,1
2,36,1,1,2,2,2,2,2,2,2,2,2,2,1.10,141,75,3.3,?,2
2,69,2,2,2,1,2,2,2,2,2,2,2,2,3.20,119,136,?,?,2
2,24,1,1,2,1,2,2,2,2,2,2,2,2,1.00,?,34,4.1,?,2
2,50,1,2,2,2,2,2,2,2,2,2,2,2,1.00,139,81,3.9,62,2
1,61,1,1,2,1,1,2,?,?,2,1,2,2,?,?,?,?,?,2
2,54,1,2,2,1,2,2,1,1,2,2,2,2,3.20,85,28,3.8,?,2
1,56,1,1,2,1,1,1,1,1,2,1,2,2,2.90,90,153,4.0,?,2
2,20,1,1,2,1,1,1,2,2,2,1,1,2,1.00,160,118,2.9,23,2
2,42,1,2,2,2,2,2,2,2,1,2,2,2,1.50,85,40,?,?,2
2,37,1,1,2,1,2,2,2,2,2,1,2,2,0.90,?,231,4.3,?,2
2,50,1,2,2,2,2,2,2,1,1,1,2,2,1.00,85,75,4.0,72,2
2,34,2,2,2,1,1,1,1,1,2,1,2,2,0.70,70,24,4.1,100,2
2,28,1,2,2,1,1,1,?,?,2,1,1,2,1.00,?,20,4.0,?,2
1,50,1,2,2,1,2,2,2,1,1,2,1,1,2.80,155,75,2.4,32,2
2,54,1,1,2,1,1,2,2,2,2,2,1,2,1.20,85,92,3.1,66,2
1,57,1,1,2,1,1,2,2,2,2,1,1,2,4.60,82,55,3.3,30,2
2,54,1,2,2,2,2,2,2,2,2,2,2,2,1.00,85,30,4.5,0,2
1,31,1,1,2,1,1,1,2,2,1,2,2,2,8.00,?,101,2.2,?,2
2,48,1,2,2,1,1,1,2,1,2,1,2,2,2.00,158,278,3.8,?,2
2,72,1,2,1,1,2,2,2,1,2,2,2,2,1.00,115,52,3.4,50,2
1,38,1,1,2,2,2,2,2,1,2,2,2,2,0.40,243,49,3.8,90,2
2,25,1,2,2,1,2,2,1,1,1,1,1,1,1.30,181,181,4.5,57,2
2,51,1,2,2,2,2,2,1,1,2,1,2,2,0.80,?,33,4.5,?,2
2,38,1,2,2,2,2,2,2,1,2,1,2,1,1.60,130,140,3.5,56,2
1,47,1,2,2,1,1,2,2,1,2,1,1,1,1.00,166,30,2.6,31,2
2,45,1,2,1,2,2,2,2,2,2,2,2,2,1.30,85,44,4.2,85,2
2,36,1,1,2,1,1,1,1,1,2,1,2,1,1.70,295,60,2.7,?,2
1,54,1,1,2,1,1,2,?,?,1,2,1,2,3.90,120,28,3.5,43,2
2,51,1,2,2,1,2,2,2,1,1,1,2,1,1.00,?,20,3.0,63,2
1,49,1,1,2,1,1,2,2,2,1,1,2,2,1.40,85,70,3.5,35,2
1,45,1,2,2,1,1,1,2,2,2,1,1,2,1.90,?,114,2.4,?,2
2,31,1,1,2,1,2,2,2,2,2,2,2,2,1.20,75,173,4.2,54,2
1,41,1,2,2,1,2,2,2,1,1,1,2,1,4.20,65,120,3.4,?,2
1,70,1,1,2,1,1,1,?,?,?,?,?,?,1.70,109,528,2.8,35,2
2,20,1,1,2,2,2,2,2,?,2,2,2,2,0.90,89,152,4.0,?,2
2,36,1,2,2,2,2,2,2,2,2,2,2,2,0.60,120,30,4.0,?,2
1,46,1,2,2,1,1,1,2,2,2,1,1,1,7.60,?,242,3.3,50,2
2,44,1,2,2,1,2,2,2,1,2,2,2,2,0.90,126,142,4.3,?,2
2,61,1,1,2,1,1,2,1,1,2,1,2,2,0.80,75,20,4.1,?,2
2,53,2,1,2,1,2,2,2,2,1,1,2,1,1.50,81,19,4.1,48,2
1,43,1,2,2,1,2,2,2,2,1,1,1,2,1.20,100,19,3.1,42,2
Hepatit.names
1. Title: Hepatitis Domain
2. Sources:
(a) unknown
(b) Donor: G.Gong (Carnegie-Mellon University) via
Bojan Cestnik
Jozef Stefan Institute
Jamova 39
61000 Ljubljana
Yugoslavia (tel.: (38)(+61) 214-399 ext.287) }
(c) Date: November, 1988
3. Past Usage:
1. Diaconis,P. & Efron,B. (1983). Computer-Intensive Methods in
Statistics. Scientific American, Volume 248.
-- Gail Gong reported a 80% classfication accuracy
2. Cestnik,G., Konenenko,I, & Bratko,I. (1987). Assistant-86: A
Knowledge-Elicitation Tool for Sophisticated Users. In I.Bratko
& N.Lavrac (Eds.) Progress in Machine Learning, 31-45, Sigma Press.
-- Assistant-86: 83% accuracy
4. Relevant Information:
Please ask Gail Gong for further information on this database.
5. Number of Instances: 155
6. Number of Attributes: 20 (including the class attribute)
7. Attribute information:
1. Class: DIE, LIVE
2. AGE: 10, 20, 30, 40, 50, 60, 70, 80
3. SEX: male, female
4. STEROID: no, yes
5. ANTIVIRALS: no, yes
6. FATIGUE: no, yes
7. MALAISE: no, yes
8. ANOREXIA: no, yes
9. LIVER BIG: no, yes
10. LIVER FIRM: no, yes
11. SPLEEN PALPABLE: no, yes
12. SPIDERS: no, yes
13. ASCITES: no, yes
14. VARICES: no, yes
15. BILIRUBIN: 0.39, 0.80, 1.20, 2.00, 3.00, 4.00
-- see the note below
16. ALK PHOSPHATE: 33, 80, 120, 160, 200, 250
17. SGOT: 13, 100, 200, 300, 400, 500,
18. ALBUMIN: 2.1, 3.0, 3.8, 4.5, 5.0, 6.0
19. PROTIME: 10, 20, 30, 40, 50, 60, 70, 80, 90
20. HISTOLOGY: no, yes
The BILIRUBIN attribute appears to be continuously-valued. I checked
this with the donater, Bojan Cestnik, who replied:
About the hepatitis database and BILIRUBIN problem I would like to say
the following: BILIRUBIN is continuous attribute (= the number of it's
"values" in the ASDOHEPA.DAT file is negative!!!); "values" are quoted
because when speaking about the continuous attribute there is no such
thing as all possible values. However, they represent so called
"boundary" values; according to these "boundary" values the attribute
can be discretized. At the same time, because of the continious
attribute, one can perform some other test since the continuous
information is preserved. I hope that these lines have at least roughly
answered your question.
8. Missing Attribute Values: (indicated by "?")
Attribute Number: Number of Missing Values:
1: 0
2: 0
3: 0
4: 1
5: 0
6: 1
7: 1
8: 1
9: 10
10: 11
11: 5
12: 5
13: 5
14: 5
15: 6
16: 29
17: 4
18: 16
19: 67
20: 0
9. Class Distribution:
DIE: 32
LIVE: 123
Do'stlaringiz bilan baham: |