Urgench state University Physics and mathematics faculty Speciality: «5111018-Professional education: Informatics and Information technologies» Group and student name: 181-inf Babaev Saidmukhammadjon

Download 2,72 Mb.

bet	26/26
Sana	21.07.2022
Hajmi	2,72 Mb.
	#831692

1 ... 18 19 20 21 22 23 24 25 26

Bog'liq
BabayevS (2)

Figure 7. Comparison of rule-based classification methods on the average number of rules.
Figure 8.
Outlook Temperature Humidity Windy

Dataset	DTNB	DT	C4.5	PT	FR	RDR	CBA	SA
J&B
Breast.Can	122	22	10	20	13	13	63	20	47
Balance	31	35	35	27	44	22	77	45	79
Car.Evn	144	432	123	62	100	119	72	160	41
Vote	270	24	11	8	17	7	22	30	13
Tic-Tac-Toe	258	121	88	37	21	13	23	60	14
Nursery	1240	804	301	172	288	141	141	175	109
Hayes-root	5	8	22	14	11	10	34	45	34
Lymp	129	19	20	10	17	11	23	60	29
Spect.H	145	2	9	13	17	12	4	50	11
Adult	737	1571	279	571	150	175	126	130	97
Chess	507	101	31	29	29	30	12	120	24
Connect4	3826	4952	3973	3973	403	341	349	600	273
Average (%)	618	674	409	411	93	75	79	125	64

Statistically significant counts (wins/losses) of J&B against other rule-based classification models on classification rules is shown in Table 9.

Table 5. Statistically significant wins/losses counts of J&B method on rules.

	DTNB	DT	C4.5	PT	FR	RDR	CBA	SA
W	10	7	6	6	8	5	7	10
L	2	5	4	6	4	5	3	2
W-L	0	0	2	0	0	2	2	0

Table 9 proves that J&B produced a statistically smaller classifier than DTNB and SA methods on 10 datasets, and DT, FR and CBA methods on 7 (or more than 7) datasets out of 12. The most importantly, J&B generated statistically smaller classifiers than all other models on bigger datasets, which was our main goal in this research. Experimental evaluations on bigger datasets (over 10,000 samples) are shown in Figure 7.

Figure 7. Comparison of rule-based classification methods on the average number of rules.
Figure 7 proves the advantage of the J&B method: it produced the smallest classifier among all rule-based classification models on selected datasets.
Our experiment on relevance measures (average results) such as “Precision”, “Recall” and “F-measure”, is highlighted in Figure 8. The detailed result for each dataset can be found in Appendix A.

Figure 8. Comparison of the J&B classifier on Accuracy, Precision, Recall and F-measure.

Example 1. Let us assume that we have the following class association rules (shown in Table 3.7) (satisﬁed the user speciﬁed minimum support and conﬁdence thresholds) generated from a dataset. We apply here the minimum coverage thresh- old as 80%, that is, when our intended classiﬁer covers at least 80% of the training examples, then we stop.

And, learning dataset is to build the model

In the ﬁrst step, we sort the class association rules by conﬁdence and support descending order, the result is shown in Table 3.9.

In the next step, we form our classiﬁer by selecting the strong rules. We select strong rules which contribute to improve the overall coverage, we continue until achieving the intended training dataset coverage. Table 3.10 illustrates our ﬁnal classiﬁer.

Our classiﬁer includes 6 rules. In this example, intended coverage is equal to 80% and 6 classiﬁcation rules in the classiﬁer cover 80% of the learning set. Since our training dataset has some examples with missing values, our classiﬁer covered the whole training dataset (examples without missing values). Other rules also may cover unclassiﬁed examples, but we cannot exceed the user-deﬁned training dataset coverage threshold. This is our stopping criterion and we cannot include other rules into our classiﬁer. We also cannot include classiﬁcation rules which cover only classiﬁed examples (this means it does not contribute to the improvement of overall coverage). Now, we classify the following unseen example:
{a1=1,a2=5,a3=5,a4=4,a5=5} ?
So, this example is classiﬁed by third and fourth classiﬁcation rules. The class value of the rules which correctly classiﬁed the new example are 3 and 3. So, our classiﬁer predicts that the class value of the new example is 3 (because the majority class value is 3).

Example 2.

№	Outlook	Temperature	Humidity	Windy	Play
1	sunny	hot	high	FALSE	no
2	sunny	hot	high	TRUE	no
3	overcast	hot	high	FALSE	yes
4	rainy	mild	high	FALSE	yes
5	rainy	cool	normal	FALSE	yes
6	rainy	cool	normal	TRUE	no
7	overcast	cool	normal	TRUE	yes
8	sunny	mild	high	FALSE	no
9	sunny	cool	normal	FALSE	yes
10	rainy	mild	normal	FALSE	yes
11	sunny	mild	normal	TRUE	yes
12	overcast	mild	high	TRUE	yes
13	overcast	hot	normal	FALSE	yes
14	rainy	mild	high	TRUE	no

We use the a priori algorithm to find the association rules.
Min support: 10%
Confidance: 80%
Car: True
If we reduce confidamce we can get more rules, if we increase we get less rules. This is one of the most important parameters. Because if we reduce it, the rules will increase and unnecessary rules will be created. If we multiply, good rules can be lost. So we have to define it as a good analysis. We can get 21 rules if we make min support 0.1 and confidance 0.8. If we lower the support to 0.05, we get 72 rules. Below you can see a table of what we did in two ways.

confidance

1. outlook=overcast ==> play=yes
2. humidity=normal windy=FALSE ==> play=yes
3. outlook=sunny humidity=high ==> play=no
4. outlook=rainy windy=FALSE ==> play=yes
5. outlook=sunny humidity=normal ==> play=yes
6. outlook=sunny temperature=hot ==> play=no
7. outlook=overcast temperature=hot ==> play=yes
8. outlook=overcast humidity=high ==> play=yes
9. outlook=overcast humidity=normal ==> play=yes
10. outlook=overcast windy=TRUE ==> play=yes
11. outlook=overcast windy=FALSE ==> play=yes
12. outlook=rainy windy=TRUE ==> play=no
13. temperature=mild humidity=normal ==> play=yes
14. temperature=cool windy=FALSE ==> play=yes
15. outlook=sunny temperature=hot humidity=high ==> play=no
16. outlook=sunny humidity=high windy=FALSE ==> play=no
17. outlook=overcast temperature=hot windy=FALSE ==> play=yes
18. outlook=rainy temperature=mild windy=FALSE ==> play=yes
19. outlook=rainy humidity=normal windy=FALSE ==> play=yes
20. temperature=cool humidity=normal windy=FALSE ==> play=yes
21. outlook=sunny temperature=cool ==> play=yes
22. outlook=overcast temperature=mild ==> play=yes
23. outlook=overcast temperature=cool ==> play=yes
24. temperature=hot humidity=normal ==> play=yes
25. temperature=hot windy=TRUE ==> play=no
26. outlook=sunny temperature=mild humidity=normal ==> play=yes
27. outlook=sunny temperature=mild windy=TRUE ==> play=yes
28. outlook=sunny temperature=cool humidity=normal ==> play=yes
29. outlook=sunny temperature=cool windy=FALSE ==> play=yes
30. outlook=sunny humidity=normal windy=TRUE ==> play=yes
31. outlook=sunny humidity=normal windy=FALSE ==> play=yes
32. outlook=sunny temperature=hot windy=TRUE ==> play=no
33. outlook=sunny temperature=hot windy=FALSE ==> play=no
34. outlook=sunny temperature=mild humidity=high ==> play=no
35. outlook=sunny temperature=mild windy=FALSE ==> play=no
36. outlook=sunny humidity=high windy=TRUE ==> play=no
37. outlook=overcast temperature=hot humidity=high ==> play=yes
38. outlook=overcast temperature=hot humidity=normal ==> play=yes
39. outlook=overcast temperature=mild humidity=high ==> play=yes
40. outlook=overcast temperature=mild windy=TRUE ==> play=yes
41. outlook=overcast temperature=cool humidity=normal ==> play=yes
42. outlook=overcast temperature=cool windy=TRUE ==> play=yes
43. outlook=overcast humidity=high windy=TRUE ==> play=yes
44. outlook=overcast humidity=high windy=FALSE ==> play=yes
45. outlook=overcast humidity=normal windy=TRUE ==> play=yes
46. outlook=overcast humidity=normal windy=FALSE ==> play=yes
47. outlook=rainy temperature=mild humidity=normal ==> play=yes
48. outlook=rainy temperature=cool windy=FALSE ==> play=yes
49. outlook=rainy humidity=high windy=FALSE ==> play=yes
50. outlook=rainy temperature=mild windy=TRUE ==> play=no
51. outlook=rainy temperature=cool windy=TRUE ==> play=no
52. outlook=rainy humidity=high windy=TRUE ==> play=no
53. outlook=rainy humidity=normal windy=TRUE ==> play=no
54. temperature=hot humidity=normal windy=FALSE ==> play=yes
55. temperature=hot humidity=high windy=TRUE ==> play=no
56. temperature=mild humidity=normal windy=TRUE ==> play=yes
57. temperature=mild humidity=normal windy=FALSE ==> play=yes
58. outlook=sunny temperature=mild humidity=normal windy=TRUE ==> play=yes
59. outlook=sunny temperature=cool humidity=normal windy=FALSE ==> play=yes
60. outlook=sunny temperature=hot humidity=high windy=TRUE ==> play=no
61. outlook=sunny temperature=hot humidity=high windy=FALSE ==> play=no
62. outlook=sunny temperature=mild humidity=high windy=FALSE ==> play=no
63. outlook=overcast temperature=hot humidity=high windy=FALSE ==> play=yes
64. outlook=overcast temperature=hot humidity=normal windy=FALSE ==> play=yes
65. outlook=overcast temperature=mild humidity=high windy=TRUE ==> play=yes
66. outlook=overcast temperature=cool humidity=normal windy=TRUE ==> play=yes
67. outlook=rainy temperature=mild humidity=high windy=FALSE ==> play=yes
68. outlook=rainy temperature=mild humidity=normal windy=FALSE ==> play=yes
69. outlook=rainy temperature=cool humidity=normal windy=FALSE ==> play=yes
70. outlook=rainy temperature=mild humidity=high windy=TRUE ==> play=no
71. outlook=rainy temperature=cool humidity=normal windy=TRUE ==> play=no
72. humidity=normal ==> play=yes

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0.86

confidance

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0.86

Conclusion
Our experiments on accuracy and number of rules show that our method is compact, accurate and comparable with 8 other well-known classification methods. Although it did not achieve the best average classification accuracy, it produced significantly smaller rules on bigger datasets compared to other classification algorithms. Our proposed classifier achieved reasonably high average coverage with.
Statistical significance testing shows that our method was statistically better than or equal to other classification methods on some datasets, while it obtained worse results than those methods on some other datasets. The most important achievement in this research was that J&B got significantly better results in terms of an average number of classification rules than all other classification methods, while it had comparable results to those methods on accuracy.
This research was the first and main step for our future goal, where we plan to cluster class association rules by their similarity and thus further reduce their number and increase the accuracy and understandability of the classifier.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 94 Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487-499. Chile (1994).
Ali, K., Manganaris, S., Srikant, R. Partial Classification Using Association Rules. In Proceedings of KDD-97, pp. 115-118, U.S.A (1997).
Baralis, E., Cagliero, L., Garza, P.: A novel pattern-based Bayesian classifier. IEEE Transactions on Knowledge and Data Engineering 25(12), 2780–2795 (2013).
Bayardo, R. J. Brute-force mining of high-confidence classification rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 123-126, U.S.A (1997).
Breiman L.: Random Forests. Machine Learning 45(1), pp. 5-32 (2001).
Cendrowska J.: PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies 27(4), pp. 349-370 (1987).
Chen, G., Liu, H., Yu, L., Wei, Q., Zhang, X.: A new approach to classification based on association rule mining. Decision Support Systems 42(2), 674–689 (2006).
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning, 3(4), 261–283 (1989).
Cohen, W., W.: Fast Effective Rule Induction. In: ICML'95 Proceedings of the Twelfth International Conference on Machine Learning, pp. 115-123, Tahoe City, California (1995).
Dua, D., Graff, C.: UCI Machine Learning Repository, Irvine, CA: University of California (2019).
Frank, E., Witten, I.: Generating Accurate Rule Sets Without Global Optimization. In: Fifteenth International Conference on Machine Learning, pp. 144-151. USA (1998).
Holte, R.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11(1), pp. 63-91 (1993).
Kohavi, R.: The Power of Decision Tables. In: 8th European Conference on Machine Learning, pp. 174-189, Heraclion, Crete, Greece (1995).
Lent, B., Swami, A., Widom, J.: Clustering association rules. In: ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering, pp. 220-231. England (1997).
Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multiple class-association rules. in Proceedings of the 1st IEEE International Conference on Data Mining (ICDM ’01), pp. 369–376, San Jose, California, USA (2001).
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. in Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD ’98), pp. 80–86, New York, USA (1998).
Quinlan, J.: C4.5: Programs for Machine Learning, Machine Learning 16(3), 235-240 (1993).
Xiaoxin, Y., Jiawei, H. CPAR: Classification based on Predictive Association Rules. Proceedings of the SIAM International Conference on Data Mining, pp. 331-335, San Francisco, U.S.A (2003).
Zhang, M., Zhou Z.: A k-nearest neighbor based algorithm for multi-label classification. In: Proceedings of the 1st IEEE International Conference on Granular Computing (GrC’05), vol. 2, pp. 718–721, Beijing, China (2005).
Zhou, Z., Liu, X.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18(1), pp. 63–77 (2006).

Download 2,72 Mb.

Do'stlaringiz bilan baham: