فهرست:
فصل اول .................................................................................................................................................. 1
1- مقدمه ................................................................................................................................................. 2
1-1 مقدمه .................................................................................................................................. 2
1-2 مفهوم الگوهای نوظهور ..................................................................................................... 3
1-3 مفهوم ویژگی های جریانی ................................................................................................. 5
1-4 چالش های موجود در استخراج الگوهای نوظهور ............................................................ 6
1-5 الگوریتم های استخراج الگوهای نوظهور ......................................................................... 8
1-6 ایده اصلی تحقیق ............................................................................................................... 11
1-7 نگاهی کلی به فصول رساله ............................................................................................... 13
فصل دوم .................................................................................................................................................. 14
2- پیشینه تحقیق ................................................................................................................................... 15
2-1 مقدمه .................................................................................................................................. 15
2-2 روش های مبتنی بر قانون .................................................................................................. 15
2-2-1 روش Classification Based on Association (CBA) ................................... 15
2-2-2 روش کلاسه بندی Classification based on Multiple-class Association Rule (CMAR) 16
2-2-3 روش کلاسه بندی Classification based on Prediction Association Rule (CPAR) 16
2-3 روش های استخراج الگوها ................................................................................................ 17
2-3-1 روش مبتنی بر مرز ..................................................................................................... 17
2-3-2 روش مبتنی بر محدودیت .......................................................................................... 17
2-3-3 الگوریتم استخراج درخت الگوی تقابل CP-tree ................................................... 18
2-3-4 روش استخراج با کمک دیاگرام دودویی صفر ZBDD Miner .............................. 18
2-3-5 روش استخراج الگوهای نوظهور متمایز DP-Miner .............................................. 18
2-4 روش های کلاسه بندی مبتنی بر الگوهای نوظهور ........................................................................ 20
2-4-1 روش کلاسه بندی مبتنی بر اساس مجموع الگوهای نوظهور CAEP ..................................... 20
2-4-2 الگوریتم کلاسه بندی بر پایه تئوری اطلاعات iCAEP ............................................................ 20
2-4-3 روش کلاسه بندی بر پایه الگوهای نوظهور جهشی JEPs-classifier .................................. 21
2-4-4 روش کلاسه بندی بر پایه الگوهای نوظهور جهشی قوی ......................................................... 21
2-4-5 روش تصمیم گیری مبتنی بر نمونه DeEPs ............................................................................ 21
2-4-6 روش کلاسه بندی توسط مجموعه راست نمایی PCL ............................................................. 22
فصل سوم ................................................................................................................................................. 23
3- دانش اولیه ......................................................................................................................................... 24
3-1 الگوهای نوظهور ................................................................................................................. 24
3-2 درخت الگوی مکرر دینامیک DFP-tree ........................................................................ 30
فصل چهارم .............................................................................................................................................. 33
4- راهکارهای ارائه شده برای استخراج الگوهای نوظهور قوی مبتنی بر ویژگی های جریانی ........... 34
4-1 مقدمه .................................................................................................................................. 34
4-2- درخت الگوی مکرر دینامیک نامرتب Unordered Dynamic FP-tree ................ 35
4-3 درخت الگوی مکرر دینامیک مرتب Ordered Dynamic FP-tree .......................... 44
4-4 روش استخراج الگوها SEP-Miner ................................................................................ 56
فصل پنجم ................................................................................................................................................ 62
5- آزمایشات تجربی ............................................................................................................................... 63
5-1 مقدمه .................................................................................................................................. 63
5-2 کلاسه بندها ........................................................................................................................ 63
5-2-1 کلاسه بند درخت تصمیم C4.5 ............................................................................. 63
5-2-2 کلاسه بند SVM ..................................................................................................... 64
5-2-3 کلاسه بند بیزین ساده ............................................................................................ 65
5-2-4 کلاسه بند نزدیکترین همسایه ............................................................................... 66
5-2-5 الگوریتم AdaBoost.............................................................................................. 66
5-3 تست های آماری ................................................................................................................ 68
5-3-1 تست آماری جفت شده t-tets ................................................................................... 68
5-3-2 تست آماری Wilcoxon ............................................................................................ 68
5-3-3 تست آماری فردمن .................................................................................................... 69
5-4 تنظیمات تجربی .................................................................................................................. 71
5-5 مقایسه دقت پیش بینی ..................................................................................................... 73
5-6 مقایسه تعداد الگوها .......................................................................................................... 81
5-7 مقایسه زمان اجرا ............................................................................................................... 83
5-8 تحلیل اثر ترتیب در ساخت درخت الگوی مکرر دینامیک .............................................. 86
5-9 چگونگی تعیین کردن حداقل آستانه فراوانی نسبی ........................................................ 88
5-10 تحلیل حساسیت روی حداقل آستانه های نرخ رشد ..................................................... 89
5-11 مقایسه کارایی DFP-SEPSF بدون دانستن کل فضای ویژگی ها ............................. 90
5-12 خلاصه نتایج تجربی ......................................................................................................... 94
فصل ششم ............................................................................................................................................... 96
6- نتیجه گیری و کارهای آینده ............................................................................................................. 97
اختصارات ................................................................................................................................................. 99
واژه نامه فارسی به انگلیسی ................................................................................................................... 100
واژه نامه انگلیسی به فارسی ................................................................................................................... 108
فهرست منابع
منبع:
Dong, Guozhu, and Jinyan Li. "Efficient mining of emerging patterns: Discovering trends and differences." In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 43-52. ACM, 1999.
Zhang, Xiuzhen, Guozu Dong, and Ramamohanarao Kotagiri. "Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets." In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 310-314. ACM, 2000.
Li, Jinyan, Guozhu Dong, and Kotagiri Ramamohanarao. "Making use of the most expressive jumping emerging patterns for classification." Knowledge and Information systems 3, no. 2 (2001): 131-145.
Lo, David, Hong Cheng, Jiawei Han, Siau-Cheng Khoo, and Chengnian Sun. "Classification of software behaviors for failure detection: a discriminative pattern mining approach." In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 557-566. ACM, 2009.
Li, Jinyan, Huiqing Liu, James R. Downing, Allen Eng-Juh Yeoh, and Limsoon Wong. "Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients." Bioinformatics 19, no. 1 (2003): 71-78.
Fang, Gang, Gaurav Pandey, Wen Wang, Manish Gupta, Michael Steinbach, and Vipin Kumar. "Mining low-support discriminative patterns from dense and high-dimensional data." Knowledge and Data Engineering, IEEE Transactions on 24, no. 2 (2012): 279-294.
Mao, Shihong, and Guozhu Dong. "Discovery of highly differentiative gene groups from microarray gene expression data using the gene club approach." Journal of Bioinformatics and Computational Biology 3, no. 06 (2005): 1263-1280.
Boulesteix, Anne-Laure, Gerhard Tutz, and Korbinian Strimmer. "A CART-based approach to discover emerging patterns in microarray data." Bioinformatics 19, no. 18 (2003): 2465-2472.
Li, Jinyan, and Limsoon Wong. "Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns." Bioinformatics 18, no. 5 (2002): 725-734.
Wu, Xindong, Kui Yu, Hao Wang, and Wei Ding. "Online streaming feature selection." In Proceedings of the 27th international conference on machine learning (ICML-10), pp. 1159-1166. 2010.
Zhou, Jing, Dean P. Foster, Robert A. Stine, and Lyle H. Ungar. "Streamwise feature selection." The Journal of Machine Learning Research 7 (2006): 1861-1885.
Dong, Guozhu, and Jinyan Li. "Mining border descriptions of emerging patterns from dataset pairs." Knowledge and Information Systems 8, no. 2 (2005): 178-202.
Li, Jinyan, Kotagiri Ramamohanarao, and Guozhu Dong. "The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms." In ICML, pp. 551-558. 2000.
Bayardo Jr, Roberto J. "Efficiently mining long patterns from databases." In ACM Sigmod Record, vol. 27, no. 2, pp. 85-93. ACM, 1998.
Han, Jiawei, Jian Pei, and Yiwen Yin. "Mining frequent patterns without candidate generation." In ACM SIGMOD Record, vol. 29, no. 2, pp. 1-12. ACM, 2000.
Han, Jiawei, Jian Pei, Yiwen Yin, and Runying Mao. "Mining frequent patterns without candidate generation: A frequent-pattern tree approach." Data mining and knowledge discovery 8, no. 1 (2004): 53-87.
Fan, Hongjian, and Ramamohanarao Kotagiri. "An efficient single-scan algorithm for mining essential jumping emerging patterns for classification." In Advances in Knowledge Discovery and Data Mining, pp. 456-462. Springer Berlin Heidelberg, 2002.
Loekito, Elsa, and James Bailey. "Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams." In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 307-316. ACM, 2006.
Yu, Kui, Wei Ding, Dan A. Simovici, and Xindong Wu. "Mining emerging patterns by streaming feature selection." In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 60-68. ACM, 2012.
Li, Jinyan, Guozhu Dong, and Kotagiri Ramamohanarao. "Instance-based classification by emerging patterns." In Principles of Data Mining and Knowledge Discovery, pp. 191-200. Springer Berlin Heidelberg, 2000.
Dong, Guozhu, Xiuzhen Zhang, Limsoon Wong, and Jinyan Li. "CAEP: Classification by aggregating emerging patterns." In Discovery Science, pp. 30-42. Springer Berlin Heidelberg, 1999.
Quinlan, John Ross. C4. 5: programs for machine learning. Vol. 1. Morgan kaufmann, 1993.
Cortes, Corinna, and Vladimir Vapnik. "Support vector machine." Machine learning 20, no. 3 (1995): 273-297.
Freund, Yoav, and Robert E. Schapire. "A desicion-theoretic generalization of on-line learning and an application to boosting." In Computational learning theory, pp. 23-37. Springer Berlin Heidelberg, 1995.
Fan, Hongjian, and Kotagiri Ramamohanarao. "Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers." Knowledge and Data Engineering, IEEE Transactions on 18, no. 6 (2006): 721-737.
Zhang, Xiuzhen, and Guozhu Dong. "Information-based classification by aggregating emerging patterns." In Intelligent Data Engineering and Automated Learning—IDEAL 2000. Data Mining, Financial Engineering, and Intelligent Agents, pp. 48-53. Springer Berlin Heidelberg, 2000.
Ma, Bing Liu Wynne Hsu Yiming. "Integrating classification and association rule mining." In Proceedings of the 4th. 1998.
Li, Wenmin, Jiawei Han, and Jian Pei. "CMAR: Accurate and efficient classification based on multiple class-association rules." In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pp. 369-376. IEEE, 2001.
Han, J. "CPAR: Classification based on predictive association rules." In Proceedings of the third SIAM international conference on data mining, vol. 3, pp. 331-335. 2003.
Li, Jinyan, Guimei Liu, and Limsoon Wong. "Mining statistically important equivalence classes and delta-discriminative emerging patterns." In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 430-439. ACM, 2007.
Perkins, Simon, and James Theiler. "Online feature selection using grafting." In ICML, pp. 592-599. 2003.
Perkins, Simon, Kevin Lacker, and James Theiler. "Grafting: Fast, incremental feature selection by gradient descent in function space." The Journal of Machine Learning Research 3 (2003): 1333-1356.
García-Borroto, Milton, José Fco Martínez-Trinidad, and Jesús Ariel Carrasco-Ochoa. "Fuzzy emerging patterns for classifying hard domains." Knowledge and information systems 28, no. 2 (2011): 473-489.
Pasquier, Nicolas, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. "Efficient mining of association rules using closed itemset lattices." Information systems 24, no. 1 (1999): 25-46.
Novak, Petra Kralj, Nada Lavrač, and Geoffrey I. Webb. "Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining." The Journal of Machine Learning Research 10 (2009): 377-403.
Fayyad, Usama, and Keki Irani. "Multi-interval discretization of continuous-valued attributes for classification learning." (1993).
Allwein, Erin L., Robert E. Schapire, and Yoram Singer. "Reducing multiclass to binary: A unifying approach for margin classifiers." The Journal of Machine Learning Research 1 (2001): 113-141.
Hastie, Trevor, and Robert Tibshirani. "Classification by pairwise coupling." The annals of statistics 26, no. 2 (1998): 451-471.
Rifkin, Ryan, and Aldebaro Klautau. "In defense of one-vs-all classification." The Journal of Machine Learning Research 5 (2004): 101-141.
Wu, Ting-Fan, Chih-Jen Lin, and Ruby C. Weng. "Probability estimates for multi-class classification by pairwise coupling." The Journal of Machine Learning Research 5 (2004): 975-1005.
Pasquier, Nicolas, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. "Discovering frequent closed itemsets for association rules." In Database Theory—ICDT’99, pp. 398-416. Springer Berlin Heidelberg, 1999.
Bastide, Yves, Rafik Taouil, Nicolas Pasquier, Gerd Stumme, and Lotfi Lakhal. "Mining frequent patterns with counting inference." ACM SIGKDD Explorations Newsletter 2, no. 2 (2000): 66-75.
Song, Hee Seok, and Soung Hie Kim. "Mining the change of customer behavior in an internet shopping mall." Expert Systems with Applications 21, no. 3 (2001): 157-168.
Ungar, Lyle H., Jing Zhou, Dean P. Foster, and R. A. Stine. "Streaming feature selection using iic." AI&STAT’05 (2005).
Zhou, Jing, Dean Foster, Robert Stine, and Lyle Ungar. "Streaming feature selection using alpha-investing." In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp. 384-393. ACM, 2005.
Wu, Xindong, Kui Yu, Wei Ding, Hao Wang, and Xingquan Zhu. "Online feature selection with streaming features." (2012): 1-1.
Foster, Dean P., and Robert A. Stine. "Variable selection in data mining: Building a predictive model for bankruptcy." Journal of the American Statistical Association 99, no. 466 (2004): 303-313.
Džeroski, Sašo. "Multi-relational data mining: an introduction." ACM SIGKDD Explorations Newsletter 5, no. 1 (2003): 1-16.
Džeroski, Sašo. Relational data mining. Springer US, 2010.
Hall, Mark, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. "The WEKA data mining software: an update." ACM SIGKDD Explorations Newsletter 11, no. 1 (2009): 10-18.
LUCS KDD Software Library. (2012) http://cgi.csc.liv.ac.uk/~frans/KDD/Software.
A. Frank and A. Asuncion, UCI machine learning repository: http:// archive.ics.uci.edu/ml.
J. Aczel and J. Daroczy, “On Measures of Information and Their Characterizations,” New York: Academic, 1975.
Mitchell, Tom M. "Machine learning and data mining." Communications of the ACM 42, no. 11 (1999): 30-36.
Bishop, Christopher M. "Pattern recognition and machine learning (information science and statistics)." (2007).
D. Opitz and R. Maclin, “Popular ensemble methods: An empirical study,” Journal of Artificial Intelligence Research, vol. 11, 1999, pp. 169-198.
Bauer, Eric, and Ron Kohavi. "An empirical comparison of voting classification algorithms: Bagging, boosting, and variants." Machine learning 36, no. 1-2 (1999): 105-139.
J. Demsar, “Statistical comparisons of classifiers over multiple data sets”, The Journal of Machine Learning Research, vol. 7, pp. 1-30, 2006.
R. L. Iman and J. M. Davenport, “Approximations of the critical region of the Friedman statistic”, Communications in statistics, vol. 9, no. 6, pp. 571-595, 1980.
O. J. Dunn, “Multiple comparisons among means”, Journal of the American Statistical Association, vol. 56, no. 293, pp. 52-64, 1961.
Agrawal, Rakesh, and Ramakrishnan Srikant. "Fast algorithms for mining association rules." Proc. 20th int. conf. very large data bases, VLDB. Vol. 1215. 1994.
Quinlan, J. Ross, and R. Mike Cameron-Jones. "FOIL: A midterm report." Machine Learning: ECML-93. Springer Berlin Heidelberg, 1993.