Peringkasan Dokumen Teks Bilingual sebagai Reduksi Fitur untuk Klasifikasi Menggunakan Algoritma K-NN

Authors

  • Rahmawan Bagus Trianto Universitas An Nuur
  • Agus Susilo Nugroho Universitas An Nuur

DOI:

https://doi.org/10.28918/logiclink.v1i1.7801

Abstract

Summarizing text is a step to extract the essence of a text document with no more than half. Summarizing text has an important role in extracting the core information from a document in a more concise form. Summarizing text documents can be used as feature reduction in classifying text documents because it can reduce features that are considered irrelevant. Text documents are summarized using the Term Frequency-Inverse Document Frequency (TF-IDF) method, then classified using the K-Nearest Neighbor (K-NN) algorithm. One of the disadvantages of the K-NN algorithm is that it is not optimal in classification if the k value is not appropriate, as well as the selection of an inappropriate distance calculation method. By testing various k values ​​and using the Euclidean Distance distance measurement method, you can increase the accuracy of text document classification. Text document summarization using the proposed TF-IDF method is proven to increase when classification is carried out with K-NN. From the research results, it was found that the classification accuracy at the compression rate increased by 50% with a k value of 6 to 8 of 95.33%. This shows that text document summarization as feature reduction has a positive role in the classification process using the K-NN algorithm.

Keywords:

summarization, document, TF-IDF, K-NN

References

Abdel Fattah, M. (2015). New term weighting schemes with combination of multiple classifiers for sentiment analysis. Neurocomputing, 167, 434–442. https://doi.org/10.1016/j.neucom.2015.04.051

Ajmal, E. B. (2015). Summarization of Malayalam Document Using Relevance of Sentences. International Journal of Latest Research in Engineering and Technology, 1(6), 8–13.

Akromunnisa, K., & Hidayat, R. (2019). Klasifikasi Dokumen Tugas Akhir (Skripsi) Menggunakan K-Nearest Neighbor. JISKA (Jurnal Informatika Sunan Kalijaga), 4(1), 69. https://doi.org/10.14421/jiska.2019.41-07

Amalia, D. H., & Yustanti, W. (2021). Klasifikasi Buku Menggunakan Metode Support Vector Machine pada Digital Library. Journal of Informatics and Computer Science (JINACS), 3(01), 55–61. https://doi.org/10.26740/jinacs.v3n01.p55-61

Asril, H., Mustakim, M., & Kamila, I. (2019). Klasifikasi Dokumen Tugas Akhir Berbasis Text Mining menggunakan Metode Naïve Bayes Classifier dan K-Nearest Neighbor. Seminar Nasional Teknologi Informasi Dan Industri, November, 2579–5406.

Babar, S. A., & Patil, P. D. (2015). Improving Performance of Text Summarization. Procedia Computer Science, 46(Icict 2014), 354–363. https://doi.org/10.1016/j.procs.2015.02.031

Chandani, V., Wahono, R. S., & Purwanto, . (2015). Komparasi Algoritma Klasifikasi Machine Learning Dan Feature Selection pada Analisis Sentimen Review Film. Journal of Intelligent Systems, 1(1), 55–59. http://journal.ilmukomputer.org/index.php/jis/article/view/10

Dhande, L. L., & Patnaik, P. G. K. (2014). Analyzing Sentiment of Movie Review Data using Naive Bayes Neural Classifier. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS), 3(4), 313–320. www.ijettcs.org

Erfani, S. M., Rajasegarar, S., Karunasekera, S., & Leckie, C. (2015). High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, 58, 121–134. https://doi.org/10.1016/j.patcog.2016.03.028

Fachrurrozi, M., Yusliani, N., & Yoanita, R. U. (2013). Frequent Term based Text Summarization for Bahasa Indonesia. International Conference on Innovations in Engineering and Technology, 30–32. https://doi.org/10.15242/IIE.E1213550

Firdaus, F., Pasnur, P., & Wabdillah, W. (2019). Implementasi Cosine Similarity untuk Peningkatan Akurasi Pengukuran Kesamaan Dokumen pada Klasifikasi Dokumen Berita dengan K Nearest Neighbour. Inspiration: Jurnal Teknologi Informasi Dan Komunikasi, 9(1), 69. https://doi.org/10.35585/inspir.v9i1.2496

García Adeva, J. J., Pikatza Atxa, J. M., Ubeda Carrillo, M., & Ansuategi Zengotitabengoa, E. (2014). Automatic text classification to support systematic reviews in medicine. Expert Systems with Applications, 41(4), 1498–1508. https://doi.org/10.1016/j.eswa.2013.08.047

Ghiassi, M., Skinner, J., & Zimbra, D. (2013). Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network. Expert Systems with Applications, 40(16), 6266–6282. https://doi.org/10.1016/j.eswa.2013.05.057

Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2006). Using kNN model for automatic text categorization. Soft Computing, 10(5), 423–430. https://doi.org/10.1007/s00500-005-0503-y

Hassani, H., Beneki, C., Unger, S., Mazinani, M. T., & Yeganegi, M. R. (2020). Text mining in big data analytics. Big Data and Cognitive Computing, 4(1), 1–34. https://doi.org/10.3390/bdcc4010001

Hidayat, E. Y., & Rizqi, M. A. (2020). Klasifikasi Dokumen Berita Menggunakan Algoritma Enhanced Confix Stripping Stemmer dan Naïve Bayes Classifier. Jurnal Nasional Teknologi Dan Sistem Informasi, 6(2), 90–99. https://doi.org/10.25077/teknosi.v6i2.2020.90-99

Hidayatullah, A. F. (2015). The Influence of Stemming on Indonesian Tweet Sentiment Analysis. Proceeding of International Conference on Electrical Engineering, Computer Science and Informatics, August, 19–20.

Lidya, S. K., Sitompul, O. S., & Efendi, S. (2015). Sentiment Analysis Pada Teks Bahasa Indonesia Menggunakan Support Vector Machine ( Svm ). Seminar Nasional Teknologi Dan Komunikasi 2015, 2015(Sentika), 1–8.

Lloret, E., & Palomar, M. (2012). Text summarisation in progress: A literature review. Artificial Intelligence Review, 37(1), 1–41. https://doi.org/10.1007/s10462-011-9216-z

Luthfiarta, A., Zeniarja, J., & Salam, A. (2013). Algoritma Latent Semantic Analysis ( LSA ) Pada Peringkas Dokumen Otomatis Untuk Proses Clustering Dokumen. Seminar Nasional Teknologi Informasi & Komunikasi Terapan 2013 (SEMANTIK 2013), 2013(November), 13–18.

Martín-Valdivia, M. T., Martínez-Cámara, E., Perea-Ortega, J. M., & Ureña-López, L. A. (2013). Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Systems with Applications, 40(10), 3934–3942. https://doi.org/10.1016/j.eswa.2012.12.084

Mutrofin, S., Izzah, A., Kurniawardhani, A., & Masrur, M. (2014). Milton. Jurnal Gamma, 10(1), 130–134. https://doi.org/10.1017/9781316534946.021

Najibullah, A. (2015). Indonesian Text Summarization based on Naïve Bayes Method. International Seminar and Conference 2015 : The Golden Triangle (Indonesia-India-Tiongkok), 67–78.

Nanda, R., Haerani, E., Gusti, S. K., & Ramadhani, S. (2022). Klasifikasi Berita Menggunakan Metode Support Vector Machine. Jurnal Nasional Komputasi Dan Teknologi Informasi (JNKTI), 5(2), 269–278. https://doi.org/10.32672/jnkti.v5i2.4193

Nurhadi, A. (2015). Klasifikasi Konten Berita Digital Bahasa Indonesia Menggunakan Support Vector Machines ( SVM ) Berbasis Particle Swarm Optimization ( PSO ). Jurnal Bianglala Informatika, 3(2), 1–9.

Rahayu, W. I., & Shafina, M. R. (2022). Aplikasi Analisis Kelayakan Sistem Untuk Pengukuran UsabilityDengan Menerapkan Metode Use Questionnaire. Jurnal Teknik Informatika, 14(3), 152.

Wibowo, E. D. (2014). Text Feature Weighting for Summarization of Documents Bahasa Indonesia by Using Binary Logistic. International Journal of Computer Science and Telecommunications, 5(7).

Wijaya, A. P., & Santoso, H. A. (2016). Naïve Bayes Classification Pada Klasifikasi Dokumen Untuk Identifikasi Konten E-Government. Journal of Applied Intelligent Systems, 1(1), 48–55.

Yadav, J., Winata, F., Rainarli, E., Informatika, T., Indonesia, U. K., Wibowo, E. D., Vinodhini, G., Varghese, R., Jayasree, M., Trstenjak, B., Mikac, S., Donko, D., Tian, F., Wu, F., Chao, K.-M., Zheng, Q., Shah, N., Lan, T., Yue, J., … Abdel Fattah, M. (2015). Automatic text classification to support systematic reviews in medicine. Expert Systems with Applications, 3(1), 1498–1508. https://doi.org/10.1016/j.eswa.2013.08.047

Zhang, L., Jiang, L., Li, C., & Kong, G. (2015). Two feature weighting approaches for naive Bayes text classifiers. Knowledge-Based Systems, 100, 137–144. https://doi.org/10.1016/j.knosys.2016.02.017

Published

2024-06-23

Article Statistics

190 Views
136 Downloads

Issue

Section

Articles

How to Cite

Peringkasan Dokumen Teks Bilingual sebagai Reduksi Fitur untuk Klasifikasi Menggunakan Algoritma K-NN. (2024). LogicLink, 1(1), 37-49. https://doi.org/10.28918/logiclink.v1i1.7801