Location Entity Recognition in Instagram Captions Using Support Vector Machine Algorithm

Cut Hilma Arifa, Rizal Tjut Adek, Yesy Afrillia

Abstract


Abstract

The rapid advancement of digital technology has significantly influenced productivity and facilitated access to information in daily life, particularly through the widespread use of social media. Instagram is one of the most popular platforms, where text in captions often contains location-related information that can be utilized for spatial analysis. This study aims to identify and classify location entities in Instagram captions using Support Vector Machine algorithm combine with rule-based Named Entity Recognition approach. The method involved linguistic feature extraction based on explicit spatial context, data labeling, model training, and performance evaluation using standard classification metrics: accuracy, precision, recall, and f1-score. Dataset consists of 400 captions primarily written in Indonesian, though some contain mixed-language elements such as foreign term or regional dialect. The dataset is divided into 70% training data ad 30% testing data. Experimental results show that model achieved an accuracy of 90,83%, precision of 97,01%, recall of 87,84%, and f1-score of 92,90%. Evaluation of three NER rules (exact match keyword, prepositional patterns, and descriptive structures) indicates that the combination of all rules yields the highest f1-score (89%), while the best-performing individual rule is the prepositioning pattern (74%). These results demonstrated strong performance in processing varied and unstructured Instagram captions. The combinations of SVM and NER rule-based prove effective in identifying and classifying spatial information into two classes Contains Location and No Location. This approach shows potential for implementation in text-based spatial analysis systems, such as location-based recommendation systems, geographic mapping, and location-based decision support systems.

 

Abstrak

Perkembangan teknologi digital yang pesat secara signifikan berpengaruh meningkatkan produktivitas dan kemudahan akses informasi dalam kehidupan sehari-hari, salah satunya penggunaan media sosial yang semakin meluas. Instagram merupakan salah satu platform yang banyak digunakan, dimana teks dalam caption memiliki informasi terkait lokasi yang dapat dimanfaatkan untuk analisis spasial. Penelitian ini bertujuan untuk mengidentifikasi dan mengklasifikasikan entitas lokasi dalam caption Instagram menggunakan algoritma Support Vector Machine (SVM) dengan pendekatan Named Entity Recognition (NER) rule-based. Metode yang digunakan meliputi ekstraksi fitur berbasis linguistik dengan konteks spasial eksplisit, lebelisasi data, pelatihan model, serta evaluasi kinerja model menggunakan matriks klasifikasi: akurasi, presisi, recall dan f1-score. Dataset terdiri dari 400 caption umumnya berbahasa Indonesia, namun terdapat unsur bahasa campuran seperti istilah asing atau bahasa daerah. Fokus utama penelitian diarahkan pada pengolahan dan pemahaman teks berbahasa Indonesia. Dataset dibagi menjadi 70% data training dan 30% data testing. Hasil pengujian menunjukkan bahwa model mendapatkan akurasi sebesar 90,83%, presisi 97,01%, recall 87,84% dan f1-score 92,90%. Evaluasi terhadap tiga rule NER (exact match keyword, pola preposisi, dan struktur deskriptif) menunjukkan bahwa pengenalan entitas berdasarkan gabungan seluruh rule memberikan f1-score tertinggi (89%), sementara rule individual terbaik adalah pola preposisi (74%). Nilai ini menunjukkan kinerja yang cukup baik dalam pengolahan caption Instagram yang variatif dan tidak terstruktur. Kombinasi metode SVM dan NER rule-based terbukti efektif dalam mengidentifikasi dan mengklasifikasi informasi spasial dalam dua kelas Contain Location dan No Location. Pendekatan ini berpotensi diterapkan pada sistem analisis spasial berbasis teks, seperti sistem rekomendasi lokasi, pemetaan geografis, dan pendukung keputusan berbasis lokasi.


Keywords


Instagram; Machine Learning; Named Entity Recognition; Support Vector Machine; Text Mining

Full Text:

PDF

References


Adek, R. T., Fikry, M., & Khalil, U. (2021). News Opinion Classification Application with Support Vector Machine algorithm using framework Codeigniter. Journal Of Informatics And Telecommunication Engineering, 5(1), 160–166. https://doi.org/10.31289/jite.v5i1.5189

Anggraeni, S. R., Ranggianto, N. A., Ghozali, I., Fatichah, C., & Purwitasari, D. (2022). Deep Learning approaches for multi-label incidents classification from Twitter textual information. Journal of Information Systems Engineering and Business Intelligence, 8(1), 31–41. https://doi.org/10.20473/jisebi.8.1.31-41

Ashok, D., & Lipton, Z. C. (2023). PromptNER: Prompting for Named Entity Recognition. http://arxiv.org/abs/2305.15444

Astrianda, N. (2020). Klasifikasi kematangan buah tomat dengan variasi model warna menggunakan Support Vector Machine. VOCATECH: Vocational Education and Technology Journal, 1(2), 45–52. https://doi.org/10.38038/vocatech.v1i2.27

Binetti, M. S., Massarelli, C., & Uricchio, V. F. (2024). Machine Learning in Geosciences: a review of complex environmental monitoring applications. In Machine Learning and Knowledge Extraction, 6(2), 1263–1280. https://doi.org/10.3390/make6020059

Budi, I., & Suryono, R. R. (2023). Application of Named Entity Recognition method for Indonesian datasets: a review. Bulletin of Electrical Engineering and Informatics, 12(2), 969–978). https://doi.org/10.11591/eei.v12i2.4529

Budiman, A. E., & Widjaja, A. (2020). Analisis pengaruh teks Preprocessing terhadap deteksi plagiarisme pada dokumen tugas akhir. Jurnal Teknik Informatika Dan Sistem Informasi, 6(3). https://doi.org/10.28932/jutisi.v6i3.2892

Ehrmann, M., Hamdi, A., Pontes, E. L., Romanello, M., & Doucet, A. (2024). Named Entity Recognition and Classification in historical documents: a survey. ACM Computing Surveys, 56(2). https://doi.org/10.1145/3604931

Ferilli, S. (2021). Automatic multilingual stopwords identification from very small corpora. Electronics (Switzerland), 10(17). https://doi.org/10.3390/electronics10172169

Firdaus, A., & Firdaus, W. I. (2021). Text Mining Dan Pola Algoritma Dalam Penyelesaian Masalah Informasi : (Sebuah Ulasan). Jurnal Jupiter 13(1).

Hasnain, M., Pasha, M. F., Ghani, I., Imran, M., Alzahrani, M. Y., & Budiarto, R. (2020). Evaluating Trust Prediction and Confusion Matrix measures for Web Services ranking. IEEE Access, 8, 90847–90861. https://doi.org/10.1109/ACCESS.2020.2994222

Ma’rifah, H., Prasetya Wibawa, A., & Akbar, M. I. (2020). Klasifikasi artikel ilmiah dengan berbagai skenario Preprocessing. SAKTI: Sains, Aplikasi, Komputasi Dan Teknologi Informasi, 2(2), 70–78.

Novian, D., A, H., & Sudirman, R. (2024). Analisis penggunaan teknologi AI ChatGPT terhadap kualitas tugas siswa kelas x di SMA Negeri 1 Gorontalo. VOCATECH: Vocational Education and Technology Journal, 6(1), 62-70. https://doi.org/10.38038/vocatech.v6i1.178

Nurdin, N. (2024). Klasifikasi penerima bantuan dari kepemilikan kartu pelaku utama sektor kelautan dan perikanan dengan metode Support Vector Machine. Jurnal Informatika Dan Teknik Elektro Terapan, 12(3). https://doi.org/10.23960/jitet.v12i3.4507

Payette, M., Abdul-Nour, G., Meango, T. J.-M., Diago, M., & Côté, A. (2025). Leveraging Failure Modes and Effect Analysis for Technical Language Processing. Machine Learning and Knowledge Extraction, 7(2), 42. https://doi.org/10.3390/make7020042

Putra, A. A., Kurniawan, R., & Statistika STIS, P. (2021). Bidirectional LSTM-CNNs untuk ekstraksi entity lokasi kebakaran pada berita online berbahasa Indonesia (Bidirectional LSTM-CNNs for entity extraction of fire location in Indonesian online news) studi kasus di provinsi DKI jakarta (case study in DKI Jakarta province). https://doi.org/https://doi.org/10.34123/semnasoffstat.v2020i1.601

Safrizal, S. (2019). Pengenalan Karakter Jawi Tulisan Tangan Menggunakan Fitur Sudut. VOCATECH: Vocational Education and Technology Journal, 1(1), 1-14. https://doi.org/10.38038/vocatech.v1i0.1

Santoso, J., Setiawan, E. I., Yuniarno, E. M., Hariadi, M., & Purnomo, M. H. (2020). Hybrid Conditional Random Fields and K-Means for Named Entity Recognition on Indonesian news documents. International Journal of Intelligent Engineering and Systems, 13(3), 233–245. https://doi.org/10.22266/IJIES2020.0630.22

Tekinerdogan, B. (2025). Machine Learning product line engineering: a systematic reuse framework. Machine Learning and Knowledge Extraction, 7(3), 58. https://doi.org/10.3390/make7030058

Valero-Carreras, D., Alcaraz, J., & Landete, M. (2023). Comparing two SVM models through different metrics based on the Confusion Matrix. Computers and Operations Research, 152. https://doi.org/10.1016/j.cor.2022.106131

Valkenborg, D., Rousseau, A. J., Geubbelmans, M., & Burzykowski, T. (2023). Support Vector Machines. In American Journal of Orthodontics and Dentofacial Orthopedics (Vol. 164, Issue 5, pp. 754–757). Elsevier Inc. https://doi.org/10.1016/j.ajodo.2023.08.003

Wang, S., Sun, X., Li, X., Ouyang, R., Wu, F., Zhang, T., Li, J., & Wang, G. (2023). GPT-NER: Named Entity Recognition via Large Language Models. http://arxiv.org/abs/2304.10428

Wang, Y., Tong, H., Zhu, Z., & Li, Y. (2022). Nested Named Entity Recognition: a survey. ACM Transactions on Knowledge Discovery from Data, 16(6). https://doi.org/10.1145/3522593




DOI: https://doi.org/10.38038/vocatech.v7i1.238

Refbacks

  • There are currently no refbacks.


VOCATECH : Vocational and Technology Journal
Unit Penelitian dan Pengabdian Masyarakat & Penjaminan Mutu
Akademi Komunitas Negeri Aceh Barat
Komplek STTU Alue Peunyareng, Ujong Tanoh Darat, Meureubo, Kabupaten Aceh Barat, Aceh 23615
Telp. (0655) 7110271
Email: vocatech@aknacehbarat.ac.id


VOCATECH: Vocational Education and Technology Journal Published by:
Unit Penelitian dan Pengabdian Masyarakat & Penjaminan Mutu
Akademi Komunitas Negeri Aceh Barat


Indexed by:

GS2 logoCrossref logoGaruda logosinta-5 logosinta-5 logo

Creative Commons License logo

VOCATECH: Vocational Education and Technology Journal Creative Commons Attribution-ShareAlike 4.0 International License.