An organized method for criminal networks web mining from unstructured text documents

1.566 449



Digital data collected for analysis at the interrogation, often contain valuable information about social networks of the suspect. Most collected records, such as emails, chats and text documents are in form of non-organized text data. A user must manually extract useful information from the text, and gather key parts in an organized database for further investigations of the criminal networks using analysis tools. It is Obvious that this process of data mining is boring along with a possibility of errors. In addition, the quality of the analysis depends on the experience and skills of the human user. This paper presents an organized method for automatic discovery of criminal networks using a set of text documents obtained from suspect devices through web mining techniques. Thus useful information about the suspected criminal network will be extracted. Our proposed method discovers direct and indirect relationships between members of criminal groups.


Digital data, web mining, criminal networks, text documents

Full Text:



H. Chen, W. Chung, J. J. Xu, G. Wang, Y. Qin, and M. Chau, "Crime data mining: a general framework and some examples," Computer, vol. 37, pp. 50-56, 2007.

O. de Vel et al., “Mining E-Mail Content for Author Identification Forensics,” SIGMOD Record, vol. 30, no. 4, 2001, pp. 55-64.

G. Wang, H. Chen, and H. Atabakhsh, “Automatically Detecting Deceptive Criminal Identities,” Comm. ACM, Mar. 2007, pp. 70-76.

R.V. Hauck et al., “Using Coplink to Analyze Criminal-Justice Data,” Computer, Mar. 2002, pp. 30-37.

M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused Crawling Using Context Graphs. In Proc. of Intl. Conf. on Very Large Databases (VLDB’00), pp. 527–534, 2000.

Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999.)

Agrawal, R., Imielinski, T., Swami, A.N: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1993).