Text Mining the Works of Christopher Marlowe

Ruben Thoplan


In this paper, the application of statistical techniques in literature through Christopher Marlowe’s works is explored through the use of text data mining algorithms. A tag cloud is used as visualization technique to identify patterns in the words used by Marlowe in his plays and poem. A cluster analysis using the complete linkage method with squared Euclidean distance is adopted to identify any agglomeration of the different texts of Marlowe. The findings of this paper shows that Marlowe uses the words “king”, “death”, “love”, “heaven”, “crown”, “soul” relatively often in his plays or poem. Besides, three main agglomerations of the texts of Marlowe can be identified.


Aggarwal, C. C. & Zhai, C. X., 2012. A Survey of Text Clustering Algorithms. In: Mining Text Data. s.l.:Springer, pp. 77-128.

Bateman, S., Gutwin, C. & Nacenta, M., 2008. Seeing Things in the Clouds: The Effect of Visual Features on Tag Cloud Selections. New York, USA, Proceedings of the nineteenth ACM conference on Hypertext and Hypermedia.

Berry, M. W. & Castellanos, M., 2007. Survey of Text Mining: Clustering, Classification and Retrieval. 2nd ed. s.l.:Springer.

Feinerer, I., Hornik, K. & Meyer, D., 2008. Text Mining Infrastructure in R. Journal of Statistical Software, 25(5).

Grieve, J., 2007. Quantitative Authorship Attribution: An Evaluation of Techniques. Literary and Linguistic Computing, 22(3).

Hart, M., 1971. Project Gutenberg. [Online]

Available at: http://www.gutenberg.org/

[Accessed 4 August 2014].

Huang, A., 2008. Similarity Measures for Text Document Clustering. New Zealand, Computer Science Research Student Conference.

Kukushkina, O. V., Polikarpov, A. A. & Khmelev, D. V., 2001. Using Literal and Grammatical Statistics for Authorship Attribution. Problems of Information Transmission, 37(2), pp. 172-184.

Lance, G. N. & Williams, W. T., 1967. A general theory of classificatory sorting strategies. I. Hierarchical systems. Computer Journal, Volume 9, p. 373.

Marlowe, C., 1590. Tamburlaine the Great, Part I. [Online]

Available at: http://www.gutenberg.org/files/1094/1094-h/1094-h.htm

[Accessed 4 August 2014].

Marlowe, C., 1590. Tamburlaine the Great, Part II. [Online]

Available at: http://www.gutenberg.org/files/1589/1589-h/1589-h.htm

[Accessed 4 August 2014].

Marlowe, C., 1593. Massacre at Paris. [Online]

Available at: http://www.gutenberg.org/files/1496/1496-h/1496-h.htm

[Accessed 4 August 2014].

Marlowe, C., 1594. Edward the Second. [Online]

Available at: http://www.gutenberg.org/cache/epub/20288/pg20288.html

[Accessed 4 August 2014].

Marlowe, C., 1594. The Tragedy of Dido Queene of Carthage by Christopher Marlowe. [Online]

Available at: http://www.gutenberg.org/cache/epub/16169/pg16169.html

[Accessed 4 August 2014].

Marlowe, C., 1598. Hero and Leander. [Online]

Available at: http://www.gutenberg.org/files/18781/18781-h/18781-h.htm

[Accessed 4 August 2014].

Marlowe, C., 1616. The Tragical History of Doctor Faustus. [Online]

Available at: http://www.gutenberg.org/files/811/811-h/811-h.htm

[Accessed 4 August 2014].

Marlowe, C., 1633. The Jew of Malta. [Online]

Available at: http://www.gutenberg.org/files/901/901-h/901-h.htm

[Accessed 4 August 2014].

Metcalfe, G., 2002. The Marlowe Society. [Online]

Available at: http://www.marlowe-society.org/index.html

[Accessed 03 August 2014].

Porter, M. F., 1980. An algorithm for suffix stripping. Program: electronic library and information systems, 14(3), pp. 130-137.

R Development Core Team, 2008. R: A language and environment for statistical computing, Vienna, Austria: R Foundation for Statistical Computing.

Sallis, P. & Shanmuganathan, S., 2008. A blended text mining method for authorship authentication analysis. Kuala Lumpur, IEEE .

Tsatsoulis, C. I., 2013. Unsupervised text mining methods for literature analysis: a case study for Thomas Pynchon's V.. Orbit: Writing Around Pynchon, 1(2).

Willet, P., 2006. The Porter stemming algorithm: then and now. Program: electronic library and information systems, 40(3), pp. 219-223.

Willett, P., 1988. Recent trends in hierarchic document clustering: A critical review. Information Processing & Management, 24(5), pp. 577-597.

Williams, G., 2014. Data Science with R Text Mining. [Online]

Available at: http://handsondatascience.com/TextMiningO.pdf

[Accessed 4 August 2014].

پاراگلایدر Full Text: PDF


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

ISSN : 2251-1563