![]() The sents() function divides the text up into its sentences, where each sentence is Tells us how many letters occur in the text, including the spaces between words. So, for example, len(gutenberg.raw( 'blake-poems.txt')) The raw() function gives us the contents of the file The previous example also showed how we can access the "raw" text of the book , (In fact, the average word length is reallyģ not 4, since the num_chars variable counts space characters.)īy contrast average sentence length and lexical diversityĪppear to be characteristics of particular authors. ![]() Observe that average word length appears to be a general property of English, since Item appears in the text on average (our lexical diversity score). This program displays three statistics for each text:Īverage word length, average sentence length, and the number of times each vocabulary 5 25 26 austen-emma.txt 5 26 17 austen-persuasion.txt 5 28 22 austen-sense.txt 4 34 79 bible-kjv.txt 5 19 5 blake-poems.txt 4 19 14 bryant-stories.txt 4 18 12 burgess-busterbrown.txt 4 20 13 carroll-alice.txt 5 20 12 chesterton-ball.txt 5 23 11 chesterton-brown.txt 5 18 11 chesterton-thursday.txt 4 21 25 edgeworth-parents.txt 5 26 15 melville-moby_dick.txt 5 52 11 milton-paradise.txt 4 12 9 shakespeare-caesar.txt 4 12 8 shakespeare-hamlet.txt 4 12 7 shakespeare-macbeth.txt 5 36 12 whitman-leaves.txt ![]() print(round(num_chars/num_words), round(num_words/num_sents), round(num_words/num_vocab), fileid) num_vocab = len(set(w.lower() for w in gutenberg.words(fileid))) ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |