᠎᠎᠎          
51K

Word frequency analyzer

World's simplest crypto tool
This browser-based utility counts the number of occurrences of each word in the given text and returns detailed statistical analysis of the word distributions. It finds the frequencies of each word in the input data and displays their totals and percentages. It can also split text in pairs of words, triplets of words, and larger word groups and analyze bigram, trigram, and n-gram word statistics. Created by cryptonerds from team Browserling.
Check out our main project! Browserling
We created Browserling – get a cloud browser in 5 seconds. Check it out!
Word Groups
Set the length of a word group.
Set to 1 to analyze word unigrams. Set to 2 to analyze word bigrams. Set to n to analyze word n-grams.
Generate n-grams from individual sentences.
Checkbox option works only when the group length is greater than or equal to 2.
Word Punctuation
Remove punctuation characters or replace them with a space.
Punctuation marks to remove.
Punctuation marks to replace with a space.
Analysis Options
Choose how to display the word frequency statistics.
Choose the word statistics sorting method.
Ignore the letter case of words.

What is a word frequency analyzer?

learn more about this tool
This online program analyzes the frequency of words in the given plaintext or ciphertext. It counts how many times each word appears in the textual data and prints the word counts to the screen. The word counts can be printed as a single number, a fraction of the total word count, or a percentage of the total word count. The output statistics can be sorted by the frequency of word occurrences or alphabetically by words. The information of how often certain words appear in the text can help you determine the language that the text is written in. In all written languages in the world, certain words are used most often than others. For example, in English, the most popular word is "the", in Dutch, it's the word "de", and in French, the word "le". The most popular words also roughly give you an idea of what the text is about. For example, if there are many sports-related words, such as "touchdown", "player", and "punt", then it's most likely text about football. Our algorithm can also calculate the frequency of word combinations. Combinations of 2 words are called "bigrams", combinations of 3 words are called "trigrams", and combinations of more words are called "multigrams". To find the distribution of all word pairs in the text, use the options and enter "2" in the word group length field, to find the distribution of word triples, set the word group length to "3", and so on. When counting groups of words, they are created in the linear order they appear in the text. By default, the algorithm respects sentence boundaries and doesn't combine words from adjacent sentences in the same group. If you disable the "Stop at the End of a Sentence" option, then the last word of the current sentence will be merged with the first word of the next sentence. To extract all words from the input text, the tool splits it on whitespace characters. If there are any punctuation marks in the words that you don't want to include in the analysis, then you can use the "Exclude Punctuation" option. For example, if you have an apostrophe in the contraction "it's", then you can enter this character in the "punctuation marks to remove" field and the word "it's" will become "its". Similarly, the other field "punctuation marks to replace with space" lets you split words with punctuation marks into multiple words. For example, if you have the word "thirty-two", and you enter a dash in this field, then this word will become two words – "thirty" and "two". With these two options, you can also remove or replace with a space any other punctuation characters. Before the analysis, the entire text is converted to lowercase. This way, words with different letter cases (such as words at the beginning of sentences and middle of sentences) are counted as the same word. If the letter case is important to you, then you can disable the "Case-insensitive Analysis" option. Cryptabulous!

Word frequency analyzer examples

Click to use
Word Statistics
This example counts words in English text about whales and returns detailed word distribution statistics in the output. To make sure the case of letters didn't change the statistics, we enable the "Case-insensitive Analysis" option that internally converts the text to lowercase. Also, to make sure punctuation marks such as full stops, commas, and parentheses weren't part of the analysis process, we activate the "Exclude Punctuation" and enter characters "(),.:" in the punctuation removal field. These punctuation symbols are removed from the text and aren't counted as part of words. In the output, the words are sorted alphabetically and their frequencies are printed next to them.
Whale is the common name for various marine mammals of the order Cetacea. Whales breathe air and are not fish. They are mammals that spend their entire lives in the water. Whales are of two types: toothed (Odontoceti) and baleen (Mysticeti). The largest whales are blue whales. In fact, the blue whale is the largest known animal. These huge animals eat about four tons of tiny krill every day, obtained by filter feeding through baleen.
about: 1 air: 1 and: 2 animal: 1 animals: 1 are: 4 baleen: 2 blue: 2 breathe: 1 by: 1 cetacea: 1 common: 1 day: 1 eat: 1 entire: 1 every: 1 fact: 1 feeding: 1 filter: 1 fish: 1 for: 1 four: 1 huge: 1 in: 2 is: 2 known: 1 krill: 1 largest: 2 lives: 1 mammals: 2 marine: 1 mysticeti: 1 name: 1 not: 1 obtained: 1 odontoceti: 1 of: 3 order: 1 spend: 1 that: 1 the: 6 their: 1 these: 1 they: 1 through: 1 tiny: 1 tons: 1 toothed: 1 two: 1 types: 1 various: 1 water: 1 whale: 2 whales: 4
Required options
These options will be used automatically if you select this example.
Set the length of a word group.
Generate n-grams from individual sentences.
Remove punctuation characters or replace them with a space.
Punctuation marks to remove.
Punctuation marks to replace with a space.
Choose how to display the word frequency statistics.
Choose the word statistics sorting method.
Ignore the letter case of words.
Word Pair Analysis
In this example, we enter the value "2" in the option that controls how many adjacent words form a unit of analysis. With the value "2", pairs of words are grouped together and the program runs frequency analysis of word bigrams (also known as digrams). The word pairs are formed only within boundaries of each sentence because we're using the option "Stop at End of Sentence". If this option was turned off, then the entire text would be treated as a single sentence. We remove all unnecessary commas and dots, and replace the hyphen symbol with the space symbol so that compound words, such as "rain-drop" were turned into two words "rain" and "drop". The statistics are sorted by frequency of bigram occurrence and the bigram counts are displayed together with the percentage values.
The Rain-Drop The rain-drop, the rain drop, Its soft and tiny feet Keep up a pleasant pattering Along the dusty street. The rain drop, the rain drop, It falls on the stream, And floats in gladsomeness along Beneath the sunny beam. By Richard Coe
the rain: 5 (11.63%) rain drop: 5 (11.63%) drop the: 3 (6.98%) drop its: 1 (2.33%) its soft: 1 (2.33%) soft and: 1 (2.33%) and tiny: 1 (2.33%) tiny feet: 1 (2.33%) feet keep: 1 (2.33%) keep up: 1 (2.33%) up a: 1 (2.33%) a pleasant: 1 (2.33%) pleasant pattering: 1 (2.33%) pattering along: 1 (2.33%) along the: 1 (2.33%) the dusty: 1 (2.33%) dusty street: 1 (2.33%) drop it: 1 (2.33%) it falls: 1 (2.33%) falls on: 1 (2.33%) on the: 1 (2.33%) the stream: 1 (2.33%) stream and: 1 (2.33%) and floats: 1 (2.33%) floats in: 1 (2.33%) in gladsomeness: 1 (2.33%) gladsomeness along: 1 (2.33%) along beneath: 1 (2.33%) beneath the: 1 (2.33%) the sunny: 1 (2.33%) sunny beam: 1 (2.33%) by richard: 1 (2.33%) richard coe: 1 (2.33%)
Required options
These options will be used automatically if you select this example.
Set the length of a word group.
Generate n-grams from individual sentences.
Remove punctuation characters or replace them with a space.
Punctuation marks to remove.
Punctuation marks to replace with a space.
Choose how to display the word frequency statistics.
Choose the word statistics sorting method.
Ignore the letter case of words.
Word Frequency in French Text
In this example, the input text is written in French and it talks about the Alps. We calculate the word frequencies in this text and print the word counts and total counts in the results field. There are a total of 102 words in the text and they are sorted by the frequency of occurrence. The most common words are the articles "la" (mentioned 8 times) and "les" (mentioned 7 times), followed by the word "alpes" (mentioned 5 times).
Dans les Alpes La région des Alpes est située à l'Est de la France. Les Alpes sont la chaîne de montagnes la plus haute d'Europe. La montagne la plus haute des Alpes s'appelle le Mont Blanc, on peut y monter grâce à un téléphérique. Les français aiment beaucoup cette région car ils peuvent y passer leurs vacances en été et en hiver. En effet, en hiver il y a beaucoup de neige dans les Alpes et les touristes peuvent pratiquer le ski, la luge ou le snowboard. En été, les touristes aiment se balader dans les montagnes, ils font de la randonnée.
la: 8 (8/102) les: 7 (7/102) alpes: 5 (5/102) en: 5 (5/102) de: 4 (4/102) dans: 3 (3/102) le: 3 (3/102) y: 3 (3/102) région: 2 (2/102) des: 2 (2/102) à: 2 (2/102) montagnes: 2 (2/102) plus: 2 (2/102) haute: 2 (2/102) aiment: 2 (2/102) beaucoup: 2 (2/102) ils: 2 (2/102) peuvent: 2 (2/102) été: 2 (2/102) et: 2 (2/102) hiver: 2 (2/102) touristes: 2 (2/102) est: 1 (1/102) située: 1 (1/102) l'est: 1 (1/102) france: 1 (1/102) sont: 1 (1/102) chaîne: 1 (1/102) d'europe: 1 (1/102) montagne: 1 (1/102) s'appelle: 1 (1/102) mont: 1 (1/102) blanc: 1 (1/102) on: 1 (1/102) peut: 1 (1/102) monter: 1 (1/102) grâce: 1 (1/102) un: 1 (1/102) téléphérique: 1 (1/102) français: 1 (1/102) cette: 1 (1/102) car: 1 (1/102) passer: 1 (1/102) leurs: 1 (1/102) vacances: 1 (1/102) effet: 1 (1/102) il: 1 (1/102) a: 1 (1/102) neige: 1 (1/102) pratiquer: 1 (1/102) ski: 1 (1/102) luge: 1 (1/102) ou: 1 (1/102) snowboard: 1 (1/102) se: 1 (1/102) balader: 1 (1/102) font: 1 (1/102) randonnée: 1 (1/102)
Required options
These options will be used automatically if you select this example.
Set the length of a word group.
Generate n-grams from individual sentences.
Remove punctuation characters or replace them with a space.
Punctuation marks to remove.
Punctuation marks to replace with a space.
Choose how to display the word frequency statistics.
Choose the word statistics sorting method.
Ignore the letter case of words.
Pro tips Master online crypto tools
You can pass input to this tool via ?input query argument and it will automatically compute output. Here's how to type it in your browser's address bar. Click to try!
https://onlinecryptotools.com/analyze-word-frequency?input=Whale%20is%20the%20common%20name%20for%20various%20marine%20mammals%20of%20the%20order%20Cetacea.%20Whales%20breathe%20air%20and%20are%20not%20fish.%20They%20are%20mammals%20that%20spend%20their%20entire%20lives%20in%20the%20water.%20Whales%20are%20of%20two%20types%3A%20toothed%20%28Odontoceti%29%20and%20baleen%20%28Mysticeti%29.%20The%20largest%20whales%20are%20blue%20whales.%20In%20fact%2C%20the%20blue%20whale%20is%20the%20largest%20known%20animal.%20These%20huge%20animals%20eat%20about%20four%20tons%20of%20tiny%20krill%20every%20day%2C%20obtained%20by%20filter%20feeding%20through%20baleen.&group-length=1&stop-at-boundary=true&remove-punctuation=true&punct-to-remove=%28%29%2C.%3A&punct-to-replace=&type=print-count&sort=sort-by-alphabet&ignore-case=true
All crypto tools
Didn't find the tool you were looking for? Let us know what tool we are missing and we'll build it!
Quickly encrypt data using the ROT13 substitution cipher.
Quickly decrypt data that was encrypted with ROT13.
Quickly encrypt data using the ROT47 substitution cipher.
Quickly undo ROT47 encryption and find the original message.
Quickly count letters in the given text and print their distribution.
Quickly count words in the given text and print their distribution.
Coming soon These crypto tools are on the way
ROT-2 Encrypt/Decrypt
Encrypt and decrypt numbers using ROT2 cypher.
ROT-5 Encrypt/Decrypt
Encrypt and decrypt numbers using ROT5 cypher.
ROT-n Encrypt/Decrypt
Encrypt and decrypt numbers using custom ROT cypher.
Base26 Encode/Decode
Encode and decode data to/from base26 encoding.
Base32 Encode/Decode
Encode and decode data to/from base32 encoding.
Base58 Encode/Decode
Encode and decode data to/from base58 encoding.
Base62 Encode/Decode
Encode and decode data to/from base62 encoding.
Base64 Encode/Decode
Encode and decode data to/from base64 encoding.
ASCII85 Encode/Decode
Encode and decode data to/from ASCII85 encoding.
XOR Encrypt/Decrypt
Encrypt and decrypt data using the XOR algorithm.
AES Encrypt/Decrypt
Encrypt and decrypt data using the AES algorithm.
RC2 Encrypt/Decrypt
Encrypt and decrypt data using the RC2 algorithm.
RC4 Encrypt/Decrypt
Encrypt and decrypt data using the RC4 algorithm.
RC5 Encrypt/Decrypt
Encrypt and decrypt data using the RC5 algorithm.
RC6 Encrypt/Decrypt
Encrypt and decrypt data using the RC6 algorithm.
Akelarre Encrypt/Decrypt
Encrypt and decrypt data using the Akelarre algorithm.
Ake98 Encrypt/Decrypt
Encrypt and decrypt data using the Ake98 algorithm.
DES Encrypt/Decrypt
Encrypt and decrypt data using the DES algorithm.
Triple DES Encrypt/Decrypt
Encrypt and decrypt data using the 3DES algorithm.
Rabbit Encrypt/Decrypt
Encrypt and decrypt data using the Rabbit cipher.
Blowfish Encrypt/Decrypt
Encrypt and decrypt data using the Blowfish cipher.
Twofish Encrypt/Decrypt
Encrypt and decrypt data using the Twofish cipher.
KASUMI Encrypt/Decrypt
Encrypt and decrypt data using the KASUMI cipher.
Serpent Encrypt/Decrypt
Encrypt and decrypt data using the Serpent cipher.
Solitaire Encrypt/Decrypt
Encrypt and decrypt data using the Solitaire cipher.
Break a Substitution Cipher
Decode text encrypted with a simple substitution cipher.
Generate Random Numbers
Print a list of random numbers.
Generate Primes
Print a list of prime numbers.
Generate Twin Primes
Print a list of twin prime numbers.
Generate Cousin Primes
Print a list of cousin prime numbers.
Generate Prime Triplets
Print a list of prime triplets.
Generate Random Primes
Print a list of random prime numbers.
Check Prime Numbers
Check if the given numbers are prime numbers.
Create Random Passwords
Generate one or more random passwords.
Replace Message Alphabet
Rewrite the given message with a new alphabet.
Calculate Text Entropy
Compute the Shannon entropy of the given message.
Generate High Entropy Data
Create text with high Shannon entropy.
Generate Low Entropy Data
Create text with low Shannon entropy.
Analyze Random Data
Perform statistical analysis of random data.