Instructions
Objective
Write a program to implement word-cloud-generator in python.
Requirements and Specifications
Word-Cloud Generator
We will write a python assignment program that generates the input to a word-cloud generator program. A word-cloud is a representation of a collection of words such that the most frequently used words in the collection appear in a larger font. This example shows where students are physically located this term :
We will not be doing the fun part – generating the words in pretty colours and making it look cool. Instead, we’ll be writing the code that reads a text file containing some text from a speech or a blog (or whatever….) and from this we will generate an output file consisting of the words in the text and their frequencies.
For instance:
- Kingston 20
- Toronto 45
- Ottawa 36
This file could then be read by a word-cloud program that would generate the picture above.
For this assignment, I am not giving you complete skeleton code, but instead an outline of the functions that are required, their parameters and what they should return. Your program should follow this outline and not deviate. Deviations from the outline (that is, changing the parameters or returns, adding functions, leaving out functions etc.) will result in a lower mark. Please follow the outline.
You will also need to write appropriate docstrings (comments at the start of each function). For the proper format you can refer to the skeleton code provided in Assignment 8. Each function should be defined with a complete docstring to describe the functionality, the parameters and the return values.
Here are the functions that you need to write:
readFile() – this function takes no parameters and it returns a list where each element is a word in the file. The function will open the file called “cisc101WordCloudFile.txt” (provided as an attachment to this assignment), read the contents into a string and convert the string to a list of words. Note that some words may have “\n” characters on the end or contain punctuation. The \n characters should be removed. Punctuation can stay. You might find the .split() method useful to split a string into a list of words. (Try this: "the quick brown fox".split() and see what you end up with). The file should be found in the same location as your program, so no absolute path should be indicated in the open() function. Be sure to check that the file is found and opened properly using exceptions. If not, inform your user and end the program. DO NOT CHANGE THE NAME OF THE FILE.
isValid(word) – this function takes one parameter – a string (a single word). It returns True if the word should be kept in the list, False otherwise. Words are considered valid if they are 4 or more characters in length, they do not start or end with a digit and they do not contain any punctuation marks. You will find the isdigit() string method useful for this function. You will find this site useful when it comes to trying to figure out whether or not a word contains punctuation marks.
cleanseWords(listOfWords) – this function will remove any words that we don’t want to keep in our word-cloud. It takes as a parameter a list of words and modifies the list to remove some words. Nothing is returned, but the list may be modified in the function. Here is how this function should be structured. Note that you want to traverse from the end of the list to the beginning so that we can safely remove words from the end of the list. (If you go the other way and remove things from the start of the list while looping through the values, you run into problems with indexing). How do you go from the end to the beginning? for i in range(len(words)-1,-1, -1) will do this for you.
For each word starting at the end of the list to the beginning:
Make the word all lower case
Check to see if the word is valid (by calling isValid()
If not a valid word, remove it from the list.
countFrequencies(listOfWords) – this function takes the list of words and creates a dictionary consisting of word: frequency elements. So, for instance, if our list of words was [“to”, “be”, “or”, “not”, “to”, “be”], we would return a dictionary consisting of the following: {“to”: 2, “be”: 2, “or”: 1, “not”: 1}. Approach this in the following way:
For each word in the list:
If it appear in the dictionary, increment dictionary[word] by 1.
Else:
Add the word to the dictionary with a count of 1.
writeFile(dictionaryOfFrequencies) – this function takes as a parameter the dictionary consisting of the word: frequency elements and writes the contents to a file called “outputForWordCloud101.txt”. There should be no path provided – the file should be written where the code is located. Each word: frequency pair should be on a separate line in the file. So for the example given above, the output file would look like this:
to 2
be 2
or 1
not 1
Be sure to check that all file I/O operations succeed by using an exception handler. If not, inform your user and end the program.
main() – main will call the functions in the following order (with appropriate parameters)
readFile()
cleanseWords()
countFrequencies()
writeFile()
Suggestions
Write each function and test it individually. You can, for instance, start with the function isValid(). It takes a string and returns True or False depending on the conditions.
So, to test this function, you don’t need to have read the file using your code – you can just make up inputs. For instance, I can put the following code in my program to test this function:
print(isValid(“7abcd”), “This should produce a False result since it starts with a number”)
print(isValid(“and”), “This should produce a False result since it is too short”)
etc …..
You could write the function writeFile() without any other functions. Simply pass it in a dictionary of fake data and check that the file is created properly.
You do not need to show this testing – it is just to give you an idea of how to build up your program. Build it one function at a time. Doing this will allow you to get credit for what you do get working even if you do not get the entire program to work together.
The file that I have given you is long! Test your code by creating a smaller test function where you know exactly what the input is . Make sure it works on this file first then run it on the longer file.
Challenges
There are many extensions that you could add to this assignment. This section is optional (and not for marks). Hand in only what I have asked you to do, but if you want some challenges, you could do the following:
Remove the punctuation from words (right now the words "world." and "them." show up as unique words. Remove all punctuation before processing.
Write a python assignment with the word frequencies sorted -- so the most frequently used word is at the top.
Remove any words that appear only once in the file.
Source Code
import string
def readFile():
"""
Returns a list where each element is a word in the file.
:return: List of words
"""
words = []
inFile = open("cisc101WordCloudFile.txt", "r")
for line in inFile:
for word in line.strip().split():
words.append(word)
inFile.close()
return words
def isValid(word):
"""
Returns true if the word should be kept in the list, false otherwise.
Words are considered valid if they are 4 or more characters in length,
they do not start or end with a digit and they do not contain any
punctuation marks.
:param word: Word to check
:return: Boolean
"""
if len(word) < 4 or word[0].isdigit() or word[-1].isdigit():
return False
for letter in word:
if letter in string.punctuation:
return False
return True
def cleanseWords(listOfWords):
"""
Remove any words that we don’t want to keep in our word-cloud.
:param listOfWords: List to be cleansed
:return: Nothing
"""
for i in range(len(listOfWords) - 1, 0, -1):
word = listOfWords[i].lower()
if not isValid(word):
del listOfWords[i]
def countFrequencies(listOfWords):
"""
Creates a dictionary consisting of word: frequency elements
:param listOfWords: List to be processed
:return: Dictionary of words and their frequencies
"""
frequencies = {}
for word in listOfWords:
word = word.lower()
if word in frequencies.keys():
frequencies[word] += 1
else:
frequencies[word] = 1
return frequencies
def writeFile(dictionaryOfFrequencies):
"""
Writes the contents to a file called "outputForWordCloud101.txt".
:param dictionaryOfFrequencies: Dictionary to write to file
:return: Nothing
"""
try:
outFile = open("outputForWordCloud101.txt", "w")
for word in dictionaryOfFrequencies.keys():
outFile.write(word + " " + str(dictionaryOfFrequencies[word]) + "\n")
outFile.close()
except Exception as e:
print("Error: " + str(e))
def main():
"""
Entry point of the program to process the words.
:return: None
"""
listOfWords = readFile()
cleanseWords(listOfWords)
dictionaryOfFrequencies = countFrequencies(listOfWords)
writeFile(dictionaryOfFrequencies)
if __name__ == '__main__':
main()
Related Samples
Dive into our free Python assignment samples to expand your knowledge base. These samples showcase real-world applications and solutions, offering insights into Python's versatility and practical use in programming. Whether you're a beginner or advanced learner, these resources provide valuable perspectives to help you grasp Python's nuances and improve your proficiency.
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python