×
Reviews 4.9/5 Order Now

Create a Program to Implement Word-Cloud-Generator in Python Assignment Solution

July 15, 2024
Alberta M. Braud
Alberta M.
🇬🇧 United Kingdom
Python
Alberta M. Braud, Ph.D., University of Technology Sydney, brings 9 years of experience and has completed 776 advanced Python assignments. Her focus areas include cybersecurity and cloud computing, contributing significantly to these fields.
Key Topics
  • Instructions
  • Requirements and Specifications
Tip of the day
Always start SQL assignments by understanding the schema and relationships between tables. Use proper indentation and aliases for clarity, and test queries incrementally to catch errors early.
News
Owl Scientific Computing 1.2: Updated on December 24, 2024, Owl is a numerical programming library for the OCaml language, offering advanced features for scientific computing.

Instructions

Objective

Write a program to implement word-cloud-generator in python.

Requirements and Specifications

Word-Cloud Generator

We will write a python assignment program that generates the input to a word-cloud generator program. A word-cloud is a representation of a collection of words such that the most frequently used words in the collection appear in a larger font. This example shows where students are physically located this term :

We will not be doing the fun part – generating the words in pretty colours and making it look cool. Instead, we’ll be writing the code that reads a text file containing some text from a speech or a blog (or whatever….) and from this we will generate an output file consisting of the words in the text and their frequencies.

For instance:

  • Kingston 20
  • Toronto 45
  • Ottawa 36

This file could then be read by a word-cloud program that would generate the picture above.

For this assignment, I am not giving you complete skeleton code, but instead an outline of the functions that are required, their parameters and what they should return. Your program should follow this outline and not deviate. Deviations from the outline (that is, changing the parameters or returns, adding functions, leaving out functions etc.) will result in a lower mark. Please follow the outline.

You will also need to write appropriate docstrings (comments at the start of each function). For the proper format you can refer to the skeleton code provided in Assignment 8. Each function should be defined with a complete docstring to describe the functionality, the parameters and the return values.

Here are the functions that you need to write:

readFile() – this function takes no parameters and it returns a list where each element is a word in the file. The function will open the file called “cisc101WordCloudFile.txt” (provided as an attachment to this assignment), read the contents into a string and convert the string to a list of words. Note that some words may have “\n” characters on the end or contain punctuation. The \n characters should be removed. Punctuation can stay. You might find the .split() method useful to split a string into a list of words. (Try this: "the quick brown fox".split() and see what you end up with). The file should be found in the same location as your program, so no absolute path should be indicated in the open() function. Be sure to check that the file is found and opened properly using exceptions. If not, inform your user and end the program. DO NOT CHANGE THE NAME OF THE FILE.

isValid(word) – this function takes one parameter – a string (a single word). It returns True if the word should be kept in the list, False otherwise. Words are considered valid if they are 4 or more characters in length, they do not start or end with a digit and they do not contain any punctuation marks. You will find the isdigit() string method useful for this function. You will find this site useful when it comes to trying to figure out whether or not a word contains punctuation marks.

cleanseWords(listOfWords) – this function will remove any words that we don’t want to keep in our word-cloud. It takes as a parameter a list of words and modifies the list to remove some words. Nothing is returned, but the list may be modified in the function. Here is how this function should be structured. Note that you want to traverse from the end of the list to the beginning so that we can safely remove words from the end of the list. (If you go the other way and remove things from the start of the list while looping through the values, you run into problems with indexing). How do you go from the end to the beginning? for i in range(len(words)-1,-1, -1) will do this for you.

For each word starting at the end of the list to the beginning:

Make the word all lower case

Check to see if the word is valid (by calling isValid()

If not a valid word, remove it from the list.

countFrequencies(listOfWords) – this function takes the list of words and creates a dictionary consisting of word: frequency elements. So, for instance, if our list of words was [“to”, “be”, “or”, “not”, “to”, “be”], we would return a dictionary consisting of the following: {“to”: 2, “be”: 2, “or”: 1, “not”: 1}. Approach this in the following way:

For each word in the list:

If it appear in the dictionary, increment dictionary[word] by 1.

Else:

Add the word to the dictionary with a count of 1.

writeFile(dictionaryOfFrequencies) – this function takes as a parameter the dictionary consisting of the word: frequency elements and writes the contents to a file called “outputForWordCloud101.txt”. There should be no path provided – the file should be written where the code is located. Each word: frequency pair should be on a separate line in the file. So for the example given above, the output file would look like this:

to 2

be 2

or 1

not 1

Be sure to check that all file I/O operations succeed by using an exception handler. If not, inform your user and end the program.

main() – main will call the functions in the following order (with appropriate parameters)

readFile()

cleanseWords()

countFrequencies()

writeFile()

Suggestions

Write each function and test it individually. You can, for instance, start with the function isValid(). It takes a string and returns True or False depending on the conditions.

So, to test this function, you don’t need to have read the file using your code – you can just make up inputs. For instance, I can put the following code in my program to test this function:

print(isValid(“7abcd”), “This should produce a False result since it starts with a number”)

print(isValid(“and”), “This should produce a False result since it is too short”)

etc …..

You could write the function writeFile() without any other functions. Simply pass it in a dictionary of fake data and check that the file is created properly.

You do not need to show this testing – it is just to give you an idea of how to build up your program. Build it one function at a time. Doing this will allow you to get credit for what you do get working even if you do not get the entire program to work together.

The file that I have given you is long! Test your code by creating a smaller test function where you know exactly what the input is . Make sure it works on this file first then run it on the longer file.

Challenges

There are many extensions that you could add to this assignment. This section is optional (and not for marks). Hand in only what I have asked you to do, but if you want some challenges, you could do the following:

Remove the punctuation from words (right now the words "world." and "them." show up as unique words. Remove all punctuation before processing.

Write a python assignment with the word frequencies sorted -- so the most frequently used word is at the top.

Remove any words that appear only once in the file.

Source Code

import string def readFile(): """ Returns a list where each element is a word in the file. :return: List of words """ words = [] inFile = open("cisc101WordCloudFile.txt", "r") for line in inFile: for word in line.strip().split(): words.append(word) inFile.close() return words def isValid(word): """ Returns true if the word should be kept in the list, false otherwise. Words are considered valid if they are 4 or more characters in length, they do not start or end with a digit and they do not contain any punctuation marks. :param word: Word to check :return: Boolean """ if len(word) < 4 or word[0].isdigit() or word[-1].isdigit(): return False for letter in word: if letter in string.punctuation: return False return True def cleanseWords(listOfWords): """ Remove any words that we don’t want to keep in our word-cloud. :param listOfWords: List to be cleansed :return: Nothing """ for i in range(len(listOfWords) - 1, 0, -1): word = listOfWords[i].lower() if not isValid(word): del listOfWords[i] def countFrequencies(listOfWords): """ Creates a dictionary consisting of word: frequency elements :param listOfWords: List to be processed :return: Dictionary of words and their frequencies """ frequencies = {} for word in listOfWords: word = word.lower() if word in frequencies.keys(): frequencies[word] += 1 else: frequencies[word] = 1 return frequencies def writeFile(dictionaryOfFrequencies): """ Writes the contents to a file called "outputForWordCloud101.txt". :param dictionaryOfFrequencies: Dictionary to write to file :return: Nothing """ try: outFile = open("outputForWordCloud101.txt", "w") for word in dictionaryOfFrequencies.keys(): outFile.write(word + " " + str(dictionaryOfFrequencies[word]) + "\n") outFile.close() except Exception as e: print("Error: " + str(e)) def main(): """ Entry point of the program to process the words. :return: None """ listOfWords = readFile() cleanseWords(listOfWords) dictionaryOfFrequencies = countFrequencies(listOfWords) writeFile(dictionaryOfFrequencies) if __name__ == '__main__': main()

Related Samples

Dive into our free Python assignment samples to expand your knowledge base. These samples showcase real-world applications and solutions, offering insights into Python's versatility and practical use in programming. Whether you're a beginner or advanced learner, these resources provide valuable perspectives to help you grasp Python's nuances and improve your proficiency.