In the spirit of a particular Facebook group i'm in, "Breaking Into Tech", I figured i'd refactor my project to attempt to structure the output at each step of the information extraction.
The main way of structuring output is utilizing Pydantic. Major kudos, also, to a friend who had a work problem they're trying to solve!
How it works:
Download scraped data as a csv file.
Extract and store "Skills and Technologies" from 'description' field of dataframe into text file.
Combine all extracted text.
Create JSON output of Top 5 skills and technologies that are mentioned.
So, I tried it and the results aren't much better than what I was getting before. I think for the next iteration i'm going to look more into adding some kind of classification step for each Skill or technology. It seems like Skills and Tech can be the same thing, so it returns things like "AWS" in skills and tech.
Here are other links that i found useful along the way:
Learning to utilize pydantic with pydanticoutputparser
Example for pydantic class: <--- Inspiration
Learned to utilize the LLMTextCompletionProgram (Pydantic output parser as input)