Jul '24 • CrewAI
How to leverage CrewAI for scraping?
Hey everyone, I am working on a really cool project which I shared with some of you in yesterday's call. I was able to collect really good information from the call yesterday. Thank you everyone. Ok, so now, I want some data to be scraped from Google.
A quick reminder about the project that I am working on:
So, we are creating a platform where its easier for students to choose the right University based on different factors and for that, we need a lot of data and a lot of automation. There are about 5400 Universities in the US and for now, I've got 1700 Universities basic data (such as names and ALIAS etc). What I wanna do now is that I want to build a scraper that takes the name of the University and search the following query: "What programs does {name} offer?" and I will see all of the programs there are to offer by that University. I want that data!
Here's the flow that I understood:
  • Get the name
  • Search the query
  • Get the URL (generated after the search query is searched) of the web page.
  • Use bs4 to get the HTML and then its easy as there's similar names used for <divs> for all the Universities.
I have attached a screenshot too so that you guys can see in detail about what I am talking about.
My questions:
  • How can I use crewAI to do this for me?
  • Should I use crewAI just to the URL of the webpage or should I leverage crewAI to give me all of the programs too.
  • If I use CrewAI, should I use openAI or any open source (maybe 7b) LLM to reduce the cost etc?
If any one of you could gimme a roadmap, it would be amazing because as of now, I have to use selenium just to get the target URL. The main concern for me right now, is the target URL. Its hard to get.
0
0 comments
Afaq Khan
1
How to leverage CrewAI for scraping?
AI Developer Accelerator
skool.com/ai-developer-accelerator
Master AI & software development to build apps and unlock new income streams. Transform ideas into profits. 💡➕🤖➕👨‍💻🟰💰
Leaderboard (30-day)
Powered by