You don't realize this until it's too late.
Here's the most annoying thing I had to deal with when it comes to n8n: scraping data. And I'm not talking about 10 items or even 100. I'm talking about scraping 33,000 zip codes in the US. Now when I first received this project, I thought easy peasy. Use Apify, connect it with n8n and start scraping, right? How wrong I was. To lay out the scene. I built an entire system of 20 flows to scrape the data, clean it, process it, and deliver it in email form to the client. And it was working when I was scraping at a low volume. But once we increased it? Well, take a look at this. If you didn't know, each node in n8n will hold all the data until the entire flow is done. So when I brought in 1,000 rows of zip codes, that meant that every single node held 1,000 items in memory. But it got worse. For every zip code we could get anything from 10 results to 1,000 results. I should mention that we were not holding back at this point, because we were scraping all the details of every single business. So the first flow was holding at times 100,000 items in every single node, which meant that we ended up with so much data that my client's entire n8n would tank. So not knowing any better, I figured, well, let's lower the volume until it works. We kept lowering it, I redesigned the entire system until we could run 100 items each run. Which worked fine. Until he decided that he wanted to add another scraper for Instagram. I'll spare you the details on this one. But long story short, having two massive scrapers was not a good idea, and we could only run one full system at a time. After some deep digging into the issue and how to fix it, I soon realized that n8n is not built for this large-scale scraping. And even if you decided to upgrade to the best server available. It would have made no difference, and queuing with Redis would not have helped either. I mean, I had already created my own queuing system inside his cloud. The lesson I learned: use n8n for simple things, because it was not meant to handle large amounts of data. Think of n8n like Zapier or Airtable. You wouldn't try to scrape data with Zapier. So the better option is to use code, something like Python for example.