A Complex Example
Hi All,
I want to post here a couple of sample projects that I have been using to evaluate the multi-agent tools.
I found that the samples I encounter are very simple, even those considered advanced.
I've used a couple of projects to evaluate several different tools+LLMs+Env.
For the tools I've used AutoGPT, AutoGen, ChatDev, MetaAI, and CrewAI. And to be honest none of them were able to complete it, even with some interference.
As for LLMs, I've used (when available) GPT-4o, Claude 3 Opus, Mistral 8x22b, and Llama 3.
And for environment, I've tried it in Windows 64x (with powershell 7), Ubuntu v22.4 (the real deal wtih zsh), and WSL2 (running Ubuntu v22.4 with zsh)
Full disclosure, the best result I've got so far is with AutoGPT+GPT-4o+WSL2.
But I think the CrewAI is the tools that offers more potential.
So, I think, I am not being able to set up the agents, tools, and tasks correctly.
I would love to get some help in this exercise.
Below I am posting the specification of two of the projects I am using. Both projects are only for experimentation. They are not production code.
We wanted to test two types of scenarios:
  1. A wide project with low coding complexity (shallow) that involves many interconnected parts (frontend, backend, tests, etc). That project was named FacePuppy
  2. A narrow project with high coding complexity (deep) that involves a lot of computation and performance in a language that is not very "popular". That project was named SoundSpectrumRT.
Those are sample of projects very close to real projects that could be given to our dev team to be developed.
Here are their specs that were given in all the tests with the different combination of Tools+LLM+Env:
```
# FacePuppy
## Scope
The FacePuppy web application aims to provide a platform for users to create profiles for their pets, specifically focusing on puppies. Users will be able to register, create and manage the puppy's profile, upload the puppy's photos with an optional caption into a gallery, write blog posts about the puppy, search for the puppies of other users, view the blog posts of the puppies of other users, and, if the user is an administrator, access an admin dashboard. The web application must be mobile-responsive to cater to a wide range of devices.
## Technical
1. The project must be developed using Flutter Web for the frontend and Node.js for the backend.
2. IMPORTANT!! NEVER run any flutter process without setting the server to exit after a short timeout. Otherwise, the server will run forever, and your process will hang. Update the server to accept the timeout as a optional argument.
3. The project must be documented with a README file containing the project name, description, scope, technical details, and instructions for running the project.
4. The project must be tested thoroughly to ensure all features work as expected.
5. The project must be reviewed and refined based on feedback from stakeholders and users.
## Requirements
1. Develop a user-friendly web application for puppy owners to manage and share their puppies' profile, photos, and blog post.
2. Implement secure user registration and login processes using OAuth2.
3. When registering the user must provide Name, email and password.
4. When logging in the user must provide email and password.
5. Passwords should be stored hashed in the user file located in their respective folder.
6. Users can have up to 2 roles: User or Administrator.
7. Allow users to create and maintain puppy profiles with detailed information about the puppy. Like name, breed, age, gender, size (ranging from XXS to XL), date of birth, and neutering status and more.
8. A user can have 0, 1 or more puppies.
9. Enable users to create and maintain a photo gallery of their puppies.
10. Enable users to create and maintain blog posts about their puppies with embedded images and rich text editor.
11. All images used for the puppy must be kept in the gallery, including the photos used in the blog posts and the profile picture.
12. Implement a search functionality for users to find and view other puppy profiles.
13. A user can only view the profile and blog posts of puppies belonging to other users.
14. The user can select puppies from other user's to follow.
15. Develop an admin dashboard for managing all users' content and profiles.
16. The admin dashboard is only accessible to users with the role of Administrator.
17. The frontend of the application must be a web application with a clean and modern design.
1. The main palette colors should be (but not restricted to) light blue and light green.
2. The navigation must be intuitive and easy-to-use across all pages, including main features and standard pages, ensuring users can effortlessly access all parts of the application.
3. The gallery must have a grid layout for photo display with lightbox feature for enlarged view.
4. The blog page must have a traditional blog layout.
5. The search bar must be prominently displayed for easy access.
6. There must be a colorful landing page for guests (not logged in users) rich with puppy stock photos, sections and call-to-action buttons.
7. The logged in users landing page must show a card for each of the user's puppy, the last few images added to the gallery across all of his puppies, and the last few blog posts added across all of his puppies, and the latest updates of the puppies that user follows.
8. There must be a colorful about page also rich with puppy stock photos.
9. There must be a traditional contact page with a form to capture the user queries and send by email.
10. Clean and intuitive admin dashboard layout with data and visualization.
18. All data should be stored in JSON files organized under a folder named 'data' as part of the backend.
1. under the 'data' folder, there should be two main subfolders: 'users' and 'puppies'.
2. Each user should have a dedicated folder under 'users named by their user id.
3. Each puppy should have a dedicated folder under 'puppies named by their puppy id.
4. Under each individual puppy's folder, there should be two subfolders: 'gallery' for photos and 'blog' for blog posts.
5. Directly under the 'data' folder, there should be a file named 'user_email.json' containing a mapping from user email to user id, which facilitates locating the user's folder by email.
6. Directly under the 'data' folder, there should be a file named 'puppy_owner.json' containing a mapping from puppy id to user id, which facilitates locating the puppy's owner.
```
```
# SoundSpectrumRT
## Scope
SoundSpectrumRT is a console application designed to perform real-time Fourier analysis of streaming audio data.
The project MUST be developed in Rust for high performance and reliability.
You MUST assume that all the software required for RUST development is installed on the environment.
You CAN execute build (cargo build), fix (cargo fix --allow-dirty), test (cargo test), fix tests (cargo fix --tests --allow-dirty), and code coverage (cargo tarpaulin --out HTML) to ensure the app has no error and is thoroughly tested.
VERY IMPORTANT!! Treat all warnings as errors in both the main code and in the source code (use '#![deny(warnings)]').
This is a console application and has NO front-end.
You MUST create a set of comprehensible unit tests with good assertions and good code coverage to guarantee system behavior.
## Requirements:
1. The application MUST receive a web socket address to an audio input stream in a WAV format from the app arguments.
2. The application MUST connect to the web socket stream and start receiving audio to be processed.
3. The application MUST apply IN REAL TIME a Fast Fourier Transform (FFT) algorithm and emits the analysis results directly to a file.
4. The generated file must be called '{curren_date_time}.fft', where current_date_time is in an international sortable format down to the seconds.
5. The FFT supports the following parameters configurable via a 'settings.json' file located in the same folder as the executable:
1. Window Size: This is the number of samples processed in each FFT operation. (Integer value range from 256 to 4096 in steps of 256)
2. Overlap: This parameter determines how much successive FFT windows overlap each other. (Integer PERCENTAGE value from 25% to 75% in steps of 5%)
3. Sample Rate: This is the number of audio samples per second. (An enum with a list of permitted values CD, PRO, and HIGH). CD represents 44.1 kHz (CD quality), PRO represents 48 kHz (professional audio), and HIGH represents 96 kHz (high-resolution audio).
4. Normalization: A boolean flag to indicate in a post-FFT normalization should be applied.
5. Silence Time In Seconds: An integer with the number of seconds to wait to stop if the audio is silent (Default 60, 0 means never stops)
6. If the web socket address received is invalid the application finishes with ERROR.
7. If the application can't connect to the web socket the application finishes with ERROR.
8. If the application can't create the output file the application finishes with ERROR.
9. If the web socket connection is lost after application have started to process some audio the application finishes with SUCCESS.
10. If, at any time, after the stream processing have started, the audio becomes silent for more than 60 seconds (this value is configurable in the settings file) the output file is closed and application finishes with SUCCESS.
11. IMPORTANT! for the purpose of unit testing, set the value to of the silence windows to 5 seconds.
12. The application MUST ensure low latency processing below 20 milliseconds and focus on performance, crucial for real-time analysis.
13. The application MUST log ONLY the exceptions.
```
The requirements below were given to both of the projects.
```
## Required Steps:
1. Research and Requirement Gathering
1. Add more details to the project requirements and scope.
2. Design and Prototyping
1. Create a detailed documentation of the application.
2. Plan the tasks required for the development and testing of the application.
3. Prototype the user interface and design elements.
4. Review and refine the design based on user feedback.
3. Development and Testing
1. Implement all the features according to the design and requirements.
2. Conduct thorough testing using unit tests and integration to ensure the application functions correctly.
3. A minimum of 80% of code coverage is required for the tests.
4. Address any bugs or issues identified during testing.
## Constraints
- IMPORTANT! You are in a {XXX} environment, set the shell commands accordingly.
- IMPORTANT! You can only act proactively but are unable to start background jobs or set up webhooks for yourself.
- IMPORTANT! You CANNOT start any long running process like a node.js server, flutter web app, dotnet server, cargo run, etc. These activities will block your terminal and you will not be able to continue with your task. So NEVER use those commands to test your application.
- Do NOT integrate the project with GitHub unless asked to do so.
- If a command or tool fails, or you are not able to fulfill a task, objective, or step, you MUST ask the user to help to execute the operation.
## Resources
- You are a Senior Professional, trained on millions of pages of text, including a lot of factual knowledge. Make use of this factual knowledge to avoid unnecessary gathering of information.
- You can search the web using GOOGLE for information, but you can only use the information you find if it is necessary to complete a task or objective.
- You can ask the user for answers or help to execute tasks at any time.
## Best Practices
- DO NOT MAKE ASSUMPTIONS. If you need more information beyond the project description and the information gathered using your abilities, you MUST ask the user for it.
- Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
- Constructively self-criticize your big-picture behavior constantly.
- Reflect on past decisions and strategies to refine your approach.
- Always try to solve the tasks in a step-by-step approach.
- Write code in a clear and clean way with lots of comments to help other developers to understand clearly what is happening in it.
- Whenever you write some code, also write the corresponding unit tests to cover the new code or behavior
```
Please any advice is welcome.
I would love also to work with someone until we can make those exemples work.
Cheers
Andre
3
8 comments
Andre Vianna
3
A Complex Example
AI Developer Accelerator
skool.com/ai-developer-accelerator
Master AI & software development to build apps and unlock new income streams. Transform ideas into profits. 💡➕🤖➕👨‍💻🟰💰
Leaderboard (30-day)
Powered by