Retko Okter

AI Developer Accelerator

Activity

Mon

Wed

Fri

Sun

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

What is this?

Less

Memberships

Learn Microsoft Fabric

16.4k members • Free

AI Developer Accelerator

11.2k members • Free

1 contribution to AI Developer Accelerator

Retko Okter

Mar '25 •

CrewAI

RAG with Structural / Tabular Data

Hi guys, I want to ask, if someone attempted to do a structural RAG with tabular data, can be in csv, or SQLite as a basic example. Or if you can think of some best practices here. The best would be to have more tables, able to join them together (custom tool), and also possibly with providing of the Schema of Tables to LLM. Thanks for any input suggestions and ideas.

New comment Apr '25

Retko Okter

0 likes • Apr '25

Hi @Brandon Hancock & @Bastian Venegas . Use case i work on now is this. There is an API from Czech Statistical Office allowing to access to its Data. I would like to create CrewAI to ask questions on it. Note that some parts here are in English and some in Czech. In short, API description is here (in CZ): https://csu.gov.cz/zakladni-informace-pro-pouziti-api-datastatu So from this API description I already took these as the most important: 1. There is a list of Datasets, https://data.csu.gov.cz/api/katalog/v1/sady, 2. Then for each dataset there is information about it https://data.csu.gov.cz/api/katalog/v1/sady/OBY01PD (see the ending code of dataset). 3. Each Dataset then has several predefined tables with its metadata: https://data.csu.gov.cz/api/katalog/v1/sady/OBY01PD/vybery 4. And this tables are in JSON-STAT format, here is link to one of them https://data.csu.gov.cz/api/dotaz/v1/data/vybery/OBY01PDT01. I am able to get this into pandas (or CSV easily), using pyjstat, and in general tables are not that big. Data are as a LONG table, which means there are columns for dimensions, one column containing metrics and last is value, see picture. I think all tables are like this. Also note first 10 rows in a picture are the totals, like for whole country in this example; then its separated by regions. So my thought was that I can prepare all those CSVs quite easily, so that LLM dont have to build this links itself (which is probably also possible, but can be error prone) - and this I am working right now. What I currently strugle with is then on CrewAI side, how should I provide information about this data to a model, so based on question it will find correct table. I can reduce descriptive JSONs from above to only something useful, like names of each dataset and its description. Then available tables in each dataset and their descriptions. So i would have quite many descriptive jsons and quite many data CSVs. Will all those descriptive json be provided to each prompt, or should i think about some finetuning with it?

Retko Okter

0 likes • Apr '25

Hi @Mohamed Juma and thank you. Unfortunately working with CSV using semantics with chunking is not enough, because you cant ask analytical queries on it, that require some kind of aggregations or combination of tables.

1-1 of 1

Level 1 - Bit Newbie

1point to level up

Retko Okter

@retko-okter-7299

Just Me

Active 349d ago

Joined Mar 27, 2025

Contributions

Followers

Following