Activity
Mon
Wed
Fri
Sun
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
What is this?
Less
More

Memberships

Learn Microsoft Fabric

16.4k members β€’ Free

AI Developer Accelerator

11.2k members β€’ Free

1 contribution to AI Developer Accelerator
RAG with Structural / Tabular Data
Hi guys, I want to ask, if someone attempted to do a structural RAG with tabular data, can be in csv, or SQLite as a basic example. Or if you can think of some best practices here. The best would be to have more tables, able to join them together (custom tool), and also possibly with providing of the Schema of Tables to LLM. Thanks for any input suggestions and ideas.
0 likes β€’ Apr '25
Hi @Brandon Hancock & @Bastian Venegas . Use case i work on now is this. There is an API from Czech Statistical Office allowing to access to its Data. I would like to create CrewAI to ask questions on it. Note that some parts here are in English and some in Czech. In short, API description is here (in CZ): https://csu.gov.cz/zakladni-informace-pro-pouziti-api-datastatu So from this API description I already took these as the most important: 1. There is a list of Datasets, https://data.csu.gov.cz/api/katalog/v1/sady, 2. Then for each dataset there is information about it https://data.csu.gov.cz/api/katalog/v1/sady/OBY01PD (see the ending code of dataset). 3. Each Dataset then has several predefined tables with its metadata: https://data.csu.gov.cz/api/katalog/v1/sady/OBY01PD/vybery 4. And this tables are in JSON-STAT format, here is link to one of them https://data.csu.gov.cz/api/dotaz/v1/data/vybery/OBY01PDT01. I am able to get this into pandas (or CSV easily), using pyjstat, and in general tables are not that big. Data are as a LONG table, which means there are columns for dimensions, one column containing metrics and last is value, see picture. I think all tables are like this. Also note first 10 rows in a picture are the totals, like for whole country in this example; then its separated by regions. So my thought was that I can prepare all those CSVs quite easily, so that LLM dont have to build this links itself (which is probably also possible, but can be error prone) - and this I am working right now. What I currently strugle with is then on CrewAI side, how should I provide information about this data to a model, so based on question it will find correct table. I can reduce descriptive JSONs from above to only something useful, like names of each dataset and its description. Then available tables in each dataset and their descriptions. So i would have quite many descriptive jsons and quite many data CSVs. Will all those descriptive json be provided to each prompt, or should i think about some finetuning with it?
0 likes β€’ Apr '25
Hi @MohamedΒ Juma and thank you. Unfortunately working with CSV using semantics with chunking is not enough, because you cant ask analytical queries on it, that require some kind of aggregations or combination of tables.
1-1 of 1
Retko Okter
1
1point to level up
@retko-okter-7299
Just Me

Active 349d ago
Joined Mar 27, 2025
Powered by