Just finished a Python project that uses Hugging Face Transformers to clean up municipal code violation data which many of you can easily access by googling "your city's code violation open data".
(sorry it gets a bit technical now) - I’m running a distilbart Zero-Shot Classification model locally to process thousands of code violation records. So far I think this has done a great job and we'll probably implement this for our larger datasets. (See screenshot)
The script iterates through raw text descriptions (Complaints/Code Violations) and classifies them into categories like 'Absentee Owner,' 'Structural Damage,' or 'Probate'—without needing a pre-trained dataset. It even includes a logic layer to prioritize probate keywords with high confidence scores as we were getting false positives such as "Abandoned Vehicles, Dead cats/dogs, etc.."
It’s cool how accessible NLP (Natural Language Processing) has become for solving niche business problems like identifying distressed property inventory! The best part is that we don't spend $ on API calls to OpenAI/Gemini as we're running all of these locally.