image generation is inconsistent — anyone dealt with this before?
Hey everyone, looking for some expert advice on a project I'm building. It's a wheel visualizer — users upload a photo of their car, pick a wheel from a catalogue, and Google Gemini generates what the car would look like with those wheels fitted. The idea is for car shops to embed it on their website so customers can visualise wheels before buying. The stack: Node.js + Express backend, vanilla JS frontend, single index.html. No framework, no database. Six wheel images stored on the server, both the car photo and wheel reference get resized with Sharp, base64 encoded, and sent to Gemini as inline image parts in one prompt. Model is gemini-2.5-flash-image. Up to 3 retries if no image comes back. The problem: Generation is inconsistent. Sometimes it works great, sometimes it fails or the wheels in the output don't match the reference image at all — even with identical inputs. I'm pretty sure this isn't a hosting issue. I think the core issues are (claude code said it): - Gemini interprets rather than copies, so the wheel reference isn't being replicated accurately - Both images go in as raw base64 blobs with no clear instruction distinguishing which is the reference and which is the target - The wheel images are standalone product shots so the model has to guess scale, perspective and angle - No seed or temperature set so output is different every time Has anyone dealt with inconsistent image generation with Gemini specifically (or any other model)? Is there a better way to structure the prompt or payload so the model reliably uses the reference wheel? Would a different approach work better here altogether? Happy to share server.js if anyone wants to look at it properly.