We know that LLMs are stateless. However, an AI needs context, some times from information already highlighted in a previous portion of the chat. Therefore, when building an App that interacts with an AI, it is recommended to keep track of the conversation, and provide some historical context to a prompt. This begs the following questions:
- How do we decide the amount of adequate context?
- Does the prompting program filter the conversation for context, and provide the relevant info as part of the prompt?
- How does all this additional historical/contextual information affect the Token cost of the prompt?
Has anyone had experience with escalating cost due to this? Please share how it was managed.