GenAI: Reducing up to 85% costs with routing and moderation with Llama 3.1 405B
In this project I will create a python class that will take advantage of filtering to reduce costs, on top of that I will add a moderation layer using semantic router. Github Repository Description As explained before, we want to reduce costs in our AI generation, and to do that we will use filtering, but what exactly is this? When we have a GenAI in production, it is common to have a common occurrence, we are using a mighty LLM, and that costs us money, so if the user asks a question that is simple, in other words, that it does not require so much power, we might prefer to use a smaller and therefore, cheaper, LLM to answer....