How I effectively integrated LLM capability in my App without any prompt-engineering

As a full-stack application developer (also), I know there must be a better way to use LLMs to add value to my user’s experience with my App. Users of modern applications can benefit immensely if the application provides in-built LLM capabilities, such as — translations, analyzing user sentiments in real-time, providing recommendations based on historical transactions, etc.
As I navigated this comparatively new and evolving field of incorporating LLMs into your applications, I quickly realized how incredibly steep the learning curve is to work with LLMs effectively. Using the ChatGPT interface to ask answers to sometimes meaningless questions is one thing, and it is another thing to develop a methodically crafted prompt as an application developer and work with LLM provider APIs, for example, OpenAI’s APIs for accessing their GPT-3.5-Turbo model.
Eventually, I did learn about various ways to write prompts, and through a lot of hands-on trial and error, I now understand that there is a method to this madness! This method (or madness) is called “Prompt Engineering”. With that, I sincerely thank folks at OpenAI for having developed the chat interface application (i.e., ChatGPT), without which I doubt the craze about Generative AI would have received such hype!
Purpose (The “Why”)
My purpose with this blog post is to share with you an approach I took to simplify my life for future application developments with Generative AI and what I am offering to the countless developers who want to use LLMs in their applications effectively.
Prompting techniques such as zero-shot or few-shot are popular amongst everyday users of chat applications built on top of such LLMs. That said, although the quality of the response is excellent, how much can we trust it? How do we know the model isn’t “making up” (a.k.a. hallucinating) on-the-fly?
As a result, grounding LLMs by providing contextual data in combination with proper prompting techniques is very important. Using prompts with grounded information as context helps LLM generate better responses.
One such approach is Retrieval-Augmented Generation (RAG), which relies on storing and searching text embeddings provided to LLM along with the prompt. However, RAG relies on static information converted into text embeddings and storing them in graph databases (a.k.a. vector databases) so that relevant information can be retrieved and augmented via grounding to generate text/response.
The RAG pattern might raise the question of whether real-time data from sources such as APIs can generate compelling and reliable responses, unlike text embeddings from vector databases. The simple answer is, “Of course, yes!”. But that means additional responsibilities for Application Developers to coordinate between identifying and calling appropriate API endpoints and well-engineered prompts that will generate reliable responses, albeit swiftly handling application errors without compromising the user’s experience.
AI-Dapter (The “What”)
That’s where the AI-Dapter (or AI Adapter) comes into play. AI-Dapter was developed as an open-source project to accelerate the LLM-based application development process for developers, allowing them to focus on their applications. AI-Dapter handles the burden of the following activities on behalf of the application developer -
- identifying the right API endpoints from a pre-defined API repository,
- acquiring real-time data from the specified API endpoints and
- generating a response using the LLM model of choice.
Suppose an application is expected to respond to its user’s questions in plain language. In that case, AI-Dapter provides an intelligent API determination feature that also updates parts of the API endpoint with appropriate values based on the context. For example, if the API endpoint were https://worldtimeapi.org/api/timezone/|area_location| and if the user were to ask about the current time in Mumbai, the AI-Dapter’s intelligent API determination feature would update the API endpoint as https://worldtimeapi.org/api/timezone/Asia/Kolkata.
Further, AI-Dapter can call the above API and obtain the response. This response is then passed to the LLM-generation feature that uses this data to ground the context of the user’s question and generate a reliable response. The idea, again, with grounding, is to prevent the hallucinating effects of LLM.
A live use case is available to try on AI-Dapter’s homepage (scroll down to the “AI-Dapter in action!” section on their homepage).
Usage (The “How”)
Start by installing the AI-Dapter module in your NodeJS project from the command line.
npm install ai-dapter - save
Prepare a repository with API endpoints from where real-time data must be used to generate grounded responses.
/**
API Repository must follow this structure
*/
const apiRepository = [
{
"api_info": {
"title": "Current time",
"description": "Identify the appropriate Timezone area and location for a given location and get time at that location."
},
"api_endpoint": {
"method": "GET",
"url": "https://worldtimeapi.org/api/timezone/|area_location|",
"headers": {
"Content-Type": "application/json"
}
},
"placeholders": [
{
"placeholder": "|area_location|",
"validation_criteria": "An example of area_location is: America/New_York or Europe/London. Based on the valid location provided, determine appropriate area_location.",
"default": "America/New_York"
}
]
},
{
"api_info": {
"title": "Tesla charging stations",
"description": "Get EV charging locations at a given address. Information provided by National Renewable Energy Laboratory (NREL)."
},
"api_endpoint": {
"method": "GET",
"url": "https://developer.nrel.gov/api/alt-fuel-stations/v1/nearest.json?fuel_type=ELEC&ev_network=Tesla&status=E&access=public&location=|address|&radius=|radius|&limit=5",
"headers": {
"Content-Type": "application/json",
"X-Api-Key": "XxYyXYxy"
}
},
"placeholders": [
{
"placeholder": "|address|",
"validation_criteria": "Address that may include a landmark, Street, City and State or Zip Code."
},
{
"placeholder": "|radius|",
"validation_criteria": "radius within miles of a given address.",
"default": "10"
}
]
},
<< add more such API endpoints to this repository >>
]
Then, “import”/”require” AI-Dapter and initialize the framework in your project.
/**
Import AIDapter
*/
import AIDapter from "ai-dapter";
/**
Initialize AIDapter with minimal configuration
*/
const ai = new AIDapter({
"app_name": << give an appropriate app name >>,
"provider": "OpenAI",
"model_name": "gpt-3.5-turbo-16k",
"endpoint": "https://api.openai.com/v1/chat/completions",
"authentication": {
"api_key": << Your OPENAI Key >>,
"org_id": << Your OPENAI Org ID >>
}
});
You can give your agent a role, personality, and other features besides configuring data management parameters.
/**
Agent and Data configuration helps optimize LLM reponses
*/
let aiDapterOptions = {
"agentConfig": {
"role": "virtual assistant",
"personality": "honest and humble"
},
"dataConfig": { "max_records": 5 }
};
Finally, call the all-in-one method, getLLMResponseFromRealtimeSources, and based on the user’s question, it will identify and re-write appropriate APIs, call the APIs to obtain real-time data, and use that to get the proper response from the LLM model.
let question = "I'm traveling to New York City soon. Where can I find EV charging stations there?"
/**
Call the all-in-one method
*/
ai.getLLMResponseFromRealtimeSources(question, apiRepository, aiDapterOptions)
.then((resp) => console.log(resp))
.catch((err) => console.error(err));
/**
Notice the appropriate API endpoint was picked from the repository,
called for obtaining data, and
passed to LLM for generating a contexual response.
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
{
"ai_response": "There are several EV charging stations in New York City. Here are a few options:\n\n1. Brookfield Place - Tesla Supercharger\n - Address: 250 Vesey Street, New York City\n - Phone: 877-798-3752\n - Charging Network: Tesla\n - Charging Types: Tesla Supercharger\n - Distance: 0.78 km\n\n2. Mott Street Parking - Tesla Supercharger\n - Address: 106 Mott Street, New York\n - Phone: 877-798-3752\n - Charging Network: Tesla\n - Charging Types: Tesla Supercharger\n - Distance: 0.89 km\n\n3. 59 Allen Street - Tesla Supercharger\n - Address: 59 Allen Street, New York\n - Phone: 877-798-3752\n - Charging Network: Tesla\n - Charging Types: Tesla Supercharger\n - Distance: 1.32 km\n\nThese are just a few options, and there are more EV charging stations available in New York City. You can find more stations and their details on the AFDC Energy Station Locator website: [AFDC Energy Station Locator](https://afdc.energy.gov/stations/).",
"ai_status": "OK",
"ai_context": {
"entities": [
{
"Location": [
"New York City"
]
}
],
"sources": [
"developer.nrel.gov"
],
"original_question": "I'm traveling to New York City soon. Where can I find EV charging stations there?",
"response_summary": "The user is asking for EV charging stations in New York City. The assistant provides a list of charging stations along with their details and suggests using the AFDC Energy Station Locator website for more options."
},
"tokens": {
"api_identification": {
"prompt_tokens": 2430,
"completion_tokens": 191,
"total_tokens": 2621
},
"llm_response": {
"prompt_tokens": 3395,
"completion_tokens": 353,
"total_tokens": 3748
},
"prompt_tokens": 5825,
"completion_tokens": 544,
"total_tokens": 6369
}
}
*/
Conclusion
The developer experience is improved tremendously by plugging in AI-Dapter within the application code and seamlessly using it as a black box to perform LLM-based responses to users’ questions. Note that these user questions might be close to a zero-shot prompt, but AI-Dapter builds a dynamic framework around that single question and helps generate a more trustful experience for your end-users.