Introduction

GenAI Agents (hereafter simply Agents) are Services which combine application logic with the use of a large language model (LLM), either directly or via other resources such as Semantic Indexes. When it comes to their basic implementation including issues such as state management, Agents are just standard services. However, the use of LLMs and the nature of the functionality they typically provide, do present some unique issues. That is what we will be focusing on in this document.

When it comes to leveraging an LLM you may be interested in the SubmitPromptAnswerQuestion, and GenAIFlow activity patterns in the Visual Event Handler Guide. For practice adding GenAI functionality to your services see the GenAI Builder Tutorial.

Conversations

LLM interactions are stateless. This means that for every request, the LLM only has access to its own “knowledge” and the information in the current request. If you want the LLM to know about previous requests and its responses to them, this information must be presented as part of the current request. Doing this is the role of a conversation.

The word “conversation” gets used in a variety of contexts when talking about GenAI applications. This isn’t surprising given that LLMs appear to “converse” with users and it is a convenient word to describe many application behaviors. However, in the Vantiq platform the term has a very specific meaning and anytime you see it in relation to Vantiq Agents and GenAI Applications, this is what we mean.

A conversation consists of an ordered sequence of messages which capture the history of an interaction between an Agent and an LLM. The messages in a conversation have a type and associated content. A message’s type tells the LLM how to interpret the content and must be one of the following:

  • system – Instructions to the LLM about how to interpret the conversation or “behave” in general. Typically there is only a single system message which is added automatically by the application. Vantiq supports including a system message as part of an LLM definition to ensure that it is always present.
  • human – Content provided by the user/client.
  • ai – Content generated by the LLM in response to a request. Maybe a direct response or instructions for some further action (e.g. the invocation of an LLM “tool”).
  • tool – Content produced by the execution of an LLM tool.

The structure of a message’s content depends on its type. See ChatMessage for more details.

Transient Conversations

Conversations are managed using the Conversation Memory service which supports creation and manipulation of conversations. The conversations managed by this service can be referenced when sending an LLM request via the built-in submitPrompt and answerQuestion procedures or when invoking a GenAI procedure. Doing so causes the conversation to be automatically updated based on the underlying LLM interactions. On the client side, there is the Conversation Widget to facilitate the user’s participation in a conversation (use the conversationId property to refer to a specific conversation)

As its name implies, the resulting conversation state is stored in memory. This means that it has a limited lifetime and can be subject to loss in certain failure cases. We refer to these as “transient” conversations. Transient conversations are suitable for interactions that will last minutes to maybe an hour or so and which do not need to be saved for any reason (such as auditing).

Persistent Conversations

For cases where a conversation must be available over much longer time frames (days or even weeks) or when it needs to be recorded for some reason, conversations can be persisted as part of a collaboration managed by the Agent. This can be accomplished in a variety of ways. Using the SubmitPromptAnswerQuestion, or GenAIFlow activity patterns in a visual event handler will automatically bind a conversation to the current collaboration instance (or create one as needed). The Agent may also choose the manage its collaborations more explicitly. If the conversation is associated with some application “entity”, then the entity role procedures are a natural fit. Alternatively, the Agent can directly manage one or more persistent conversations using the collaboration management procedures.

In either case, once bound to a collaboration instance, persistent conversations are automatically saved along with their associated collaboration instance and loaded into memory when the collaboration instance is retrieved. Once loaded, they can be accessed through the Conversation Memory service as described above. This is all done using standard partitioned state and fully supports service replication for stricter reliability guarantees.

When using persistent conversations, clients should use Client.setCollaborationContext to bind the client to the active collaboration instance. This will trigger binding any conversation widgets to the appropriate conversation (including support for use of named conversations).

Agent to Human Interaction

The use of an LLM can make Agents very powerful, allowing them to make decisions much more dynamically instead of relying solely on predetermined pathways. To help mitigate against the Agent doing something not just unexpected, but inappropriate, it can be necessary to keep the “human in the loop”, requiring that it obtain permission prior to acting. This requires that the Agent be able to initiate an interaction with the user. This mechanism can also be used for other purposes, such as allowing the Agent to gather additional information it might need to complete its task. Vantiq supports two ways to accomplish this, direct communication and using notifications.

Direct Communication

The direct communication model requires that the agent and the user be actively engaged in a conversation via the Conversation Widget. Whenever the conversation widget loads a conversation (either directly or via an enclosing collaboration instance), it will register a “callback” with the Callback service. The id of the current conversation is used as the callbackId. This allows the Agent to contact the user using the io.vantiq.Callback.invoke procedure like this:

var userPrompt = "I'm about to withdraw money from your account, is that OK?"
var userResponse = io.vantiq.Callback.invoke(conversationId, userPrompt, 5 minutes)
... do something with the response ...

The data sent will be displayed to the user in the conversation widget and then the user’s response will be returned as the result of the invocation (unless the user takes longer than 5 minutes to respond). The advantage of this approach is that the Agent’s communication will appear to the user where they are likely already engaged. This avoids the need to pop up additional UI elements or distract them from the current task. The disadvantage is that it won’t work if the user isn’t actively using a client with a conversation widget.

Notification

If the user cannot be reached directly, then the Agent must instead use a notification based approach to contact them. The Notify activity pattern already provides robust support for sending a notification to one or more users and then managing their responses. The primary limitation is that it works in the context of a visual event handler, which typically operate asynchronously and do not provide a means to produce a result. Rather than invent an alternate notification mechanism, we instead chose to address this limitation.

To do this we added the ability to “invoke” a service event handler. The VAIL PUBLISH statement can obviously be used to trigger a handler, but it assumes a fully asynchronous execution model, so it cannot “wait” for a reply. Therefore, we have added the Event.request procedure to the built-in event processing service. This procedure triggers the handler for a specified service event. It must be called from a service procedure belonging to the same service as the target event type. The behavior of the handler is unrestricted, but it is assumed that at some point it will provide a response using the Reply activity pattern. The net result is a request/response execution model which uses an event handler as its implementation.

For example, suppose we have the following event handler:

Notify Handler

We can “invoke” it from the Agent using code like this:

var event = {collaborationId: collaborationId}
var userResponse = Event.request("NotifySessionEVT", event, 5 minutes)
... do something with the response ...

When run, the target user will receive the notification and be presented with the associated client. Once the user provided the requested input, the result from the client would be sent back to the Agent as the return from Event.request and processing would continue from that point. The advantage of this approach is that it allows the agent to contact users on an “interrupt” basis, not just when they are actively in a conversation. The disadvantage is that the Vantiq Notify pattern is limited to use on a mobile device. So this approach isn’t appropriate for browser only applications.