Generates a text response based on a sequence of messages. The callback function allows for streaming tokens as they are generated.
An array of message objects representing the conversation history.
A function that is called with each new token generated by the LLM.
A promise that resolves to the complete generated string.
Interrupts any ongoing text generation process. This can be useful for stopping long-running generations prematurely.
A promise that resolves once the interruption is complete.
Loads the LLM model resources (e.g., weights, tokenizer) into memory. This should be called before attempting to generate text.
Unloads the LLM and its associated resources from memory. This is typically used to free up system resources when the model is no longer needed.
A promise that resolves once the model unloading is complete.
Defines the essential operations for a Large Language Model (LLM). This interface provides a standardized way to interact with various LLM implementations, covering model lifecycle (loading, unloading) and core text generation capabilities. It supports streaming of generated tokens for interactive applications.