New and Powerful Components for AI and LLMs

Wolfram
4 Sept 202492:40

TLDRIn the final session of the Wolfram 14.1 webinar series, R&D Manager Shoty Ashnai and Machine Learning Manager Julio Alisandrini introduce new AI and LLM components. They discuss prompt engineering for LLMs, connecting to different LLMs, and updates in audio, video, and image computation. The webinar highlights the ability to interact with LLMs through simple functions, dynamic prompting, and service connections. It also showcases the power of the new semantic search and vector database functionality, and the potential of local LLMs. The session concludes with a Q&A, addressing audience queries and providing insights into the future of AI and LLM integration in Wolfram Language.

Takeaways

  • πŸ˜€ Shoty Ashnai, R&D manager at Sound and Vision, welcomes attendees to the final session of the Wolfram 14.1 webinar series.
  • πŸ“Š The webinar series covered a range of topics including mathematical computations, compiler systems, and data visualization in various domains.
  • πŸ“ˆ The session focused on new and powerful components for AI and Large Language Models (LLMs), with presentations by Shoty and Julio Alisand reini.
  • πŸ”§ Updates and upgrades in audio, video, and image computation were discussed, along with new service connections and future updates.
  • πŸ’¬ Interaction with LLMs was simplified through functions like `LM` and `MLFunction`, allowing for easy API requests and output processing.
  • πŸ–ΌοΈ Image Expressions are now directly supported, eliminating the need for manual encoding and decoding of images in interactions with LLMs.
  • πŸ’¬ The use of `LMFunction` enables delayed evaluation, allowing for the creation of customizable functions backed by LLM APIs.
  • πŸ”„ New service connections were introduced, and a sneak peek into upcoming updates and versions was provided.
  • πŸŽ₯ Significant improvements in video computation were highlighted, including quality of life updates and new functions for video generation.
  • πŸ—£οΈ Speech recognition was enhanced with the adoption of the Whisper model, offering faster and more accurate transcriptions.
  • πŸ‘₯ People segmentation in video frames was improved, allowing for better foreground and background separation.

Q & A

  • What is the main focus of the Wolfram 14.1 release discussed in the webinar?

    -The main focus of the Wolfram 14.1 release discussed in the webinar is the introduction of new and powerful components for AI and Large Language Models (LLMs), including updates in audio, video, and image computation.

  • Who are the presenters of the webinar?

    -The presenters of the webinar are Shoty Ashnai, R&D manager of Sound and Vision, and Julio Alisand reini, manager of Machine Learning at Wolfram.

  • What is the significance of the 'lm synthesize' function in interacting with LLMs?

    -The 'lm synthesize' function is significant as it simplifies the interaction with Large Language Models (LLMs) by requiring minimal setup and handling the connection, sending API requests, interpreting parameters, and processing the output on the fly.

  • How can users customize their interaction with LLMs in Wolfram Language?

    -Users can customize their interaction with LLMs in Wolfram Language by using supporting functions and options, such as specifying parameters like temperature to control randomness, using 'lm function' for delayed evaluation, and employing 'lm prompt' for more structured prompting.

  • What is the new feature introduced in Wolfram 14.1 that supports image inputs for LLMs?

    -In Wolfram 14.1, a new feature that supports image inputs for LLMs is the direct support for image expressions, which automates the process of converting images to base64 encoding strings, allowing for easier interaction with LLMs that accept image inputs.

  • What is the role of 'LM function' in creating parametrized inputs for LLMs?

    -The 'LM function' in Wolfram Language allows users to create parametrized inputs for LLMs by setting up interactions and waiting for user input, which can then be applied as a function with normal Wolfram Language syntax.

  • How does the 'chart object' facilitate persistent conversations with LLMs?

    -The 'chart object' in Wolfram Language facilitates persistent conversations with LLMs by allowing users to start and continue conversations through the use of 'chart evaluate' function, which modifies the chart object with inputs and outputs while preserving the conversation history.

  • What improvements have been made to the video computation capabilities in Wolfram 14.1?

    -Wolfram 14.1 includes several improvements to video computation capabilities, such as support for the latest version of FFMpeg, the addition of an image resolution option for videos, and the introduction of functions like 'video summary plot', 'simple video', 'manipulate video', 'video frame fold', and 'reap video'.

  • What is the new speech recognition model used in Wolfram 14.1 and what are its advantages?

    -The new speech recognition model used in Wolfram 14.1 is the Whisper model. Its advantages include significantly faster processing speeds, improved punctuation and capitalization, recognition of non-speech tokens, and the ability to return timestamps for recognized speech.

  • How can users take advantage of the upgraded OCR functions in Wolfram 14.1?

    -Users can take advantage of the upgraded OCR functions in Wolfram 14.1 by utilizing the improved 'text recognize' and 'find image text' functions on Mac OS, which now leverage the operating system's APIs for better recognition accuracy, especially in low-resolution and challenging text recognition scenarios.

Outlines

00:00

πŸ˜€ Introduction to Wolfram 141 Webinar

Shoty Ashnai, the R&D manager of Sound and Vision, warmly welcomes attendees to the final session of the Wolfram 141 webinar series. The series has covered a wide range of topics including new features, mathematical computations, compiler systems, and various data visualizations. Recordings of past sessions are available on the webinar landing page. Today's focus is on AI and LLMs, with presentations by Julio Alisand reini on prompt engineering and updates by Shoty on audio, video, and image computation. The session also includes a Q&A with Bradley Ashby and Timo Berrier. The audience is encouraged to participate in a poll and ask questions through the chat.

05:04

πŸ”§ Accessing Wolfram 141 and Q&A Coordination

The script instructs participants to access version 141 of the Wolfram app for the webinar examples and mentions a free trial offer. A poll question is asked regarding the upgrade to version 141. The webinar audience and YouTube live stream viewers are encouraged to submit questions and comments through the chat. The technical staff is present to coordinate Q&A between the webinar and live stream platforms. The script also explains the webinar interface, highlighting the download link for presentation notebooks and previous session materials.

10:07

πŸ’¬ Dynamic Prompting and Connecting to LLMs

Julia, the manager of the machine learning group, discusses dynamic prompting and connecting to Large Language Models (LLMs). She provides an overview of the LLM stack functionality in Wolfram Language, including new features in version 141. The presentation covers top-level functions for interacting with LLMs, supporting functions, options, and parameters. It also explores new service connections and future updates. Julia demonstrates how to use simple strings and complex instructions for interacting with LLMs and introduces image expressions support. The session concludes with a Q&A segment.

15:08

πŸ–ŒοΈ Image Expressions and Multimodal Input

The script explains the support for image expressions in LLM interactions, allowing for direct image inputs without the need for base64 encoding. It showcases how to start a conversation with an LLM using an image and receive generic information about the image content. The ability to mix text and images in inputs is demonstrated, highlighting the multimodal input feature. The script also discusses the internal representation of different input types and the potential for future model capabilities.

20:08

πŸ”„ LM Function for Delayed Evaluation

The script introduces the LM function, which allows for delayed evaluation and user input in interactions with LLMs. It demonstrates how to set up interactions and reuse them with different inputs. The use of string templates for creating parametrized inputs is explained, along with the ability to customize function behavior using the LM function's interpreter type. The script also shows how to create self-documented code using named arguments and how to mix multimodal inputs for image captioning.

25:10

πŸ—£οΈ Conversational Interfaces with LLMs

The script describes how to create conversational interfaces with LLMs using chart objects and the chart evaluate function. It shows how to start a persistent conversation by modifying chart objects and preserving conversation history. The use of multimodality in conversations is demonstrated, along with the ability to export conversations as chart objects for later manipulation.

30:13

βš™οΈ Customizing LLM Interactions with Options

The script explains how to customize interactions with LLMs using various options and parameters. It demonstrates how to control the interaction via the LM evaluator option and how to use the llm configuration option to set multiple parameters. The script also shows how to set default configurations for the LM evaluator and how to use service connections to specify different LLM services.

35:15

πŸ”§ Prompt Engineering and Tool Integration

The script delves into prompt engineering, explaining its importance in refining LLM interactions. It introduces the llm prompt function and the ANM prompt repository, which contains well-crafted prompts. The script shows how to use prompts to simplify answers, change languages, and integrate tools for live information retrieval. It also demonstrates how to use tool prompting to enhance LLM responses with external tools and how to create new documents with plots and other information.

40:16

🌐 Retrieval-Augmented Generation in LLMs

The script introduces retrieval-augmented generation (RAG) in LLMs, which allows for on-the-fly argumentation of prompts based on conversation context. It demonstrates how to use the LM prompt generator to enrich prompts with contextual information. The script also explains the use of semantic search and vector database functionality to provide relevant context to LLMs based on the semantic sense of the input text.

45:18

πŸ“Š Vector Databases and Semantic Search

The script provides an overview of vector databases and their use in finding similar content numerically. It explains how to create and search vector databases and how to use feature extractors to convert data into vectors for searching. The script also discusses the use of semantic search to convert text into vectors that represent semantic content and how to perform semantic searches using indexes created from various text sources.

50:21

🎞️ Updates in Video, Audio, and Image Computation

Shi Ashnai discusses updates in video, audio, and image computation in Wolfram Language version 141. She highlights improvements in video quality of life, support for the latest FFMpeg version, and the addition of image resolution options for videos. The script also covers new functions like video summary plot for quick content display and updates to subtitle handling. The presentation includes demonstrations of video generation functions and the new manipulate video function, which allows for detailed control over variable changes over time in video creation.

55:24

πŸ”„ Advanced Video Manipulation Techniques

The script explains advanced video manipulation techniques introduced in Wolfram Language version 141. It covers the use of key frame actions in manipulate and manipulate video functions, allowing for precise control over how variables change over time. The script also introduces video frame fold, which operates similarly to fold list for video frames, and the use of sew video and reap video for conditional frame extraction. The presentation includes examples of how these functions can be used to create motion blur, remove backgrounds, and perform other video manipulations.

00:25

πŸ“Ή Video Generation and Speech Recognition Enhancements

The script highlights new video generation functions in version 141, such as constant video from a single image and video transcribe, which adds transcribed subtitles to video objects. It also discusses enhancements to speech recognition with the introduction of the Whisper model, which significantly improves recognition speed and accuracy. The script covers updates to subtitle handling, including the ability to extract subtitled strings and rules, and the addition of language identification and translation capabilities for video objects.

05:27

πŸ–ΌοΈ Image Computation and People Segmentation

The script discusses updates in image computation, particularly the upgrade of OCR functions on Mac OS to utilize operating system APIs, improving text recognition in various scenarios. It also covers the introduction of people segmentation in the remove background function, which can be applied to both single images and video frames to target and remove backgrounds effectively.

10:28

πŸ“ Closing Remarks and Q&A Summary

The script concludes the webinar with closing remarks, thanking attendees and the behind-the-scenes staff. It mentions the availability of a survey for feedback and certificates of attendance. The presenters encourage participants to explore and provide feedback on the new features introduced in Wolfram Language version 141. They also address common questions from the Q&A chat, summarizing key points and providing additional insights on vector-based functionality and semantic search.

Mindmap

Keywords

πŸ’‘AI and LLMs

AI, or Artificial Intelligence, refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. LLMs, or Large Language Models, are a subset of AI that are designed to understand and generate human-like text based on the input they receive. In the context of the video, AI and LLMs are central to discussing new components and updates in the Wolfram Language, showcasing how these technologies can interact and provide enhanced computational capabilities.

πŸ’‘Prompt Engineering

Prompt engineering is the art of crafting input prompts that effectively guide LLMs to produce desired outputs. It involves understanding how to structure questions or instructions to elicit the most accurate or relevant responses from AI systems. In the video, prompt engineering is highlighted as a key aspect of working with LLMs, where the presenter discusses techniques to refine prompts for better results.

πŸ’‘LM Function

The LM Function is a feature in the Wolfram Language that allows users to set up interactions with LLMs. It's used to define a function that can be evaluated later, which is particularly useful for creating dynamic and adaptable AI-driven applications. The video explains how LM Function can be customized with various parameters to control the behavior of the LLMs.

πŸ’‘Multimodal Input

Multimodal input refers to the ability of a system to accept and process multiple types of input data, such as text, images, and audio. In the video, multimodal input is showcased as a feature that allows for a more comprehensive interaction with LLMs, enabling the AI to understand and respond to a broader range of queries by combining different forms of data.

πŸ’‘Semantic Search

Semantic search is a technique in information retrieval that focuses on the meaning and intent behind user queries, rather than just keyword matching. The video discusses how semantic search can be utilized within the Wolfram Language to enhance the functionality of LLMs, allowing them to provide more contextually relevant responses by understanding the semantics of the input.

πŸ’‘Chat Object

A chat object, in the context of the video, represents a persistent conversation with an LLM. It maintains the context and history of the interaction, allowing for a more natural and continuous dialogue with the AI. The video demonstrates how chat objects can be manipulated and used to create interactive and engaging AI conversations.

πŸ’‘LM Configuration

LM Configuration is a set of options and parameters that define the behavior of an LLM during its interaction with users. It includes settings like temperature, which can affect the randomness of the AI's responses. The video explains how LM Configuration can be used to fine-tune the AI's performance to suit specific needs or preferences.

πŸ’‘Tool Calling

Tool calling is the process of enabling LLMs to access external tools or functions to enhance their capabilities. It allows the AI to perform tasks that go beyond its inherent knowledge, such as live data retrieval or complex computations. The video demonstrates the integration of tool calling within the Wolfram Language, showing how LLMs can leverage additional resources to provide more comprehensive answers.

πŸ’‘Video Summary Plot

A video summary plot is a feature that allows users to quickly visualize the content of a video or audio file by displaying a small number of representative frames or an audio waveform. The video script mentions this feature as part of the updates in the Wolfram Language, which can be useful for quickly assessing or presenting media content.

πŸ’‘Manipulate Video

Manipulate Video is a function that enables the creation of videos from dynamic systems defined in the Wolfram Language. It allows users to record changes in variables over time, such as in simulations or data visualizations, and compile them into a video format. The video script discusses how this function has been updated to provide more control over the video creation process, including specifying key frames and actions.

Highlights

Introduction to the webinar series on new features in Wolfram Language 14.1 by Shoty Ashnai, R&D manager.

Overview of previous webinar sessions covering mathematical computations, compiler systems, and data visualization.

Focus on new and powerful components for AI and Large Language Models (LLMs) in Wolfram Language 14.1.

Presentation by Julio Alisandrini on prompt engineering for LLMs and connecting to different LLMs.

Updates and upgrades in audio, video, and image computation by Shoty Ashnai.

Introduction to the concept of dynamic prompting and connecting to LLMs.

Explanation of the LLM stack functionality in Wolfram Language, including new features in version 14.1.

Demonstration of the simplicity of interacting with LLMs using `LMSynthesize`.

Discussion on the plumbing of supporting functions and options for customizing interactions with LLMs.

Introduction of new service connections and a sneak peek at upcoming updates.

Explanation of the top-level functionality for handling interaction with large language models.

Demonstration of more complex interactions using instructions within the input string.

Introduction of image expressions directly supported in LLM interactions.

Discussion on the `LMFunction` for delayed evaluation and setting up interactions with LLMs.

Presentation of the `ChartObject` for representing programmatically persistent conversations with LLMs.

Introduction of the `LMEvaluator` option for controlling the interaction with LLMs.

Explanation of the `LMConfiguration` option for setting parameters like temperature and max token value.

Demonstration of service connection framework for creating connections to different LLM APIs.

Discussion on the new `RetrievalAugmented Generation` or `RAG` feature in Wolfram Language 14.1.

Introduction to semantic search and vector database functionality for providing context to LLMs.

Explanation of the `LMPromptGenerator` for generating prompts based on the semantic sense of the input.

Discussion on the updates to video, audio, and image computation in Wolfram Language 14.1.

Introduction of the `VideoSummaryPlot` function for displaying video and audio content.

Explanation of updates to subtitle handling and improvements in video generation.

Discussion on the new `ManipulateVideo` function for creating videos from `Manipulate` objects.

Introduction of the `VideoFrameFold` function for applying functions over video frames.

Explanation of the `ReapVideo` function for extracting and reaping interesting video frames.

Discussion on the extended support for audio functions directly on video objects.

Introduction of the `SpeechRecognize` function upgrade to the Whisper model for improved speed and accuracy.

Explanation of the new `VideoTranscribe` function for transcribing spoken audio tracks in videos.

Discussion on incremental updates in image computation, including improved OCR functions.

Introduction of the `RemoveBackground` function update for people segmentation in images and videos.