Building AI Applications with Large Language Models
Download the AI model guide to learn more → https://ibm.biz/BdabCK
Learn more about AI solutions → https://ibm.biz/BdabCa
Unlock the power of Large Language Models and discover how to build AI applications that can understand and respond to complex questions. Join Roy Derks as he explores the technical aspects of building AI applications with LLMs, including the use of vector databases and API’s. Get a guide to the concept of agents and multi-agent frameworks, and learn how to build complex AI applications that can drive business value.
AI news moves fast. Sign up for a monthly newsletter for AI updates from IBM → https://www.ibm.com/account/reg/us-en/signup?formid=news-urx-52120
#largelanguagemodels #aiapplications #artificialintelligence
A lot of web developers are using AI applications such as chat assistance or code assistance. But building an application yourself can sound like a scary task, but it’s not as complex as you think. In this video, I’m going to walk you through how you get from asking a question to retrieving an answer by a large language model. There are a couple of patterns in between. So let’s break down what a typical application looks like. Often it starts with a user interface and a user interface. You can ask your questions. The user interface will then connect to a library or a framework. This library or framework could be open source or it could be a cloud product. This library or framework will interact with an API and is APIs typically provided by an LLM provider. As mentioned, there are a couple of better things I’d like to highlight. Usually people start by asking a question to a large language model. This question will be put inside of a prompt. Then this prompt will be sent to the large language model. And retrieve your final answer. So in your prompt, you would put both your question, but also a set of instructions for the LLM, such as be a helpful assistant or don’t hallucinate or don’t offend anyone. There’s also more complex, better and next to basic prompting and this is what we call RAG or retrieval augmented generation. Again, it starts with a question. This time your question won’t be directly put inside a prompt, but it will be turned into an embedding. This embedding will then be used by a vector database to find relevant context. So it is vector database is right here. And with this vector database, you can retrieve relevant context. We call this top N matches. These top N matches will be put inside a prompt. And this prompt will, of course, also contain your question. What the LLM sees is your prompts, which includes both the question and the top N matches. And based on this, it’s going to return your final answer. With RAG is also a stage where you upload your data into the vector database. And this is important because otherwise the vector database won’t be able to retrieve early and relevant context. If you look at basic prompting, this is typically done via an API or an SDK, which could be provided by LLM provider. If we look at RAG or retrieval augmented generation, you typically do this via a library or a framework. So the final pattern and you can implement as a web developer on building your application is AI agents with AI agents. You still have a question and a final answer, but it’s time you have an agent in the middle that will help you to answer the question. So we start again with a question. This time your question will be sent to the agent. There are multiple patterns to implement agents, and typically you use a framework or a library to do this. The agent will typically plan based on your question and the available tools. It will act or react based on the tool calls. And finally, it’s going to reflect and see if your answer is matching the question. For the plan the next stage it needs a set of tools. And these tools could be either APIs, databases or code, for example, to crawl the web. The LLM is going to use those tools to plan and execute. And finally, providing you with your final answer. Next to a single agent. You can also have a multi agent framework. This typically involves a supervisor agents, which is going to determine which agent should be called to answer your question. So in this video we looked at three different paterns to implement AI applications. The first one was basic prompting, where you use a prompt, which includes your question and a set of instructions. The second one was RAG, Retrieval Augmented Generation, where we use a vector database to make LLMs context aware of your data. And a final one was agents, where you use an agent that will look at a set of tools and based on the tools is going to answer your question. So with this, I hope you can start building your applications today.
#Building #Applications #Large #Language #Models
source
Very nicely explained !!
They really need to make inferencing optimized for and by AVX-512 and AVX-10. Cant put all eggs in the GPU basket
the screech! :S
I have yet to see any serious application created this way.
Why we need rag
What is the difference between AI Agents and Assistants API? For example, Claude (Personal) Projects vs the new OpenAI Projects / Tasks? For example, would creating your own assistant/co-pilot (ChatGPT MyGPT?) work around the rate limit, but then you'd have to pay the API token cost?
And I understand the models you download are already trained, but how do you train it on your own data.. can my "tokens" be pieces of text AND video? And depending on the model the download size is proportional to the number of parameters (ex. 6B Parameters = 5GB) not the size of all my training data, correct? i.e. my data will just change the weights of the model I'm using? And how do you train on the cloud vs locally? Can you still use OpenWebUI as the interface?
I'd like to see if AI can do any of the following:
1) Point an AI tool to a 3hour YT interview (JRE #2237), that DOESN'T have a transcript, and have it summarize answer questions about it
2) Create a video using several yt video clips
3) Open PDFs of scanned handwritten-notes containing tables of numbers, and have it extract (OCR) the numbers
4) Create a BINGO-like AR app that overlays a coin on a specific square in a grid. For example, if A4 is said/entered, the app would use some marker/fiducial on the page to know how far into the grid it needs to place a coin.. scaling the coin size depending on the angle and how close the camera is to the paper
BTW, any thoughts on TinyGrad?
Brilliant video!
Everything depends on value we wanna provide. Simple question simple solution, complex question complex solution, or with some variations.
do you really think that breakthrough technologies are created this way?
no
Nice.
This will take some real good times to learn and to understand and it will take lots of data too. Can this be done with Androids?
Love it
This breakdown of AI application patterns makes it clear how tools like RAG and agents simplify complex tasks—-great for developers looking to dive into building smarter solutions!