What are Generative AI models?
Learn more about watsonx → https://ibm.biz/BdvxDz
Generative AI has stunned the world with its ability to create realistic images, code, and dialogue. Here, IBM expert Kate Soule explains how a popular form of generative AI, large language models, works and what it can do for enterprise.
#LLMs #GenerativeAI #FoundationModels #EnterpriseAI #Watsonx
Over the past couple of months, large language models, or LLMs, such as chatGPT, have taken the world by storm. Whether it's writing poetry or helping plan your upcoming vacation, we are seeing a step change in the performance of AI and its potential to drive enterprise value. My name is Kate Soule. I'm a senior manager of business strategy at IBM Research, and today I'm going to give a brief overview of this new field of AI that's emerging and how it can be used in a business setting to drive value. Now, large language models are actually a part of a different class of models called foundation models. Now, the term "foundation models" was actually first coined by a team from Stanford when they saw that the field of AI was converging to a new paradigm. Where before AI applications were being built by training, maybe a library of different AI models, where each AI model was trained on very task-specific data to perform very specific task. They predicted that we were going to start moving to a new paradigm, where we would have a foundational capability, or a foundation model, that would drive all of these same use cases and applications. So the same exact applications that we were envisioning before with conventional AI, and the same model could drive any number of additional applications. The point is that this model could be transferred to any number of tasks. What gives this model the super power to be able to transfer to multiple different tasks and perform multiple different functions is that it's been trained on a huge amount, in an unsupervised manner, on unstructured data. And what that means, in the language domain, is basically I'll feed a bunch of sentences– and I'm talking terabytes of data here –to train this model. And the start of my sentence might be "no use crying over spilled" and the end of my sentence might be "milk". And I'm trying to get my model to predict the last word of the sentence based off of the words that it saw before. And it's this generative capability of the model– predicting and generating the next word –based off of previous words that it's seen beforehand, that is why that foundation models are actually a part of the field of AI called generative AI because we're generating something new in this case, the next word in a sentence. And even though these models are trained to perform, at its core, a generation past, predicting the next word in the sentence, we actually can take these models, and if you introduce a small amount of labeled data to the equation, you can tune them to perform traditional NLP tasks– things like classification, or named-entity recognition –things that you don't normally associate as being a generative-based model or capability. And this process is called tuning. Where you can tune your foundation model by introducing a small amount of data, you update the parameters of your model and now perform a very specific natural language task. If you don't have data, or have only very few data points, you can still take these foundation models and they actually work very well in low-labeled data domains. And in a process called prompting or prompt engineering, you can apply these models for some of those same exact tasks. So an example of prompting a model to perform a classification task might be you could give a model a sentence and then ask it a question: Does this sentence have a positive sentiment or negative sentiment? The model's going to try and finish generating words in that sentence, and the next natural word in that sentence would be the answer to your classification problem, which would respond either positive or negative, depending on where it estimated the sentiment of the sentence would be. And these models work surprisingly well when applied to these new settings and domains. Now, this is a lot of where the advantages of foundation models come into play. So if we talk about the advantages, the chief advantage is the performance. These models have seen so much data. Again, data with a capital D– terabytes of data –that by the time that they're applied to small tasks, they can drastically outperform a model that was only trained on just a few data points. The second advantage of these models are the productivity gains. So just like I said earlier, through prompting or tuning, you need far less label data to get to task-specific model than if you had to start from scratch because your model is taking advantage of all the unlabeled data that it saw in its pre-training when we created this generative task. With these advantages, there are also some disadvantages that are important to keep in mind. And the first of those is the compute cost. So that penalty for having this model see so much data is that they're very expensive to train, making it difficult for smaller enterprises to train a foundation model on their own. They're also expensive– by the time they get to a huge size, a couple billion parameters –they're also very expensive to run inference. You might require multiple GPUs at a time just to host these models and run inference, making them a more costly method than traditional approaches. The second disadvantage of these models is on the trustworthiness side. So just like data is a huge advantage for these models, they've seen so much unstructured data, it also comes at a cost, especially in the domain like language. A lot of these models are trained basically off of language data that's been scraped from the Internet. And there's so much data that these models have been trained on. Even if you had a whole team of human annotators, you wouldn't be able to go through and actually vet every single data point to make sure that it wasn't biased and didn't contain hate speech or other toxic information. And that's just assuming you actually know what the data is. Often we don't even know– for a lot of these open source models that have been posted –what the exact datasets are that these models have been trained on leading to trustworthiness issues. So IBM recognizes the huge potential of these technologies. But my partners in IBM Research are working on multiple different innovations to try and improve also the efficiency of these models and the trustworthiness and reliability of these models to make them more relevant in a business setting. All of these examples that I've talked through so far have just been on the language side. But the reality is, there are a lot of other domains that foundation models can be applied towards. Famously, we've seen foundation models for vision –looking at models such as DALL-E 2, which takes text data, and that's then used to generate a custom image. We've seen models for code with products like Copilot that can help complete code as it's being authored. And IBM's innovating across all of these domains. So whether it's language models that we're building into products like Watson Assistant and Watson Discovery, vision models that we're building into products like Maximo Visual Inspection, or Ansible code models that we're building with our partners at Red Hat under Project Wisdom. We're innovating across all of these domains and more. We're working on chemistry. So, for example, we just published and released molformer, which is a foundation model to promote molecule discovery or different targeted therapeutics. And we're working on models for climate change, building Earth Science Foundation models using geospatial data to improve climate research. I hope you found this video both informative and helpful. If you're interested in learning more, particularly how IBM is working to improve some of these disadvantages, making foundation models more trustworthy and more efficient, please take a look at the links below. Thank you.
#Generative #models
source
Well explain
Bring restriction to collect data, set a process only to take from trusted sources.henceforth, generative data will have all possiblity to give wrong outcome for a specified task. Thank you for making it deep model wise explanation
Excellent presentation
Complex subject of AI explained so simply
❤
Kate, can you give us a simple AI example for comparison that is solved using classical and modern DL-based AI techniques? For example, how would you generate the next word to complete a sentence using the classical techniques, i.e. perform GenAI using classical techniques?
Thanks for teaching this Generative Ai model in a brief time.
This is exactly what I am thinking about AI as a computer science student. Thanks a lot for all explanations and encouragement gave me on this video
Very nice 🎉
U just give me a good idea since apple encode right goes left to right. Android is right to left. Can we some how program Apple to be like thr front look at the mirror n the Android in behind that would line them identical in the same direction 1 dimension view each but as it move around etc .
when creating educational content on youtube, it's essential to remember that your audience is largely comprised of the general public, who may not have prior knowledge of the subject matter. to effectively educate them, you should use simple, everyday language that's easy to understand. additionally, using relatable examples that are familiar to your viewers can help illustrate complex concepts and make your content more accessible and engaging.
I want to know how you are writing backwards…!!!
Why work harder when AI works smarter? 💪
superb man thank you
Generative AI, Foundation Model and LLMs are are very powerful and great innovations, eventually they are trained on Data, My Question is, how they find that the underlying hidden fact in the data is 100% is a fact or that is a lie? for example we have a lot of data on the internet for fake news, a lot of fake news are generated on daily basis, and spread false information, Do these models will learn these fake news as well? how do they penalize the data to make it more close to the reality before training the LLMS? how this preprocessing is done? anyone please help to understand, thanks in advance!
Are the Code models a subset of LLMs or are they considered their own class? Is there a hierarchy? Thanks for the great content
very informative
She is not writing inverted, she is left handed.
I love it, why u use left hand to write
Very interesting education – Thanks for this presentation
I only just began watching your video, and I am already impressed by your ability to legibly right backward on the transparent wall in front of you.
This should be reviewed as this is not in sync with their other videos. For instances, she puts GenAI on top of Foundation Model, while most of their (IBM) videos put GenAI under the FM. This creates confusion.
Walker Sharon Garcia Kevin Jones Thomas
It doesn't make sense at first, but at the end it all links. Thank you IBM.
IBM Watson AI
IBM open Source Quantum Computing
TY