Episode 397 – Local LLMs: Why Every Microsoft 365 & Azure Pro Should Explore Them

March 13, 2025

Episode Transcript

Welcome to Episode 397 of the Microsoft Cloud IT Pro Podcast. In this episode, Scott and Ben dive into the world of local LLMs—large language models that run entirely on your device. We’re going to explore why more IT pros and developers are experimenting with them, the kinds of models you can run, and how you can integrate them directly into your workflow, including in Visual Studio Code for AI-assisted coding.

Your support makes this show possible! Please consider becoming a premium member for access to live shows and more. Check out our membership options.

Show Notes

About the sponsors

Would you like to become the irreplaceable Microsoft 365 resource for your organization? Let us know!

See full episode transcriptTranscript is autogenerated by AI

1 00:00:03,520 --> 00:00:06,080 Welcome to episode 397 2 00:00:06,080 --> 00:00:09,279 of the Microsoft Cloud IT Pro podcast recorded 3 00:00:09,279 --> 00:00:11,939 live on 03/10/2025. 4 00:00:12,160 --> 00:00:14,480 This is a show about Microsoft three sixty 5 00:00:14,480 --> 00:00:16,554 five and Azure from the perspective of IT 6 00:00:16,554 --> 00:00:18,714 pros and end users, where we discuss the 7 00:00:18,714 --> 00:00:20,875 topic or recent news and how it relates 8 00:00:20,875 --> 00:00:23,114 to you. We've been talking a lot about 9 00:00:23,114 --> 00:00:24,494 AI recently, particularly 10 00:00:24,954 --> 00:00:26,175 Microsoft Copilots. 11 00:00:26,875 --> 00:00:28,714 But what if you want to play around 12 00:00:28,714 --> 00:00:31,960 with AI outside of Copilot or chat GPT 13 00:00:31,960 --> 00:00:34,600 or any other hosted AI tool? In today's 14 00:00:34,600 --> 00:00:35,100 episode, 15 00:00:35,479 --> 00:00:37,559 Scott and Ben dive into the world of 16 00:00:37,559 --> 00:00:38,539 local LLMs, 17 00:00:39,000 --> 00:00:41,659 large language models, that run entirely 18 00:00:42,039 --> 00:00:44,295 on your device. We look at what models 19 00:00:44,295 --> 00:00:45,975 you can run, how you can integrate them 20 00:00:45,975 --> 00:00:47,914 into your workflow, and more. 21 00:00:49,975 --> 00:00:52,375 Oh, Scott. Here we are back in the 22 00:00:52,375 --> 00:00:55,495 stormy South. Stormy South. It has been stormy, 23 00:00:55,495 --> 00:00:57,539 but it's bright and sunny now. So I'll 24 00:00:57,539 --> 00:00:58,579 take it while I can get it. I 25 00:00:58,579 --> 00:01:00,579 don't have anything to go with Nordic. From 26 00:01:00,579 --> 00:01:02,740 the Nordic North, I'm back to the stormy 27 00:01:02,740 --> 00:01:03,240 South. 28 00:01:04,579 --> 00:01:06,659 From sea to shining sea and everything in 29 00:01:06,659 --> 00:01:08,260 between the seas? As long as you count 30 00:01:08,260 --> 00:01:10,005 Lake Michigan as the sea, which if you're 31 00:01:10,005 --> 00:01:12,084 from Michigan, you do. Like, East Coast, West 32 00:01:12,084 --> 00:01:13,924 Coast in Michigan are Lake Michigan and Lake 33 00:01:13,924 --> 00:01:16,245 Huron. We don't really count oceans in Michigan. 34 00:01:16,245 --> 00:01:18,084 Some of those lakes are kinda big, so, 35 00:01:18,484 --> 00:01:20,084 you might even say they're great. They could 36 00:01:20,084 --> 00:01:21,704 be great. It's always interesting. 37 00:01:22,005 --> 00:01:24,165 Side topic, coming down to Florida and talking 38 00:01:24,165 --> 00:01:26,549 to people about lakes and being from Michigan 39 00:01:26,549 --> 00:01:27,370 and Lake Michigan 40 00:01:27,750 --> 00:01:29,530 and Lake Superior and 41 00:01:29,829 --> 00:01:31,109 they're like, but it's a lake. And I'm 42 00:01:31,109 --> 00:01:32,790 like, yeah, but you can't see across it. 43 00:01:32,790 --> 00:01:34,629 So it kinda looks like an ocean when 44 00:01:34,629 --> 00:01:36,549 you're standing on the shore, and we get 45 00:01:36,549 --> 00:01:38,134 waves that are like, well, I think the 46 00:01:38,134 --> 00:01:40,614 biggest waves I've ever recorded in Lake Michigan 47 00:01:40,614 --> 00:01:42,774 were like 25, 20 six feet, and Lake 48 00:01:42,774 --> 00:01:44,774 Superior was up to 32 49 00:01:44,774 --> 00:01:46,854 foot waves. It's like these are not just 50 00:01:46,854 --> 00:01:49,414 like little lakes. These are massive bodies of 51 00:01:49,414 --> 00:01:49,914 water. 52 00:01:50,900 --> 00:01:53,159 They're really, really big ponds, you know. 53 00:01:53,700 --> 00:01:55,060 Joshua Foer: So now we're going across the 54 00:01:55,060 --> 00:01:56,579 pond and that means across Lake Michigan. Joshua 55 00:01:56,579 --> 00:01:58,019 Foer: An LLM what it thinks. Joshua Foer: 56 00:01:58,019 --> 00:02:00,500 We should ask an LLM because we're going 57 00:02:00,500 --> 00:02:02,099 to talk about LLMs. Joshua Foer: We're back 58 00:02:02,099 --> 00:02:04,579 to those things again. So I wanted to 59 00:02:04,579 --> 00:02:05,640 have a chat today 60 00:02:06,019 --> 00:02:06,519 about 61 00:02:07,034 --> 00:02:07,534 LLMs 62 00:02:07,994 --> 00:02:08,495 and 63 00:02:08,955 --> 00:02:11,514 running them locally. Like, I've been doing this 64 00:02:11,514 --> 00:02:13,675 more and more, and I think there's an 65 00:02:13,675 --> 00:02:16,155 interesting set of use cases and workflows. And 66 00:02:16,155 --> 00:02:17,614 I was having a chat with you, 67 00:02:17,914 --> 00:02:20,340 and this isn't something that you do in 68 00:02:20,340 --> 00:02:22,099 your kinda day today from what it sounds 69 00:02:22,099 --> 00:02:23,699 like, but maybe I can, like, get you 70 00:02:23,699 --> 00:02:25,219 in and and and hook you in a 71 00:02:25,219 --> 00:02:27,060 little bit along the way. Oh, you've already 72 00:02:27,060 --> 00:02:28,340 got me hooked. You sent me a few 73 00:02:28,340 --> 00:02:30,340 YouTube videos, and I started watching it, and 74 00:02:30,340 --> 00:02:31,639 the wheels started clicking. 75 00:02:31,955 --> 00:02:34,194 And I have one of the browser tabs 76 00:02:34,194 --> 00:02:35,634 up here. We'll put a link to it, 77 00:02:35,634 --> 00:02:37,555 Scott, about a use case that I already 78 00:02:37,555 --> 00:02:39,254 have for a local LLM. 79 00:02:39,715 --> 00:02:41,574 And you definitely got my wheels 80 00:02:41,875 --> 00:02:42,935 turning about 81 00:02:43,235 --> 00:02:45,715 what possibilities there are about how some of 82 00:02:45,715 --> 00:02:48,310 this works. In Microsoft three sixty five, I 83 00:02:48,310 --> 00:02:50,469 have played around with Copilot. I know a 84 00:02:50,469 --> 00:02:52,229 fair amount, but I've never really looked at 85 00:02:52,229 --> 00:02:54,650 running them locally and bewet my appetite 86 00:02:54,949 --> 00:02:57,370 for this. So this will be an interesting 87 00:02:57,430 --> 00:03:00,229 discussion, and I'm curious to see where it 88 00:03:00,229 --> 00:03:01,849 goes, your thoughts, 89 00:03:02,284 --> 00:03:03,025 my new 90 00:03:03,405 --> 00:03:06,444 thoughts. And my expanding list, Scott, you added 91 00:03:06,444 --> 00:03:08,444 something new to my list. I was doing 92 00:03:08,444 --> 00:03:10,444 so good. It's been a hot minute, but 93 00:03:10,444 --> 00:03:12,044 I I I think this is an important 94 00:03:12,044 --> 00:03:14,064 one. So as we talk about 95 00:03:14,540 --> 00:03:15,040 the 96 00:03:15,659 --> 00:03:16,879 kinda growth 97 00:03:17,259 --> 00:03:17,759 of 98 00:03:18,219 --> 00:03:19,199 generative AI 99 00:03:19,819 --> 00:03:22,800 and models along the way for, 100 00:03:23,580 --> 00:03:26,139 you know, certainly the the copilots of the 101 00:03:26,139 --> 00:03:26,639 world, 102 00:03:27,094 --> 00:03:27,754 the OpenAI's, 103 00:03:28,775 --> 00:03:30,474 Anthropic with Claude, 104 00:03:30,854 --> 00:03:33,034 DeepSeek with r one, 105 00:03:33,655 --> 00:03:35,655 all all these different kinds of things that 106 00:03:35,655 --> 00:03:37,514 exist out there. So 107 00:03:38,134 --> 00:03:39,655 they're they're nice that you can run them 108 00:03:39,655 --> 00:03:40,474 in a service. 109 00:03:40,909 --> 00:03:43,389 And I think most of us have kind 110 00:03:43,389 --> 00:03:46,030 of grown accustomed to that, and and it's 111 00:03:46,030 --> 00:03:47,550 it's a place that most of us are 112 00:03:47,550 --> 00:03:49,629 comfortable. Like, we know how to sign in 113 00:03:49,629 --> 00:03:51,650 to chat GPT on the web and maybe 114 00:03:51,870 --> 00:03:52,370 either 115 00:03:52,915 --> 00:03:56,034 have a chat with an LLM and and 116 00:03:56,034 --> 00:03:58,275 do some structured prompting and and try and 117 00:03:58,275 --> 00:03:59,895 get some responses out of it 118 00:04:00,194 --> 00:04:00,694 versus 119 00:04:01,395 --> 00:04:03,955 things like ChatGPT web search. And it's great. 120 00:04:03,955 --> 00:04:05,655 Right? It's it's all cloud based. 121 00:04:06,030 --> 00:04:07,870 Some of them are free. Some of them 122 00:04:07,870 --> 00:04:08,610 cost money. 123 00:04:09,069 --> 00:04:10,930 They really only start to get powerful 124 00:04:11,229 --> 00:04:14,189 when they do cost money. So now you're 125 00:04:14,189 --> 00:04:16,029 in the world where you're relying on this 126 00:04:16,029 --> 00:04:18,769 external service. You're gonna pay per request. 127 00:04:19,175 --> 00:04:22,055 And probably most importantly, there's a privacy angle 128 00:04:22,055 --> 00:04:24,714 here where you're sending your data out into 129 00:04:24,935 --> 00:04:27,095 the wild. Like, when you're chatting with Chat 130 00:04:27,095 --> 00:04:28,475 GPT in the web interface, 131 00:04:28,775 --> 00:04:30,855 you're passing that data to them. We saw 132 00:04:30,855 --> 00:04:33,095 this with DeepSeq. When DeepSeq kinda came out 133 00:04:33,095 --> 00:04:33,754 of the woodwork 134 00:04:34,099 --> 00:04:35,860 a couple weeks ago and the market freaked 135 00:04:35,860 --> 00:04:37,139 out. You know, they were about a month 136 00:04:37,139 --> 00:04:38,740 behind freaking out when it had actually been 137 00:04:38,740 --> 00:04:42,039 released. But that said, you know, DeepSeek immediately 138 00:04:42,099 --> 00:04:44,680 had a data leak and people broke in 139 00:04:44,819 --> 00:04:46,740 and they got all the usernames, they got 140 00:04:46,740 --> 00:04:48,795 the passwords, they got the prompts that were 141 00:04:48,795 --> 00:04:50,714 flowing through that system, things like that. So 142 00:04:50,714 --> 00:04:52,394 I think one of the most powerful things 143 00:04:52,394 --> 00:04:54,654 here is the ability to 144 00:04:55,514 --> 00:04:56,894 run a local LLM 145 00:04:57,274 --> 00:04:57,774 with 146 00:04:58,154 --> 00:05:00,329 data privacy in mind. So I'm gonna run 147 00:05:00,329 --> 00:05:02,329 these things locally. They're only going to be 148 00:05:02,329 --> 00:05:04,649 on my machine. They're not gonna communicate with 149 00:05:04,649 --> 00:05:06,810 the outside world. And then if you're in 150 00:05:06,810 --> 00:05:09,289 that world of, you know, being a little 151 00:05:09,289 --> 00:05:10,509 bit more cost conscious, 152 00:05:11,129 --> 00:05:13,370 you might wanna try some of these things 153 00:05:13,370 --> 00:05:15,915 out without paying per request in a service 154 00:05:15,915 --> 00:05:18,394 like chat, GPT, or Claude, or or something 155 00:05:18,394 --> 00:05:21,035 like that. And in that world, you're gonna 156 00:05:21,035 --> 00:05:23,134 also have a cost savings angle. 157 00:05:23,514 --> 00:05:25,214 You're gonna have offline capabilities. 158 00:05:25,675 --> 00:05:28,095 So the ability to chat with these models 159 00:05:28,154 --> 00:05:30,470 locally can be a little bit interesting 160 00:05:30,930 --> 00:05:32,470 and and how all that composes. 161 00:05:32,930 --> 00:05:35,110 And, you know, I think the kicker is 162 00:05:35,490 --> 00:05:37,490 most of us are geeks, and we run 163 00:05:37,490 --> 00:05:40,129 around with these really powerful computers. You know, 164 00:05:40,129 --> 00:05:42,689 you've got a laptop with gobs and gobs 165 00:05:42,689 --> 00:05:45,214 of RAM on it, and it's running a 166 00:05:45,214 --> 00:05:48,254 modern processor, it's got a GPU, it's got 167 00:05:48,254 --> 00:05:48,995 an MP, 168 00:05:49,694 --> 00:05:51,134 you know, you might be sitting there at 169 00:05:51,134 --> 00:05:52,895 home and you're like a PC gamer and 170 00:05:52,895 --> 00:05:54,274 that's how you, 171 00:05:54,574 --> 00:05:56,574 you know, just relax at the end of 172 00:05:56,574 --> 00:05:58,319 the day. Well, guess what? You got that 173 00:05:58,319 --> 00:06:00,639 monster GPU, you know, that fifty ninety or 174 00:06:00,639 --> 00:06:02,399 whatever that you can potentially use during the 175 00:06:02,399 --> 00:06:04,319 day with these things. And it turns out 176 00:06:04,319 --> 00:06:06,479 that you might actually chat with local LMs, 177 00:06:06,479 --> 00:06:08,560 like, more than you think. You know, like 178 00:06:08,720 --> 00:06:10,740 so we've talked about how we're Apple users, 179 00:06:11,305 --> 00:06:14,185 so iOS, things like that. The predictive text 180 00:06:14,185 --> 00:06:17,004 on iOS is all based on an LLM. 181 00:06:17,144 --> 00:06:18,764 It's based on a transformer. 182 00:06:19,384 --> 00:06:21,144 So that thing is running a local model. 183 00:06:21,144 --> 00:06:23,589 Well, you can run those similar models on 184 00:06:23,589 --> 00:06:25,669 your side. So it gives you this really 185 00:06:25,669 --> 00:06:29,689 interesting opportunity to kinda take advantage of AI 186 00:06:30,149 --> 00:06:32,789 while maintaining the privacy aspects, maybe letting you 187 00:06:32,789 --> 00:06:34,229 play with new things. Like, if you wanna 188 00:06:34,229 --> 00:06:36,389 play with DeepSeek without signing up for the 189 00:06:36,389 --> 00:06:37,370 DeepSeek service, 190 00:06:37,685 --> 00:06:39,525 like, hey, that that that that's a great 191 00:06:39,525 --> 00:06:40,725 way to do it. So we'll talk a 192 00:06:40,725 --> 00:06:42,085 little bit about that and kind of some 193 00:06:42,085 --> 00:06:43,925 of the advantages and what you can get 194 00:06:43,925 --> 00:06:46,904 on with. We should also talk about what 195 00:06:47,525 --> 00:06:49,064 folks can actually run, 196 00:06:49,365 --> 00:06:52,439 like, what's useful useful that can run locally 197 00:06:52,439 --> 00:06:53,879 for you. So we're gonna talk a little 198 00:06:53,879 --> 00:06:56,779 bit about, like, parameter size in a model 199 00:06:56,839 --> 00:06:59,800 and how big these things are. So turns 200 00:06:59,800 --> 00:07:02,300 out there's a big difference between a 201 00:07:02,680 --> 00:07:06,044 1,000,000,000 parameter model, a 7,000,000,000 parameter model, a 202 00:07:06,044 --> 00:07:06,925 65,000,000,000 203 00:07:06,925 --> 00:07:09,004 parameter model, or, you know, like I said, 204 00:07:09,004 --> 00:07:10,384 if you wanna play around with DeepSeq, 205 00:07:10,764 --> 00:07:12,444 I was watching some videos on YouTube of 206 00:07:12,444 --> 00:07:15,264 people who are playing around with some clustered 207 00:07:15,644 --> 00:07:16,144 servers 208 00:07:16,604 --> 00:07:18,285 to do, like, 400,000,000,000 209 00:07:18,285 --> 00:07:20,660 parameter model runs. And, you know, you can't 210 00:07:20,660 --> 00:07:22,899 run, like, 400,000,000,000 parameters locally. You need, like, 211 00:07:22,899 --> 00:07:25,220 a distributed system, and, you you know, you 212 00:07:25,220 --> 00:07:27,860 can potentially do it across a series of 213 00:07:27,860 --> 00:07:30,339 servers within your premises. But that said, like, 214 00:07:30,339 --> 00:07:32,339 those aren't for everybody. They're gonna be too 215 00:07:32,339 --> 00:07:35,060 slow, cost a bunch for the GPUs, things 216 00:07:35,060 --> 00:07:36,555 like that. So we'll talk a little bit 217 00:07:36,555 --> 00:07:38,954 about that, about like parameters and, you know, 218 00:07:38,954 --> 00:07:42,235 maybe where more parameters doesn't always mean, like, 219 00:07:42,235 --> 00:07:44,735 better results. I think that's important too. 220 00:07:45,035 --> 00:07:46,714 There there's a little bit of nuance and 221 00:07:46,714 --> 00:07:48,735 kind of trade off here between 222 00:07:49,240 --> 00:07:51,720 speed of response, like how many tokens can 223 00:07:51,720 --> 00:07:53,639 an LLM respond back to you with, what's 224 00:07:53,639 --> 00:07:56,519 the accuracy of that, and probably most importantly, 225 00:07:56,519 --> 00:07:58,279 like what are the compute requirements on your 226 00:07:58,279 --> 00:08:00,039 end. So like the things that I'm gonna 227 00:08:00,039 --> 00:08:01,800 talk about that I run today, so I 228 00:08:01,800 --> 00:08:03,819 rock an m MaxBook Pro 229 00:08:04,295 --> 00:08:06,455 most of the time, and that's kind of 230 00:08:06,455 --> 00:08:07,975 like what I'm running on. And I've got, 231 00:08:07,975 --> 00:08:09,735 you know, 32 gigs of RAM in there, 232 00:08:09,735 --> 00:08:11,595 and and I'm all set in my world. 233 00:08:11,895 --> 00:08:14,214 You have a a different model on a 234 00:08:14,214 --> 00:08:15,035 different processor 235 00:08:15,574 --> 00:08:19,009 with more memory and potentially more GPUs, so 236 00:08:19,009 --> 00:08:20,529 you'll be able to run, like, maybe even, 237 00:08:20,529 --> 00:08:22,870 like, bigger things than I can run here. 238 00:08:22,930 --> 00:08:24,310 And that's okay. And then, 239 00:08:24,770 --> 00:08:27,089 you know, your mileage may vary. But it's 240 00:08:27,089 --> 00:08:28,930 kind of like anybody can get started with 241 00:08:28,930 --> 00:08:31,435 these things, even on, like, a little, 242 00:08:31,975 --> 00:08:34,294 you know, off the shelf NUC kind of 243 00:08:34,294 --> 00:08:37,254 PC or things like that. So beyond chatting 244 00:08:37,254 --> 00:08:38,075 with these things, 245 00:08:38,774 --> 00:08:40,615 you can also use them to empower your 246 00:08:40,615 --> 00:08:41,115 workflows. 247 00:08:41,654 --> 00:08:44,375 So you can use local AI models with 248 00:08:44,375 --> 00:08:46,830 Visual Studio Code. Like, you might sit out 249 00:08:46,830 --> 00:08:47,809 and go and say, 250 00:08:48,590 --> 00:08:50,929 I'm coding a dot net application. 251 00:08:51,470 --> 00:08:54,830 Let me go find the best LLM model 252 00:08:54,830 --> 00:08:56,210 for dot net applications, 253 00:08:56,669 --> 00:08:58,074 but I don't wanna pay for it. I 254 00:08:58,074 --> 00:09:00,074 I don't wanna, like, go to OpenAI or 255 00:09:00,074 --> 00:09:02,154 Anthropic and and do the cloud thing, anything 256 00:09:02,154 --> 00:09:03,995 like that. Well, maybe you can go out 257 00:09:03,995 --> 00:09:05,754 and actually just download a model and run 258 00:09:05,754 --> 00:09:07,595 it locally, and we'll kind of talk about 259 00:09:07,595 --> 00:09:10,074 the hosting engines for these things that expose 260 00:09:10,074 --> 00:09:12,730 things like standard OpenAI endpoints. So you can 261 00:09:12,730 --> 00:09:15,629 literally point Versus Code at a local LLM 262 00:09:15,769 --> 00:09:17,529 and have it write you PowerShell and all 263 00:09:17,529 --> 00:09:19,210 those things that are, like, private just to 264 00:09:19,210 --> 00:09:19,870 your machine 265 00:09:20,170 --> 00:09:23,210 without having to go out to the Internet 266 00:09:23,210 --> 00:09:25,154 and get those kinds of things done. So 267 00:09:25,235 --> 00:09:26,915 I think that's a fun little way to 268 00:09:26,915 --> 00:09:29,254 kind of think about integrating these things 269 00:09:29,715 --> 00:09:31,555 into your life and how they come together. 270 00:09:31,555 --> 00:09:33,154 So we just kind of want to go 271 00:09:33,154 --> 00:09:35,475 end to end and full circle between, can 272 00:09:35,475 --> 00:09:37,975 you run your own chat GPT 273 00:09:38,675 --> 00:09:39,175 like 274 00:09:39,600 --> 00:09:40,259 thing, model 275 00:09:41,039 --> 00:09:43,600 locally? And the answer is yes. So, yeah, 276 00:09:43,600 --> 00:09:45,440 like we should just kind of have a 277 00:09:45,440 --> 00:09:45,940 conversation 278 00:09:46,799 --> 00:09:47,700 about that. So 279 00:09:48,399 --> 00:09:50,639 why don't we start with like the whole 280 00:09:50,639 --> 00:09:53,360 data and privacy cost efficiency thing and all 281 00:09:53,360 --> 00:09:54,904 that stuff? I think that's one of the 282 00:09:54,904 --> 00:09:56,924 ones that can be super important 283 00:09:58,024 --> 00:10:00,105 that people think about. And kinda like you 284 00:10:00,105 --> 00:10:02,345 said, the deep sea click exposed to millions 285 00:10:02,345 --> 00:10:02,845 sensitive 286 00:10:03,225 --> 00:10:05,544 data records. One thing I've heard even when 287 00:10:05,544 --> 00:10:07,565 you start looking at things like ChatGPT 288 00:10:08,105 --> 00:10:08,605 versus 289 00:10:09,304 --> 00:10:11,759 Copilot and Microsoft three sixty five and going 290 00:10:11,759 --> 00:10:14,580 back to the local ones or doing OpenAI 291 00:10:14,720 --> 00:10:16,420 in Azure or 292 00:10:16,960 --> 00:10:19,680 something in AWS is it it very much 293 00:10:19,680 --> 00:10:21,279 goes back to where does that data go. 294 00:10:21,279 --> 00:10:23,680 Some people see rolling out Copilot as a 295 00:10:23,680 --> 00:10:26,365 security benefit because then they're not taking all 296 00:10:26,365 --> 00:10:27,585 that data from 297 00:10:27,965 --> 00:10:30,845 SharePoint, from Teams, from their Microsoft three sixty 298 00:10:30,845 --> 00:10:33,424 five tenant, sending it out into ChatGPT 299 00:10:33,804 --> 00:10:36,524 where it's escaping that Microsoft three sixty five 300 00:10:36,524 --> 00:10:39,470 boundary. OpenAI and Azure, same thing. If all 301 00:10:39,470 --> 00:10:42,210 your data's up in Azure somewhere, if you're 302 00:10:42,509 --> 00:10:45,309 working with Scott to store petabytes of data 303 00:10:45,309 --> 00:10:47,309 in blob storage and you want that to 304 00:10:47,309 --> 00:10:49,470 be used for OpenAI, you can do that. 305 00:10:49,470 --> 00:10:50,909 But then you do get into this local 306 00:10:50,909 --> 00:10:52,129 thing. All your data's 307 00:10:52,595 --> 00:10:54,835 local. Or one of the scenarios I have 308 00:10:54,835 --> 00:10:56,514 that we can put a link to is 309 00:10:56,514 --> 00:10:58,754 I use Home Assistant for all my smart 310 00:10:58,754 --> 00:10:59,894 home stuff because 311 00:11:00,355 --> 00:11:01,254 I like everything 312 00:11:01,555 --> 00:11:03,235 local. I don't want it all going out 313 00:11:03,235 --> 00:11:05,495 to relying on Samsung or 314 00:11:05,795 --> 00:11:08,370 any of those. What if you wanna integrate 315 00:11:08,590 --> 00:11:11,789 AI into your local smart home stuff and, 316 00:11:11,789 --> 00:11:14,129 again, you wanna keep it all internal? You're 317 00:11:14,190 --> 00:11:15,809 in an industry where 318 00:11:16,269 --> 00:11:18,590 you need to keep things on premises for 319 00:11:18,590 --> 00:11:21,115 some reason or certain regulations around that. I 320 00:11:21,115 --> 00:11:23,215 think there's a huge benefit to doing 321 00:11:23,595 --> 00:11:26,095 local AI, whether it's at that small 322 00:11:26,475 --> 00:11:27,375 in your house, 323 00:11:27,915 --> 00:11:29,855 you and I type scenario of 324 00:11:30,235 --> 00:11:33,215 smart home or something here or large enterprises 325 00:11:33,754 --> 00:11:38,220 that have very stringent data requirements and need 326 00:11:38,220 --> 00:11:40,460 to run it locally, maybe in their own 327 00:11:40,460 --> 00:11:43,519 data centers in clusters that they build internally 328 00:11:43,580 --> 00:11:45,820 and stuff. Home Assistant is a fun one. 329 00:11:45,820 --> 00:11:48,294 So if you think about AI and Home 330 00:11:48,294 --> 00:11:49,894 Assistant and what they're doing with like Home 331 00:11:49,894 --> 00:11:52,054 Assistant voice and some of those things, it 332 00:11:52,054 --> 00:11:55,414 relies on two paths. One is text to 333 00:11:55,414 --> 00:11:58,054 speech. So can I have Home Assistant talk 334 00:11:58,054 --> 00:11:59,735 to me? So some text goes in and 335 00:11:59,735 --> 00:12:01,195 can I have it talk back to me? 336 00:12:01,419 --> 00:12:03,980 And then it's also speech to text in 337 00:12:03,980 --> 00:12:05,600 the form of things maybe 338 00:12:05,980 --> 00:12:06,879 like Whisper, 339 00:12:07,259 --> 00:12:09,259 which is, you know, typically what I see 340 00:12:09,259 --> 00:12:11,660 integrated with most on that side. In fact, 341 00:12:11,660 --> 00:12:15,179 we use Whisper for generating transcripts sometimes for 342 00:12:15,179 --> 00:12:17,205 the show. So it's not just LLMs. 343 00:12:17,665 --> 00:12:19,665 It could be things like text to speech, 344 00:12:19,665 --> 00:12:22,165 speech to text. Could also be image generation. 345 00:12:22,225 --> 00:12:24,384 Like, if somebody is looking to, like, play 346 00:12:24,384 --> 00:12:26,945 around with stable diffusion, that that runs pretty 347 00:12:26,945 --> 00:12:27,764 well locally 348 00:12:28,269 --> 00:12:29,870 on most of these things as well. It 349 00:12:29,870 --> 00:12:31,549 could be a little bit slow, but, hey, 350 00:12:31,549 --> 00:12:33,629 that that's okay. That's that's part of the 351 00:12:33,629 --> 00:12:35,950 trade off of not having to pay and 352 00:12:35,950 --> 00:12:38,110 and push these things through. But I I 353 00:12:38,110 --> 00:12:39,950 think the most important thing is just when 354 00:12:39,950 --> 00:12:44,274 you're running an LLM locally, you're basically mitigating 355 00:12:44,274 --> 00:12:45,495 a bunch of that risk 356 00:12:46,115 --> 00:12:48,834 of having to worry about compliance, having to 357 00:12:48,834 --> 00:12:51,735 worry about legal concerns. Like, hey, I'm submitting, 358 00:12:51,875 --> 00:12:54,115 like, this thing that's important to me. Like, 359 00:12:54,115 --> 00:12:56,274 I'm never like, for example, I'm never going 360 00:12:56,274 --> 00:12:57,495 to chat with my taxes 361 00:12:58,039 --> 00:13:00,439 with anything other than, like, a local LLM 362 00:13:00,439 --> 00:13:01,879 to help me break some of that stuff 363 00:13:01,879 --> 00:13:02,379 down. 364 00:13:03,159 --> 00:13:05,240 But, you know, somebody else might be out 365 00:13:05,240 --> 00:13:07,000 there, but good good luck when you're when 366 00:13:07,000 --> 00:13:08,759 you're in the next data breach or or 367 00:13:08,759 --> 00:13:11,345 or whatever happens. So there's things like that. 368 00:13:11,345 --> 00:13:12,945 I think the other one that's important to 369 00:13:12,945 --> 00:13:15,424 consider is kind of the cost angle of 370 00:13:15,424 --> 00:13:17,904 things. Like, I'll be the first to admit 371 00:13:17,904 --> 00:13:20,225 that I'm pretty frugal. So if you're thinking 372 00:13:20,225 --> 00:13:22,544 about maybe like OpenAI and having to go 373 00:13:22,544 --> 00:13:24,945 out and pay for OpenAI, and you're either 374 00:13:24,945 --> 00:13:27,129 paying per request or you're on one of 375 00:13:27,129 --> 00:13:28,889 the monthly plans. And those can get pretty 376 00:13:28,889 --> 00:13:30,250 expensive. Right? If you wanna get up there, 377 00:13:30,250 --> 00:13:31,769 you can spend up to, like, $200 a 378 00:13:31,769 --> 00:13:34,169 month. But typically, they're on the order of, 379 00:13:34,169 --> 00:13:34,669 like, 380 00:13:35,129 --> 00:13:38,990 you know, 1¢ US per 1,000 tokens. 381 00:13:39,365 --> 00:13:40,964 And then you're like, Well, what's a token? 382 00:13:40,964 --> 00:13:43,125 Like, how many words comprise a token? Like, 383 00:13:43,125 --> 00:13:44,164 it can be a little bit weird to 384 00:13:44,164 --> 00:13:45,845 figure out the pricing. So sometimes you just 385 00:13:45,845 --> 00:13:48,105 want to play around with these things locally 386 00:13:48,884 --> 00:13:50,824 without having that cost constraint, 387 00:13:51,204 --> 00:13:53,959 because costs can run away from you pretty 388 00:13:53,959 --> 00:13:56,620 quickly, especially if you're being like super chatty 389 00:13:57,000 --> 00:14:00,120 and doing longer chat threads and things like 390 00:14:00,120 --> 00:14:02,199 that. Or the other place they tend to 391 00:14:02,199 --> 00:14:03,179 get pretty expensive 392 00:14:03,559 --> 00:14:05,339 is if you're integrating 393 00:14:06,134 --> 00:14:06,634 these 394 00:14:07,095 --> 00:14:07,595 AIs 395 00:14:08,254 --> 00:14:08,754 into, 396 00:14:09,415 --> 00:14:11,815 like, your coding workflows, like, hey, you're you're 397 00:14:11,815 --> 00:14:12,855 out there and you're sitting there and you're 398 00:14:12,855 --> 00:14:15,095 like, I want a vibe code. Well, great. 399 00:14:15,095 --> 00:14:17,654 When you're like vibe coding across 10,000 lines 400 00:14:17,654 --> 00:14:18,929 of code, it starts 401 00:14:19,309 --> 00:14:21,790 to add up and get pretty expensive. So 402 00:14:21,790 --> 00:14:24,350 you already bought this, you know, honking computer. 403 00:14:24,350 --> 00:14:26,370 You got a GPU. It's got CPU. 404 00:14:26,830 --> 00:14:28,990 It's got a fast disk. You might as 405 00:14:28,990 --> 00:14:30,850 well use it for a little bit more 406 00:14:31,154 --> 00:14:33,794 than just writing your PowerShell scripts. Like, why 407 00:14:33,794 --> 00:14:35,235 why are you sitting there writing in Versus 408 00:14:35,235 --> 00:14:37,074 Code by hand when, you know, you could 409 00:14:37,074 --> 00:14:39,074 be just vibing your way through that stuff? 410 00:14:39,074 --> 00:14:41,014 For sure. And I think that's one thing. 411 00:14:41,074 --> 00:14:42,834 I guess I kind of always realized it 412 00:14:42,834 --> 00:14:44,834 in the back of my head, comparing local 413 00:14:44,834 --> 00:14:45,334 LLM 414 00:14:45,679 --> 00:14:48,559 to JetGPT to Copilot to cloud based, it 415 00:14:48,559 --> 00:14:50,100 kinda struck me that 416 00:14:50,720 --> 00:14:53,120 from a pricing perspective, when you're using cloud 417 00:14:53,120 --> 00:14:55,759 based LLMs, you're not paying for the models. 418 00:14:55,759 --> 00:14:58,319 Like, these companies, these models are all out 419 00:14:58,319 --> 00:14:58,819 there, 420 00:14:59,120 --> 00:14:59,940 whether it's 421 00:15:00,424 --> 00:15:00,924 DeepSeek 422 00:15:01,945 --> 00:15:02,764 or Lama 423 00:15:03,144 --> 00:15:05,225 or any of those. What you're really paying 424 00:15:05,225 --> 00:15:06,764 for is the compute to 425 00:15:07,225 --> 00:15:09,784 process the request to these models, and that's 426 00:15:09,784 --> 00:15:11,225 where that cost comes in. Do you wanna 427 00:15:11,225 --> 00:15:13,625 spend it in on premises hardware and hardware 428 00:15:13,625 --> 00:15:15,790 running in your house, or are you giving 429 00:15:15,790 --> 00:15:18,350 it to these cloud providers for the hardware 430 00:15:18,350 --> 00:15:21,470 out there running models that maybe you don't 431 00:15:21,470 --> 00:15:24,429 physically have the capability of running on your 432 00:15:24,429 --> 00:15:26,269 compute that you own? It is an interesting 433 00:15:26,269 --> 00:15:28,190 one. The other thing that, you know, like, 434 00:15:28,190 --> 00:15:29,629 once you get a little bit more advanced 435 00:15:29,629 --> 00:15:30,910 and you start going down the path of 436 00:15:30,910 --> 00:15:32,664 some of this stuff, if you really get 437 00:15:32,664 --> 00:15:35,065 into it, you start looking at things like 438 00:15:35,065 --> 00:15:35,804 fine tuning 439 00:15:36,184 --> 00:15:39,304 and doing RAG or retrieval augmented generation against 440 00:15:39,304 --> 00:15:41,144 things. So we'll put a link in the 441 00:15:41,144 --> 00:15:44,105 show notes to a Network Chuck episode where 442 00:15:44,105 --> 00:15:46,610 he talks about running local LMs. And one 443 00:15:46,610 --> 00:15:47,970 of the things that he does, he has 444 00:15:47,970 --> 00:15:50,370 this really interesting use case where when he 445 00:15:50,370 --> 00:15:53,110 attends church, all the sermons are transcribed, 446 00:15:53,649 --> 00:15:56,610 and he uses local LLMs to summarize the 447 00:15:56,610 --> 00:15:58,929 sermons for himself. Like, he doesn't always get 448 00:15:58,929 --> 00:16:00,769 to attend live, but he still wants to 449 00:16:00,769 --> 00:16:02,534 get the messaging out of it. So he 450 00:16:02,534 --> 00:16:05,095 does all that stuff like local LLM, and 451 00:16:05,095 --> 00:16:07,174 it's just all there ready to go. It 452 00:16:07,174 --> 00:16:08,154 does the transcription, 453 00:16:08,934 --> 00:16:10,855 like pulls it all off a YouTube thing, 454 00:16:10,855 --> 00:16:12,934 transcribes it, runs it through an LLM, gives 455 00:16:12,934 --> 00:16:14,730 him the summary, and then that summary is 456 00:16:14,730 --> 00:16:17,289 written back as a markdown file where it 457 00:16:17,289 --> 00:16:18,269 lands in Obsidian, 458 00:16:18,809 --> 00:16:21,289 and then he can just use his network 459 00:16:21,289 --> 00:16:23,450 brain in Obsidian to go and figure some 460 00:16:23,450 --> 00:16:25,049 of that stuff out too. So you can 461 00:16:25,049 --> 00:16:27,164 get pretty rich with these things if you 462 00:16:27,164 --> 00:16:29,644 start to kinda, run through the use cases. 463 00:16:29,644 --> 00:16:31,804 So we're, like, Network Chuck might be doing 464 00:16:31,965 --> 00:16:34,044 I might be working on coding a new 465 00:16:34,044 --> 00:16:34,544 application, 466 00:16:35,004 --> 00:16:37,004 and I just want it to learn off 467 00:16:37,004 --> 00:16:39,009 maybe an existing code base from, like, the 468 00:16:39,009 --> 00:16:41,029 previous two versions or iterations 469 00:16:41,409 --> 00:16:43,089 or things like that that I did along 470 00:16:43,089 --> 00:16:44,529 the way. So you can also do these 471 00:16:44,529 --> 00:16:46,850 things like fine tuning and get up and 472 00:16:46,850 --> 00:16:47,350 running 473 00:16:47,889 --> 00:16:50,529 pretty pretty quickly. It's actually, like, turns out 474 00:16:50,529 --> 00:16:51,970 a lot of the tooling's already out there. 475 00:16:51,970 --> 00:16:53,350 Like, these things are 476 00:16:53,730 --> 00:16:56,115 not the hardest thing to stand up. But 477 00:16:56,115 --> 00:16:57,875 before we stand them up, we should also 478 00:16:57,875 --> 00:16:59,955 probably talk a little bit about, like, what 479 00:16:59,955 --> 00:17:02,455 kinds of models you can run 480 00:17:02,914 --> 00:17:05,494 because your mileage may vary here based on 481 00:17:05,795 --> 00:17:08,375 your your hardware and what's available to you, 482 00:17:08,650 --> 00:17:11,049 your your network bandwidths, and a couple other 483 00:17:11,049 --> 00:17:11,549 things. 484 00:17:15,450 --> 00:17:17,690 Do you feel overwhelmed by trying to manage 485 00:17:17,690 --> 00:17:19,929 your Office three sixty five environment? Are you 486 00:17:19,929 --> 00:17:23,230 facing unexpected issues that disrupt your company's productivity? 487 00:17:23,529 --> 00:17:25,474 Intelligink is here to help. Much like you 488 00:17:25,474 --> 00:17:27,394 take your car to the mechanic that has 489 00:17:27,394 --> 00:17:29,474 specialized knowledge on how to best keep your 490 00:17:29,474 --> 00:17:32,454 car running, Intelligink helps you with your Microsoft 491 00:17:32,515 --> 00:17:34,774 cloud environment because that's their expertise. 492 00:17:35,154 --> 00:17:37,470 Intelligink keeps up with the latest updates in 493 00:17:37,470 --> 00:17:39,630 the Microsoft cloud to help keep your business 494 00:17:39,630 --> 00:17:41,869 running smoothly and ahead of the curve. Whether 495 00:17:41,869 --> 00:17:43,869 you are a small organization with just a 496 00:17:43,869 --> 00:17:46,349 few users up to an organization of several 497 00:17:46,349 --> 00:17:47,329 thousand employees, 498 00:17:47,710 --> 00:17:49,710 they want to partner with you to implement 499 00:17:49,710 --> 00:17:52,450 and administer your Microsoft cloud technology. 500 00:17:53,204 --> 00:17:56,744 Visit them at inteliginc.com/podcast. 501 00:17:56,964 --> 00:18:03,704 That's intelligink.com/podcast 502 00:18:04,085 --> 00:18:06,244 for more information or to schedule a thirty 503 00:18:06,244 --> 00:18:08,240 minute call to get started with them today. 504 00:18:08,539 --> 00:18:11,900 Remember, Intelligink focuses on the Microsoft cloud so 505 00:18:11,900 --> 00:18:13,680 you can focus on your business. 506 00:18:15,820 --> 00:18:17,900 So talking hardware, do you wanna drive into 507 00:18:17,900 --> 00:18:20,545 hardware or models? Where should we go? It's 508 00:18:20,545 --> 00:18:21,924 kinda like a both conversation. 509 00:18:22,305 --> 00:18:23,744 So I think we can cover kind of 510 00:18:23,744 --> 00:18:24,725 the whole parameterization 511 00:18:25,505 --> 00:18:26,005 question 512 00:18:26,545 --> 00:18:29,105 and how big these things are to run 513 00:18:29,105 --> 00:18:29,605 locally 514 00:18:29,984 --> 00:18:31,605 along with some of the hardware 515 00:18:31,904 --> 00:18:32,404 constraints 516 00:18:32,865 --> 00:18:33,365 that 517 00:18:33,769 --> 00:18:35,929 come along with them. So when you think 518 00:18:35,929 --> 00:18:37,849 about the models that you can run, one 519 00:18:37,849 --> 00:18:39,849 of the first things that's gonna happen is 520 00:18:39,849 --> 00:18:42,269 you might go out and grab Ollama, 521 00:18:42,569 --> 00:18:45,129 you might grab LM Studio. You're you're gonna 522 00:18:45,129 --> 00:18:47,845 grab some system that's going to let you 523 00:18:47,845 --> 00:18:48,345 basically 524 00:18:48,884 --> 00:18:51,684 run that model and be able to run 525 00:18:51,684 --> 00:18:54,644 prompts against it. So though those models are 526 00:18:54,644 --> 00:18:55,704 gonna have different 527 00:18:56,005 --> 00:18:56,505 sizes, 528 00:18:57,044 --> 00:19:00,299 and those sizes equate back to parameters. So 529 00:19:00,460 --> 00:19:01,580 you're gonna go out and you're gonna see 530 00:19:01,580 --> 00:19:04,320 things like, oh, I wanna run llama three. 531 00:19:04,940 --> 00:19:06,960 And llama three might have, 532 00:19:07,660 --> 00:19:10,460 you know, a 7,000,000,000 parameter model. It might 533 00:19:10,460 --> 00:19:12,380 have a 300,000,000,000 534 00:19:12,380 --> 00:19:15,144 parameter model. It could have a 1,000,000,000 parameter. 535 00:19:15,144 --> 00:19:16,985 It could have something that's even smaller than 536 00:19:16,985 --> 00:19:19,144 that. So these things start to kind of 537 00:19:19,144 --> 00:19:21,384 become important. So if you're thinking about, like, 538 00:19:21,384 --> 00:19:23,865 parameters, number of parameters in a model, which 539 00:19:23,865 --> 00:19:25,404 is going to equate to 540 00:19:25,705 --> 00:19:26,924 kind of functionality 541 00:19:27,384 --> 00:19:28,445 within that model, 542 00:19:28,840 --> 00:19:31,400 In some place like a 7,000,000,000 parameter model, 543 00:19:31,400 --> 00:19:33,480 if you're looking at, like, LAMA two seven 544 00:19:33,480 --> 00:19:36,600 b, you're looking at Mistral seven b, like, 545 00:19:36,600 --> 00:19:38,519 those are pretty good starting points, and you 546 00:19:38,519 --> 00:19:41,160 don't need a super monster laptop or desktop 547 00:19:41,160 --> 00:19:43,184 to do it, just something decent. So if 548 00:19:43,184 --> 00:19:45,585 you have about 16 gigs of RAM and 549 00:19:45,585 --> 00:19:47,904 some CPU, you're good. Like, you don't need 550 00:19:47,904 --> 00:19:50,464 a dedicated GPU. You can absolutely do this 551 00:19:50,464 --> 00:19:51,444 stuff on CPU. 552 00:19:51,904 --> 00:19:54,144 I hesitate to say fast. It'll be fast 553 00:19:54,144 --> 00:19:56,150 ish. It might feel a little bit slow, 554 00:19:56,150 --> 00:19:58,069 like you'll see, like, the words typing out 555 00:19:58,069 --> 00:20:00,230 on screen, but that's okay. That that kind 556 00:20:00,230 --> 00:20:01,750 of equates to the experience that you might 557 00:20:01,750 --> 00:20:03,589 have in a chat GPT or or a 558 00:20:03,589 --> 00:20:05,369 Claude or things like that. 559 00:20:05,829 --> 00:20:07,450 But they're also super lightweight. 560 00:20:07,845 --> 00:20:10,565 So you you can get models that potentially 561 00:20:10,565 --> 00:20:12,404 when you download the model, they're measured in, 562 00:20:12,404 --> 00:20:13,704 like, hundreds of bags. 563 00:20:14,005 --> 00:20:15,845 Some are in the gigabyte range. Like, if 564 00:20:15,845 --> 00:20:17,224 you're in, like, a 7,000,000,000 565 00:20:17,444 --> 00:20:19,605 parameter model, you're talking about maybe, like, two 566 00:20:19,605 --> 00:20:22,390 to three gigs of downloading a quantized model 567 00:20:22,529 --> 00:20:25,569 and being able to track against it. And 568 00:20:25,569 --> 00:20:28,470 with 7,000,000,000 parameters, you'll probably find 569 00:20:28,769 --> 00:20:31,910 that they're good enough for most tasks, 570 00:20:32,529 --> 00:20:33,109 for most 571 00:20:33,464 --> 00:20:36,744 personal tasks. Hey. Summarize this for me. Hey. 572 00:20:36,744 --> 00:20:38,444 Give me a quick idea of this. 573 00:20:38,904 --> 00:20:41,544 Translate this to this. Like, those kinds of 574 00:20:41,544 --> 00:20:43,644 things, it's perfect. Hey. I wanna pump in 575 00:20:43,865 --> 00:20:46,424 the transcript from a YouTube video and have 576 00:20:46,424 --> 00:20:48,569 a local model summarize it for me. That's 577 00:20:48,569 --> 00:20:50,970 an awesome job for, like, a 3,000,000,007 578 00:20:50,970 --> 00:20:52,029 parameter model, 579 00:20:52,409 --> 00:20:54,169 things like that. You can get a little 580 00:20:54,169 --> 00:20:56,569 bit bigger, and a little bit bigger is 581 00:20:56,569 --> 00:20:58,809 typically gonna be in the something of, like, 582 00:20:58,809 --> 00:21:00,109 10 to 30,000,000,000 583 00:21:00,169 --> 00:21:01,230 parameter range. 584 00:21:01,755 --> 00:21:02,255 So 585 00:21:02,634 --> 00:21:04,815 now you're getting a little bit more honking. 586 00:21:04,954 --> 00:21:07,994 You're actually gonna need some GPU here, and 587 00:21:07,994 --> 00:21:10,234 you're probably gonna need more RAM as well. 588 00:21:10,234 --> 00:21:11,914 So, like, 16 gigs of RAM isn't gonna 589 00:21:11,914 --> 00:21:13,994 cut it. You're probably gonna need something closer 590 00:21:13,994 --> 00:21:15,615 to 32 gigs of RAM. 591 00:21:16,000 --> 00:21:18,480 You're gonna need some kind of GPU to 592 00:21:18,480 --> 00:21:19,380 drive that. 593 00:21:20,000 --> 00:21:21,359 You know, I think you could maybe get 594 00:21:21,359 --> 00:21:23,919 by on, like, an RTX thirty ninety or 595 00:21:23,919 --> 00:21:25,759 something like that. You'd probably wanna be in, 596 00:21:25,759 --> 00:21:27,519 like, a a a 40 series, like, a 597 00:21:27,519 --> 00:21:29,975 forty sixty, 40 70. Or if you're all 598 00:21:29,975 --> 00:21:31,174 on board and, like I said, you're a 599 00:21:31,174 --> 00:21:33,095 PC gamer and you've got that fifty ninety 600 00:21:33,095 --> 00:21:35,674 sitting in there, like, go ahead. Use it. 601 00:21:35,815 --> 00:21:37,575 It's ready to go. Nobody has the 50 602 00:21:37,575 --> 00:21:39,095 series. There were only, like, 10 of them 603 00:21:39,095 --> 00:21:40,695 produced and nobody could buy them. Well, and 604 00:21:40,695 --> 00:21:41,975 out of the 10 that were produced, 10 605 00:21:41,975 --> 00:21:43,894 out of 10 were broken, so the the 606 00:21:43,894 --> 00:21:46,049 yields are great. And melted power cables. Okay. 607 00:21:46,049 --> 00:21:48,210 Anyways, sidetracked. Yes. But you're gonna need one 608 00:21:48,210 --> 00:21:49,890 of those high end GPUs. Yeah. Well, you're 609 00:21:49,890 --> 00:21:51,730 gonna need a GPU. Like, I think the 610 00:21:51,730 --> 00:21:54,369 difference between, like, a 3,000,000,000, seven parameter model 611 00:21:54,369 --> 00:21:55,890 and then you get up to those, like, 612 00:21:55,890 --> 00:21:57,255 10 to 30 range 613 00:21:57,575 --> 00:21:59,494 is, do I need a GPU or do 614 00:21:59,494 --> 00:22:00,634 I not need a GPU? 615 00:22:00,934 --> 00:22:02,775 So you can do the smaller models just 616 00:22:02,775 --> 00:22:04,535 with CPU as long as you have enough 617 00:22:04,535 --> 00:22:06,934 RAM. At some point, you're gonna want GPU 618 00:22:06,934 --> 00:22:10,234 as well to go ahead and offload those. 619 00:22:10,460 --> 00:22:12,220 So if you're thinking like, hey, my use 620 00:22:12,220 --> 00:22:14,880 case for running a local LM is doing 621 00:22:15,019 --> 00:22:17,759 advanced coding, like, I'm I'm beyond, like, summarization, 622 00:22:17,900 --> 00:22:19,579 and I want this thing to help me 623 00:22:19,579 --> 00:22:20,319 write applications, 624 00:22:20,619 --> 00:22:23,924 PowerShell scripts, bash scripts, anything like that, you're 625 00:22:24,085 --> 00:22:26,244 probably gonna wanna be in that range where 626 00:22:26,244 --> 00:22:28,244 you've got a little bit more RAM and 627 00:22:28,244 --> 00:22:29,225 you've got a GPU, 628 00:22:29,765 --> 00:22:31,365 and then you kinda find the model that 629 00:22:31,365 --> 00:22:33,205 you like, and and that ends up being 630 00:22:33,205 --> 00:22:35,605 your sweet spot there. After that, you get 631 00:22:35,605 --> 00:22:37,700 into, like, the big, big models. So you're 632 00:22:37,700 --> 00:22:40,099 into, like, 65. I think, I was watching 633 00:22:40,099 --> 00:22:42,099 another NetworkChuck video. He ran one on a 634 00:22:42,099 --> 00:22:42,599 cluster 635 00:22:42,900 --> 00:22:44,980 of Those studios. I think it was the 636 00:22:44,980 --> 00:22:46,900 m one studios. It was, like, a cluster 637 00:22:46,900 --> 00:22:48,500 of, like, six of those where he was 638 00:22:48,500 --> 00:22:50,660 able to run, like, a 400,000,000,000 parameter model, 639 00:22:50,660 --> 00:22:52,599 but it was only able to output context 640 00:22:53,164 --> 00:22:54,544 you know, like one 641 00:22:55,085 --> 00:22:55,585 word, 642 00:22:55,884 --> 00:22:58,684 a second. Like, it's just so slow that 643 00:22:58,684 --> 00:23:01,005 it's that it's not actually useful. Right. So 644 00:23:01,005 --> 00:23:02,605 slow. A few times it looked like it 645 00:23:02,605 --> 00:23:04,625 even got stuck and, 646 00:23:05,005 --> 00:23:07,940 yeah, it was it was interesting. We'll put 647 00:23:07,940 --> 00:23:08,980 a link to that video in the show 648 00:23:08,980 --> 00:23:10,340 notes too. Yeah. So the way I think 649 00:23:10,340 --> 00:23:13,400 about that, the really big models, they're basically 650 00:23:13,940 --> 00:23:16,100 not there for, like, the faint of heart. 651 00:23:16,100 --> 00:23:17,460 They're there if you know what you're doing, 652 00:23:17,460 --> 00:23:19,619 if you've got the hardware to back it, 653 00:23:19,619 --> 00:23:20,440 both CPU, 654 00:23:20,980 --> 00:23:21,480 RAM, 655 00:23:22,394 --> 00:23:23,295 and and GPU. 656 00:23:23,755 --> 00:23:25,275 So if you think about it, like, there's 657 00:23:25,275 --> 00:23:26,394 kinda like a way that you can just 658 00:23:26,394 --> 00:23:27,755 break it down into a simple set of, 659 00:23:27,755 --> 00:23:29,755 like, pros and cons. So when you're sitting 660 00:23:29,755 --> 00:23:32,234 out there, you're in that, like, three, five, 661 00:23:32,234 --> 00:23:33,535 seven billion range, 662 00:23:34,075 --> 00:23:36,269 that's gonna be fast. You can do it 663 00:23:36,269 --> 00:23:37,649 on simple low hardware, 664 00:23:37,950 --> 00:23:39,470 or you can even do it on beefier 665 00:23:39,470 --> 00:23:41,069 hardware. Like in my case, like when I'm 666 00:23:41,069 --> 00:23:43,710 on my M1 Max, typically, I'm also running 667 00:23:43,710 --> 00:23:46,109 Windows in a virtual machine. So that's typically 668 00:23:46,109 --> 00:23:48,109 got half my RAM already. And then I've 669 00:23:48,109 --> 00:23:49,470 got a little bit of RAM that's going 670 00:23:49,470 --> 00:23:50,714 to the OS and things like that as 671 00:23:50,714 --> 00:23:52,474 well. So even if I could run a 672 00:23:52,474 --> 00:23:55,194 bigger model, I'm not going to because I'm 673 00:23:55,194 --> 00:23:57,755 still having resource contention and other things. Like, 674 00:23:57,755 --> 00:23:59,194 sometimes I don't wanna shut down my VM 675 00:23:59,194 --> 00:24:00,634 or I don't wanna shut down Versus Code 676 00:24:00,634 --> 00:24:02,154 because I'm I'm using those things. Right. You 677 00:24:02,154 --> 00:24:03,980 know, smaller models, fast, 678 00:24:04,359 --> 00:24:05,420 commodity hardware, 679 00:24:06,119 --> 00:24:07,500 good enough for 680 00:24:07,799 --> 00:24:11,000 easy tasks. Like, sum summarize that transcript for 681 00:24:11,000 --> 00:24:13,500 me thing, they're gonna be great for that. 682 00:24:13,559 --> 00:24:15,640 You get into that middle range, probably your 683 00:24:15,640 --> 00:24:16,920 sweet spot, like, if you do have a 684 00:24:16,920 --> 00:24:18,779 little GPU to drive these things, 685 00:24:19,125 --> 00:24:19,865 good accuracy, 686 00:24:20,244 --> 00:24:21,545 more context awareness, 687 00:24:22,164 --> 00:24:25,125 and kinda longer context windows. So as you're 688 00:24:25,125 --> 00:24:26,984 chatting with these things, they can remember, 689 00:24:27,285 --> 00:24:29,684 quote, unquote, big air quotes here. They can 690 00:24:29,684 --> 00:24:32,149 remember what you previously typed with them. So 691 00:24:32,149 --> 00:24:34,710 having bigger context windows and and more RAM 692 00:24:34,710 --> 00:24:36,869 and VRAM from your GPUs to host those 693 00:24:36,869 --> 00:24:39,529 context windows in becomes a little bit important. 694 00:24:39,750 --> 00:24:41,210 And then, like, if you're, 695 00:24:41,990 --> 00:24:43,829 you know, a monster gamer, you've got just 696 00:24:43,829 --> 00:24:45,190 a bunch of these things laying around and 697 00:24:45,190 --> 00:24:47,414 you wanna network them all together, it's super 698 00:24:47,414 --> 00:24:48,774 easy to do that too if you got 699 00:24:48,774 --> 00:24:50,154 enough hardware running around, 700 00:24:50,615 --> 00:24:52,075 and and you can go and, 701 00:24:52,855 --> 00:24:55,335 make that happen. So once you've kinda figured 702 00:24:55,335 --> 00:24:57,575 out your your hardware and you've got a 703 00:24:57,575 --> 00:24:59,014 sense for what you wanna do and what 704 00:24:59,014 --> 00:25:01,190 you're gonna be able to run locally, well, 705 00:25:01,250 --> 00:25:03,349 then you need a way to 706 00:25:03,730 --> 00:25:04,470 run these 707 00:25:04,849 --> 00:25:05,349 things 708 00:25:05,809 --> 00:25:08,289 locally, which, you know, it's not a little 709 00:25:08,289 --> 00:25:10,210 decision to make. Right. And another thing about 710 00:25:10,210 --> 00:25:12,529 the hardware that I found interesting watching the 711 00:25:12,529 --> 00:25:15,644 NetworkChuck videos as well was because we talked 712 00:25:15,644 --> 00:25:17,904 about the Macs, Macs have, like, that shared 713 00:25:17,965 --> 00:25:20,945 memory. They don't have dedicated video memory and 714 00:25:21,005 --> 00:25:23,404 system memory. So one thing he was doing 715 00:25:23,404 --> 00:25:25,325 was when he was running these models, like, 716 00:25:25,325 --> 00:25:28,200 all the memory was going to process the 717 00:25:28,200 --> 00:25:31,419 model because it doesn't have, like, those physical 718 00:25:31,480 --> 00:25:31,980 boundaries 719 00:25:32,279 --> 00:25:32,779 between 720 00:25:33,400 --> 00:25:35,000 physical and system memory. So I think that 721 00:25:35,000 --> 00:25:36,679 was another thing to watch out for. And 722 00:25:36,679 --> 00:25:38,539 the other thing, because you mentioned networking, 723 00:25:38,919 --> 00:25:40,759 he also found, like, running a 10 gig 724 00:25:40,759 --> 00:25:43,115 network. Something I didn't realize because I've never 725 00:25:43,115 --> 00:25:45,515 done this locally, how chatty these are if 726 00:25:45,515 --> 00:25:47,674 you're running a cluster over a network. Super 727 00:25:47,674 --> 00:25:50,815 chatty. He'd, like, saturated his 10 gig network, 728 00:25:51,115 --> 00:25:53,355 and that appeared I would say, I don't 729 00:25:53,355 --> 00:25:55,035 know that it was definitive in his videos, 730 00:25:55,035 --> 00:25:56,494 but appeared to be the bottleneck 731 00:25:56,849 --> 00:25:57,670 using these, 732 00:25:58,450 --> 00:25:59,429 clustered studios. 733 00:25:59,730 --> 00:26:02,609 So then he switched to Thunderbolt, which gave 734 00:26:02,609 --> 00:26:05,490 him a 40 gig network essentially. And even 735 00:26:05,490 --> 00:26:07,809 that, he managed to saturate, get a little 736 00:26:07,809 --> 00:26:09,829 bit more speed out of it using Thunderbolt 737 00:26:09,890 --> 00:26:12,194 as opposed to a 10 gig network. But 738 00:26:12,194 --> 00:26:14,534 if you do start thinking of clustering 739 00:26:14,994 --> 00:26:15,894 larger models, 740 00:26:16,274 --> 00:26:19,075 networking is also huge when it comes into 741 00:26:19,075 --> 00:26:20,994 the hardware for these things. I don't really 742 00:26:20,994 --> 00:26:23,234 get into the network model kind of thing. 743 00:26:23,234 --> 00:26:25,075 Like, I just don't have enough hardware running 744 00:26:25,075 --> 00:26:26,890 around here at home to do it. I 745 00:26:26,890 --> 00:26:29,130 I certainly think it's interesting if you can 746 00:26:29,130 --> 00:26:31,450 get there. So, yeah, we can kinda talk 747 00:26:31,450 --> 00:26:33,369 about that maybe with, like, more advanced stuff. 748 00:26:33,369 --> 00:26:35,609 Yes. So on to software. So you got 749 00:26:35,609 --> 00:26:38,009 your hardware. You got your software. I keep 750 00:26:38,009 --> 00:26:39,865 seeing you sent LM Studio. I've not looked 751 00:26:39,865 --> 00:26:41,865 at LM Studio. The one that always seems 752 00:26:41,865 --> 00:26:43,804 to pop up for me both in 753 00:26:44,105 --> 00:26:45,704 the Home Assistant as well as in a 754 00:26:45,704 --> 00:26:47,865 lot of the network check is Ollama for 755 00:26:47,865 --> 00:26:50,765 running these locally. Your decision here is 756 00:26:51,144 --> 00:26:53,144 how geeky do you wanna be and and 757 00:26:53,144 --> 00:26:54,204 what is your workflow? 758 00:26:54,679 --> 00:26:56,619 So if your primary workflow 759 00:26:57,079 --> 00:26:57,579 is 760 00:26:58,119 --> 00:26:59,259 you just want to 761 00:26:59,720 --> 00:27:02,119 chat with a a chatbot, like, you wanna 762 00:27:02,119 --> 00:27:03,960 hop right in, you wanna download a model, 763 00:27:03,960 --> 00:27:05,400 and you wanna be able to chat right 764 00:27:05,400 --> 00:27:07,480 away in, like, a nice GUI and a 765 00:27:07,480 --> 00:27:08,539 graphical interface, 766 00:27:08,924 --> 00:27:11,245 LM Studio is great for that. There's like, 767 00:27:11,245 --> 00:27:12,285 if you go out and you look this 768 00:27:12,285 --> 00:27:14,045 stuff up and you hop on Reddit or 769 00:27:14,045 --> 00:27:15,025 things like that, 770 00:27:15,485 --> 00:27:17,485 there there's going to be that set of 771 00:27:17,485 --> 00:27:19,825 folks out there who hate LM Studio 772 00:27:20,205 --> 00:27:22,369 because it's closed source, 773 00:27:22,670 --> 00:27:24,349 but, you know, I I'm just looking to 774 00:27:24,349 --> 00:27:26,190 play with these things. So for what I 775 00:27:26,190 --> 00:27:27,809 wanna do, it certainly 776 00:27:28,509 --> 00:27:29,009 works 777 00:27:29,309 --> 00:27:31,390 works great. Comes together, does what I need 778 00:27:31,390 --> 00:27:33,950 it to do. That said, you can also 779 00:27:33,950 --> 00:27:35,089 do Ollama, 780 00:27:35,695 --> 00:27:38,414 and Ollama is gonna be more command line 781 00:27:38,414 --> 00:27:40,734 driven, like you're gonna do more installations from 782 00:27:40,734 --> 00:27:42,335 the command line, you're even gonna download your 783 00:27:42,335 --> 00:27:44,355 models from the command line, so you're kinda 784 00:27:44,575 --> 00:27:46,975 trading off ease of use there. There's pros 785 00:27:46,975 --> 00:27:48,654 and cons to both depending on what you're 786 00:27:48,654 --> 00:27:50,639 doing. LM Studio is great if you just 787 00:27:50,639 --> 00:27:52,639 want to chat, you want to immediately have 788 00:27:52,639 --> 00:27:54,179 OpenAI spec ed endpoints 789 00:27:54,880 --> 00:27:57,039 exposed maybe to things like Versus Code locally, 790 00:27:57,039 --> 00:27:58,399 and you just don't want to wire anything 791 00:27:58,399 --> 00:27:59,919 up. You're looking for just like a one 792 00:27:59,919 --> 00:28:01,919 shot install, and you're going to be one 793 00:28:01,919 --> 00:28:03,244 and done. The other way you can do 794 00:28:03,244 --> 00:28:05,565 it is you can go to Ollama, and 795 00:28:05,565 --> 00:28:07,804 you can find your model that you wanna 796 00:28:07,804 --> 00:28:09,884 run on there. So, you know, I wanna 797 00:28:09,884 --> 00:28:12,524 run llama two seven billion, and you'll go 798 00:28:12,524 --> 00:28:14,764 download that, and you're gonna do all this 799 00:28:14,764 --> 00:28:16,919 from the command line. Now you wanna chat 800 00:28:16,919 --> 00:28:17,740 with that thing. 801 00:28:18,200 --> 00:28:18,700 Well, 802 00:28:19,000 --> 00:28:20,919 you can certainly chat with it from the 803 00:28:20,919 --> 00:28:23,480 command line. That that's totally a possibility. If 804 00:28:23,480 --> 00:28:25,659 if that's your jam or your jelly, awesome. 805 00:28:25,720 --> 00:28:27,880 Go for it. But if you want to 806 00:28:27,880 --> 00:28:29,559 chat with it in a GUI, now you 807 00:28:29,559 --> 00:28:31,424 gotta go install something else. Like, you might 808 00:28:31,424 --> 00:28:32,644 have to go install 809 00:28:33,184 --> 00:28:34,244 open web UI 810 00:28:34,704 --> 00:28:36,865 to to to get that piece going and 811 00:28:36,865 --> 00:28:38,944 and stand all that up. So it's not 812 00:28:38,944 --> 00:28:41,345 like it's hard to do. It's just your 813 00:28:41,345 --> 00:28:43,345 your flavor and and and where you sit 814 00:28:43,345 --> 00:28:44,565 and where you wanna land. 815 00:28:44,950 --> 00:28:46,309 You know, if I'm looking to just do 816 00:28:46,309 --> 00:28:48,150 things quickly and, like, I'm just in there 817 00:28:48,150 --> 00:28:50,549 to maybe, like, oh, hey. I see Microsoft 818 00:28:50,549 --> 00:28:52,970 released a new model for 05/04, 819 00:28:53,269 --> 00:28:55,590 and they they were they, you know, just 820 00:28:55,590 --> 00:28:57,590 pushed new models for 05/03 and '5 '4, 821 00:28:57,590 --> 00:28:59,690 and I I wanna compare those two things. 822 00:29:00,025 --> 00:29:02,265 I'll probably just spin those up in LM 823 00:29:02,265 --> 00:29:04,664 Studio. Super easy. Next, next, next my way 824 00:29:04,664 --> 00:29:06,025 through it. I don't have to remember a 825 00:29:06,025 --> 00:29:08,365 bunch of command line parameters, things like that. 826 00:29:08,424 --> 00:29:11,065 If I'm doing more like application development and 827 00:29:11,065 --> 00:29:13,059 I'm thinking about, like, hey. I want to 828 00:29:13,059 --> 00:29:14,340 stand this thing up. I wanna have it 829 00:29:14,340 --> 00:29:16,200 running in the background. I want some endpoints 830 00:29:16,259 --> 00:29:17,779 that are exposed. Maybe I can build, like, 831 00:29:17,779 --> 00:29:19,779 an app that's doing, like, some light rag 832 00:29:19,779 --> 00:29:21,700 or some fine tuning on top of it, 833 00:29:21,700 --> 00:29:23,460 and I've got, like, a Python script over 834 00:29:23,460 --> 00:29:24,920 here that needs to talk to the model. 835 00:29:25,059 --> 00:29:28,440 Awesome. Great. Like, that's that's where Ollama sits, 836 00:29:28,795 --> 00:29:31,595 and it has its space ready to go 837 00:29:31,595 --> 00:29:34,154 for you. So much like picking a model 838 00:29:34,154 --> 00:29:36,075 size, you're you're just doing a pros and 839 00:29:36,075 --> 00:29:37,355 cons and a little bit of a trade 840 00:29:37,355 --> 00:29:39,595 off thing. So Ollama, if you want a 841 00:29:39,595 --> 00:29:42,394 simple command line experience and you're comfortable at 842 00:29:42,394 --> 00:29:45,609 the terminal, go for it. Windows, macOS, Linux, 843 00:29:45,609 --> 00:29:47,930 it's all there. LM Studio, if you're not 844 00:29:47,930 --> 00:29:51,049 opposed to closed source and you just want 845 00:29:51,049 --> 00:29:52,809 a GUI from the start for all the 846 00:29:52,809 --> 00:29:55,390 things, for downloading, for chatting, for, 847 00:29:56,009 --> 00:29:58,934 all all that stuff. Again, macOS, Windows, Linux, 848 00:29:59,015 --> 00:30:01,335 ready to go. It's just closed source versus 849 00:30:01,335 --> 00:30:03,174 open source is really how I think about 850 00:30:03,174 --> 00:30:05,575 it. And then if you really do go 851 00:30:05,575 --> 00:30:08,055 down the Ollama path, you're probably gonna end 852 00:30:08,055 --> 00:30:09,735 up in a space where you wanna run 853 00:30:09,735 --> 00:30:12,089 a local chat UI, like a web based 854 00:30:12,490 --> 00:30:13,789 chatbot style thing, 855 00:30:14,169 --> 00:30:14,669 and 856 00:30:15,049 --> 00:30:17,609 then you'll just use something like Open Web 857 00:30:17,609 --> 00:30:19,849 UI for that. And, again, super easy to 858 00:30:19,849 --> 00:30:22,589 install. You're just basically hosting a little 859 00:30:22,970 --> 00:30:25,369 a little web server locally that knows how 860 00:30:25,369 --> 00:30:25,869 to 861 00:30:26,394 --> 00:30:29,454 chat with chat with that model. And then 862 00:30:29,674 --> 00:30:31,035 it could be a little bit different depending 863 00:30:31,035 --> 00:30:32,715 on like the extension tooling that you're going 864 00:30:32,715 --> 00:30:34,075 to use from there. So I talked about 865 00:30:34,075 --> 00:30:36,234 maybe like integrating Versus Code with one of 866 00:30:36,234 --> 00:30:36,974 these locally. 867 00:30:37,434 --> 00:30:38,974 So if you're doing 868 00:30:39,490 --> 00:30:41,730 Versus Code, you're gonna typically go grab an 869 00:30:41,730 --> 00:30:43,909 extension. So there's things like CodeGPT, 870 00:30:44,369 --> 00:30:45,909 there's continue dot dev, 871 00:30:46,210 --> 00:30:48,929 there's an Ollama extension, which can actually just 872 00:30:48,929 --> 00:30:50,869 talk natively to your Ollama endpoint. 873 00:30:51,329 --> 00:30:54,309 Or like I said, LM Studio exposes OpenAI 874 00:30:55,005 --> 00:30:55,505 compatible 875 00:30:56,045 --> 00:30:58,224 endpoints. So that's kind of a known, like, 876 00:30:58,845 --> 00:31:00,605 you know, web interface that you can throw 877 00:31:00,605 --> 00:31:02,464 a request at in a structured way, 878 00:31:02,765 --> 00:31:04,765 and it will respond in a in a 879 00:31:04,765 --> 00:31:06,464 way that most of the extensions 880 00:31:07,085 --> 00:31:09,265 are going to understand 881 00:31:09,644 --> 00:31:11,140 and get you ramped up for and ready 882 00:31:11,140 --> 00:31:13,240 to go with. Yeah, looking through this and 883 00:31:13,539 --> 00:31:14,900 most of the videos I saw, and again, 884 00:31:14,900 --> 00:31:17,539 were all Olamae, even the command line based 885 00:31:17,539 --> 00:31:18,839 looked really 886 00:31:19,299 --> 00:31:22,200 simple, lots of guides to just walk through, 887 00:31:22,579 --> 00:31:24,524 type this in, this is how you tie 888 00:31:24,524 --> 00:31:25,884 that in, this is how you go stand 889 00:31:25,884 --> 00:31:26,704 up the WebUI, 890 00:31:27,164 --> 00:31:29,825 point WebUI, to all of those. 891 00:31:30,204 --> 00:31:31,424 So none of this 892 00:31:31,964 --> 00:31:34,444 really seemed that complicated in everything I watched 893 00:31:34,444 --> 00:31:36,605 and, again, made me excited, like, I need 894 00:31:36,605 --> 00:31:38,980 to go try this out and go find 895 00:31:38,980 --> 00:31:40,279 a computer that I can 896 00:31:40,660 --> 00:31:43,220 absolutely bury with a model. See what I 897 00:31:43,220 --> 00:31:44,660 can do. See what damage I can do 898 00:31:44,660 --> 00:31:46,900 to my computer, Scott. It is not hard 899 00:31:46,900 --> 00:31:48,420 to do. So the other thing that you 900 00:31:48,420 --> 00:31:50,180 can do, if you're comfortable on the command 901 00:31:50,180 --> 00:31:52,954 line, there's another project out there that's called 902 00:31:52,954 --> 00:31:53,454 Fabric. 903 00:31:53,835 --> 00:31:55,775 So Fabric is kind of a 904 00:31:56,794 --> 00:31:59,454 it it allows you to easily network and 905 00:31:59,835 --> 00:32:02,075 distribute traffic across multiple nodes, but you can 906 00:32:02,075 --> 00:32:03,539 also do it on a single node. So 907 00:32:03,619 --> 00:32:05,299 So I was talking earlier about, like, that, 908 00:32:05,299 --> 00:32:07,940 you know, sermon summarization thing. Yep. And that's 909 00:32:07,940 --> 00:32:10,259 all based on Fabric. So Fabric, again, command 910 00:32:10,259 --> 00:32:12,660 line, it can run with local LLMs. It's 911 00:32:12,660 --> 00:32:13,720 a little kinda 912 00:32:14,100 --> 00:32:15,859 opaque for for how it does it. So, 913 00:32:15,859 --> 00:32:17,299 you know, make sure you download one of 914 00:32:17,299 --> 00:32:19,825 the the newer versions of it, And Fabric 915 00:32:19,825 --> 00:32:21,424 is all run from the command line as 916 00:32:21,424 --> 00:32:24,065 well. But then you can super easily integrate 917 00:32:24,065 --> 00:32:27,105 Fabric into things like bash scripts. So, like, 918 00:32:27,105 --> 00:32:29,505 I use it for the same thing. Like, 919 00:32:29,505 --> 00:32:31,264 if I think about the the podcast, I 920 00:32:31,264 --> 00:32:33,450 just have a bash script that runs Whispir 921 00:32:33,450 --> 00:32:35,230 locally. So Whispir is 922 00:32:36,089 --> 00:32:38,349 a speech to text model Yep. That OpenAI, 923 00:32:38,410 --> 00:32:39,929 and I can run that locally. Like, that 924 00:32:39,929 --> 00:32:41,929 runs on my hardware just fine. So I've 925 00:32:41,929 --> 00:32:43,529 just got a little bash script that takes 926 00:32:43,529 --> 00:32:45,884 that, generates the transcript, and then I just 927 00:32:45,964 --> 00:32:48,765 pipe the summaries out into Fabric to have 928 00:32:48,765 --> 00:32:49,585 those for myself 929 00:32:49,964 --> 00:32:51,644 in just my notes on the side. Right? 930 00:32:51,644 --> 00:32:53,164 Like, hey, here's the things we talked about 931 00:32:53,164 --> 00:32:55,825 and and how they're coming together. So 932 00:32:56,285 --> 00:32:58,684 very, very, very easy to get on with 933 00:32:58,684 --> 00:33:00,039 this stuff. And I think for most of 934 00:33:00,039 --> 00:33:01,559 our audience as well, like you folks are 935 00:33:01,559 --> 00:33:03,160 all comfortable on the command line. You don't 936 00:33:03,160 --> 00:33:04,680 need a GUI for this stuff. You can 937 00:33:04,680 --> 00:33:05,740 follow some instructions 938 00:33:06,119 --> 00:33:08,039 and wire these up. And we're not talking 939 00:33:08,039 --> 00:33:10,840 like super complicated things. We're basically talking the 940 00:33:10,840 --> 00:33:13,274 equivalent of like a brew or a chocolatey 941 00:33:13,274 --> 00:33:15,674 install or a Winget install, like just little 942 00:33:15,674 --> 00:33:17,194 one liners to get all this stuff up 943 00:33:17,194 --> 00:33:18,875 and running. Absolutely. You don't need to go 944 00:33:18,875 --> 00:33:21,115 write 50 line PowerShell scripts or pipe a 945 00:33:21,115 --> 00:33:24,174 bunch of things. It's really straightforward from everything 946 00:33:24,394 --> 00:33:26,474 I saw. Super easy to get up and 947 00:33:26,474 --> 00:33:28,559 going with that. I would say, like, the 948 00:33:28,559 --> 00:33:30,160 other thing you might wanna do a little 949 00:33:30,160 --> 00:33:30,980 bit is 950 00:33:31,440 --> 00:33:32,980 when you're exploring models. 951 00:33:33,440 --> 00:33:35,759 So if you go into, like, LM Studio 952 00:33:35,759 --> 00:33:37,920 and you're going through their model catalog or 953 00:33:37,920 --> 00:33:40,799 you're on, Ollama and you're exploring their model 954 00:33:40,799 --> 00:33:42,755 catalog, you might wanna just start with, like, 955 00:33:42,755 --> 00:33:45,474 some of the more popular ones to get 956 00:33:45,474 --> 00:33:47,954 up and running. So, you know, there there 957 00:33:47,954 --> 00:33:49,974 are differences between these things, 958 00:33:50,355 --> 00:33:52,115 you know, depending on what you're doing. Like, 959 00:33:52,115 --> 00:33:53,575 you can't go ask DeepSeek 960 00:33:54,119 --> 00:33:56,920 what happened in Tiananmen Square. Like, that is 961 00:33:56,920 --> 00:33:58,380 not programmed into that model, 962 00:33:58,839 --> 00:34:00,359 e even in the one that you you 963 00:34:00,359 --> 00:34:01,500 download and 964 00:34:01,799 --> 00:34:03,640 you run locally, but, you know, you can 965 00:34:03,640 --> 00:34:06,279 do that with, other stuff. So these models 966 00:34:06,279 --> 00:34:07,720 all vary. The other thing that you can 967 00:34:07,720 --> 00:34:09,000 do is you can go through the model 968 00:34:09,000 --> 00:34:09,500 catalogs, 969 00:34:09,855 --> 00:34:12,414 and you can find models that are purpose 970 00:34:12,414 --> 00:34:13,875 built for certain things. 971 00:34:14,335 --> 00:34:16,974 So there are models that are generated within 972 00:34:16,974 --> 00:34:19,215 these families. So you talk about, like, LAMA. 973 00:34:19,215 --> 00:34:21,215 There's gonna be versions of the LAMA model 974 00:34:21,215 --> 00:34:23,775 that are better for doing coding assistance things 975 00:34:23,775 --> 00:34:26,039 with it than there are for doing just 976 00:34:26,039 --> 00:34:28,460 straight one shot text summarization, 977 00:34:29,320 --> 00:34:30,619 stuff like that. So 978 00:34:30,920 --> 00:34:32,280 you you have to think through that a 979 00:34:32,280 --> 00:34:35,019 little bit too, like, just what's your workflow 980 00:34:35,480 --> 00:34:35,980 and 981 00:34:36,360 --> 00:34:38,300 what are you trying to 982 00:34:38,840 --> 00:34:39,579 get at 983 00:34:40,074 --> 00:34:41,054 along the way? 984 00:34:41,355 --> 00:34:42,954 And then be prepared for a little bit 985 00:34:42,954 --> 00:34:44,094 of latency 986 00:34:44,394 --> 00:34:46,315 and maybe differences in perf when you're running 987 00:34:46,315 --> 00:34:48,315 with these things. I think lots of people 988 00:34:48,315 --> 00:34:49,675 set out and they say, oh, I'm gonna 989 00:34:49,675 --> 00:34:50,875 be able to run that model locally, and 990 00:34:50,875 --> 00:34:52,315 it's gonna be so much faster because it 991 00:34:52,315 --> 00:34:53,594 doesn't need to go out and talk to 992 00:34:53,594 --> 00:34:55,430 the Internet. Like, it doesn't need to talk 993 00:34:55,430 --> 00:34:57,430 to Claude. It it it doesn't need to 994 00:34:57,430 --> 00:35:00,390 talk to chat GPT, anything like that. Yeah. 995 00:35:00,390 --> 00:35:03,110 Like, absolutely. You've eliminated the latency of that 996 00:35:03,110 --> 00:35:05,269 whole, like, request response thing having to traverse 997 00:35:05,269 --> 00:35:05,930 the Internet, 998 00:35:06,309 --> 00:35:08,309 but you still have to have the hardware 999 00:35:08,309 --> 00:35:10,150 that's capable of running this and standing it 1000 00:35:10,150 --> 00:35:12,224 all up. So you might wanna even, like, 1001 00:35:12,224 --> 00:35:13,744 play around before you integrate these things. Like, 1002 00:35:13,744 --> 00:35:15,764 if you're interested in, like, a coding workflow 1003 00:35:16,144 --> 00:35:18,304 with or integrating with Versus Code, things like 1004 00:35:18,304 --> 00:35:20,065 that, you'll probably wanna play around with the 1005 00:35:20,065 --> 00:35:21,744 the models a little bit locally to find 1006 00:35:21,744 --> 00:35:23,424 the one that's got the the sweet spot 1007 00:35:23,424 --> 00:35:24,724 for you based on 1008 00:35:25,089 --> 00:35:27,409 number of parameters, your hardware, things like that 1009 00:35:27,409 --> 00:35:29,089 before you go down the path of integrating 1010 00:35:29,089 --> 00:35:30,869 it in Versus Code and then being disappointed 1011 00:35:30,929 --> 00:35:32,929 that it's too slow or or things like 1012 00:35:32,929 --> 00:35:34,690 that. There's a lot of blogs out there 1013 00:35:34,690 --> 00:35:37,010 that'll just tell you, like, oh, running AI 1014 00:35:37,010 --> 00:35:39,889 locally, like, it's super fast. It's it's super 1015 00:35:39,889 --> 00:35:41,775 easy. It is super easy. It's not always 1016 00:35:41,775 --> 00:35:43,295 super fast. So you so you do have 1017 00:35:43,295 --> 00:35:44,815 to be prepared for that depending on your 1018 00:35:44,815 --> 00:35:47,135 hardware. Yeah. Along with the model, Scott, this 1019 00:35:47,135 --> 00:35:49,214 is another thing again, being fairly new to 1020 00:35:49,214 --> 00:35:50,355 this, have you 1021 00:35:50,734 --> 00:35:52,974 compared at all? Because another thing you can 1022 00:35:52,974 --> 00:35:55,139 run into is quantization of these models. Right? 1023 00:35:55,139 --> 00:35:57,299 And this is something else Network Chuck talked 1024 00:35:57,299 --> 00:35:59,139 about in one of his where some of 1025 00:35:59,139 --> 00:36:01,799 these larger models, they quantize. 1026 00:36:02,260 --> 00:36:03,699 I don't know if that's the word. They 1027 00:36:03,699 --> 00:36:06,579 quantize them down, and it sounds like it's 1028 00:36:06,579 --> 00:36:07,719 essentially taking 1029 00:36:08,184 --> 00:36:10,444 different aspects of the model. And inside 1030 00:36:10,904 --> 00:36:14,025 models, they have model weights with, like, 32 1031 00:36:14,025 --> 00:36:16,424 bit precision, and they reduce these down to 1032 00:36:16,424 --> 00:36:18,444 eight bit, four bit precision, 1033 00:36:18,904 --> 00:36:20,984 which makes them not as accurate but makes 1034 00:36:20,984 --> 00:36:23,644 them smaller so you can run a 1035 00:36:24,239 --> 00:36:26,320 larger model. Some of those bigger ones we 1036 00:36:26,320 --> 00:36:28,179 talked about like 65,000,000,000 1037 00:36:28,639 --> 00:36:29,460 plus parameters 1038 00:36:30,159 --> 00:36:31,300 on less hardware, 1039 00:36:31,920 --> 00:36:33,380 but with more 1040 00:36:34,239 --> 00:36:35,219 not the accuracy 1041 00:36:35,679 --> 00:36:38,420 versus running maybe a model with less parameters, 1042 00:36:38,719 --> 00:36:40,474 but you get the full 1043 00:36:40,855 --> 00:36:42,695 the full model weights in there where you're 1044 00:36:42,695 --> 00:36:45,195 running the 32 bit precision instead of quantasize 1045 00:36:45,255 --> 00:36:47,894 them down. Again, when you're downloading models, definitely 1046 00:36:47,894 --> 00:36:50,215 something to watch out for because if these 1047 00:36:50,215 --> 00:36:51,434 are quantasized 1048 00:36:52,630 --> 00:36:54,389 and they have smaller, they can be less 1049 00:36:54,389 --> 00:36:56,389 accurate, you can run them. Like, have you 1050 00:36:56,389 --> 00:36:58,789 ever compared those of let's go run a 1051 00:36:58,789 --> 00:37:03,050 30,000,000,000 parameter model on local hardware versus a, 1052 00:37:03,829 --> 00:37:05,050 65,000,000,000 1053 00:37:05,109 --> 00:37:08,974 model or parameter model that's quantized down to 1054 00:37:08,974 --> 00:37:10,735 eight bit instead of 32 bit? I don't 1055 00:37:10,735 --> 00:37:13,295 think many folks are running 32 bit. Most 1056 00:37:13,295 --> 00:37:14,595 are probably running 1057 00:37:15,135 --> 00:37:17,855 Four bit. Some kind of like well, something 1058 00:37:17,855 --> 00:37:21,329 like 16 or lower, so like four, eight, 1059 00:37:21,730 --> 00:37:22,849 16. I think when you go out and, 1060 00:37:22,849 --> 00:37:24,710 like, you watch a lot of YouTube videos 1061 00:37:24,849 --> 00:37:25,590 and and, 1062 00:37:25,969 --> 00:37:27,250 you know, if if you do go down 1063 00:37:27,250 --> 00:37:28,369 this path and you start getting into it, 1064 00:37:28,369 --> 00:37:29,890 I think YouTube is a great place to 1065 00:37:29,890 --> 00:37:31,730 go to and start to see. You'll see 1066 00:37:31,730 --> 00:37:34,469 lots of people playing around with massive models, 1067 00:37:35,025 --> 00:37:36,484 but with a 1068 00:37:37,025 --> 00:37:39,105 like, only, like, four bits. Right. So they're 1069 00:37:39,105 --> 00:37:40,864 doing that just so they can run it, 1070 00:37:40,864 --> 00:37:42,704 not so they can run it effectively to 1071 00:37:42,704 --> 00:37:44,704 drive a workflow. Like, they're just trying to 1072 00:37:44,704 --> 00:37:46,304 try it out and see how many tokens 1073 00:37:46,304 --> 00:37:47,744 a second they can get out of it 1074 00:37:47,744 --> 00:37:49,599 or something like that. So a four bit 1075 00:37:49,599 --> 00:37:50,820 model is 1076 00:37:51,440 --> 00:37:52,980 absolutely going to 1077 00:37:53,280 --> 00:37:56,019 run on, like, consumer grade GPUs, CPUs. 1078 00:37:56,800 --> 00:37:58,320 Like, you're gonna be all good, ready to 1079 00:37:58,320 --> 00:38:00,260 go there, but you have to know that 1080 00:38:00,320 --> 00:38:03,295 it's been extremely compressed. So it can get 1081 00:38:03,295 --> 00:38:05,394 it down to a smaller download size, 1082 00:38:05,695 --> 00:38:08,574 and thus, it's going to take less memory 1083 00:38:08,574 --> 00:38:10,974 and less processing power to go ahead and 1084 00:38:10,974 --> 00:38:13,235 run it. So you might be running like, 1085 00:38:13,295 --> 00:38:14,894 you know, like if I think about, like, 1086 00:38:14,894 --> 00:38:17,074 the transformer that's running in iOS, 1087 00:38:17,489 --> 00:38:18,389 that's probably 1088 00:38:19,090 --> 00:38:20,769 a a a four bit model. Right? Like, 1089 00:38:20,769 --> 00:38:22,849 it's sitting there. It's running on commodity hardware 1090 00:38:22,929 --> 00:38:24,690 Right. And it's just doing what it needs 1091 00:38:24,690 --> 00:38:27,269 to do. Now, if I'm on my desktop 1092 00:38:27,409 --> 00:38:29,110 or or my m one 1093 00:38:29,565 --> 00:38:31,585 MacBook, you know, I might be thinking about 1094 00:38:31,964 --> 00:38:33,964 an eight bit model, and I'm okay with 1095 00:38:33,964 --> 00:38:36,045 the performance trade off. Like, I'm I'm okay 1096 00:38:36,045 --> 00:38:37,804 if it chats with me at, like, you 1097 00:38:37,804 --> 00:38:40,364 know, like, two tokens a second kinda thing. 1098 00:38:40,364 --> 00:38:42,659 Like, it can be super slow. It's it's 1099 00:38:42,659 --> 00:38:44,500 okay. But you're not gonna run these, like, 1100 00:38:44,500 --> 00:38:48,039 massive models because those are absolutely running in 1101 00:38:48,099 --> 00:38:48,599 those 1102 00:38:49,059 --> 00:38:51,619 massive data centers and and that set of 1103 00:38:51,619 --> 00:38:54,099 infrastructure. Like, I I just wanna be clear. 1104 00:38:54,099 --> 00:38:56,339 Like, you can't do the things that, like, 1105 00:38:56,339 --> 00:38:56,839 ChatGPT 1106 00:38:57,139 --> 00:38:58,875 can do with, like, o one running in 1107 00:38:58,875 --> 00:38:59,695 their data center 1108 00:38:59,994 --> 00:39:01,755 locally at your house. Like, that's just not 1109 00:39:01,755 --> 00:39:03,434 the way these things work. It's it's not 1110 00:39:03,434 --> 00:39:05,114 how they come together. So if you think 1111 00:39:05,114 --> 00:39:07,675 about, like, the the whole quantization thing, it's 1112 00:39:07,675 --> 00:39:08,414 all about 1113 00:39:08,715 --> 00:39:11,820 packing things down and basically, like, archiving them, 1114 00:39:11,820 --> 00:39:13,420 right? Put a tar or zip together of 1115 00:39:13,420 --> 00:39:13,920 this 1116 00:39:14,220 --> 00:39:17,039 thing and reduce the size, reduce the computational 1117 00:39:17,340 --> 00:39:17,840 requirements, 1118 00:39:18,780 --> 00:39:21,019 all that kind of stuff. So you're going 1119 00:39:21,019 --> 00:39:23,180 to get small models. Hey, that's great. They're 1120 00:39:23,180 --> 00:39:24,480 going to use less memory, 1121 00:39:24,974 --> 00:39:26,414 and you might be able to run a 1122 00:39:26,414 --> 00:39:28,655 larger model. Like, you could run a four 1123 00:39:28,655 --> 00:39:31,934 bit, you know, 30,000,000,000 parameter model, but it's 1124 00:39:31,934 --> 00:39:34,575 gonna be less accurate. And is accuracy important 1125 00:39:34,575 --> 00:39:36,255 to you? Well, you might wanna go to, 1126 00:39:36,255 --> 00:39:39,075 like, an eight bit like, 7,000,000,000 parameter model, 1127 00:39:39,690 --> 00:39:41,369 something like that. So it it's gonna be 1128 00:39:41,369 --> 00:39:43,150 very dependent on, like, your workflow 1129 00:39:43,609 --> 00:39:44,109 and 1130 00:39:44,570 --> 00:39:45,070 your 1131 00:39:45,449 --> 00:39:47,150 use case for these things. 1132 00:39:47,449 --> 00:39:49,130 I think the biggest thing you miss out 1133 00:39:49,130 --> 00:39:50,510 on is accuracy. 1134 00:39:51,210 --> 00:39:53,164 So, you know, like, if I'm summarizing 1135 00:39:53,625 --> 00:39:55,864 the podcast transcripts, I want those to be 1136 00:39:55,864 --> 00:39:57,465 kind of accurate. Like, I I don't want 1137 00:39:57,465 --> 00:39:59,305 them to just be hallucinating all over the 1138 00:39:59,305 --> 00:39:59,805 place. 1139 00:40:00,344 --> 00:40:00,844 But, 1140 00:40:01,305 --> 00:40:04,344 you know, if I'm doing something else, like, 1141 00:40:04,344 --> 00:40:06,744 hey. Help me write a poem about, you 1142 00:40:06,744 --> 00:40:09,309 know, iPads. Like, whatever. Do it with all 1143 00:40:09,309 --> 00:40:10,210 the least accuracy 1144 00:40:10,829 --> 00:40:12,289 that you want out there 1145 00:40:12,750 --> 00:40:14,829 along the way. I think the most common 1146 00:40:14,829 --> 00:40:16,269 thing, like so the other thing you run 1147 00:40:16,269 --> 00:40:17,170 into with quantization 1148 00:40:17,710 --> 00:40:18,210 is 1149 00:40:18,589 --> 00:40:20,510 there there's a bunch of different methods for 1150 00:40:20,510 --> 00:40:21,764 this. So 1151 00:40:22,144 --> 00:40:22,644 there's 1152 00:40:23,025 --> 00:40:23,525 Q, 1153 00:40:23,985 --> 00:40:26,324 which is basically like four bit quantization. 1154 00:40:27,105 --> 00:40:28,885 There's another format called 1155 00:40:29,344 --> 00:40:32,304 g g u f. So that's kind of, 1156 00:40:32,304 --> 00:40:34,724 like, the standard for running these things 1157 00:40:35,099 --> 00:40:36,539 efficiently. So you'll see a lot of these 1158 00:40:36,539 --> 00:40:38,140 things when you go in like, what's the 1159 00:40:38,140 --> 00:40:40,380 format of the model? Oh, it's a g 1160 00:40:40,380 --> 00:40:41,660 g u f. I don't even know how 1161 00:40:41,660 --> 00:40:42,239 it's pronounced. 1162 00:40:42,539 --> 00:40:44,300 But, you know, you can go in and 1163 00:40:44,300 --> 00:40:45,440 and grab those things 1164 00:40:46,140 --> 00:40:48,174 and and figure those out. So you can 1165 00:40:48,174 --> 00:40:50,434 think of, like, quantization maybe as, like, another 1166 00:40:50,974 --> 00:40:52,815 weight that you can put on that scale 1167 00:40:52,815 --> 00:40:54,914 when you're trying to find that balance between 1168 00:40:55,534 --> 00:40:58,114 model size, parameter count, quantization, 1169 00:40:58,574 --> 00:41:00,809 and the hardware that you run and the 1170 00:41:00,809 --> 00:41:02,489 workload that you wanna do. So how does 1171 00:41:02,489 --> 00:41:04,090 that scale tip and where do you wanna 1172 00:41:04,090 --> 00:41:06,650 land? It just becomes an another consideration in 1173 00:41:06,650 --> 00:41:09,610 there for you. Sounds good. Anything else before 1174 00:41:09,610 --> 00:41:12,090 wrapping this episode up? So a couple things. 1175 00:41:12,090 --> 00:41:13,930 If folks haven't done this yet, like Go 1176 00:41:13,930 --> 00:41:16,074 do it. You should totally go out and 1177 00:41:16,074 --> 00:41:18,315 just try and play around with Ollama LM 1178 00:41:18,315 --> 00:41:18,815 Studio. 1179 00:41:19,114 --> 00:41:20,795 If you're already doing it today, come back 1180 00:41:20,795 --> 00:41:22,315 and give us some feedback. Let let us 1181 00:41:22,315 --> 00:41:24,635 know what you're using it for. I think 1182 00:41:24,635 --> 00:41:27,594 there's all sorts of interesting use cases for 1183 00:41:27,594 --> 00:41:29,880 this stuff. We're just getting Ben started on 1184 00:41:29,880 --> 00:41:31,800 his list. Let's make his list a lot 1185 00:41:31,800 --> 00:41:34,519 longer for things that he is missing out 1186 00:41:34,519 --> 00:41:36,679 in his life. Home assistant and AI. He 1187 00:41:36,679 --> 00:41:38,539 needs to do to run a 1188 00:41:38,920 --> 00:41:40,839 chat model locally. And then if you're doing 1189 00:41:40,839 --> 00:41:42,519 other things besides chat models, like I said, 1190 00:41:42,519 --> 00:41:45,265 there's the stable diffusions of the world, there's 1191 00:41:45,265 --> 00:41:48,065 image generation, there's whisper, there's all these other 1192 00:41:48,065 --> 00:41:50,625 things out there. I was very surprised at 1193 00:41:50,625 --> 00:41:52,704 how approachable they are. I always thought this 1194 00:41:52,704 --> 00:41:54,885 was going to be like mystical dark arts 1195 00:41:55,025 --> 00:41:57,184 and magic and not for mere mortals kind 1196 00:41:57,184 --> 00:41:59,400 of thing. It's very much for mere mortals, 1197 00:41:59,480 --> 00:42:01,799 Like, super easy to get started with, super 1198 00:42:01,799 --> 00:42:02,299 turnkey, 1199 00:42:02,679 --> 00:42:04,839 and I would guarantee that almost anybody who 1200 00:42:04,839 --> 00:42:06,920 listens to this podcast probably has the hardware 1201 00:42:06,920 --> 00:42:08,359 to run this stuff and make it happen. 1202 00:42:08,359 --> 00:42:10,940 I'm actually excited to go try this out 1203 00:42:11,000 --> 00:42:13,005 and play around with it. I did find 1204 00:42:13,005 --> 00:42:15,085 an article too on a Raspberry Pi cluster 1205 00:42:15,085 --> 00:42:16,364 for AI. I don't know if I'm gonna 1206 00:42:16,364 --> 00:42:18,204 try that or use an extra Mac mini 1207 00:42:18,204 --> 00:42:20,385 I have sitting around here to start, but 1208 00:42:20,444 --> 00:42:22,045 I, like you, I would love to hear 1209 00:42:22,045 --> 00:42:23,404 what other people are doing. If you are 1210 00:42:23,404 --> 00:42:25,404 running them locally, what are you using them 1211 00:42:25,404 --> 00:42:26,144 for locally, 1212 00:42:26,764 --> 00:42:27,824 different use cases, 1213 00:42:28,364 --> 00:42:29,699 where have you found a good place to 1214 00:42:29,699 --> 00:42:31,000 start, all the things. 1215 00:42:31,300 --> 00:42:33,380 So if you do want to join us 1216 00:42:33,380 --> 00:42:34,679 and discuss these things, 1217 00:42:34,980 --> 00:42:37,059 we need to redo our outro, Scott, because 1218 00:42:37,059 --> 00:42:38,659 I think that has changed. I think we 1219 00:42:38,659 --> 00:42:40,659 actually still have Twitter in it. Let's not 1220 00:42:40,659 --> 00:42:43,414 say Twitter. Let's say probably Blue Sky. Are 1221 00:42:43,414 --> 00:42:44,855 you more active on Blue Sky right now 1222 00:42:44,855 --> 00:42:46,375 than any other one? Pick one anyway. Anyone 1223 00:42:46,375 --> 00:42:48,054 that's not Twitter, you can find Scott on, 1224 00:42:48,054 --> 00:42:49,494 except that I can never find you on 1225 00:42:49,494 --> 00:42:51,655 Blue Sky because you chose a weird handle 1226 00:42:51,655 --> 00:42:53,655 that isn't the same as any of your 1227 00:42:53,655 --> 00:42:55,894 other social media. You need to go grab 1228 00:42:55,894 --> 00:42:57,494 a new handle on Blue Sky that matches 1229 00:42:57,494 --> 00:42:59,789 everything else. I would say Blue Sky is 1230 00:42:59,789 --> 00:43:01,650 probably where I'm the most active 1231 00:43:02,269 --> 00:43:04,190 as of late and where I feel like 1232 00:43:04,190 --> 00:43:04,849 the biggest 1233 00:43:05,309 --> 00:43:06,849 tech community has 1234 00:43:07,309 --> 00:43:09,630 moved to. So go chat with us on 1235 00:43:09,630 --> 00:43:11,789 Blue Sky. LinkedIn is another good one. I'm 1236 00:43:11,789 --> 00:43:12,769 always on LinkedIn. 1237 00:43:13,085 --> 00:43:15,005 So if you wanna chat, give us feedback 1238 00:43:15,005 --> 00:43:15,744 on LinkedIn, 1239 00:43:16,045 --> 00:43:17,484 you can do that. If you wanna sign 1240 00:43:17,484 --> 00:43:19,085 up for membership, we still have our membership 1241 00:43:19,085 --> 00:43:21,744 at mscloud, I t pro Com / membership. 1242 00:43:22,204 --> 00:43:23,264 Todd's in 1243 00:43:23,565 --> 00:43:25,644 Discord today. He got a new laptop that 1244 00:43:25,644 --> 00:43:27,424 he's gonna go try to run some LLMs 1245 00:43:27,484 --> 00:43:29,460 on. So if you wanna join us, chat 1246 00:43:29,460 --> 00:43:31,380 with us during the recording. You can go 1247 00:43:31,380 --> 00:43:32,359 check out our membership 1248 00:43:32,659 --> 00:43:33,159 options 1249 00:43:33,940 --> 00:43:36,179 there as well and join us in Discord 1250 00:43:36,179 --> 00:43:37,480 for these. So 1251 00:43:37,940 --> 00:43:40,260 looking forward to hearing from people how you 1252 00:43:40,260 --> 00:43:42,359 use LLMs, what you're gonna do with LLMs, 1253 00:43:42,885 --> 00:43:45,844 and how they run locally. Who can bury 1254 00:43:45,844 --> 00:43:47,784 their computer first and 1255 00:43:48,405 --> 00:43:50,744 crash it? Super easy to do. Yeah. 1256 00:43:51,204 --> 00:43:53,364 Anything else? I think that's it. As always, 1257 00:43:53,364 --> 00:43:55,710 thanks, Ben. Alright. Thank you, Scott. We will 1258 00:43:55,710 --> 00:43:56,690 talk to you later. 1259 00:43:58,670 --> 00:44:00,909 If you enjoyed the podcast, go leave us 1260 00:44:00,909 --> 00:44:03,150 a five star rating in iTunes. It helps 1261 00:44:03,150 --> 00:44:04,829 to get the word out so more IT 1262 00:44:04,829 --> 00:44:06,989 pros can learn about Office three sixty five 1263 00:44:06,989 --> 00:44:07,650 and Azure. 1264 00:44:08,190 --> 00:44:09,855 If you have any questions you want us 1265 00:44:09,855 --> 00:44:12,014 to address on the show, or feedback about 1266 00:44:12,014 --> 00:44:14,414 the show, feel free to reach out via 1267 00:44:14,414 --> 00:44:16,514 our website, Twitter, or Facebook. 1268 00:44:16,815 --> 00:44:18,735 Thanks again for listening, and have a great 1269 00:44:18,735 --> 00:44:19,235 day.

Digital Dispatch Podcast Podcast Artwork Image

Microsoft Cloud IT Pro Podcast

Ben Stegink, Scott Hoag

On the MS Cloud IT Pro Podcast, Scott and Ben discuss the Microsoft Cloud with a focus on IT Pros. They'll discuss the latest in Microsoft 365 and Office 365 News, Azure news and talk about their experiences with managing the Microsoft Cloud as well as interview industry experts on various cloud technology. They'll cover things such as SharePoint, Exchange, Microsoft Teams, PowerShell, Azure, Azure AD, Security, Networking, Storage, and the many other technologies and products that have made their way into the Microsoft 365 suite and Azure. To stay up-to-date on the latest in Microsoft Cloud news and gain some valuable knowledge as you deploy it within your own organization, make sure to tune in every week! Find out more at msclouditpropodcast.com.

Contact Show