1
00:00:03,520 --> 00:00:06,080
Welcome to episode 397
2
00:00:06,080 --> 00:00:09,279
of the Microsoft Cloud IT Pro podcast recorded
3
00:00:09,279 --> 00:00:11,939
live on 03/10/2025.
4
00:00:12,160 --> 00:00:14,480
This is a show about Microsoft three sixty
5
00:00:14,480 --> 00:00:16,554
five and Azure from the perspective of IT
6
00:00:16,554 --> 00:00:18,714
pros and end users, where we discuss the
7
00:00:18,714 --> 00:00:20,875
topic or recent news and how it relates
8
00:00:20,875 --> 00:00:23,114
to you. We've been talking a lot about
9
00:00:23,114 --> 00:00:24,494
AI recently, particularly
10
00:00:24,954 --> 00:00:26,175
Microsoft Copilots.
11
00:00:26,875 --> 00:00:28,714
But what if you want to play around
12
00:00:28,714 --> 00:00:31,960
with AI outside of Copilot or chat GPT
13
00:00:31,960 --> 00:00:34,600
or any other hosted AI tool? In today's
14
00:00:34,600 --> 00:00:35,100
episode,
15
00:00:35,479 --> 00:00:37,559
Scott and Ben dive into the world of
16
00:00:37,559 --> 00:00:38,539
local LLMs,
17
00:00:39,000 --> 00:00:41,659
large language models, that run entirely
18
00:00:42,039 --> 00:00:44,295
on your device. We look at what models
19
00:00:44,295 --> 00:00:45,975
you can run, how you can integrate them
20
00:00:45,975 --> 00:00:47,914
into your workflow, and more.
21
00:00:49,975 --> 00:00:52,375
Oh, Scott. Here we are back in the
22
00:00:52,375 --> 00:00:55,495
stormy South. Stormy South. It has been stormy,
23
00:00:55,495 --> 00:00:57,539
but it's bright and sunny now. So I'll
24
00:00:57,539 --> 00:00:58,579
take it while I can get it. I
25
00:00:58,579 --> 00:01:00,579
don't have anything to go with Nordic. From
26
00:01:00,579 --> 00:01:02,740
the Nordic North, I'm back to the stormy
27
00:01:02,740 --> 00:01:03,240
South.
28
00:01:04,579 --> 00:01:06,659
From sea to shining sea and everything in
29
00:01:06,659 --> 00:01:08,260
between the seas? As long as you count
30
00:01:08,260 --> 00:01:10,005
Lake Michigan as the sea, which if you're
31
00:01:10,005 --> 00:01:12,084
from Michigan, you do. Like, East Coast, West
32
00:01:12,084 --> 00:01:13,924
Coast in Michigan are Lake Michigan and Lake
33
00:01:13,924 --> 00:01:16,245
Huron. We don't really count oceans in Michigan.
34
00:01:16,245 --> 00:01:18,084
Some of those lakes are kinda big, so,
35
00:01:18,484 --> 00:01:20,084
you might even say they're great. They could
36
00:01:20,084 --> 00:01:21,704
be great. It's always interesting.
37
00:01:22,005 --> 00:01:24,165
Side topic, coming down to Florida and talking
38
00:01:24,165 --> 00:01:26,549
to people about lakes and being from Michigan
39
00:01:26,549 --> 00:01:27,370
and Lake Michigan
40
00:01:27,750 --> 00:01:29,530
and Lake Superior and
41
00:01:29,829 --> 00:01:31,109
they're like, but it's a lake. And I'm
42
00:01:31,109 --> 00:01:32,790
like, yeah, but you can't see across it.
43
00:01:32,790 --> 00:01:34,629
So it kinda looks like an ocean when
44
00:01:34,629 --> 00:01:36,549
you're standing on the shore, and we get
45
00:01:36,549 --> 00:01:38,134
waves that are like, well, I think the
46
00:01:38,134 --> 00:01:40,614
biggest waves I've ever recorded in Lake Michigan
47
00:01:40,614 --> 00:01:42,774
were like 25, 20 six feet, and Lake
48
00:01:42,774 --> 00:01:44,774
Superior was up to 32
49
00:01:44,774 --> 00:01:46,854
foot waves. It's like these are not just
50
00:01:46,854 --> 00:01:49,414
like little lakes. These are massive bodies of
51
00:01:49,414 --> 00:01:49,914
water.
52
00:01:50,900 --> 00:01:53,159
They're really, really big ponds, you know.
53
00:01:53,700 --> 00:01:55,060
Joshua Foer: So now we're going across the
54
00:01:55,060 --> 00:01:56,579
pond and that means across Lake Michigan. Joshua
55
00:01:56,579 --> 00:01:58,019
Foer: An LLM what it thinks. Joshua Foer:
56
00:01:58,019 --> 00:02:00,500
We should ask an LLM because we're going
57
00:02:00,500 --> 00:02:02,099
to talk about LLMs. Joshua Foer: We're back
58
00:02:02,099 --> 00:02:04,579
to those things again. So I wanted to
59
00:02:04,579 --> 00:02:05,640
have a chat today
60
00:02:06,019 --> 00:02:06,519
about
61
00:02:07,034 --> 00:02:07,534
LLMs
62
00:02:07,994 --> 00:02:08,495
and
63
00:02:08,955 --> 00:02:11,514
running them locally. Like, I've been doing this
64
00:02:11,514 --> 00:02:13,675
more and more, and I think there's an
65
00:02:13,675 --> 00:02:16,155
interesting set of use cases and workflows. And
66
00:02:16,155 --> 00:02:17,614
I was having a chat with you,
67
00:02:17,914 --> 00:02:20,340
and this isn't something that you do in
68
00:02:20,340 --> 00:02:22,099
your kinda day today from what it sounds
69
00:02:22,099 --> 00:02:23,699
like, but maybe I can, like, get you
70
00:02:23,699 --> 00:02:25,219
in and and and hook you in a
71
00:02:25,219 --> 00:02:27,060
little bit along the way. Oh, you've already
72
00:02:27,060 --> 00:02:28,340
got me hooked. You sent me a few
73
00:02:28,340 --> 00:02:30,340
YouTube videos, and I started watching it, and
74
00:02:30,340 --> 00:02:31,639
the wheels started clicking.
75
00:02:31,955 --> 00:02:34,194
And I have one of the browser tabs
76
00:02:34,194 --> 00:02:35,634
up here. We'll put a link to it,
77
00:02:35,634 --> 00:02:37,555
Scott, about a use case that I already
78
00:02:37,555 --> 00:02:39,254
have for a local LLM.
79
00:02:39,715 --> 00:02:41,574
And you definitely got my wheels
80
00:02:41,875 --> 00:02:42,935
turning about
81
00:02:43,235 --> 00:02:45,715
what possibilities there are about how some of
82
00:02:45,715 --> 00:02:48,310
this works. In Microsoft three sixty five, I
83
00:02:48,310 --> 00:02:50,469
have played around with Copilot. I know a
84
00:02:50,469 --> 00:02:52,229
fair amount, but I've never really looked at
85
00:02:52,229 --> 00:02:54,650
running them locally and bewet my appetite
86
00:02:54,949 --> 00:02:57,370
for this. So this will be an interesting
87
00:02:57,430 --> 00:03:00,229
discussion, and I'm curious to see where it
88
00:03:00,229 --> 00:03:01,849
goes, your thoughts,
89
00:03:02,284 --> 00:03:03,025
my new
90
00:03:03,405 --> 00:03:06,444
thoughts. And my expanding list, Scott, you added
91
00:03:06,444 --> 00:03:08,444
something new to my list. I was doing
92
00:03:08,444 --> 00:03:10,444
so good. It's been a hot minute, but
93
00:03:10,444 --> 00:03:12,044
I I I think this is an important
94
00:03:12,044 --> 00:03:14,064
one. So as we talk about
95
00:03:14,540 --> 00:03:15,040
the
96
00:03:15,659 --> 00:03:16,879
kinda growth
97
00:03:17,259 --> 00:03:17,759
of
98
00:03:18,219 --> 00:03:19,199
generative AI
99
00:03:19,819 --> 00:03:22,800
and models along the way for,
100
00:03:23,580 --> 00:03:26,139
you know, certainly the the copilots of the
101
00:03:26,139 --> 00:03:26,639
world,
102
00:03:27,094 --> 00:03:27,754
the OpenAI's,
103
00:03:28,775 --> 00:03:30,474
Anthropic with Claude,
104
00:03:30,854 --> 00:03:33,034
DeepSeek with r one,
105
00:03:33,655 --> 00:03:35,655
all all these different kinds of things that
106
00:03:35,655 --> 00:03:37,514
exist out there. So
107
00:03:38,134 --> 00:03:39,655
they're they're nice that you can run them
108
00:03:39,655 --> 00:03:40,474
in a service.
109
00:03:40,909 --> 00:03:43,389
And I think most of us have kind
110
00:03:43,389 --> 00:03:46,030
of grown accustomed to that, and and it's
111
00:03:46,030 --> 00:03:47,550
it's a place that most of us are
112
00:03:47,550 --> 00:03:49,629
comfortable. Like, we know how to sign in
113
00:03:49,629 --> 00:03:51,650
to chat GPT on the web and maybe
114
00:03:51,870 --> 00:03:52,370
either
115
00:03:52,915 --> 00:03:56,034
have a chat with an LLM and and
116
00:03:56,034 --> 00:03:58,275
do some structured prompting and and try and
117
00:03:58,275 --> 00:03:59,895
get some responses out of it
118
00:04:00,194 --> 00:04:00,694
versus
119
00:04:01,395 --> 00:04:03,955
things like ChatGPT web search. And it's great.
120
00:04:03,955 --> 00:04:05,655
Right? It's it's all cloud based.
121
00:04:06,030 --> 00:04:07,870
Some of them are free. Some of them
122
00:04:07,870 --> 00:04:08,610
cost money.
123
00:04:09,069 --> 00:04:10,930
They really only start to get powerful
124
00:04:11,229 --> 00:04:14,189
when they do cost money. So now you're
125
00:04:14,189 --> 00:04:16,029
in the world where you're relying on this
126
00:04:16,029 --> 00:04:18,769
external service. You're gonna pay per request.
127
00:04:19,175 --> 00:04:22,055
And probably most importantly, there's a privacy angle
128
00:04:22,055 --> 00:04:24,714
here where you're sending your data out into
129
00:04:24,935 --> 00:04:27,095
the wild. Like, when you're chatting with Chat
130
00:04:27,095 --> 00:04:28,475
GPT in the web interface,
131
00:04:28,775 --> 00:04:30,855
you're passing that data to them. We saw
132
00:04:30,855 --> 00:04:33,095
this with DeepSeq. When DeepSeq kinda came out
133
00:04:33,095 --> 00:04:33,754
of the woodwork
134
00:04:34,099 --> 00:04:35,860
a couple weeks ago and the market freaked
135
00:04:35,860 --> 00:04:37,139
out. You know, they were about a month
136
00:04:37,139 --> 00:04:38,740
behind freaking out when it had actually been
137
00:04:38,740 --> 00:04:42,039
released. But that said, you know, DeepSeek immediately
138
00:04:42,099 --> 00:04:44,680
had a data leak and people broke in
139
00:04:44,819 --> 00:04:46,740
and they got all the usernames, they got
140
00:04:46,740 --> 00:04:48,795
the passwords, they got the prompts that were
141
00:04:48,795 --> 00:04:50,714
flowing through that system, things like that. So
142
00:04:50,714 --> 00:04:52,394
I think one of the most powerful things
143
00:04:52,394 --> 00:04:54,654
here is the ability to
144
00:04:55,514 --> 00:04:56,894
run a local LLM
145
00:04:57,274 --> 00:04:57,774
with
146
00:04:58,154 --> 00:05:00,329
data privacy in mind. So I'm gonna run
147
00:05:00,329 --> 00:05:02,329
these things locally. They're only going to be
148
00:05:02,329 --> 00:05:04,649
on my machine. They're not gonna communicate with
149
00:05:04,649 --> 00:05:06,810
the outside world. And then if you're in
150
00:05:06,810 --> 00:05:09,289
that world of, you know, being a little
151
00:05:09,289 --> 00:05:10,509
bit more cost conscious,
152
00:05:11,129 --> 00:05:13,370
you might wanna try some of these things
153
00:05:13,370 --> 00:05:15,915
out without paying per request in a service
154
00:05:15,915 --> 00:05:18,394
like chat, GPT, or Claude, or or something
155
00:05:18,394 --> 00:05:21,035
like that. And in that world, you're gonna
156
00:05:21,035 --> 00:05:23,134
also have a cost savings angle.
157
00:05:23,514 --> 00:05:25,214
You're gonna have offline capabilities.
158
00:05:25,675 --> 00:05:28,095
So the ability to chat with these models
159
00:05:28,154 --> 00:05:30,470
locally can be a little bit interesting
160
00:05:30,930 --> 00:05:32,470
and and how all that composes.
161
00:05:32,930 --> 00:05:35,110
And, you know, I think the kicker is
162
00:05:35,490 --> 00:05:37,490
most of us are geeks, and we run
163
00:05:37,490 --> 00:05:40,129
around with these really powerful computers. You know,
164
00:05:40,129 --> 00:05:42,689
you've got a laptop with gobs and gobs
165
00:05:42,689 --> 00:05:45,214
of RAM on it, and it's running a
166
00:05:45,214 --> 00:05:48,254
modern processor, it's got a GPU, it's got
167
00:05:48,254 --> 00:05:48,995
an MP,
168
00:05:49,694 --> 00:05:51,134
you know, you might be sitting there at
169
00:05:51,134 --> 00:05:52,895
home and you're like a PC gamer and
170
00:05:52,895 --> 00:05:54,274
that's how you,
171
00:05:54,574 --> 00:05:56,574
you know, just relax at the end of
172
00:05:56,574 --> 00:05:58,319
the day. Well, guess what? You got that
173
00:05:58,319 --> 00:06:00,639
monster GPU, you know, that fifty ninety or
174
00:06:00,639 --> 00:06:02,399
whatever that you can potentially use during the
175
00:06:02,399 --> 00:06:04,319
day with these things. And it turns out
176
00:06:04,319 --> 00:06:06,479
that you might actually chat with local LMs,
177
00:06:06,479 --> 00:06:08,560
like, more than you think. You know, like
178
00:06:08,720 --> 00:06:10,740
so we've talked about how we're Apple users,
179
00:06:11,305 --> 00:06:14,185
so iOS, things like that. The predictive text
180
00:06:14,185 --> 00:06:17,004
on iOS is all based on an LLM.
181
00:06:17,144 --> 00:06:18,764
It's based on a transformer.
182
00:06:19,384 --> 00:06:21,144
So that thing is running a local model.
183
00:06:21,144 --> 00:06:23,589
Well, you can run those similar models on
184
00:06:23,589 --> 00:06:25,669
your side. So it gives you this really
185
00:06:25,669 --> 00:06:29,689
interesting opportunity to kinda take advantage of AI
186
00:06:30,149 --> 00:06:32,789
while maintaining the privacy aspects, maybe letting you
187
00:06:32,789 --> 00:06:34,229
play with new things. Like, if you wanna
188
00:06:34,229 --> 00:06:36,389
play with DeepSeek without signing up for the
189
00:06:36,389 --> 00:06:37,370
DeepSeek service,
190
00:06:37,685 --> 00:06:39,525
like, hey, that that that that's a great
191
00:06:39,525 --> 00:06:40,725
way to do it. So we'll talk a
192
00:06:40,725 --> 00:06:42,085
little bit about that and kind of some
193
00:06:42,085 --> 00:06:43,925
of the advantages and what you can get
194
00:06:43,925 --> 00:06:46,904
on with. We should also talk about what
195
00:06:47,525 --> 00:06:49,064
folks can actually run,
196
00:06:49,365 --> 00:06:52,439
like, what's useful useful that can run locally
197
00:06:52,439 --> 00:06:53,879
for you. So we're gonna talk a little
198
00:06:53,879 --> 00:06:56,779
bit about, like, parameter size in a model
199
00:06:56,839 --> 00:06:59,800
and how big these things are. So turns
200
00:06:59,800 --> 00:07:02,300
out there's a big difference between a
201
00:07:02,680 --> 00:07:06,044
1,000,000,000 parameter model, a 7,000,000,000 parameter model, a
202
00:07:06,044 --> 00:07:06,925
65,000,000,000
203
00:07:06,925 --> 00:07:09,004
parameter model, or, you know, like I said,
204
00:07:09,004 --> 00:07:10,384
if you wanna play around with DeepSeq,
205
00:07:10,764 --> 00:07:12,444
I was watching some videos on YouTube of
206
00:07:12,444 --> 00:07:15,264
people who are playing around with some clustered
207
00:07:15,644 --> 00:07:16,144
servers
208
00:07:16,604 --> 00:07:18,285
to do, like, 400,000,000,000
209
00:07:18,285 --> 00:07:20,660
parameter model runs. And, you know, you can't
210
00:07:20,660 --> 00:07:22,899
run, like, 400,000,000,000 parameters locally. You need, like,
211
00:07:22,899 --> 00:07:25,220
a distributed system, and, you you know, you
212
00:07:25,220 --> 00:07:27,860
can potentially do it across a series of
213
00:07:27,860 --> 00:07:30,339
servers within your premises. But that said, like,
214
00:07:30,339 --> 00:07:32,339
those aren't for everybody. They're gonna be too
215
00:07:32,339 --> 00:07:35,060
slow, cost a bunch for the GPUs, things
216
00:07:35,060 --> 00:07:36,555
like that. So we'll talk a little bit
217
00:07:36,555 --> 00:07:38,954
about that, about like parameters and, you know,
218
00:07:38,954 --> 00:07:42,235
maybe where more parameters doesn't always mean, like,
219
00:07:42,235 --> 00:07:44,735
better results. I think that's important too.
220
00:07:45,035 --> 00:07:46,714
There there's a little bit of nuance and
221
00:07:46,714 --> 00:07:48,735
kind of trade off here between
222
00:07:49,240 --> 00:07:51,720
speed of response, like how many tokens can
223
00:07:51,720 --> 00:07:53,639
an LLM respond back to you with, what's
224
00:07:53,639 --> 00:07:56,519
the accuracy of that, and probably most importantly,
225
00:07:56,519 --> 00:07:58,279
like what are the compute requirements on your
226
00:07:58,279 --> 00:08:00,039
end. So like the things that I'm gonna
227
00:08:00,039 --> 00:08:01,800
talk about that I run today, so I
228
00:08:01,800 --> 00:08:03,819
rock an m MaxBook Pro
229
00:08:04,295 --> 00:08:06,455
most of the time, and that's kind of
230
00:08:06,455 --> 00:08:07,975
like what I'm running on. And I've got,
231
00:08:07,975 --> 00:08:09,735
you know, 32 gigs of RAM in there,
232
00:08:09,735 --> 00:08:11,595
and and I'm all set in my world.
233
00:08:11,895 --> 00:08:14,214
You have a a different model on a
234
00:08:14,214 --> 00:08:15,035
different processor
235
00:08:15,574 --> 00:08:19,009
with more memory and potentially more GPUs, so
236
00:08:19,009 --> 00:08:20,529
you'll be able to run, like, maybe even,
237
00:08:20,529 --> 00:08:22,870
like, bigger things than I can run here.
238
00:08:22,930 --> 00:08:24,310
And that's okay. And then,
239
00:08:24,770 --> 00:08:27,089
you know, your mileage may vary. But it's
240
00:08:27,089 --> 00:08:28,930
kind of like anybody can get started with
241
00:08:28,930 --> 00:08:31,435
these things, even on, like, a little,
242
00:08:31,975 --> 00:08:34,294
you know, off the shelf NUC kind of
243
00:08:34,294 --> 00:08:37,254
PC or things like that. So beyond chatting
244
00:08:37,254 --> 00:08:38,075
with these things,
245
00:08:38,774 --> 00:08:40,615
you can also use them to empower your
246
00:08:40,615 --> 00:08:41,115
workflows.
247
00:08:41,654 --> 00:08:44,375
So you can use local AI models with
248
00:08:44,375 --> 00:08:46,830
Visual Studio Code. Like, you might sit out
249
00:08:46,830 --> 00:08:47,809
and go and say,
250
00:08:48,590 --> 00:08:50,929
I'm coding a dot net application.
251
00:08:51,470 --> 00:08:54,830
Let me go find the best LLM model
252
00:08:54,830 --> 00:08:56,210
for dot net applications,
253
00:08:56,669 --> 00:08:58,074
but I don't wanna pay for it. I
254
00:08:58,074 --> 00:09:00,074
I don't wanna, like, go to OpenAI or
255
00:09:00,074 --> 00:09:02,154
Anthropic and and do the cloud thing, anything
256
00:09:02,154 --> 00:09:03,995
like that. Well, maybe you can go out
257
00:09:03,995 --> 00:09:05,754
and actually just download a model and run
258
00:09:05,754 --> 00:09:07,595
it locally, and we'll kind of talk about
259
00:09:07,595 --> 00:09:10,074
the hosting engines for these things that expose
260
00:09:10,074 --> 00:09:12,730
things like standard OpenAI endpoints. So you can
261
00:09:12,730 --> 00:09:15,629
literally point Versus Code at a local LLM
262
00:09:15,769 --> 00:09:17,529
and have it write you PowerShell and all
263
00:09:17,529 --> 00:09:19,210
those things that are, like, private just to
264
00:09:19,210 --> 00:09:19,870
your machine
265
00:09:20,170 --> 00:09:23,210
without having to go out to the Internet
266
00:09:23,210 --> 00:09:25,154
and get those kinds of things done. So
267
00:09:25,235 --> 00:09:26,915
I think that's a fun little way to
268
00:09:26,915 --> 00:09:29,254
kind of think about integrating these things
269
00:09:29,715 --> 00:09:31,555
into your life and how they come together.
270
00:09:31,555 --> 00:09:33,154
So we just kind of want to go
271
00:09:33,154 --> 00:09:35,475
end to end and full circle between, can
272
00:09:35,475 --> 00:09:37,975
you run your own chat GPT
273
00:09:38,675 --> 00:09:39,175
like
274
00:09:39,600 --> 00:09:40,259
thing, model
275
00:09:41,039 --> 00:09:43,600
locally? And the answer is yes. So, yeah,
276
00:09:43,600 --> 00:09:45,440
like we should just kind of have a
277
00:09:45,440 --> 00:09:45,940
conversation
278
00:09:46,799 --> 00:09:47,700
about that. So
279
00:09:48,399 --> 00:09:50,639
why don't we start with like the whole
280
00:09:50,639 --> 00:09:53,360
data and privacy cost efficiency thing and all
281
00:09:53,360 --> 00:09:54,904
that stuff? I think that's one of the
282
00:09:54,904 --> 00:09:56,924
ones that can be super important
283
00:09:58,024 --> 00:10:00,105
that people think about. And kinda like you
284
00:10:00,105 --> 00:10:02,345
said, the deep sea click exposed to millions
285
00:10:02,345 --> 00:10:02,845
sensitive
286
00:10:03,225 --> 00:10:05,544
data records. One thing I've heard even when
287
00:10:05,544 --> 00:10:07,565
you start looking at things like ChatGPT
288
00:10:08,105 --> 00:10:08,605
versus
289
00:10:09,304 --> 00:10:11,759
Copilot and Microsoft three sixty five and going
290
00:10:11,759 --> 00:10:14,580
back to the local ones or doing OpenAI
291
00:10:14,720 --> 00:10:16,420
in Azure or
292
00:10:16,960 --> 00:10:19,680
something in AWS is it it very much
293
00:10:19,680 --> 00:10:21,279
goes back to where does that data go.
294
00:10:21,279 --> 00:10:23,680
Some people see rolling out Copilot as a
295
00:10:23,680 --> 00:10:26,365
security benefit because then they're not taking all
296
00:10:26,365 --> 00:10:27,585
that data from
297
00:10:27,965 --> 00:10:30,845
SharePoint, from Teams, from their Microsoft three sixty
298
00:10:30,845 --> 00:10:33,424
five tenant, sending it out into ChatGPT
299
00:10:33,804 --> 00:10:36,524
where it's escaping that Microsoft three sixty five
300
00:10:36,524 --> 00:10:39,470
boundary. OpenAI and Azure, same thing. If all
301
00:10:39,470 --> 00:10:42,210
your data's up in Azure somewhere, if you're
302
00:10:42,509 --> 00:10:45,309
working with Scott to store petabytes of data
303
00:10:45,309 --> 00:10:47,309
in blob storage and you want that to
304
00:10:47,309 --> 00:10:49,470
be used for OpenAI, you can do that.
305
00:10:49,470 --> 00:10:50,909
But then you do get into this local
306
00:10:50,909 --> 00:10:52,129
thing. All your data's
307
00:10:52,595 --> 00:10:54,835
local. Or one of the scenarios I have
308
00:10:54,835 --> 00:10:56,514
that we can put a link to is
309
00:10:56,514 --> 00:10:58,754
I use Home Assistant for all my smart
310
00:10:58,754 --> 00:10:59,894
home stuff because
311
00:11:00,355 --> 00:11:01,254
I like everything
312
00:11:01,555 --> 00:11:03,235
local. I don't want it all going out
313
00:11:03,235 --> 00:11:05,495
to relying on Samsung or
314
00:11:05,795 --> 00:11:08,370
any of those. What if you wanna integrate
315
00:11:08,590 --> 00:11:11,789
AI into your local smart home stuff and,
316
00:11:11,789 --> 00:11:14,129
again, you wanna keep it all internal? You're
317
00:11:14,190 --> 00:11:15,809
in an industry where
318
00:11:16,269 --> 00:11:18,590
you need to keep things on premises for
319
00:11:18,590 --> 00:11:21,115
some reason or certain regulations around that. I
320
00:11:21,115 --> 00:11:23,215
think there's a huge benefit to doing
321
00:11:23,595 --> 00:11:26,095
local AI, whether it's at that small
322
00:11:26,475 --> 00:11:27,375
in your house,
323
00:11:27,915 --> 00:11:29,855
you and I type scenario of
324
00:11:30,235 --> 00:11:33,215
smart home or something here or large enterprises
325
00:11:33,754 --> 00:11:38,220
that have very stringent data requirements and need
326
00:11:38,220 --> 00:11:40,460
to run it locally, maybe in their own
327
00:11:40,460 --> 00:11:43,519
data centers in clusters that they build internally
328
00:11:43,580 --> 00:11:45,820
and stuff. Home Assistant is a fun one.
329
00:11:45,820 --> 00:11:48,294
So if you think about AI and Home
330
00:11:48,294 --> 00:11:49,894
Assistant and what they're doing with like Home
331
00:11:49,894 --> 00:11:52,054
Assistant voice and some of those things, it
332
00:11:52,054 --> 00:11:55,414
relies on two paths. One is text to
333
00:11:55,414 --> 00:11:58,054
speech. So can I have Home Assistant talk
334
00:11:58,054 --> 00:11:59,735
to me? So some text goes in and
335
00:11:59,735 --> 00:12:01,195
can I have it talk back to me?
336
00:12:01,419 --> 00:12:03,980
And then it's also speech to text in
337
00:12:03,980 --> 00:12:05,600
the form of things maybe
338
00:12:05,980 --> 00:12:06,879
like Whisper,
339
00:12:07,259 --> 00:12:09,259
which is, you know, typically what I see
340
00:12:09,259 --> 00:12:11,660
integrated with most on that side. In fact,
341
00:12:11,660 --> 00:12:15,179
we use Whisper for generating transcripts sometimes for
342
00:12:15,179 --> 00:12:17,205
the show. So it's not just LLMs.
343
00:12:17,665 --> 00:12:19,665
It could be things like text to speech,
344
00:12:19,665 --> 00:12:22,165
speech to text. Could also be image generation.
345
00:12:22,225 --> 00:12:24,384
Like, if somebody is looking to, like, play
346
00:12:24,384 --> 00:12:26,945
around with stable diffusion, that that runs pretty
347
00:12:26,945 --> 00:12:27,764
well locally
348
00:12:28,269 --> 00:12:29,870
on most of these things as well. It
349
00:12:29,870 --> 00:12:31,549
could be a little bit slow, but, hey,
350
00:12:31,549 --> 00:12:33,629
that that's okay. That's that's part of the
351
00:12:33,629 --> 00:12:35,950
trade off of not having to pay and
352
00:12:35,950 --> 00:12:38,110
and push these things through. But I I
353
00:12:38,110 --> 00:12:39,950
think the most important thing is just when
354
00:12:39,950 --> 00:12:44,274
you're running an LLM locally, you're basically mitigating
355
00:12:44,274 --> 00:12:45,495
a bunch of that risk
356
00:12:46,115 --> 00:12:48,834
of having to worry about compliance, having to
357
00:12:48,834 --> 00:12:51,735
worry about legal concerns. Like, hey, I'm submitting,
358
00:12:51,875 --> 00:12:54,115
like, this thing that's important to me. Like,
359
00:12:54,115 --> 00:12:56,274
I'm never like, for example, I'm never going
360
00:12:56,274 --> 00:12:57,495
to chat with my taxes
361
00:12:58,039 --> 00:13:00,439
with anything other than, like, a local LLM
362
00:13:00,439 --> 00:13:01,879
to help me break some of that stuff
363
00:13:01,879 --> 00:13:02,379
down.
364
00:13:03,159 --> 00:13:05,240
But, you know, somebody else might be out
365
00:13:05,240 --> 00:13:07,000
there, but good good luck when you're when
366
00:13:07,000 --> 00:13:08,759
you're in the next data breach or or
367
00:13:08,759 --> 00:13:11,345
or whatever happens. So there's things like that.
368
00:13:11,345 --> 00:13:12,945
I think the other one that's important to
369
00:13:12,945 --> 00:13:15,424
consider is kind of the cost angle of
370
00:13:15,424 --> 00:13:17,904
things. Like, I'll be the first to admit
371
00:13:17,904 --> 00:13:20,225
that I'm pretty frugal. So if you're thinking
372
00:13:20,225 --> 00:13:22,544
about maybe like OpenAI and having to go
373
00:13:22,544 --> 00:13:24,945
out and pay for OpenAI, and you're either
374
00:13:24,945 --> 00:13:27,129
paying per request or you're on one of
375
00:13:27,129 --> 00:13:28,889
the monthly plans. And those can get pretty
376
00:13:28,889 --> 00:13:30,250
expensive. Right? If you wanna get up there,
377
00:13:30,250 --> 00:13:31,769
you can spend up to, like, $200 a
378
00:13:31,769 --> 00:13:34,169
month. But typically, they're on the order of,
379
00:13:34,169 --> 00:13:34,669
like,
380
00:13:35,129 --> 00:13:38,990
you know, 1¢ US per 1,000 tokens.
381
00:13:39,365 --> 00:13:40,964
And then you're like, Well, what's a token?
382
00:13:40,964 --> 00:13:43,125
Like, how many words comprise a token? Like,
383
00:13:43,125 --> 00:13:44,164
it can be a little bit weird to
384
00:13:44,164 --> 00:13:45,845
figure out the pricing. So sometimes you just
385
00:13:45,845 --> 00:13:48,105
want to play around with these things locally
386
00:13:48,884 --> 00:13:50,824
without having that cost constraint,
387
00:13:51,204 --> 00:13:53,959
because costs can run away from you pretty
388
00:13:53,959 --> 00:13:56,620
quickly, especially if you're being like super chatty
389
00:13:57,000 --> 00:14:00,120
and doing longer chat threads and things like
390
00:14:00,120 --> 00:14:02,199
that. Or the other place they tend to
391
00:14:02,199 --> 00:14:03,179
get pretty expensive
392
00:14:03,559 --> 00:14:05,339
is if you're integrating
393
00:14:06,134 --> 00:14:06,634
these
394
00:14:07,095 --> 00:14:07,595
AIs
395
00:14:08,254 --> 00:14:08,754
into,
396
00:14:09,415 --> 00:14:11,815
like, your coding workflows, like, hey, you're you're
397
00:14:11,815 --> 00:14:12,855
out there and you're sitting there and you're
398
00:14:12,855 --> 00:14:15,095
like, I want a vibe code. Well, great.
399
00:14:15,095 --> 00:14:17,654
When you're like vibe coding across 10,000 lines
400
00:14:17,654 --> 00:14:18,929
of code, it starts
401
00:14:19,309 --> 00:14:21,790
to add up and get pretty expensive. So
402
00:14:21,790 --> 00:14:24,350
you already bought this, you know, honking computer.
403
00:14:24,350 --> 00:14:26,370
You got a GPU. It's got CPU.
404
00:14:26,830 --> 00:14:28,990
It's got a fast disk. You might as
405
00:14:28,990 --> 00:14:30,850
well use it for a little bit more
406
00:14:31,154 --> 00:14:33,794
than just writing your PowerShell scripts. Like, why
407
00:14:33,794 --> 00:14:35,235
why are you sitting there writing in Versus
408
00:14:35,235 --> 00:14:37,074
Code by hand when, you know, you could
409
00:14:37,074 --> 00:14:39,074
be just vibing your way through that stuff?
410
00:14:39,074 --> 00:14:41,014
For sure. And I think that's one thing.
411
00:14:41,074 --> 00:14:42,834
I guess I kind of always realized it
412
00:14:42,834 --> 00:14:44,834
in the back of my head, comparing local
413
00:14:44,834 --> 00:14:45,334
LLM
414
00:14:45,679 --> 00:14:48,559
to JetGPT to Copilot to cloud based, it
415
00:14:48,559 --> 00:14:50,100
kinda struck me that
416
00:14:50,720 --> 00:14:53,120
from a pricing perspective, when you're using cloud
417
00:14:53,120 --> 00:14:55,759
based LLMs, you're not paying for the models.
418
00:14:55,759 --> 00:14:58,319
Like, these companies, these models are all out
419
00:14:58,319 --> 00:14:58,819
there,
420
00:14:59,120 --> 00:14:59,940
whether it's
421
00:15:00,424 --> 00:15:00,924
DeepSeek
422
00:15:01,945 --> 00:15:02,764
or Lama
423
00:15:03,144 --> 00:15:05,225
or any of those. What you're really paying
424
00:15:05,225 --> 00:15:06,764
for is the compute to
425
00:15:07,225 --> 00:15:09,784
process the request to these models, and that's
426
00:15:09,784 --> 00:15:11,225
where that cost comes in. Do you wanna
427
00:15:11,225 --> 00:15:13,625
spend it in on premises hardware and hardware
428
00:15:13,625 --> 00:15:15,790
running in your house, or are you giving
429
00:15:15,790 --> 00:15:18,350
it to these cloud providers for the hardware
430
00:15:18,350 --> 00:15:21,470
out there running models that maybe you don't
431
00:15:21,470 --> 00:15:24,429
physically have the capability of running on your
432
00:15:24,429 --> 00:15:26,269
compute that you own? It is an interesting
433
00:15:26,269 --> 00:15:28,190
one. The other thing that, you know, like,
434
00:15:28,190 --> 00:15:29,629
once you get a little bit more advanced
435
00:15:29,629 --> 00:15:30,910
and you start going down the path of
436
00:15:30,910 --> 00:15:32,664
some of this stuff, if you really get
437
00:15:32,664 --> 00:15:35,065
into it, you start looking at things like
438
00:15:35,065 --> 00:15:35,804
fine tuning
439
00:15:36,184 --> 00:15:39,304
and doing RAG or retrieval augmented generation against
440
00:15:39,304 --> 00:15:41,144
things. So we'll put a link in the
441
00:15:41,144 --> 00:15:44,105
show notes to a Network Chuck episode where
442
00:15:44,105 --> 00:15:46,610
he talks about running local LMs. And one
443
00:15:46,610 --> 00:15:47,970
of the things that he does, he has
444
00:15:47,970 --> 00:15:50,370
this really interesting use case where when he
445
00:15:50,370 --> 00:15:53,110
attends church, all the sermons are transcribed,
446
00:15:53,649 --> 00:15:56,610
and he uses local LLMs to summarize the
447
00:15:56,610 --> 00:15:58,929
sermons for himself. Like, he doesn't always get
448
00:15:58,929 --> 00:16:00,769
to attend live, but he still wants to
449
00:16:00,769 --> 00:16:02,534
get the messaging out of it. So he
450
00:16:02,534 --> 00:16:05,095
does all that stuff like local LLM, and
451
00:16:05,095 --> 00:16:07,174
it's just all there ready to go. It
452
00:16:07,174 --> 00:16:08,154
does the transcription,
453
00:16:08,934 --> 00:16:10,855
like pulls it all off a YouTube thing,
454
00:16:10,855 --> 00:16:12,934
transcribes it, runs it through an LLM, gives
455
00:16:12,934 --> 00:16:14,730
him the summary, and then that summary is
456
00:16:14,730 --> 00:16:17,289
written back as a markdown file where it
457
00:16:17,289 --> 00:16:18,269
lands in Obsidian,
458
00:16:18,809 --> 00:16:21,289
and then he can just use his network
459
00:16:21,289 --> 00:16:23,450
brain in Obsidian to go and figure some
460
00:16:23,450 --> 00:16:25,049
of that stuff out too. So you can
461
00:16:25,049 --> 00:16:27,164
get pretty rich with these things if you
462
00:16:27,164 --> 00:16:29,644
start to kinda, run through the use cases.
463
00:16:29,644 --> 00:16:31,804
So we're, like, Network Chuck might be doing
464
00:16:31,965 --> 00:16:34,044
I might be working on coding a new
465
00:16:34,044 --> 00:16:34,544
application,
466
00:16:35,004 --> 00:16:37,004
and I just want it to learn off
467
00:16:37,004 --> 00:16:39,009
maybe an existing code base from, like, the
468
00:16:39,009 --> 00:16:41,029
previous two versions or iterations
469
00:16:41,409 --> 00:16:43,089
or things like that that I did along
470
00:16:43,089 --> 00:16:44,529
the way. So you can also do these
471
00:16:44,529 --> 00:16:46,850
things like fine tuning and get up and
472
00:16:46,850 --> 00:16:47,350
running
473
00:16:47,889 --> 00:16:50,529
pretty pretty quickly. It's actually, like, turns out
474
00:16:50,529 --> 00:16:51,970
a lot of the tooling's already out there.
475
00:16:51,970 --> 00:16:53,350
Like, these things are
476
00:16:53,730 --> 00:16:56,115
not the hardest thing to stand up. But
477
00:16:56,115 --> 00:16:57,875
before we stand them up, we should also
478
00:16:57,875 --> 00:16:59,955
probably talk a little bit about, like, what
479
00:16:59,955 --> 00:17:02,455
kinds of models you can run
480
00:17:02,914 --> 00:17:05,494
because your mileage may vary here based on
481
00:17:05,795 --> 00:17:08,375
your your hardware and what's available to you,
482
00:17:08,650 --> 00:17:11,049
your your network bandwidths, and a couple other
483
00:17:11,049 --> 00:17:11,549
things.
484
00:17:15,450 --> 00:17:17,690
Do you feel overwhelmed by trying to manage
485
00:17:17,690 --> 00:17:19,929
your Office three sixty five environment? Are you
486
00:17:19,929 --> 00:17:23,230
facing unexpected issues that disrupt your company's productivity?
487
00:17:23,529 --> 00:17:25,474
Intelligink is here to help. Much like you
488
00:17:25,474 --> 00:17:27,394
take your car to the mechanic that has
489
00:17:27,394 --> 00:17:29,474
specialized knowledge on how to best keep your
490
00:17:29,474 --> 00:17:32,454
car running, Intelligink helps you with your Microsoft
491
00:17:32,515 --> 00:17:34,774
cloud environment because that's their expertise.
492
00:17:35,154 --> 00:17:37,470
Intelligink keeps up with the latest updates in
493
00:17:37,470 --> 00:17:39,630
the Microsoft cloud to help keep your business
494
00:17:39,630 --> 00:17:41,869
running smoothly and ahead of the curve. Whether
495
00:17:41,869 --> 00:17:43,869
you are a small organization with just a
496
00:17:43,869 --> 00:17:46,349
few users up to an organization of several
497
00:17:46,349 --> 00:17:47,329
thousand employees,
498
00:17:47,710 --> 00:17:49,710
they want to partner with you to implement
499
00:17:49,710 --> 00:17:52,450
and administer your Microsoft cloud technology.
500
00:17:53,204 --> 00:17:56,744
Visit them at inteliginc.com/podcast.
501
00:17:56,964 --> 00:18:03,704
That's intelligink.com/podcast
502
00:18:04,085 --> 00:18:06,244
for more information or to schedule a thirty
503
00:18:06,244 --> 00:18:08,240
minute call to get started with them today.
504
00:18:08,539 --> 00:18:11,900
Remember, Intelligink focuses on the Microsoft cloud so
505
00:18:11,900 --> 00:18:13,680
you can focus on your business.
506
00:18:15,820 --> 00:18:17,900
So talking hardware, do you wanna drive into
507
00:18:17,900 --> 00:18:20,545
hardware or models? Where should we go? It's
508
00:18:20,545 --> 00:18:21,924
kinda like a both conversation.
509
00:18:22,305 --> 00:18:23,744
So I think we can cover kind of
510
00:18:23,744 --> 00:18:24,725
the whole parameterization
511
00:18:25,505 --> 00:18:26,005
question
512
00:18:26,545 --> 00:18:29,105
and how big these things are to run
513
00:18:29,105 --> 00:18:29,605
locally
514
00:18:29,984 --> 00:18:31,605
along with some of the hardware
515
00:18:31,904 --> 00:18:32,404
constraints
516
00:18:32,865 --> 00:18:33,365
that
517
00:18:33,769 --> 00:18:35,929
come along with them. So when you think
518
00:18:35,929 --> 00:18:37,849
about the models that you can run, one
519
00:18:37,849 --> 00:18:39,849
of the first things that's gonna happen is
520
00:18:39,849 --> 00:18:42,269
you might go out and grab Ollama,
521
00:18:42,569 --> 00:18:45,129
you might grab LM Studio. You're you're gonna
522
00:18:45,129 --> 00:18:47,845
grab some system that's going to let you
523
00:18:47,845 --> 00:18:48,345
basically
524
00:18:48,884 --> 00:18:51,684
run that model and be able to run
525
00:18:51,684 --> 00:18:54,644
prompts against it. So though those models are
526
00:18:54,644 --> 00:18:55,704
gonna have different
527
00:18:56,005 --> 00:18:56,505
sizes,
528
00:18:57,044 --> 00:19:00,299
and those sizes equate back to parameters. So
529
00:19:00,460 --> 00:19:01,580
you're gonna go out and you're gonna see
530
00:19:01,580 --> 00:19:04,320
things like, oh, I wanna run llama three.
531
00:19:04,940 --> 00:19:06,960
And llama three might have,
532
00:19:07,660 --> 00:19:10,460
you know, a 7,000,000,000 parameter model. It might
533
00:19:10,460 --> 00:19:12,380
have a 300,000,000,000
534
00:19:12,380 --> 00:19:15,144
parameter model. It could have a 1,000,000,000 parameter.
535
00:19:15,144 --> 00:19:16,985
It could have something that's even smaller than
536
00:19:16,985 --> 00:19:19,144
that. So these things start to kind of
537
00:19:19,144 --> 00:19:21,384
become important. So if you're thinking about, like,
538
00:19:21,384 --> 00:19:23,865
parameters, number of parameters in a model, which
539
00:19:23,865 --> 00:19:25,404
is going to equate to
540
00:19:25,705 --> 00:19:26,924
kind of functionality
541
00:19:27,384 --> 00:19:28,445
within that model,
542
00:19:28,840 --> 00:19:31,400
In some place like a 7,000,000,000 parameter model,
543
00:19:31,400 --> 00:19:33,480
if you're looking at, like, LAMA two seven
544
00:19:33,480 --> 00:19:36,600
b, you're looking at Mistral seven b, like,
545
00:19:36,600 --> 00:19:38,519
those are pretty good starting points, and you
546
00:19:38,519 --> 00:19:41,160
don't need a super monster laptop or desktop
547
00:19:41,160 --> 00:19:43,184
to do it, just something decent. So if
548
00:19:43,184 --> 00:19:45,585
you have about 16 gigs of RAM and
549
00:19:45,585 --> 00:19:47,904
some CPU, you're good. Like, you don't need
550
00:19:47,904 --> 00:19:50,464
a dedicated GPU. You can absolutely do this
551
00:19:50,464 --> 00:19:51,444
stuff on CPU.
552
00:19:51,904 --> 00:19:54,144
I hesitate to say fast. It'll be fast
553
00:19:54,144 --> 00:19:56,150
ish. It might feel a little bit slow,
554
00:19:56,150 --> 00:19:58,069
like you'll see, like, the words typing out
555
00:19:58,069 --> 00:20:00,230
on screen, but that's okay. That that kind
556
00:20:00,230 --> 00:20:01,750
of equates to the experience that you might
557
00:20:01,750 --> 00:20:03,589
have in a chat GPT or or a
558
00:20:03,589 --> 00:20:05,369
Claude or things like that.
559
00:20:05,829 --> 00:20:07,450
But they're also super lightweight.
560
00:20:07,845 --> 00:20:10,565
So you you can get models that potentially
561
00:20:10,565 --> 00:20:12,404
when you download the model, they're measured in,
562
00:20:12,404 --> 00:20:13,704
like, hundreds of bags.
563
00:20:14,005 --> 00:20:15,845
Some are in the gigabyte range. Like, if
564
00:20:15,845 --> 00:20:17,224
you're in, like, a 7,000,000,000
565
00:20:17,444 --> 00:20:19,605
parameter model, you're talking about maybe, like, two
566
00:20:19,605 --> 00:20:22,390
to three gigs of downloading a quantized model
567
00:20:22,529 --> 00:20:25,569
and being able to track against it. And
568
00:20:25,569 --> 00:20:28,470
with 7,000,000,000 parameters, you'll probably find
569
00:20:28,769 --> 00:20:31,910
that they're good enough for most tasks,
570
00:20:32,529 --> 00:20:33,109
for most
571
00:20:33,464 --> 00:20:36,744
personal tasks. Hey. Summarize this for me. Hey.
572
00:20:36,744 --> 00:20:38,444
Give me a quick idea of this.
573
00:20:38,904 --> 00:20:41,544
Translate this to this. Like, those kinds of
574
00:20:41,544 --> 00:20:43,644
things, it's perfect. Hey. I wanna pump in
575
00:20:43,865 --> 00:20:46,424
the transcript from a YouTube video and have
576
00:20:46,424 --> 00:20:48,569
a local model summarize it for me. That's
577
00:20:48,569 --> 00:20:50,970
an awesome job for, like, a 3,000,000,007
578
00:20:50,970 --> 00:20:52,029
parameter model,
579
00:20:52,409 --> 00:20:54,169
things like that. You can get a little
580
00:20:54,169 --> 00:20:56,569
bit bigger, and a little bit bigger is
581
00:20:56,569 --> 00:20:58,809
typically gonna be in the something of, like,
582
00:20:58,809 --> 00:21:00,109
10 to 30,000,000,000
583
00:21:00,169 --> 00:21:01,230
parameter range.
584
00:21:01,755 --> 00:21:02,255
So
585
00:21:02,634 --> 00:21:04,815
now you're getting a little bit more honking.
586
00:21:04,954 --> 00:21:07,994
You're actually gonna need some GPU here, and
587
00:21:07,994 --> 00:21:10,234
you're probably gonna need more RAM as well.
588
00:21:10,234 --> 00:21:11,914
So, like, 16 gigs of RAM isn't gonna
589
00:21:11,914 --> 00:21:13,994
cut it. You're probably gonna need something closer
590
00:21:13,994 --> 00:21:15,615
to 32 gigs of RAM.
591
00:21:16,000 --> 00:21:18,480
You're gonna need some kind of GPU to
592
00:21:18,480 --> 00:21:19,380
drive that.
593
00:21:20,000 --> 00:21:21,359
You know, I think you could maybe get
594
00:21:21,359 --> 00:21:23,919
by on, like, an RTX thirty ninety or
595
00:21:23,919 --> 00:21:25,759
something like that. You'd probably wanna be in,
596
00:21:25,759 --> 00:21:27,519
like, a a a 40 series, like, a
597
00:21:27,519 --> 00:21:29,975
forty sixty, 40 70. Or if you're all
598
00:21:29,975 --> 00:21:31,174
on board and, like I said, you're a
599
00:21:31,174 --> 00:21:33,095
PC gamer and you've got that fifty ninety
600
00:21:33,095 --> 00:21:35,674
sitting in there, like, go ahead. Use it.
601
00:21:35,815 --> 00:21:37,575
It's ready to go. Nobody has the 50
602
00:21:37,575 --> 00:21:39,095
series. There were only, like, 10 of them
603
00:21:39,095 --> 00:21:40,695
produced and nobody could buy them. Well, and
604
00:21:40,695 --> 00:21:41,975
out of the 10 that were produced, 10
605
00:21:41,975 --> 00:21:43,894
out of 10 were broken, so the the
606
00:21:43,894 --> 00:21:46,049
yields are great. And melted power cables. Okay.
607
00:21:46,049 --> 00:21:48,210
Anyways, sidetracked. Yes. But you're gonna need one
608
00:21:48,210 --> 00:21:49,890
of those high end GPUs. Yeah. Well, you're
609
00:21:49,890 --> 00:21:51,730
gonna need a GPU. Like, I think the
610
00:21:51,730 --> 00:21:54,369
difference between, like, a 3,000,000,000, seven parameter model
611
00:21:54,369 --> 00:21:55,890
and then you get up to those, like,
612
00:21:55,890 --> 00:21:57,255
10 to 30 range
613
00:21:57,575 --> 00:21:59,494
is, do I need a GPU or do
614
00:21:59,494 --> 00:22:00,634
I not need a GPU?
615
00:22:00,934 --> 00:22:02,775
So you can do the smaller models just
616
00:22:02,775 --> 00:22:04,535
with CPU as long as you have enough
617
00:22:04,535 --> 00:22:06,934
RAM. At some point, you're gonna want GPU
618
00:22:06,934 --> 00:22:10,234
as well to go ahead and offload those.
619
00:22:10,460 --> 00:22:12,220
So if you're thinking like, hey, my use
620
00:22:12,220 --> 00:22:14,880
case for running a local LM is doing
621
00:22:15,019 --> 00:22:17,759
advanced coding, like, I'm I'm beyond, like, summarization,
622
00:22:17,900 --> 00:22:19,579
and I want this thing to help me
623
00:22:19,579 --> 00:22:20,319
write applications,
624
00:22:20,619 --> 00:22:23,924
PowerShell scripts, bash scripts, anything like that, you're
625
00:22:24,085 --> 00:22:26,244
probably gonna wanna be in that range where
626
00:22:26,244 --> 00:22:28,244
you've got a little bit more RAM and
627
00:22:28,244 --> 00:22:29,225
you've got a GPU,
628
00:22:29,765 --> 00:22:31,365
and then you kinda find the model that
629
00:22:31,365 --> 00:22:33,205
you like, and and that ends up being
630
00:22:33,205 --> 00:22:35,605
your sweet spot there. After that, you get
631
00:22:35,605 --> 00:22:37,700
into, like, the big, big models. So you're
632
00:22:37,700 --> 00:22:40,099
into, like, 65. I think, I was watching
633
00:22:40,099 --> 00:22:42,099
another NetworkChuck video. He ran one on a
634
00:22:42,099 --> 00:22:42,599
cluster
635
00:22:42,900 --> 00:22:44,980
of Those studios. I think it was the
636
00:22:44,980 --> 00:22:46,900
m one studios. It was, like, a cluster
637
00:22:46,900 --> 00:22:48,500
of, like, six of those where he was
638
00:22:48,500 --> 00:22:50,660
able to run, like, a 400,000,000,000 parameter model,
639
00:22:50,660 --> 00:22:52,599
but it was only able to output context
640
00:22:53,164 --> 00:22:54,544
you know, like one
641
00:22:55,085 --> 00:22:55,585
word,
642
00:22:55,884 --> 00:22:58,684
a second. Like, it's just so slow that
643
00:22:58,684 --> 00:23:01,005
it's that it's not actually useful. Right. So
644
00:23:01,005 --> 00:23:02,605
slow. A few times it looked like it
645
00:23:02,605 --> 00:23:04,625
even got stuck and,
646
00:23:05,005 --> 00:23:07,940
yeah, it was it was interesting. We'll put
647
00:23:07,940 --> 00:23:08,980
a link to that video in the show
648
00:23:08,980 --> 00:23:10,340
notes too. Yeah. So the way I think
649
00:23:10,340 --> 00:23:13,400
about that, the really big models, they're basically
650
00:23:13,940 --> 00:23:16,100
not there for, like, the faint of heart.
651
00:23:16,100 --> 00:23:17,460
They're there if you know what you're doing,
652
00:23:17,460 --> 00:23:19,619
if you've got the hardware to back it,
653
00:23:19,619 --> 00:23:20,440
both CPU,
654
00:23:20,980 --> 00:23:21,480
RAM,
655
00:23:22,394 --> 00:23:23,295
and and GPU.
656
00:23:23,755 --> 00:23:25,275
So if you think about it, like, there's
657
00:23:25,275 --> 00:23:26,394
kinda like a way that you can just
658
00:23:26,394 --> 00:23:27,755
break it down into a simple set of,
659
00:23:27,755 --> 00:23:29,755
like, pros and cons. So when you're sitting
660
00:23:29,755 --> 00:23:32,234
out there, you're in that, like, three, five,
661
00:23:32,234 --> 00:23:33,535
seven billion range,
662
00:23:34,075 --> 00:23:36,269
that's gonna be fast. You can do it
663
00:23:36,269 --> 00:23:37,649
on simple low hardware,
664
00:23:37,950 --> 00:23:39,470
or you can even do it on beefier
665
00:23:39,470 --> 00:23:41,069
hardware. Like in my case, like when I'm
666
00:23:41,069 --> 00:23:43,710
on my M1 Max, typically, I'm also running
667
00:23:43,710 --> 00:23:46,109
Windows in a virtual machine. So that's typically
668
00:23:46,109 --> 00:23:48,109
got half my RAM already. And then I've
669
00:23:48,109 --> 00:23:49,470
got a little bit of RAM that's going
670
00:23:49,470 --> 00:23:50,714
to the OS and things like that as
671
00:23:50,714 --> 00:23:52,474
well. So even if I could run a
672
00:23:52,474 --> 00:23:55,194
bigger model, I'm not going to because I'm
673
00:23:55,194 --> 00:23:57,755
still having resource contention and other things. Like,
674
00:23:57,755 --> 00:23:59,194
sometimes I don't wanna shut down my VM
675
00:23:59,194 --> 00:24:00,634
or I don't wanna shut down Versus Code
676
00:24:00,634 --> 00:24:02,154
because I'm I'm using those things. Right. You
677
00:24:02,154 --> 00:24:03,980
know, smaller models, fast,
678
00:24:04,359 --> 00:24:05,420
commodity hardware,
679
00:24:06,119 --> 00:24:07,500
good enough for
680
00:24:07,799 --> 00:24:11,000
easy tasks. Like, sum summarize that transcript for
681
00:24:11,000 --> 00:24:13,500
me thing, they're gonna be great for that.
682
00:24:13,559 --> 00:24:15,640
You get into that middle range, probably your
683
00:24:15,640 --> 00:24:16,920
sweet spot, like, if you do have a
684
00:24:16,920 --> 00:24:18,779
little GPU to drive these things,
685
00:24:19,125 --> 00:24:19,865
good accuracy,
686
00:24:20,244 --> 00:24:21,545
more context awareness,
687
00:24:22,164 --> 00:24:25,125
and kinda longer context windows. So as you're
688
00:24:25,125 --> 00:24:26,984
chatting with these things, they can remember,
689
00:24:27,285 --> 00:24:29,684
quote, unquote, big air quotes here. They can
690
00:24:29,684 --> 00:24:32,149
remember what you previously typed with them. So
691
00:24:32,149 --> 00:24:34,710
having bigger context windows and and more RAM
692
00:24:34,710 --> 00:24:36,869
and VRAM from your GPUs to host those
693
00:24:36,869 --> 00:24:39,529
context windows in becomes a little bit important.
694
00:24:39,750 --> 00:24:41,210
And then, like, if you're,
695
00:24:41,990 --> 00:24:43,829
you know, a monster gamer, you've got just
696
00:24:43,829 --> 00:24:45,190
a bunch of these things laying around and
697
00:24:45,190 --> 00:24:47,414
you wanna network them all together, it's super
698
00:24:47,414 --> 00:24:48,774
easy to do that too if you got
699
00:24:48,774 --> 00:24:50,154
enough hardware running around,
700
00:24:50,615 --> 00:24:52,075
and and you can go and,
701
00:24:52,855 --> 00:24:55,335
make that happen. So once you've kinda figured
702
00:24:55,335 --> 00:24:57,575
out your your hardware and you've got a
703
00:24:57,575 --> 00:24:59,014
sense for what you wanna do and what
704
00:24:59,014 --> 00:25:01,190
you're gonna be able to run locally, well,
705
00:25:01,250 --> 00:25:03,349
then you need a way to
706
00:25:03,730 --> 00:25:04,470
run these
707
00:25:04,849 --> 00:25:05,349
things
708
00:25:05,809 --> 00:25:08,289
locally, which, you know, it's not a little
709
00:25:08,289 --> 00:25:10,210
decision to make. Right. And another thing about
710
00:25:10,210 --> 00:25:12,529
the hardware that I found interesting watching the
711
00:25:12,529 --> 00:25:15,644
NetworkChuck videos as well was because we talked
712
00:25:15,644 --> 00:25:17,904
about the Macs, Macs have, like, that shared
713
00:25:17,965 --> 00:25:20,945
memory. They don't have dedicated video memory and
714
00:25:21,005 --> 00:25:23,404
system memory. So one thing he was doing
715
00:25:23,404 --> 00:25:25,325
was when he was running these models, like,
716
00:25:25,325 --> 00:25:28,200
all the memory was going to process the
717
00:25:28,200 --> 00:25:31,419
model because it doesn't have, like, those physical
718
00:25:31,480 --> 00:25:31,980
boundaries
719
00:25:32,279 --> 00:25:32,779
between
720
00:25:33,400 --> 00:25:35,000
physical and system memory. So I think that
721
00:25:35,000 --> 00:25:36,679
was another thing to watch out for. And
722
00:25:36,679 --> 00:25:38,539
the other thing, because you mentioned networking,
723
00:25:38,919 --> 00:25:40,759
he also found, like, running a 10 gig
724
00:25:40,759 --> 00:25:43,115
network. Something I didn't realize because I've never
725
00:25:43,115 --> 00:25:45,515
done this locally, how chatty these are if
726
00:25:45,515 --> 00:25:47,674
you're running a cluster over a network. Super
727
00:25:47,674 --> 00:25:50,815
chatty. He'd, like, saturated his 10 gig network,
728
00:25:51,115 --> 00:25:53,355
and that appeared I would say, I don't
729
00:25:53,355 --> 00:25:55,035
know that it was definitive in his videos,
730
00:25:55,035 --> 00:25:56,494
but appeared to be the bottleneck
731
00:25:56,849 --> 00:25:57,670
using these,
732
00:25:58,450 --> 00:25:59,429
clustered studios.
733
00:25:59,730 --> 00:26:02,609
So then he switched to Thunderbolt, which gave
734
00:26:02,609 --> 00:26:05,490
him a 40 gig network essentially. And even
735
00:26:05,490 --> 00:26:07,809
that, he managed to saturate, get a little
736
00:26:07,809 --> 00:26:09,829
bit more speed out of it using Thunderbolt
737
00:26:09,890 --> 00:26:12,194
as opposed to a 10 gig network. But
738
00:26:12,194 --> 00:26:14,534
if you do start thinking of clustering
739
00:26:14,994 --> 00:26:15,894
larger models,
740
00:26:16,274 --> 00:26:19,075
networking is also huge when it comes into
741
00:26:19,075 --> 00:26:20,994
the hardware for these things. I don't really
742
00:26:20,994 --> 00:26:23,234
get into the network model kind of thing.
743
00:26:23,234 --> 00:26:25,075
Like, I just don't have enough hardware running
744
00:26:25,075 --> 00:26:26,890
around here at home to do it. I
745
00:26:26,890 --> 00:26:29,130
I certainly think it's interesting if you can
746
00:26:29,130 --> 00:26:31,450
get there. So, yeah, we can kinda talk
747
00:26:31,450 --> 00:26:33,369
about that maybe with, like, more advanced stuff.
748
00:26:33,369 --> 00:26:35,609
Yes. So on to software. So you got
749
00:26:35,609 --> 00:26:38,009
your hardware. You got your software. I keep
750
00:26:38,009 --> 00:26:39,865
seeing you sent LM Studio. I've not looked
751
00:26:39,865 --> 00:26:41,865
at LM Studio. The one that always seems
752
00:26:41,865 --> 00:26:43,804
to pop up for me both in
753
00:26:44,105 --> 00:26:45,704
the Home Assistant as well as in a
754
00:26:45,704 --> 00:26:47,865
lot of the network check is Ollama for
755
00:26:47,865 --> 00:26:50,765
running these locally. Your decision here is
756
00:26:51,144 --> 00:26:53,144
how geeky do you wanna be and and
757
00:26:53,144 --> 00:26:54,204
what is your workflow?
758
00:26:54,679 --> 00:26:56,619
So if your primary workflow
759
00:26:57,079 --> 00:26:57,579
is
760
00:26:58,119 --> 00:26:59,259
you just want to
761
00:26:59,720 --> 00:27:02,119
chat with a a chatbot, like, you wanna
762
00:27:02,119 --> 00:27:03,960
hop right in, you wanna download a model,
763
00:27:03,960 --> 00:27:05,400
and you wanna be able to chat right
764
00:27:05,400 --> 00:27:07,480
away in, like, a nice GUI and a
765
00:27:07,480 --> 00:27:08,539
graphical interface,
766
00:27:08,924 --> 00:27:11,245
LM Studio is great for that. There's like,
767
00:27:11,245 --> 00:27:12,285
if you go out and you look this
768
00:27:12,285 --> 00:27:14,045
stuff up and you hop on Reddit or
769
00:27:14,045 --> 00:27:15,025
things like that,
770
00:27:15,485 --> 00:27:17,485
there there's going to be that set of
771
00:27:17,485 --> 00:27:19,825
folks out there who hate LM Studio
772
00:27:20,205 --> 00:27:22,369
because it's closed source,
773
00:27:22,670 --> 00:27:24,349
but, you know, I I'm just looking to
774
00:27:24,349 --> 00:27:26,190
play with these things. So for what I
775
00:27:26,190 --> 00:27:27,809
wanna do, it certainly
776
00:27:28,509 --> 00:27:29,009
works
777
00:27:29,309 --> 00:27:31,390
works great. Comes together, does what I need
778
00:27:31,390 --> 00:27:33,950
it to do. That said, you can also
779
00:27:33,950 --> 00:27:35,089
do Ollama,
780
00:27:35,695 --> 00:27:38,414
and Ollama is gonna be more command line
781
00:27:38,414 --> 00:27:40,734
driven, like you're gonna do more installations from
782
00:27:40,734 --> 00:27:42,335
the command line, you're even gonna download your
783
00:27:42,335 --> 00:27:44,355
models from the command line, so you're kinda
784
00:27:44,575 --> 00:27:46,975
trading off ease of use there. There's pros
785
00:27:46,975 --> 00:27:48,654
and cons to both depending on what you're
786
00:27:48,654 --> 00:27:50,639
doing. LM Studio is great if you just
787
00:27:50,639 --> 00:27:52,639
want to chat, you want to immediately have
788
00:27:52,639 --> 00:27:54,179
OpenAI spec ed endpoints
789
00:27:54,880 --> 00:27:57,039
exposed maybe to things like Versus Code locally,
790
00:27:57,039 --> 00:27:58,399
and you just don't want to wire anything
791
00:27:58,399 --> 00:27:59,919
up. You're looking for just like a one
792
00:27:59,919 --> 00:28:01,919
shot install, and you're going to be one
793
00:28:01,919 --> 00:28:03,244
and done. The other way you can do
794
00:28:03,244 --> 00:28:05,565
it is you can go to Ollama, and
795
00:28:05,565 --> 00:28:07,804
you can find your model that you wanna
796
00:28:07,804 --> 00:28:09,884
run on there. So, you know, I wanna
797
00:28:09,884 --> 00:28:12,524
run llama two seven billion, and you'll go
798
00:28:12,524 --> 00:28:14,764
download that, and you're gonna do all this
799
00:28:14,764 --> 00:28:16,919
from the command line. Now you wanna chat
800
00:28:16,919 --> 00:28:17,740
with that thing.
801
00:28:18,200 --> 00:28:18,700
Well,
802
00:28:19,000 --> 00:28:20,919
you can certainly chat with it from the
803
00:28:20,919 --> 00:28:23,480
command line. That that's totally a possibility. If
804
00:28:23,480 --> 00:28:25,659
if that's your jam or your jelly, awesome.
805
00:28:25,720 --> 00:28:27,880
Go for it. But if you want to
806
00:28:27,880 --> 00:28:29,559
chat with it in a GUI, now you
807
00:28:29,559 --> 00:28:31,424
gotta go install something else. Like, you might
808
00:28:31,424 --> 00:28:32,644
have to go install
809
00:28:33,184 --> 00:28:34,244
open web UI
810
00:28:34,704 --> 00:28:36,865
to to to get that piece going and
811
00:28:36,865 --> 00:28:38,944
and stand all that up. So it's not
812
00:28:38,944 --> 00:28:41,345
like it's hard to do. It's just your
813
00:28:41,345 --> 00:28:43,345
your flavor and and and where you sit
814
00:28:43,345 --> 00:28:44,565
and where you wanna land.
815
00:28:44,950 --> 00:28:46,309
You know, if I'm looking to just do
816
00:28:46,309 --> 00:28:48,150
things quickly and, like, I'm just in there
817
00:28:48,150 --> 00:28:50,549
to maybe, like, oh, hey. I see Microsoft
818
00:28:50,549 --> 00:28:52,970
released a new model for 05/04,
819
00:28:53,269 --> 00:28:55,590
and they they were they, you know, just
820
00:28:55,590 --> 00:28:57,590
pushed new models for 05/03 and '5 '4,
821
00:28:57,590 --> 00:28:59,690
and I I wanna compare those two things.
822
00:29:00,025 --> 00:29:02,265
I'll probably just spin those up in LM
823
00:29:02,265 --> 00:29:04,664
Studio. Super easy. Next, next, next my way
824
00:29:04,664 --> 00:29:06,025
through it. I don't have to remember a
825
00:29:06,025 --> 00:29:08,365
bunch of command line parameters, things like that.
826
00:29:08,424 --> 00:29:11,065
If I'm doing more like application development and
827
00:29:11,065 --> 00:29:13,059
I'm thinking about, like, hey. I want to
828
00:29:13,059 --> 00:29:14,340
stand this thing up. I wanna have it
829
00:29:14,340 --> 00:29:16,200
running in the background. I want some endpoints
830
00:29:16,259 --> 00:29:17,779
that are exposed. Maybe I can build, like,
831
00:29:17,779 --> 00:29:19,779
an app that's doing, like, some light rag
832
00:29:19,779 --> 00:29:21,700
or some fine tuning on top of it,
833
00:29:21,700 --> 00:29:23,460
and I've got, like, a Python script over
834
00:29:23,460 --> 00:29:24,920
here that needs to talk to the model.
835
00:29:25,059 --> 00:29:28,440
Awesome. Great. Like, that's that's where Ollama sits,
836
00:29:28,795 --> 00:29:31,595
and it has its space ready to go
837
00:29:31,595 --> 00:29:34,154
for you. So much like picking a model
838
00:29:34,154 --> 00:29:36,075
size, you're you're just doing a pros and
839
00:29:36,075 --> 00:29:37,355
cons and a little bit of a trade
840
00:29:37,355 --> 00:29:39,595
off thing. So Ollama, if you want a
841
00:29:39,595 --> 00:29:42,394
simple command line experience and you're comfortable at
842
00:29:42,394 --> 00:29:45,609
the terminal, go for it. Windows, macOS, Linux,
843
00:29:45,609 --> 00:29:47,930
it's all there. LM Studio, if you're not
844
00:29:47,930 --> 00:29:51,049
opposed to closed source and you just want
845
00:29:51,049 --> 00:29:52,809
a GUI from the start for all the
846
00:29:52,809 --> 00:29:55,390
things, for downloading, for chatting, for,
847
00:29:56,009 --> 00:29:58,934
all all that stuff. Again, macOS, Windows, Linux,
848
00:29:59,015 --> 00:30:01,335
ready to go. It's just closed source versus
849
00:30:01,335 --> 00:30:03,174
open source is really how I think about
850
00:30:03,174 --> 00:30:05,575
it. And then if you really do go
851
00:30:05,575 --> 00:30:08,055
down the Ollama path, you're probably gonna end
852
00:30:08,055 --> 00:30:09,735
up in a space where you wanna run
853
00:30:09,735 --> 00:30:12,089
a local chat UI, like a web based
854
00:30:12,490 --> 00:30:13,789
chatbot style thing,
855
00:30:14,169 --> 00:30:14,669
and
856
00:30:15,049 --> 00:30:17,609
then you'll just use something like Open Web
857
00:30:17,609 --> 00:30:19,849
UI for that. And, again, super easy to
858
00:30:19,849 --> 00:30:22,589
install. You're just basically hosting a little
859
00:30:22,970 --> 00:30:25,369
a little web server locally that knows how
860
00:30:25,369 --> 00:30:25,869
to
861
00:30:26,394 --> 00:30:29,454
chat with chat with that model. And then
862
00:30:29,674 --> 00:30:31,035
it could be a little bit different depending
863
00:30:31,035 --> 00:30:32,715
on like the extension tooling that you're going
864
00:30:32,715 --> 00:30:34,075
to use from there. So I talked about
865
00:30:34,075 --> 00:30:36,234
maybe like integrating Versus Code with one of
866
00:30:36,234 --> 00:30:36,974
these locally.
867
00:30:37,434 --> 00:30:38,974
So if you're doing
868
00:30:39,490 --> 00:30:41,730
Versus Code, you're gonna typically go grab an
869
00:30:41,730 --> 00:30:43,909
extension. So there's things like CodeGPT,
870
00:30:44,369 --> 00:30:45,909
there's continue dot dev,
871
00:30:46,210 --> 00:30:48,929
there's an Ollama extension, which can actually just
872
00:30:48,929 --> 00:30:50,869
talk natively to your Ollama endpoint.
873
00:30:51,329 --> 00:30:54,309
Or like I said, LM Studio exposes OpenAI
874
00:30:55,005 --> 00:30:55,505
compatible
875
00:30:56,045 --> 00:30:58,224
endpoints. So that's kind of a known, like,
876
00:30:58,845 --> 00:31:00,605
you know, web interface that you can throw
877
00:31:00,605 --> 00:31:02,464
a request at in a structured way,
878
00:31:02,765 --> 00:31:04,765
and it will respond in a in a
879
00:31:04,765 --> 00:31:06,464
way that most of the extensions
880
00:31:07,085 --> 00:31:09,265
are going to understand
881
00:31:09,644 --> 00:31:11,140
and get you ramped up for and ready
882
00:31:11,140 --> 00:31:13,240
to go with. Yeah, looking through this and
883
00:31:13,539 --> 00:31:14,900
most of the videos I saw, and again,
884
00:31:14,900 --> 00:31:17,539
were all Olamae, even the command line based
885
00:31:17,539 --> 00:31:18,839
looked really
886
00:31:19,299 --> 00:31:22,200
simple, lots of guides to just walk through,
887
00:31:22,579 --> 00:31:24,524
type this in, this is how you tie
888
00:31:24,524 --> 00:31:25,884
that in, this is how you go stand
889
00:31:25,884 --> 00:31:26,704
up the WebUI,
890
00:31:27,164 --> 00:31:29,825
point WebUI, to all of those.
891
00:31:30,204 --> 00:31:31,424
So none of this
892
00:31:31,964 --> 00:31:34,444
really seemed that complicated in everything I watched
893
00:31:34,444 --> 00:31:36,605
and, again, made me excited, like, I need
894
00:31:36,605 --> 00:31:38,980
to go try this out and go find
895
00:31:38,980 --> 00:31:40,279
a computer that I can
896
00:31:40,660 --> 00:31:43,220
absolutely bury with a model. See what I
897
00:31:43,220 --> 00:31:44,660
can do. See what damage I can do
898
00:31:44,660 --> 00:31:46,900
to my computer, Scott. It is not hard
899
00:31:46,900 --> 00:31:48,420
to do. So the other thing that you
900
00:31:48,420 --> 00:31:50,180
can do, if you're comfortable on the command
901
00:31:50,180 --> 00:31:52,954
line, there's another project out there that's called
902
00:31:52,954 --> 00:31:53,454
Fabric.
903
00:31:53,835 --> 00:31:55,775
So Fabric is kind of a
904
00:31:56,794 --> 00:31:59,454
it it allows you to easily network and
905
00:31:59,835 --> 00:32:02,075
distribute traffic across multiple nodes, but you can
906
00:32:02,075 --> 00:32:03,539
also do it on a single node. So
907
00:32:03,619 --> 00:32:05,299
So I was talking earlier about, like, that,
908
00:32:05,299 --> 00:32:07,940
you know, sermon summarization thing. Yep. And that's
909
00:32:07,940 --> 00:32:10,259
all based on Fabric. So Fabric, again, command
910
00:32:10,259 --> 00:32:12,660
line, it can run with local LLMs. It's
911
00:32:12,660 --> 00:32:13,720
a little kinda
912
00:32:14,100 --> 00:32:15,859
opaque for for how it does it. So,
913
00:32:15,859 --> 00:32:17,299
you know, make sure you download one of
914
00:32:17,299 --> 00:32:19,825
the the newer versions of it, And Fabric
915
00:32:19,825 --> 00:32:21,424
is all run from the command line as
916
00:32:21,424 --> 00:32:24,065
well. But then you can super easily integrate
917
00:32:24,065 --> 00:32:27,105
Fabric into things like bash scripts. So, like,
918
00:32:27,105 --> 00:32:29,505
I use it for the same thing. Like,
919
00:32:29,505 --> 00:32:31,264
if I think about the the podcast, I
920
00:32:31,264 --> 00:32:33,450
just have a bash script that runs Whispir
921
00:32:33,450 --> 00:32:35,230
locally. So Whispir is
922
00:32:36,089 --> 00:32:38,349
a speech to text model Yep. That OpenAI,
923
00:32:38,410 --> 00:32:39,929
and I can run that locally. Like, that
924
00:32:39,929 --> 00:32:41,929
runs on my hardware just fine. So I've
925
00:32:41,929 --> 00:32:43,529
just got a little bash script that takes
926
00:32:43,529 --> 00:32:45,884
that, generates the transcript, and then I just
927
00:32:45,964 --> 00:32:48,765
pipe the summaries out into Fabric to have
928
00:32:48,765 --> 00:32:49,585
those for myself
929
00:32:49,964 --> 00:32:51,644
in just my notes on the side. Right?
930
00:32:51,644 --> 00:32:53,164
Like, hey, here's the things we talked about
931
00:32:53,164 --> 00:32:55,825
and and how they're coming together. So
932
00:32:56,285 --> 00:32:58,684
very, very, very easy to get on with
933
00:32:58,684 --> 00:33:00,039
this stuff. And I think for most of
934
00:33:00,039 --> 00:33:01,559
our audience as well, like you folks are
935
00:33:01,559 --> 00:33:03,160
all comfortable on the command line. You don't
936
00:33:03,160 --> 00:33:04,680
need a GUI for this stuff. You can
937
00:33:04,680 --> 00:33:05,740
follow some instructions
938
00:33:06,119 --> 00:33:08,039
and wire these up. And we're not talking
939
00:33:08,039 --> 00:33:10,840
like super complicated things. We're basically talking the
940
00:33:10,840 --> 00:33:13,274
equivalent of like a brew or a chocolatey
941
00:33:13,274 --> 00:33:15,674
install or a Winget install, like just little
942
00:33:15,674 --> 00:33:17,194
one liners to get all this stuff up
943
00:33:17,194 --> 00:33:18,875
and running. Absolutely. You don't need to go
944
00:33:18,875 --> 00:33:21,115
write 50 line PowerShell scripts or pipe a
945
00:33:21,115 --> 00:33:24,174
bunch of things. It's really straightforward from everything
946
00:33:24,394 --> 00:33:26,474
I saw. Super easy to get up and
947
00:33:26,474 --> 00:33:28,559
going with that. I would say, like, the
948
00:33:28,559 --> 00:33:30,160
other thing you might wanna do a little
949
00:33:30,160 --> 00:33:30,980
bit is
950
00:33:31,440 --> 00:33:32,980
when you're exploring models.
951
00:33:33,440 --> 00:33:35,759
So if you go into, like, LM Studio
952
00:33:35,759 --> 00:33:37,920
and you're going through their model catalog or
953
00:33:37,920 --> 00:33:40,799
you're on, Ollama and you're exploring their model
954
00:33:40,799 --> 00:33:42,755
catalog, you might wanna just start with, like,
955
00:33:42,755 --> 00:33:45,474
some of the more popular ones to get
956
00:33:45,474 --> 00:33:47,954
up and running. So, you know, there there
957
00:33:47,954 --> 00:33:49,974
are differences between these things,
958
00:33:50,355 --> 00:33:52,115
you know, depending on what you're doing. Like,
959
00:33:52,115 --> 00:33:53,575
you can't go ask DeepSeek
960
00:33:54,119 --> 00:33:56,920
what happened in Tiananmen Square. Like, that is
961
00:33:56,920 --> 00:33:58,380
not programmed into that model,
962
00:33:58,839 --> 00:34:00,359
e even in the one that you you
963
00:34:00,359 --> 00:34:01,500
download and
964
00:34:01,799 --> 00:34:03,640
you run locally, but, you know, you can
965
00:34:03,640 --> 00:34:06,279
do that with, other stuff. So these models
966
00:34:06,279 --> 00:34:07,720
all vary. The other thing that you can
967
00:34:07,720 --> 00:34:09,000
do is you can go through the model
968
00:34:09,000 --> 00:34:09,500
catalogs,
969
00:34:09,855 --> 00:34:12,414
and you can find models that are purpose
970
00:34:12,414 --> 00:34:13,875
built for certain things.
971
00:34:14,335 --> 00:34:16,974
So there are models that are generated within
972
00:34:16,974 --> 00:34:19,215
these families. So you talk about, like, LAMA.
973
00:34:19,215 --> 00:34:21,215
There's gonna be versions of the LAMA model
974
00:34:21,215 --> 00:34:23,775
that are better for doing coding assistance things
975
00:34:23,775 --> 00:34:26,039
with it than there are for doing just
976
00:34:26,039 --> 00:34:28,460
straight one shot text summarization,
977
00:34:29,320 --> 00:34:30,619
stuff like that. So
978
00:34:30,920 --> 00:34:32,280
you you have to think through that a
979
00:34:32,280 --> 00:34:35,019
little bit too, like, just what's your workflow
980
00:34:35,480 --> 00:34:35,980
and
981
00:34:36,360 --> 00:34:38,300
what are you trying to
982
00:34:38,840 --> 00:34:39,579
get at
983
00:34:40,074 --> 00:34:41,054
along the way?
984
00:34:41,355 --> 00:34:42,954
And then be prepared for a little bit
985
00:34:42,954 --> 00:34:44,094
of latency
986
00:34:44,394 --> 00:34:46,315
and maybe differences in perf when you're running
987
00:34:46,315 --> 00:34:48,315
with these things. I think lots of people
988
00:34:48,315 --> 00:34:49,675
set out and they say, oh, I'm gonna
989
00:34:49,675 --> 00:34:50,875
be able to run that model locally, and
990
00:34:50,875 --> 00:34:52,315
it's gonna be so much faster because it
991
00:34:52,315 --> 00:34:53,594
doesn't need to go out and talk to
992
00:34:53,594 --> 00:34:55,430
the Internet. Like, it doesn't need to talk
993
00:34:55,430 --> 00:34:57,430
to Claude. It it it doesn't need to
994
00:34:57,430 --> 00:35:00,390
talk to chat GPT, anything like that. Yeah.
995
00:35:00,390 --> 00:35:03,110
Like, absolutely. You've eliminated the latency of that
996
00:35:03,110 --> 00:35:05,269
whole, like, request response thing having to traverse
997
00:35:05,269 --> 00:35:05,930
the Internet,
998
00:35:06,309 --> 00:35:08,309
but you still have to have the hardware
999
00:35:08,309 --> 00:35:10,150
that's capable of running this and standing it
1000
00:35:10,150 --> 00:35:12,224
all up. So you might wanna even, like,
1001
00:35:12,224 --> 00:35:13,744
play around before you integrate these things. Like,
1002
00:35:13,744 --> 00:35:15,764
if you're interested in, like, a coding workflow
1003
00:35:16,144 --> 00:35:18,304
with or integrating with Versus Code, things like
1004
00:35:18,304 --> 00:35:20,065
that, you'll probably wanna play around with the
1005
00:35:20,065 --> 00:35:21,744
the models a little bit locally to find
1006
00:35:21,744 --> 00:35:23,424
the one that's got the the sweet spot
1007
00:35:23,424 --> 00:35:24,724
for you based on
1008
00:35:25,089 --> 00:35:27,409
number of parameters, your hardware, things like that
1009
00:35:27,409 --> 00:35:29,089
before you go down the path of integrating
1010
00:35:29,089 --> 00:35:30,869
it in Versus Code and then being disappointed
1011
00:35:30,929 --> 00:35:32,929
that it's too slow or or things like
1012
00:35:32,929 --> 00:35:34,690
that. There's a lot of blogs out there
1013
00:35:34,690 --> 00:35:37,010
that'll just tell you, like, oh, running AI
1014
00:35:37,010 --> 00:35:39,889
locally, like, it's super fast. It's it's super
1015
00:35:39,889 --> 00:35:41,775
easy. It is super easy. It's not always
1016
00:35:41,775 --> 00:35:43,295
super fast. So you so you do have
1017
00:35:43,295 --> 00:35:44,815
to be prepared for that depending on your
1018
00:35:44,815 --> 00:35:47,135
hardware. Yeah. Along with the model, Scott, this
1019
00:35:47,135 --> 00:35:49,214
is another thing again, being fairly new to
1020
00:35:49,214 --> 00:35:50,355
this, have you
1021
00:35:50,734 --> 00:35:52,974
compared at all? Because another thing you can
1022
00:35:52,974 --> 00:35:55,139
run into is quantization of these models. Right?
1023
00:35:55,139 --> 00:35:57,299
And this is something else Network Chuck talked
1024
00:35:57,299 --> 00:35:59,139
about in one of his where some of
1025
00:35:59,139 --> 00:36:01,799
these larger models, they quantize.
1026
00:36:02,260 --> 00:36:03,699
I don't know if that's the word. They
1027
00:36:03,699 --> 00:36:06,579
quantize them down, and it sounds like it's
1028
00:36:06,579 --> 00:36:07,719
essentially taking
1029
00:36:08,184 --> 00:36:10,444
different aspects of the model. And inside
1030
00:36:10,904 --> 00:36:14,025
models, they have model weights with, like, 32
1031
00:36:14,025 --> 00:36:16,424
bit precision, and they reduce these down to
1032
00:36:16,424 --> 00:36:18,444
eight bit, four bit precision,
1033
00:36:18,904 --> 00:36:20,984
which makes them not as accurate but makes
1034
00:36:20,984 --> 00:36:23,644
them smaller so you can run a
1035
00:36:24,239 --> 00:36:26,320
larger model. Some of those bigger ones we
1036
00:36:26,320 --> 00:36:28,179
talked about like 65,000,000,000
1037
00:36:28,639 --> 00:36:29,460
plus parameters
1038
00:36:30,159 --> 00:36:31,300
on less hardware,
1039
00:36:31,920 --> 00:36:33,380
but with more
1040
00:36:34,239 --> 00:36:35,219
not the accuracy
1041
00:36:35,679 --> 00:36:38,420
versus running maybe a model with less parameters,
1042
00:36:38,719 --> 00:36:40,474
but you get the full
1043
00:36:40,855 --> 00:36:42,695
the full model weights in there where you're
1044
00:36:42,695 --> 00:36:45,195
running the 32 bit precision instead of quantasize
1045
00:36:45,255 --> 00:36:47,894
them down. Again, when you're downloading models, definitely
1046
00:36:47,894 --> 00:36:50,215
something to watch out for because if these
1047
00:36:50,215 --> 00:36:51,434
are quantasized
1048
00:36:52,630 --> 00:36:54,389
and they have smaller, they can be less
1049
00:36:54,389 --> 00:36:56,389
accurate, you can run them. Like, have you
1050
00:36:56,389 --> 00:36:58,789
ever compared those of let's go run a
1051
00:36:58,789 --> 00:37:03,050
30,000,000,000 parameter model on local hardware versus a,
1052
00:37:03,829 --> 00:37:05,050
65,000,000,000
1053
00:37:05,109 --> 00:37:08,974
model or parameter model that's quantized down to
1054
00:37:08,974 --> 00:37:10,735
eight bit instead of 32 bit? I don't
1055
00:37:10,735 --> 00:37:13,295
think many folks are running 32 bit. Most
1056
00:37:13,295 --> 00:37:14,595
are probably running
1057
00:37:15,135 --> 00:37:17,855
Four bit. Some kind of like well, something
1058
00:37:17,855 --> 00:37:21,329
like 16 or lower, so like four, eight,
1059
00:37:21,730 --> 00:37:22,849
16. I think when you go out and,
1060
00:37:22,849 --> 00:37:24,710
like, you watch a lot of YouTube videos
1061
00:37:24,849 --> 00:37:25,590
and and,
1062
00:37:25,969 --> 00:37:27,250
you know, if if you do go down
1063
00:37:27,250 --> 00:37:28,369
this path and you start getting into it,
1064
00:37:28,369 --> 00:37:29,890
I think YouTube is a great place to
1065
00:37:29,890 --> 00:37:31,730
go to and start to see. You'll see
1066
00:37:31,730 --> 00:37:34,469
lots of people playing around with massive models,
1067
00:37:35,025 --> 00:37:36,484
but with a
1068
00:37:37,025 --> 00:37:39,105
like, only, like, four bits. Right. So they're
1069
00:37:39,105 --> 00:37:40,864
doing that just so they can run it,
1070
00:37:40,864 --> 00:37:42,704
not so they can run it effectively to
1071
00:37:42,704 --> 00:37:44,704
drive a workflow. Like, they're just trying to
1072
00:37:44,704 --> 00:37:46,304
try it out and see how many tokens
1073
00:37:46,304 --> 00:37:47,744
a second they can get out of it
1074
00:37:47,744 --> 00:37:49,599
or something like that. So a four bit
1075
00:37:49,599 --> 00:37:50,820
model is
1076
00:37:51,440 --> 00:37:52,980
absolutely going to
1077
00:37:53,280 --> 00:37:56,019
run on, like, consumer grade GPUs, CPUs.
1078
00:37:56,800 --> 00:37:58,320
Like, you're gonna be all good, ready to
1079
00:37:58,320 --> 00:38:00,260
go there, but you have to know that
1080
00:38:00,320 --> 00:38:03,295
it's been extremely compressed. So it can get
1081
00:38:03,295 --> 00:38:05,394
it down to a smaller download size,
1082
00:38:05,695 --> 00:38:08,574
and thus, it's going to take less memory
1083
00:38:08,574 --> 00:38:10,974
and less processing power to go ahead and
1084
00:38:10,974 --> 00:38:13,235
run it. So you might be running like,
1085
00:38:13,295 --> 00:38:14,894
you know, like if I think about, like,
1086
00:38:14,894 --> 00:38:17,074
the transformer that's running in iOS,
1087
00:38:17,489 --> 00:38:18,389
that's probably
1088
00:38:19,090 --> 00:38:20,769
a a a four bit model. Right? Like,
1089
00:38:20,769 --> 00:38:22,849
it's sitting there. It's running on commodity hardware
1090
00:38:22,929 --> 00:38:24,690
Right. And it's just doing what it needs
1091
00:38:24,690 --> 00:38:27,269
to do. Now, if I'm on my desktop
1092
00:38:27,409 --> 00:38:29,110
or or my m one
1093
00:38:29,565 --> 00:38:31,585
MacBook, you know, I might be thinking about
1094
00:38:31,964 --> 00:38:33,964
an eight bit model, and I'm okay with
1095
00:38:33,964 --> 00:38:36,045
the performance trade off. Like, I'm I'm okay
1096
00:38:36,045 --> 00:38:37,804
if it chats with me at, like, you
1097
00:38:37,804 --> 00:38:40,364
know, like, two tokens a second kinda thing.
1098
00:38:40,364 --> 00:38:42,659
Like, it can be super slow. It's it's
1099
00:38:42,659 --> 00:38:44,500
okay. But you're not gonna run these, like,
1100
00:38:44,500 --> 00:38:48,039
massive models because those are absolutely running in
1101
00:38:48,099 --> 00:38:48,599
those
1102
00:38:49,059 --> 00:38:51,619
massive data centers and and that set of
1103
00:38:51,619 --> 00:38:54,099
infrastructure. Like, I I just wanna be clear.
1104
00:38:54,099 --> 00:38:56,339
Like, you can't do the things that, like,
1105
00:38:56,339 --> 00:38:56,839
ChatGPT
1106
00:38:57,139 --> 00:38:58,875
can do with, like, o one running in
1107
00:38:58,875 --> 00:38:59,695
their data center
1108
00:38:59,994 --> 00:39:01,755
locally at your house. Like, that's just not
1109
00:39:01,755 --> 00:39:03,434
the way these things work. It's it's not
1110
00:39:03,434 --> 00:39:05,114
how they come together. So if you think
1111
00:39:05,114 --> 00:39:07,675
about, like, the the whole quantization thing, it's
1112
00:39:07,675 --> 00:39:08,414
all about
1113
00:39:08,715 --> 00:39:11,820
packing things down and basically, like, archiving them,
1114
00:39:11,820 --> 00:39:13,420
right? Put a tar or zip together of
1115
00:39:13,420 --> 00:39:13,920
this
1116
00:39:14,220 --> 00:39:17,039
thing and reduce the size, reduce the computational
1117
00:39:17,340 --> 00:39:17,840
requirements,
1118
00:39:18,780 --> 00:39:21,019
all that kind of stuff. So you're going
1119
00:39:21,019 --> 00:39:23,180
to get small models. Hey, that's great. They're
1120
00:39:23,180 --> 00:39:24,480
going to use less memory,
1121
00:39:24,974 --> 00:39:26,414
and you might be able to run a
1122
00:39:26,414 --> 00:39:28,655
larger model. Like, you could run a four
1123
00:39:28,655 --> 00:39:31,934
bit, you know, 30,000,000,000 parameter model, but it's
1124
00:39:31,934 --> 00:39:34,575
gonna be less accurate. And is accuracy important
1125
00:39:34,575 --> 00:39:36,255
to you? Well, you might wanna go to,
1126
00:39:36,255 --> 00:39:39,075
like, an eight bit like, 7,000,000,000 parameter model,
1127
00:39:39,690 --> 00:39:41,369
something like that. So it it's gonna be
1128
00:39:41,369 --> 00:39:43,150
very dependent on, like, your workflow
1129
00:39:43,609 --> 00:39:44,109
and
1130
00:39:44,570 --> 00:39:45,070
your
1131
00:39:45,449 --> 00:39:47,150
use case for these things.
1132
00:39:47,449 --> 00:39:49,130
I think the biggest thing you miss out
1133
00:39:49,130 --> 00:39:50,510
on is accuracy.
1134
00:39:51,210 --> 00:39:53,164
So, you know, like, if I'm summarizing
1135
00:39:53,625 --> 00:39:55,864
the podcast transcripts, I want those to be
1136
00:39:55,864 --> 00:39:57,465
kind of accurate. Like, I I don't want
1137
00:39:57,465 --> 00:39:59,305
them to just be hallucinating all over the
1138
00:39:59,305 --> 00:39:59,805
place.
1139
00:40:00,344 --> 00:40:00,844
But,
1140
00:40:01,305 --> 00:40:04,344
you know, if I'm doing something else, like,
1141
00:40:04,344 --> 00:40:06,744
hey. Help me write a poem about, you
1142
00:40:06,744 --> 00:40:09,309
know, iPads. Like, whatever. Do it with all
1143
00:40:09,309 --> 00:40:10,210
the least accuracy
1144
00:40:10,829 --> 00:40:12,289
that you want out there
1145
00:40:12,750 --> 00:40:14,829
along the way. I think the most common
1146
00:40:14,829 --> 00:40:16,269
thing, like so the other thing you run
1147
00:40:16,269 --> 00:40:17,170
into with quantization
1148
00:40:17,710 --> 00:40:18,210
is
1149
00:40:18,589 --> 00:40:20,510
there there's a bunch of different methods for
1150
00:40:20,510 --> 00:40:21,764
this. So
1151
00:40:22,144 --> 00:40:22,644
there's
1152
00:40:23,025 --> 00:40:23,525
Q,
1153
00:40:23,985 --> 00:40:26,324
which is basically like four bit quantization.
1154
00:40:27,105 --> 00:40:28,885
There's another format called
1155
00:40:29,344 --> 00:40:32,304
g g u f. So that's kind of,
1156
00:40:32,304 --> 00:40:34,724
like, the standard for running these things
1157
00:40:35,099 --> 00:40:36,539
efficiently. So you'll see a lot of these
1158
00:40:36,539 --> 00:40:38,140
things when you go in like, what's the
1159
00:40:38,140 --> 00:40:40,380
format of the model? Oh, it's a g
1160
00:40:40,380 --> 00:40:41,660
g u f. I don't even know how
1161
00:40:41,660 --> 00:40:42,239
it's pronounced.
1162
00:40:42,539 --> 00:40:44,300
But, you know, you can go in and
1163
00:40:44,300 --> 00:40:45,440
and grab those things
1164
00:40:46,140 --> 00:40:48,174
and and figure those out. So you can
1165
00:40:48,174 --> 00:40:50,434
think of, like, quantization maybe as, like, another
1166
00:40:50,974 --> 00:40:52,815
weight that you can put on that scale
1167
00:40:52,815 --> 00:40:54,914
when you're trying to find that balance between
1168
00:40:55,534 --> 00:40:58,114
model size, parameter count, quantization,
1169
00:40:58,574 --> 00:41:00,809
and the hardware that you run and the
1170
00:41:00,809 --> 00:41:02,489
workload that you wanna do. So how does
1171
00:41:02,489 --> 00:41:04,090
that scale tip and where do you wanna
1172
00:41:04,090 --> 00:41:06,650
land? It just becomes an another consideration in
1173
00:41:06,650 --> 00:41:09,610
there for you. Sounds good. Anything else before
1174
00:41:09,610 --> 00:41:12,090
wrapping this episode up? So a couple things.
1175
00:41:12,090 --> 00:41:13,930
If folks haven't done this yet, like Go
1176
00:41:13,930 --> 00:41:16,074
do it. You should totally go out and
1177
00:41:16,074 --> 00:41:18,315
just try and play around with Ollama LM
1178
00:41:18,315 --> 00:41:18,815
Studio.
1179
00:41:19,114 --> 00:41:20,795
If you're already doing it today, come back
1180
00:41:20,795 --> 00:41:22,315
and give us some feedback. Let let us
1181
00:41:22,315 --> 00:41:24,635
know what you're using it for. I think
1182
00:41:24,635 --> 00:41:27,594
there's all sorts of interesting use cases for
1183
00:41:27,594 --> 00:41:29,880
this stuff. We're just getting Ben started on
1184
00:41:29,880 --> 00:41:31,800
his list. Let's make his list a lot
1185
00:41:31,800 --> 00:41:34,519
longer for things that he is missing out
1186
00:41:34,519 --> 00:41:36,679
in his life. Home assistant and AI. He
1187
00:41:36,679 --> 00:41:38,539
needs to do to run a
1188
00:41:38,920 --> 00:41:40,839
chat model locally. And then if you're doing
1189
00:41:40,839 --> 00:41:42,519
other things besides chat models, like I said,
1190
00:41:42,519 --> 00:41:45,265
there's the stable diffusions of the world, there's
1191
00:41:45,265 --> 00:41:48,065
image generation, there's whisper, there's all these other
1192
00:41:48,065 --> 00:41:50,625
things out there. I was very surprised at
1193
00:41:50,625 --> 00:41:52,704
how approachable they are. I always thought this
1194
00:41:52,704 --> 00:41:54,885
was going to be like mystical dark arts
1195
00:41:55,025 --> 00:41:57,184
and magic and not for mere mortals kind
1196
00:41:57,184 --> 00:41:59,400
of thing. It's very much for mere mortals,
1197
00:41:59,480 --> 00:42:01,799
Like, super easy to get started with, super
1198
00:42:01,799 --> 00:42:02,299
turnkey,
1199
00:42:02,679 --> 00:42:04,839
and I would guarantee that almost anybody who
1200
00:42:04,839 --> 00:42:06,920
listens to this podcast probably has the hardware
1201
00:42:06,920 --> 00:42:08,359
to run this stuff and make it happen.
1202
00:42:08,359 --> 00:42:10,940
I'm actually excited to go try this out
1203
00:42:11,000 --> 00:42:13,005
and play around with it. I did find
1204
00:42:13,005 --> 00:42:15,085
an article too on a Raspberry Pi cluster
1205
00:42:15,085 --> 00:42:16,364
for AI. I don't know if I'm gonna
1206
00:42:16,364 --> 00:42:18,204
try that or use an extra Mac mini
1207
00:42:18,204 --> 00:42:20,385
I have sitting around here to start, but
1208
00:42:20,444 --> 00:42:22,045
I, like you, I would love to hear
1209
00:42:22,045 --> 00:42:23,404
what other people are doing. If you are
1210
00:42:23,404 --> 00:42:25,404
running them locally, what are you using them
1211
00:42:25,404 --> 00:42:26,144
for locally,
1212
00:42:26,764 --> 00:42:27,824
different use cases,
1213
00:42:28,364 --> 00:42:29,699
where have you found a good place to
1214
00:42:29,699 --> 00:42:31,000
start, all the things.
1215
00:42:31,300 --> 00:42:33,380
So if you do want to join us
1216
00:42:33,380 --> 00:42:34,679
and discuss these things,
1217
00:42:34,980 --> 00:42:37,059
we need to redo our outro, Scott, because
1218
00:42:37,059 --> 00:42:38,659
I think that has changed. I think we
1219
00:42:38,659 --> 00:42:40,659
actually still have Twitter in it. Let's not
1220
00:42:40,659 --> 00:42:43,414
say Twitter. Let's say probably Blue Sky. Are
1221
00:42:43,414 --> 00:42:44,855
you more active on Blue Sky right now
1222
00:42:44,855 --> 00:42:46,375
than any other one? Pick one anyway. Anyone
1223
00:42:46,375 --> 00:42:48,054
that's not Twitter, you can find Scott on,
1224
00:42:48,054 --> 00:42:49,494
except that I can never find you on
1225
00:42:49,494 --> 00:42:51,655
Blue Sky because you chose a weird handle
1226
00:42:51,655 --> 00:42:53,655
that isn't the same as any of your
1227
00:42:53,655 --> 00:42:55,894
other social media. You need to go grab
1228
00:42:55,894 --> 00:42:57,494
a new handle on Blue Sky that matches
1229
00:42:57,494 --> 00:42:59,789
everything else. I would say Blue Sky is
1230
00:42:59,789 --> 00:43:01,650
probably where I'm the most active
1231
00:43:02,269 --> 00:43:04,190
as of late and where I feel like
1232
00:43:04,190 --> 00:43:04,849
the biggest
1233
00:43:05,309 --> 00:43:06,849
tech community has
1234
00:43:07,309 --> 00:43:09,630
moved to. So go chat with us on
1235
00:43:09,630 --> 00:43:11,789
Blue Sky. LinkedIn is another good one. I'm
1236
00:43:11,789 --> 00:43:12,769
always on LinkedIn.
1237
00:43:13,085 --> 00:43:15,005
So if you wanna chat, give us feedback
1238
00:43:15,005 --> 00:43:15,744
on LinkedIn,
1239
00:43:16,045 --> 00:43:17,484
you can do that. If you wanna sign
1240
00:43:17,484 --> 00:43:19,085
up for membership, we still have our membership
1241
00:43:19,085 --> 00:43:21,744
at mscloud, I t pro Com / membership.
1242
00:43:22,204 --> 00:43:23,264
Todd's in
1243
00:43:23,565 --> 00:43:25,644
Discord today. He got a new laptop that
1244
00:43:25,644 --> 00:43:27,424
he's gonna go try to run some LLMs
1245
00:43:27,484 --> 00:43:29,460
on. So if you wanna join us, chat
1246
00:43:29,460 --> 00:43:31,380
with us during the recording. You can go
1247
00:43:31,380 --> 00:43:32,359
check out our membership
1248
00:43:32,659 --> 00:43:33,159
options
1249
00:43:33,940 --> 00:43:36,179
there as well and join us in Discord
1250
00:43:36,179 --> 00:43:37,480
for these. So
1251
00:43:37,940 --> 00:43:40,260
looking forward to hearing from people how you
1252
00:43:40,260 --> 00:43:42,359
use LLMs, what you're gonna do with LLMs,
1253
00:43:42,885 --> 00:43:45,844
and how they run locally. Who can bury
1254
00:43:45,844 --> 00:43:47,784
their computer first and
1255
00:43:48,405 --> 00:43:50,744
crash it? Super easy to do. Yeah.
1256
00:43:51,204 --> 00:43:53,364
Anything else? I think that's it. As always,
1257
00:43:53,364 --> 00:43:55,710
thanks, Ben. Alright. Thank you, Scott. We will
1258
00:43:55,710 --> 00:43:56,690
talk to you later.
1259
00:43:58,670 --> 00:44:00,909
If you enjoyed the podcast, go leave us
1260
00:44:00,909 --> 00:44:03,150
a five star rating in iTunes. It helps
1261
00:44:03,150 --> 00:44:04,829
to get the word out so more IT
1262
00:44:04,829 --> 00:44:06,989
pros can learn about Office three sixty five
1263
00:44:06,989 --> 00:44:07,650
and Azure.
1264
00:44:08,190 --> 00:44:09,855
If you have any questions you want us
1265
00:44:09,855 --> 00:44:12,014
to address on the show, or feedback about
1266
00:44:12,014 --> 00:44:14,414
the show, feel free to reach out via
1267
00:44:14,414 --> 00:44:16,514
our website, Twitter, or Facebook.
1268
00:44:16,815 --> 00:44:18,735
Thanks again for listening, and have a great
1269
00:44:18,735 --> 00:44:19,235
day.