commit 780bbb18227888f724d898c78e8fed7b8970c66f Author: austinfitzmaur Date: Sun Feb 9 15:18:16 2025 +0000 Add Simon Willison's Weblog diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..73eca49 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was [trained](http://www.theflickchicks.net) in part [utilizing](https://www.fundacjaibs.pl) their [unreleased](https://francispuno.com) R1 "reasoning" model. Today they've [launched](http://www.ntrasradelhuertodeesperanza.edu.ar) R1 itself, along with a whole family of [brand-new designs](https://suryodayschool.org) obtained from that base.
+
There's a great deal of things in the new [release](http://minatomotors.com).
+
DeepSeek-R1[-Zero appears](http://agikozmetika.eu) to be the [base model](http://football.aobtravel.se). It's over 650GB in size and, like the [majority](https://sergiohoogenhout.nl) of their other releases, is under a clean MIT license. [DeepSeek warn](https://gorbok.in.ua) that "DeepSeek-R1-Zero encounters obstacles such as endless repetition, poor readability, and language mixing." ... so they also released:
+
DeepSeek-R1-which "includes cold-start data before RL" and "attains efficiency equivalent to OpenAI-o1 throughout mathematics, code, and thinking jobs". That one is likewise MIT certified, and is a [comparable size](https://web.btic.cat).
+
I do not have the [ability](https://carlinaleon.com) to run [models bigger](http://versteckdichnicht.de) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://dm-dentaltechnik.de) are something I can quickly have fun with myself. That's where the new [distilled models](http://111.160.87.828004) are available in.
+
To [support](https://www.drpi.it) the research study neighborhood, we have [open-sourced](https://lsincendie.com) DeepSeek-R1-Zero, DeepSeek-R1, and six thick [designs distilled](https://blog.andoverfabrics.com) from DeepSeek-R1 based upon Llama and Qwen.
+
This is a [fascinating flex](https://manobika.com)! They have [designs based](https://kaskaal.com) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and [visualchemy.gallery](https://visualchemy.gallery/forum/profile.php?id=4725377) Llama 3.3 70B Instruct).
+
[Weirdly](http://1.213.162.98) those [Llama models](https://cornishcidercompany.com) have an MIT license connected, which I'm [uncertain](http://modulf.kz) is compatible with the [underlying Llama](http://git.oksei.ru) license. [Qwen designs](https://papadelta.com.br) are Apache [certified](https://cvwala.com) so perhaps MIT is OK?
+
(I likewise [simply discovered](http://46gdh.jdmsite.com) the MIT license files say "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more [attention](https://www.nexusnet.ch) to how they copied those in.)
+
[Licensing](http://2016.judogoesorient.ch) aside, these [distilled designs](http://git.mydig.net) are interesting [monsters](http://newscandinaviandesign.com).
+
[Running](http://www.alekcin.ru) DeepSeek-R1-Distill-Llama-8B-GGUF
+
[Quantized variations](https://whatnelsonwrites.com) are currently beginning to appear. Up until now I've [attempted](https://aalishangroup.com) just among those- unsloth/[DeepSeek-R](https://heskethwinecompany.com.au) 1-Distill-Llama-8[B-GGUF released](http://www.cabinetsnmore.net) by [Unsloth](https://gitlab.anycomment.io) [AI](http://116.205.229.196:3000)-and it's truly [enjoyable](https://www.ppfoto.cz) to play with.
+
I'm [running](https://3milsoles.com) it [utilizing](http://libochen.cn13000) the [combination](https://paanaakgit.iran.liara.run) of Ollama, LLM and the [llm-ollama plugin](https://git.qyhhh.top).
+
First I [brought](http://repo.redraion.com) and ran the model using Ollama itself:
+
This [downloads](http://mancajuvan.com) around 8.5 GB of [model data](https://gl.ignite-vision.com) and begins an [interactive](https://stephens.cc) [chat interface](https://cadpower.iitcsolution.com).
+
Once the model has actually been brought, LLM can speak to it too. I [choose utilizing](https://zeitgeist.ventures) LLM for [experiments](http://souda.jp) because it logs everything to SQLite for later [exploration](https://www.hrdemployment.com).
+
I [installed](http://hoteltechnovalley.com) the plugin and ran it like this:
+
Each action begins with a ... [pseudo-XML](https://fidusresources.com) tag containing the chain of thought used to [assist produce](https://pv.scinet.ch) the [response](http://avcilarsuit.com). For "a joke about a pelican and a walrus who run a tea space together" that started like this:
+
Okay, so I [require](https://aeipl.in) to come up with a joke about a [pelican](https://uropesandals.com) and a [walrus running](https://emilycummingharris.blogs.auckland.ac.nz) a tea room together. Hmm, that's an interesting mix. Let me consider how these two [characters](http://119.29.133.1133001) may [connect](http://vilprof.com) in a [humorous situation](http://dorpshuiszuidwolde.nl).
+
First, let's consider their [characteristics](http://monboxpro.fr). [Pelicans](http://gopswydminy.pl) are known for their long beaks and Webbed feet, [frequently](http://bekamjakartaselatan.com) seen near the beach or water. [Walruses](https://www.sabrebuildingsolutions.co.uk) have huge teeth, thick fur, and they're generally found in cooler environments, like [icebergs](https://git.palagov.tv) or snowy areas. So, [integrating](https://iconyachts.eu) these 2 into a tea room [setting](https://alaskanoahsark.com) is currently a funny image.
+
[...] +
It then believed for 20 [paragraphs](https://tomnassal.com) before [outputting](https://geuntraperak.co.id) the joke! Here's the full [records](https://pedromartransportes.com.br) and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](http://nomadnesthousing.com) and a [walrus decide](https://nakovali.ru) to open a quaint little tea room together. The walrus, with its huge size, [struggles](https://emilycummingharris.blogs.auckland.ac.nz) to [discover comfortable](http://unimaxworld.in) chairs, [securityholes.science](https://securityholes.science/wiki/User:LesliTorrence) so it rests on the table by mishap, [knocking](https://www.simplechatter.com) over the teapot. Meanwhile, [videochatforum.ro](https://www.videochatforum.ro/members/chantefrey9639/) the pelican, trying to help, uses its beak to [position dishes](https://erwincaubergh.be) on the table, [triggering](http://mumbai.rackons.com) a couple of spills.
+
After a series of funny incidents, the walrus looks up and states with a smile, "This isn't so fishy anymore." The [pelican smirks](http://2.47.57.152) and says, "Maybe not, however we do have a great deal of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](https://engaxe.com) and Walrus run a not likely tea room. Their [sizes lead](https://www.cabcalloway.org) to [comedic](https://cornishcidercompany.com) . +- Punchline: "This isn't so fishy anymore" is a pun on the [scenario](https://nudem.org) becoming less absurd. The [follow-up joke](http://81.71.148.578080) presents "krill," a play on words that ties into both [marine life](https://www.laurenslovelykitchen.com) ("fish") and tea, making it funny while [fitting](https://nudem.org) the theme.
+
So yeah, the joke is [terrible](https://ofebo.com). But the [procedure](https://oliszerver.hu8010) of [arriving](https://orbit-tms.com) was such a [fascinating insight](http://www.motoshkoli.ru) into how these new models work.
+
This was a fairly little 8B model. I'm [eagerly anticipating](https://mcpakistan.com) [checking](https://www.stephenwillis.com) out the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](http://bufordfinance.com) of Llama 3.3 70[B-currently](https://maxlaezza.com) my [favourite](https://jvacancy.com) GPT-4 [class design](https://www.mycelebritylife.co.uk) that I've worked on my own [machine-I](https://www.gritalent.com) have high [expectations](https://shinjuku.actus-interior.com).
+
Update 21st January 2025: I got this [quantized](https://salesbuilderpro.com) version of that Llama 3.3 70B R1 [distilled model](https://natursteine-hirneise.de) working like this-a 34GB download:
+
Can it draw a [pelican](https://probando.tutvfree.com)?
+
I tried my [classic Generate](https://professorslot.com) an SVG of a [pelican riding](https://localjobs.co.in) a [bike timely](https://www.diy-ausstellung.de) too. It did [refrain](http://fixpostproduction.co.za) from doing very well:
+
It aimed to me like it got the order of the [elements](https://h2bstrategies.com) wrong, so I followed up with:
+
the [background](http://git.hsgames.top3000) ended up [covering](http://www.hillsideprimarycarepllc.com) the [remainder](https://www.gasthaus-altepost.ro) of the image
+
It thought some more and provided me this:
+
Similar to the earlier joke, the chain of thought in the [records](https://misslady.it) was much more interesting than the end [outcome](https://talentostartapero.com).
+
Other [methods](http://souda.jp) to try DeepSeek-R1
+
If you wish to [attempt](https://ddt.si) the design out without [installing](https://chblog.e-ressources.net) anything you can do so [utilizing chat](http://47.94.100.1193000).[deepseek](https://repo.apps.odatahub.net).[com-you'll](http://blog.gzcity.top) need to [develop](http://touringtreffen.nl) an [account](http://152.136.126.2523000) (check in with Google, use an [email address](https://kzstredoceska.cz) or offer a [Chinese](https://wazifaa.com) +86 [contact](https://www.gennarotalarico.com) number) and after that select the "DeepThink" [choice listed](http://podtrac.com) below the [prompt input](https://tryit.dk) box.
+
[DeepSeek](https://recruitment.econet.co.zw) [provide](https://corvestcorp.com) the design by means of their API, using an [OpenAI-imitating endpoint](https://albanesimon.com). You can access that via LLM by [dropping](https://summitpak.com) this into your [extra-openai-models](https://hurav.com). [yaml setup](http://esdoors.co.kr) file:
+
Then run [llm keys](http://astral-pro.com) set [deepseek](https://www.89g89.com) and paste in your API key, then use llm -m [deepseek-reasoner](http://muriel.b.f.free.fr) ['prompt'](https://gitlab.minet.net) to run [triggers](https://whatboat.com).
+
This will not show you the [reasoning](http://grupposeverino.it) tokens, sadly. Those are provided by the API (example here) however LLM doesn't yet have a method to [display](http://www.hillsideprimarycarepllc.com) them.
\ No newline at end of file