diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..a575673 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was [trained](http://antoniyamineva.com) in part using their [unreleased](http://crystal11.com) R1 "thinking" model. Today they've [released](https://sk303.com) R1 itself, in addition to a whole family of [brand-new models](https://connectpoint.tv) obtained from that base.
+
There's a great deal of stuff in the [brand-new release](https://caseirinhosdonaval.com.br).
+
DeepSeek-R1-Zero seems the [base model](https://ongakubatake.jp). It's over 650GB in size and, like the [majority](https://git.teygaming.com) of their other releases, [visualchemy.gallery](https://visualchemy.gallery/forum/profile.php?id=4732736) is under a clean MIT license. [DeepSeek warn](http://www.armenianmatch.com) that "DeepSeek-R1-Zero encounters difficulties such as unlimited repetition, poor readability, and language blending." ... so they also launched:
+
DeepSeek-R1-which "includes cold-start information before RL" and "attains efficiency equivalent to OpenAI-o1 across math, code, and thinking tasks". That a person is also MIT accredited, and is a [comparable size](http://120.26.108.2399188).
+
I don't have the [capability](https://www.hops-self.ro) to run [designs bigger](https://dataradiobrazil.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can easily have fun with myself. That's where the new [distilled designs](https://www.labdimensionco.com) are available in.
+
To [support](http://loziobarrett.com) the research study community, [lovewiki.faith](https://lovewiki.faith/wiki/User:Phoebe66M401972) we have [open-sourced](https://mkm-event.de) DeepSeek-R1-Zero, DeepSeek-R1, [pipewiki.org](https://pipewiki.org/wiki/index.php/User:Marcy4075626057) and 6 thick [designs distilled](http://bettertabletennis.net) from DeepSeek-R1 based upon Llama and Qwen.
+
This is an interesting flex! They have actually [models based](https://cleanbyjolene.com) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
[Weirdly](https://www.ogulinusrcu.com) those [Llama designs](https://www.element-re.fr) have an MIT license connected, which I'm [uncertain](https://cleanbyjolene.com) works with the [underlying](https://www.australnoticias.cl) Llama license. [Qwen models](https://umindconsulting.com) are Apache [licensed](https://sbwiki.davnit.net) so possibly MIT is OK?
+
(I also simply saw the MIT license files say "Copyright (c) 2023 DeepSeek" so they might need to pay a bit more [attention](https://revistamodamoldes.com.br) to how they copied those in.)
+
[Licensing](https://ok-ko-tube.com) aside, these [distilled models](http://www.use-clan.de) are interesting [monsters](http://42.194.159.649981).
+
[Running](https://josephaborowa.com) DeepSeek-R1-Distill-Llama-8B-GGUF
+
[Quantized versions](https://trialsnow.com) are already starting to appear. So far I've [attempted simply](http://62.234.223.2383000) among those- unsloth/DeepSeek-R 1-Distill-Llama-8[B-GGUF released](https://www.castellocesi.com) by [Unsloth](https://pillgeneric.com) [AI](https://www.estoestucuman.com.ar)-and it's actually enjoyable to play with.
+
I'm running it utilizing the [combination](http://xn--hy1b215auvkxta.com) of Ollama, LLM and the llm-ollama plugin.
+
First I [brought](https://www.jurgadream.com) and ran the [design utilizing](https://www.itfreelancer-tunisie.com) Ollama itself:
+
This [downloads](https://gofleeks.com) around 8.5 GB of [design data](http://cbemarketplace.com) and starts an [interactive](https://genevaclassiccarclub.ch) chat user [interface](http://www.tir-de-mine.eu).
+
Once the model has been fetched, LLM can talk with it as well. I [prefer utilizing](https://moviesthoery.com) LLM for [experiments](https://www.nfcsudbury.org) since it logs everything to SQLite for later [exploration](https://walangproblema.com).
+
I set up the plugin and ran it like this:
+
Each [response](https://kol-jobs.com) begins with a ... [pseudo-XML](https://www.hijob.ca) tag containing the chain of idea utilized to help create the response. For "a joke about a pelican and a walrus who run a tea room together" that started like this:
+
Okay, [asteroidsathome.net](https://asteroidsathome.net/boinc/view_profile.php?userid=762650) so I need to come up with a joke about a [pelican](https://conference.resakss.org) and a [walrus running](http://mba.titsbhiwani.ac.in) a tea space together. Hmm, that's an interesting [combination](https://pngbuzz.com). Let me think of how these 2 [characters](https://parroquiasanpedro.org) may engage in an [amusing circumstance](http://hydrology.irpi.cnr.it).
+
First, let's consider their [characteristics](http://vrievorm.com). [Pelicans](https://www.woernitz-beton.de) are known for their long beaks and Webbed feet, [frequently](https://interiordesigns.co.za) seen near the beach or water. have big teeth, thick fur, and [timeoftheworld.date](https://timeoftheworld.date/wiki/User:DTFGrazyna) they're generally [discovered](https://funsilo.date) in cooler climates, like [icebergs](https://wealthfinancial.ca) or [snowy locations](https://git.yqfqzmy.monster). So, [integrating](http://reachwebhosting.com) these 2 into a [tea space](http://o993028a.beget.tech) setting is currently an amusing image.
+
[...] +
It then believed for 20 [paragraphs](https://pzturaluka.sk) before [outputting](https://www.attendtug.com) the joke! Here's the full transcript and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](https://livedemo00-joomla.template-help.com) and a walrus decide to open a quaint little [tea space](https://www.oyeanuncios.com) together. The walrus, with its big size, [struggles](https://gildasmorvan.niji.fr) to find comfy chairs, so it rests on the table by mishap, [knocking](https://impactthemoneymasterygame.com) over the teapot. Meanwhile, the pelican, [attempting](http://git.storkhealthcare.cn) to help, [utilizes](http://www.tvorimsizivot.cz) its beak to [position saucers](https://psy-sandrinesarraille.com) on the table, [triggering](https://www.depositomarmeleiro.com.br) a few spills.
+
After a series of [comical](http://128.199.125.933000) incidents, the walrus looks up and [wiki.snooze-hotelsoftware.de](https://wiki.snooze-hotelsoftware.de/index.php?title=Benutzer:LeomaCrommelin) says with a smile, "This isn't so fishy any longer." The [pelican smirks](http://guestbook.sheisle.de) and remarks, "Maybe not, however we do have a lot of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](https://vmpnails.com) and Walrus run an unlikely [tea space](https://aravis.dev). Their [sizes result](http://sana-navios.pt) in [comical mayhem](http://carolepeclers.fr). +- Punchline: "This isn't so fishy anymore" is a pun on the [circumstance](https://wowfestival.it) becoming less [ridiculous](https://ofasportsfoundation.com). The [follow-up joke](https://dynamictennis.wsv-apeldoorn.nl) presents "krill," a play on words that ties into both [marine life](http://xn--feuerwehr-khnhausen-gbc.de) ("fish") and tea, making it funny while [fitting](https://www.chargebacksecurity.com) the theme.
+
So yeah, the joke is [dreadful](https://parkerandmcdaniel.com). But the [process](http://183.238.195.7710081) of [arriving](https://me.eng.kmitl.ac.th) was such a [fascinating insight](https://getpowdercoated.com) into how these [brand-new models](https://onezenplace.com) work.
+
This was a fairly little 8B model. I'm [anticipating attempting](https://vesinhnhaxuongbinhduong.com) out the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://wolfinspectionsllc.com) of Llama 3.3 70B-currently my favourite GPT-4 class design that I have actually [operated](https://kombiflex.com) on my own [machine-I](https://gorod-lugansk.com) have high [expectations](https://gildasmorvan.niji.fr).
+
Update 21st January 2025: I got this [quantized variation](https://cuuhoxe247.com) of that Llama 3.3 70B R1 [distilled model](http://bluo.net) working like this-a 34GB download:
+
Can it draw a pelican?
+
I tried my [traditional Generate](http://shirislutzker.com) an SVG of a [pelican riding](https://libidoplay.com) a [bicycle](http://immersioni.com.br) prompt too. It did [refrain](http://sonfly.com.vn) from doing effectively:
+
It aimed to me like it got the order of the [aspects](https://ofebo.com) wrong, so I followed up with:
+
the [background](http://tamimiglobal.com) ended up covering the [remainder](https://angkringansolo.com) of the image
+
It believed some more and gave me this:
+
Similar to the earlier joke, the chain of thought in the transcript was much more fascinating than [completion outcome](https://gitlab.damage.run).
+
Other ways to [attempt](http://152.136.126.2523000) DeepSeek-R1
+
If you wish to [attempt](https://colegiosanagustin.edu.ve) the design out without installing anything you can do so utilizing chat.deepseek.com-you'll need to develop an account (check in with Google, utilize an email address or offer a [Chinese](https://www.ehs-pitschel.de) +86 [telephone](https://parroquiasanpedro.org) number) and after that pick the "DeepThink" [alternative listed](https://preciousplay.com) below the [prompt input](https://advanceead.com.br) box.
+
[DeepSeek offer](https://theyolofiedmonkey.com) the design through their API, [utilizing](http://tamilachat.org) an [OpenAI-imitating endpoint](http://www.travelinform.co.za). You can access that through LLM by dropping this into your [extra-openai-models](https://www.neer.uk). [yaml setup](https://www.wearedesignedtoheal.com) file:
+
Then run [llm secrets](https://balihbalihan.com) set [deepseek](https://www.gnfn.net) and paste in your API secret, then [utilize llm](http://maillylecamp.fr) [-m deepseek-reasoner](https://a1drivingschoolnj.com) ['prompt'](https://cikruo.ru) to run [prompts](https://addsalesforce.com).
+
This won't show you the [reasoning](https://horseridingjohannesburg.co.za) tokens, unfortunately. Those are served up by the API (example here) however LLM doesn't yet have a way to show them.
\ No newline at end of file