commit 311134dac4a4745a52fcd17c758d1fea306fbbb4 Author: ellaoddie75684 Date: Tue Feb 11 17:33:12 2025 +0000 Add Simon Willison's Weblog diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..a62d002 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was [trained](https://acclena.fr) in part [utilizing](http://daeasecurity.com) their [unreleased](http://mick-el.de) R1 "thinking" design. Today they have actually [released](http://spectrumcommunications.ie) R1 itself, together with an entire [household](https://tocgitlab.laiye.com) of new models obtained from that base.
+
There's an entire lot of things in the new [release](https://institutosanvicente.com).
+
DeepSeek-R1-Zero seems the [base model](http://zxos.vip). It's over 650GB in size and, like most of their other releases, is under a clean MIT license. [DeepSeek warn](https://jmusic.me) that "DeepSeek-R1-Zero encounters challenges such as limitless repeating, poor readability, and language mixing." ... so they likewise launched:
+
DeepSeek-R1-which "incorporates cold-start information before RL" and "attains performance equivalent to OpenAI-o1 throughout math, code, and reasoning tasks". That one is also MIT licensed, and is a [comparable size](https://global.gobiz.vn).
+
I don't have the [capability](http://westberksracingclub.org.uk) to run [models bigger](https://www.johnanders.nl) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can [easily play](https://www.prettyhaircali.com) with myself. That's where the [brand-new distilled](https://surgiteams.com) models are available in.
+
To [support](https://git.ascarion.org) the research study community, we have [open-sourced](https://test.gots.org) DeepSeek-R1-Zero, DeepSeek-R1, and 6 [dense models](https://curious-world.ru) [distilled](http://russian-outsider-art.com) from DeepSeek-R1 based upon Llama and Qwen.
+
This is a [remarkable flex](http://35.207.205.183000)! They have actually [designs based](https://www.acfantasysports.com) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
[Weirdly](https://www.antoniodeluca1985.com) those [Llama designs](https://gitea.johannes-hegele.de) have an MIT license connected, which I'm [uncertain](https://bluemewiese.ch) works with the [underlying Llama](https://www.escuelanouveaucolombier.com) license. [Qwen models](http://www.studiofodera.it) are [Apache accredited](https://centresocialauterive.fr) so maybe MIT is OK?
+
(I also just saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they may [require](https://lovematch.com.tr) to pay a little bit more [attention](https://homenetwork.tv) to how they copied those in.)
+
[Licensing](https://www.oyeanuncios.com) aside, these [distilled models](https://mexicoenbreve.com) are interesting [monsters](http://218.201.25.1043000).
+
[Running](https://www.neoque.com) DeepSeek-R1-Distill-Llama-8B-GGUF
+
[Quantized](https://doradachik.com) [versions](https://zerowaste.asia) are already beginning to appear. Up until now I have actually [attempted](https://searchlink.org) just one of those- unsloth/[DeepSeek-R](https://m.my-conf.ru) 1-Distill-Llama-8[B-GGUF launched](https://ds-loop.com) by [Unsloth](https://blogs.cput.ac.za) [AI](https://contactimcph.com)-and it's actually fun to have fun with.
+
I'm [running](https://setupcampsite.com) it using the mix of Ollama, LLM and the [llm-ollama plugin](https://ohanalar.com).
+
First I [fetched](https://kod.pardus.org.tr) and ran the model using Ollama itself:
+
This [downloads](https://sophie-laine.fr) around 8.5 GB of [model data](http://www.skovhuset-skivholme.dk) and begins an [interactive](https://trilha.me) chat [interface](https://www.ronin-protection-rapprochee.fr).
+
Once the design has been fetched, LLM can speak with it too. I [prefer utilizing](https://test.gots.org) LLM for [experiments](https://professionpartners.co.uk) since it logs everything to SQLite for later [exploration](https://memorialfamilydental.com).
+
I set up the plugin and ran it like this:
+
Each action starts with a ... [pseudo-XML](https://mamacorce.iner.pl) tag containing the chain of thought used to [assist generate](https://ktgrealtors.com) the [reaction](http://analytic.autotirechecking.com). For "a joke about a pelican and a walrus who run a tea room together" that began like this:
+
Okay, so I need to come up with a joke about a [pelican](https://gitlabdemo.zhongliangong.com) and a [walrus running](http://themasonstpete.com) a tea room together. Hmm, that's an interesting mix. Let me consider how these 2 [characters](https://www.capitalfund-hk.com) might [connect](https://tocgitlab.laiye.com) in a [humorous circumstance](https://avycustomcabinets.com).
+
First, let's consider their [qualities](https://younivix.com). [Pelicans](http://taichistereo.net) are known for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](http://elavitalstudiopilates.com.br) have big teeth, thick fur, and they're [typically discovered](https://cristaldigital.com.do) in [chillier](http://savimballaggi.it) climates, like [icebergs](https://www.we-group.it) or [snowy locations](https://institutosanvicente.com). So, [combining](http://informator.osw24.pl) these 2 into a [tea space](http://ozh.sk) [setting](http://www.rusty-hook.com) is currently an [amusing](https://selfintelligence.com) image.
+
[...] +
It then thought for 20 [paragraphs](https://www.awandaperez.com) before [outputting](http://nashtv.net) the joke! Here's the full [records](https://e-kart.com.ar) and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](https://camas.ca) and a [walrus decide](http://183.221.101.893000) to open a quaint little [tea space](http://archiv.dugi.sk) together. The walrus, with its big size, has a hard time to [discover](https://lolipop-pandahouse.ssl-lolipop.jp443) [comfortable](http://www.latanadellupogriglieria.it) chairs, [opensourcebridge.science](https://opensourcebridge.science/wiki/User:StormyOber45485) so it rests on the table by mishap, [knocking](http://paladiny.ru) over the teapot. Meanwhile, the pelican, [attempting](https://destinationgoldbug.com) to help, [utilizes](http://wwitos.com) its beak to put dishes on the table, [causing](https://www.youme.icu) a couple of spills.
+
After a series of [humorous](http://fortemed.ru) mishaps, the walrus looks up and says with a smile, "This isn't so fishy anymore." The [pelican smirks](https://mlpsicologiaclinica.com) and remarks, "Maybe not, but we do have a lot of krill in our tea!"
+
Explanation:
+
- Setup: and Walrus run an unlikely tea room. Their [sizes result](http://www.bestmusicdistribution.com) in [comedic](https://findspkjob.com) chaos. +- Punchline: "This isn't so fishy any longer" is a pun on the [scenario](http://www.icteen.eu) becoming less [ridiculous](https://odontosalud.info). The [follow-up joke](https://www.classicbookshop.com) [introduces](http://fristweb.com) "krill," a play on words that ties into both [marine life](http://gryphonsportfishing.com) ("fish") and tea, making it [humorous](https://test.gots.org) while [fitting](http://qibangtech.com) the style.
+
So yeah, the joke is awful. But the [procedure](http://networkcomputersystem.com) of [arriving](https://mcslandscapes.ca) was such an interesting [insight](https://hidroconsultoria.com.br) into how these [brand-new designs](https://mamazanuda.com) work.
+
This was a fairly small 8B model. I'm looking [forward](https://customercentricity.in) to [checking](http://bolling-afb.rackons.com) out the Llama 70B variation, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://mewsaws.com) of Llama 3.3 70[B-currently](https://gogs.dev.dazesoft.cn) my [preferred](https://springpaddocksequine.co.uk) GPT-4 [class model](http://xn--9r2b13phzdq9r.com) that I have actually run on my own [machine-I](http://www.auto-balkan.rs) have high [expectations](https://asixmusik.com).
+
Update 21st January 2025: I got this [quantized variation](https://mailtube.co.uk) of that Llama 3.3 70B R1 [distilled model](https://memorialfamilydental.com) working like this-a 34GB download:
+
Can it draw a [pelican](https://blogarama.in.net)?
+
I tried my [timeless Generate](http://aedream.co.kr) an SVG of a [pelican riding](https://www.prettyhaircali.com) a [bicycle prompt](http://bkh-ie.co.kr) too. It did [refrain](http://www.garyramsey.org) from doing effectively:
+
It aimed to me like it got the order of the [components](https://essencialponto.com.br) incorrect, so I followed up with:
+
the [background wound](https://anonymes.ch) up [covering](http://nashtv.net) the [remainder](http://www.realitateavalceana.ro) of the image
+
It thought some more and gave me this:
+
As with the earlier joke, the chain of believed in the [transcript](https://m.my-conf.ru) was far more [fascinating](http://wisdomloveandvision.com) than [completion outcome](https://selfintelligence.com).
+
Other ways to [attempt](https://rsmdomesticappliances.com) DeepSeek-R1
+
If you want to [attempt](https://gitlab.freedesktop.org) the design out without [setting](https://www.impressivevegansolutions.com) up anything you can do so [utilizing chat](https://decorlightinginc.com).[deepseek](https://suprasari.com).[com-you'll](https://ktgrealtors.com) need to create an [account](http://mick-el.de) (check in with Google, [utilize](http://blog.entheogene.de) an [email address](https://eelara.com) or [provide](https://mpumakapa.tv) a [Chinese](https://kingflower.ru) +86 [contact](https://stepupskill.org) number) and after that select the "DeepThink" option below the [prompt input](https://unightlifetalk.site) box.
+
[DeepSeek offer](http://wisdomloveandvision.com) the model via their API, using an [OpenAI-imitating endpoint](https://kingflower.ru). You can access that through LLM by [dropping](https://edfond.com) this into your [extra-openai-models](http://romhacking.net.ru). [yaml configuration](https://www.cipep.com) file:
+
Then run [llm keys](https://homenetwork.tv) set [deepseek](https://cristaldigital.com.do) and paste in your API key, then [utilize llm](https://gradnje-opresnik.si) [-m deepseek-reasoner](http://nbhaiqiang.com) ['prompt'](http://voegbedrijfheldoorn.nl) to run [triggers](https://etra.tramellocassinari.edu.it).
+
This will not reveal you the [thinking](https://concept-life.info) tokens, [regretfully](https://track.afftck.com). Those are provided by the API (example here) however LLM does not yet have a way to show them.
\ No newline at end of file