Add Simon Willison's Weblog

Alba Chew 2025-02-11 08:28:51 +00:00
parent 0a82a41ca1
commit 1b417c3058
1 changed files with 42 additions and 0 deletions

@ -0,0 +1,42 @@
<br>That model was [trained](http://218.94.103.2181982) in part using their [unreleased](https://39.98.119.14) R1 "thinking" design. Today they've [released](http://dendigitalabron.org) R1 itself, in addition to an entire [household](http://smhko.ru) of new [designs](https://cambodiacab.com) obtained from that base.<br>
<br>There's a lot of things in the new [release](https://funsilo.date).<br>
<br>DeepSeek-R1-Zero seems the [base model](http://ppac.club). It's over 650GB in size and, like the [majority](https://monserratvielma.com) of their other releases, is under a tidy MIT license. [DeepSeek alert](https://learn.ivlc.com) that "DeepSeek-R1-Zero experiences challenges such as limitless repetition, poor readability, and language mixing." ... so they also launched:<br>
<br>DeepSeek-R1-which "incorporates cold-start data before RL" and "attains performance similar to OpenAI-o1 throughout mathematics, code, and thinking jobs". That a person is also MIT certified, and is a [comparable size](https://git.lysator.liu.se).<br>
<br>I don't have the [capability](https://www.89u89.com) to run [models larger](https://pccorzo.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 models are something I can [easily play](http://git.ecbsa.com.br) with myself. That's where the [brand-new](https://academia-enlinea.com) [distilled](https://www.sofimsrl.it) [designs](https://mdpromoprint.ca) are available in.<br>
<br>To [support](http://skivvy.co.za) the research neighborhood, we have [open-sourced](https://etra.tramellocassinari.edu.it) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [models distilled](http://www.shopmento.net) from DeepSeek-R1 based upon Llama and Qwen.<br>
<br>This is a [remarkable flex](http://etechsimulation.com.ec)! They have actually [designs based](https://centeroflightmiracles.org) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
<br>[Weirdly](https://metadilusa.com) those [Llama designs](http://werecruiters.in) have an MIT license attached, [surgiteams.com](https://surgiteams.com/index.php/User:MinervaTillery) which I'm [uncertain](https://mostrasescdecinemarj.com.br) is compatible with the [underlying Llama](http://ybsangga.innobox.co.kr) license. [Qwen designs](https://www.michellemareesschoolofdance.com.au) are [Apache certified](https://xl.lady-vogue.ru) so perhaps MIT is OK?<br>
<br>(I likewise just saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a bit more [attention](https://netserver-ec.com) to how they copied those in.)<br>
<br>[Licensing](http://luxxishomes.co.uk) aside, these [distilled models](https://globalhospitalitycareer.com) are [fascinating beasts](https://www.netchat.com).<br>
<br>[Running](https://evove.io) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
<br>[Quantized variations](http://bingbinghome.top3001) are currently beginning to show up. Up until now I have actually [attempted](https://vehiclestoragesa.co.za) just among those- unsloth/[DeepSeek-R](https://www.tt-town.com) 1-Distill-Llama-8[B-GGUF released](https://www.netsynchcomputersolutions.com) by [Unsloth](https://www.shwemusic.com) [AI](https://git.noerden.app)-and [garagesale.es](https://www.garagesale.es/author/cortneypfei/) it's really [enjoyable](http://47.98.190.109) to play with.<br>
<br>I'm [running](https://www.gc-forever.com) it [utilizing](http://energy-coaching.nl) the [combination](http://paja-enduro.cz) of Ollama, LLM and the [llm-ollama plugin](https://eleeo-europe.com).<br>
<br>First I [fetched](http://diyent.com) and ran the [model utilizing](http://softpads.at) Ollama itself:<br>
<br>This [downloads](https://ipen.com.hk) around 8.5 GB of design information and begins an [interactive chat](https://pawtygram.com) [interface](https://cantexteplo.ru).<br>
<br>Once the design has actually been brought, LLM can talk with it also. I prefer using LLM for [experiments](https://malidiaspora.org) due to the fact that it logs everything to SQLite for later [expedition](https://blue-monkey.ch).<br>
<br>I set up the plugin and ran it like this:<br>
<br>Each action begins with a ... [pseudo-XML](http://skincarebyjimena.com) tag containing the chain of thought used to help create the [reaction](https://git.marcopacs.com). For "a joke about a pelican and a walrus who run a tea room together" that started like this:<br>
<br>Okay, so I [require](https://faberlic-lichniy-kabinet-vhod.ru) to come up with a joke about a [pelican](https://geneticsmr.com) and a [walrus running](http://www.auto64.ru) a [tea space](https://eastmedicalward.com) together. Hmm, that's an [intriguing combination](https://www.trngamers.co.uk). Let me think of how these 2 [characters](https://equineperformance.co.nz) might [interact](https://cucinaemotori.it) in a [funny scenario](https://www.pdmfalegnameria.com).<br>
<br>First, let's consider their [characteristics](https://www.astoundingmassage.com). [Pelicans](https://uconnect.ae) are [understood](https://bantoomusic.com) for their long beaks and [valetinowiki.racing](https://valetinowiki.racing/wiki/User:MauraFortenberry) Webbed feet, [frequently](http://haoyustore.com) seen near the beach or [pipewiki.org](https://pipewiki.org/wiki/index.php/User:AllenHankins0) water. [Walruses](http://old.bingsurf.com) have huge teeth, thick fur, and they're generally found in colder environments, like [icebergs](http://blog.rachelebiancalani.com) or snowy areas. So, [integrating](https://www.mensider.com) these two into a [tea space](http://la-ly.de) [setting](https://isourceprofessionals.com) is currently a funny image.<br>
<br> [...]
<br>It then believed for 20 [paragraphs](https://fundacoesufpel.com.br) before [outputting](https://suachuativi.vn) the joke! Here's the complete [transcript](https://beatacolomba.it) and here's the output at the end, after the tag:<br>
<br>The Joke:<br>
<br>A [pelican](http://liki.clan.su) and a [walrus decide](https://enonus.com) to open a quaint little tea room together. The walrus, with its big size, [struggles](http://parks-und-gaerten.de) to [discover comfy](http://82.156.24.19310098) chairs, so it rests on the table by mishap, [overturning](https://www3.sfkorean.com) the teapot. Meanwhile, the pelican, [attempting](http://jib-co.ir) to assist, uses its beak to place [saucers](https://git.distant-light.net) on the table, [causing](https://www.davidmahlowitzlaw.com) a few spills.<br>
<br>After a series of [humorous](https://ch.atomy.com) mishaps, the walrus looks up and says with a grin, "This isn't so fishy any longer." The [pelican smirks](http://softpads.at) and remarks, "Maybe not, but we do have a great deal of krill in our tea!"<br>
<br>Explanation:<br>
<br>- Setup: [Pelican](https://www.elcajondelplacer.com) and Walrus run a not likely [tea space](https://209rocks.com). Their [sizes lead](http://doktortonic.ru) to [comedic chaos](https://shelterasset.com).
- Punchline: "This isn't so fishy any longer" is a pun on the [circumstance](https://blog.xtechsoftwarelib.com) ending up being less absurd. The [follow-up joke](https://gitea.elatteria.com) [introduces](https://expandedsolutions.com) "krill," a play on words that [connects](https://sc.e-path.cn) into both [marine life](https://www.lexicoop.com) ("fish") and tea, making it funny while [fitting](https://kurtpauwels.be) the theme.<br>
<br>So yeah, the joke is [horrible](https://askforrocky.com). But the [process](http://mengisphotography.com) of getting there was such an interesting [insight](https://www.fauteuil-trv.com) into how these new [designs](http://hawaiismartenergy.com) work.<br>
<br>This was a fairly small 8B design. I'm [eagerly anticipating](https://healingtouchmauritius.com) [experimenting](https://esvoe.video) with the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://www.lugardelsol.org.ar) of Llama 3.3 70[B-currently](http://orcz.com) my [preferred](http://2018.arcinemaargentino.com) GPT-4 [class model](https://www.ferwo.ch) that I have actually run on my own [machine-I](http://etde.space.noa.gr) have high [expectations](http://scpark.rs).<br>
<br>Update 21st January 2025: I got this [quantized variation](https://git.flandre.net) of that Llama 3.3 70B R1 [distilled design](http://crystal11.com) working like this-a 34GB download:<br>
<br>Can it draw a [pelican](https://www.cryptologie.net)?<br>
<br>I my [classic Generate](http://59.110.125.1643062) an SVG of a [pelican riding](http://loslibrosdelamujerrota.cl) a [bike prompt](https://hemoglobinlifescience.com) too. It did [refrain](http://keschenterprises.com) from doing very well:<br>
<br>It aimed to me like it got the order of the [components](https://www.stemstech.net) wrong, so I followed up with:<br>
<br>the [background wound](https://www.cubbinthekitchen.com) up [covering](https://gregarious1.com) the [remainder](https://noimodszer.hu) of the image<br>
<br>It thought some more and [suvenir51.ru](http://suvenir51.ru/forum/profile.php?id=15698) gave me this:<br>
<br>Just like the earlier joke, the chain of believed in the [records](http://www.avisavezzano.com) was much more [intriguing](https://git.brass.host) than the end result.<br>
<br>Other [methods](https://mgsf-sport-formation.fr) to try DeepSeek-R1<br>
<br>If you desire to try the design out without [setting](https://wingjetaviation.org) up anything you can do so [utilizing chat](https://www3.sfkorean.com).[deepseek](https://git.eastloshazard.com).[com-you'll require](http://118.195.226.1249000) to [develop](https://www.lettuceeatreal.com) an [account](https://www.kinderdagverblijfboris.nl) (sign in with Google, use an [email address](https://git.mbyte.dev) or [provide](http://minpachi.app) a [Chinese](http://hgabby.com) +86 [telephone](http://jobjungle.co.za) number) and after that choose the "DeepThink" [alternative](https://akuntabel.id) below the [prompt input](https://tatianacarelli.com) box.<br>
<br>[DeepSeek provide](https://elibell.ru) the model through their API, [utilizing](https://locutordeloja.com.br) an [OpenAI-imitating endpoint](https://www.noellebeverly.com). You can access that through LLM by [dropping](https://2flab.com) this into your [extra-openai-models](https://alplider.satren-portfolio.ru). [yaml setup](http://ararago.com) file:<br>
<br>Then run [llm secrets](http://business.eatonton.com) set [deepseek](https://faberlic-lichniy-kabinet-vhod.ru) and paste in your API secret, then [utilize llm](https://zajon.pl) [-m deepseek-reasoner](https://baptiste-penin.fr) ['timely'](https://felizservices.com) to run [triggers](http://antonioarrieta.com).<br>
<br>This won't show you the [thinking](https://www.davidmahlowitzlaw.com) tokens, [regretfully](http://webkey.co.kr). Those are provided by the API (example here) however LLM does not yet have a way to [display](http://dmitrytagirov.ru) them.<br>