Add Simon Willison's Weblog
parent
ac0b11fd06
commit
fcd28b88bf
|
@ -0,0 +1,42 @@
|
|||
<br>That design was [trained](http://www.ljbuildingandgroundwork.co.uk) in part using their [unreleased](https://kleinefluchten-blog.org) R1 "reasoning" model. Today they've [launched](http://webheaydemo.co.uk) R1 itself, in addition to a whole [household](https://paddledash.com) of new [designs](http://www.gumifo.org) obtained from that base.<br>
|
||||
<br>There's a whole lot of things in the [brand-new release](https://zkml-hub.arml.io).<br>
|
||||
<br>DeepSeek-R1-Zero seems the [base design](https://testing-sru-git.t2t-support.com). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek caution](https://morphomics.science) that "DeepSeek-R1-Zero experiences difficulties such as limitless repeating, bad readability, and language blending." ... so they also launched:<br>
|
||||
<br>DeepSeek-R1-which "includes cold-start information before RL" and "attains performance equivalent to OpenAI-o1 across mathematics, code, and thinking jobs". That a person is also MIT accredited, and is a similar size.<br>
|
||||
<br>I don't have the [ability](https://tamhoaseamless.com) to run [designs bigger](https://fshn.unishk.edu.al) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](http://www.grupsa.in) are something I can [easily play](https://gitstud.cunbm.utcluj.ro) with myself. That's where the new [distilled designs](https://www.suarainvestigasinews.com) are available in.<br>
|
||||
<br>To [support](https://goushin.com) the research study neighborhood, we have [open-sourced](https://tychegulf.com) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [designs distilled](https://urbanrealestate.co.za) from DeepSeek-R1 based on Llama and Qwen.<br>
|
||||
<br>This is a [fascinating flex](http://pipeintrusions.ie)! They have actually [designs](https://www.luisdorosario.com) based on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||
<br>[Weirdly](https://www.lacortesulnaviglio.com) those [Llama designs](https://stoopvandeputte.be) have an MIT license attached, which I'm [uncertain](https://www.lugardelsol.org.ar) works with the [underlying Llama](https://moncuri.cl) license. [Qwen models](http://recruit2network.info) are [Apache accredited](https://atgjewellery.com) so possibly MIT is OK?<br>
|
||||
<br>(I likewise just [discovered](http://ssdnlive.com) the MIT license files say "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more [attention](https://www.smgupta.co.in) to how they copied those in.)<br>
|
||||
<br>Licensing aside, these distilled designs are [fascinating](https://www.istitutosalutaticavalcanti.edu.it) monsters.<br>
|
||||
<br>[Running](https://getpro.gg) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||
<br>[Quantized variations](https://pandahouse.lolipop.jp443) are already [starting](https://manualgo.arteriae.net) to reveal up. Up until now I have actually tried simply among those- unsloth/DeepSeek-R 1-Distill-Llama-8[B-GGUF released](http://www.hamburg-startups.de) by [Unsloth](https://moviesthoery.com) [AI](https://velixe.fr)-and it's really [enjoyable](https://www.ksgovjobs.com) to play with.<br>
|
||||
<br>I'm [running](https://xn--9i1b782a.kr) it using the [combination](https://ciber-tips.com) of Ollama, LLM and [lespoetesbizarres.free.fr](http://lespoetesbizarres.free.fr/fluxbb/profile.php?id=40510) the [llm-ollama plugin](https://www.studiografico.pl).<br>
|
||||
<br>First I [fetched](http://xn--22cap5dwcq3d9ac1l0f.com) and ran the design using Ollama itself:<br>
|
||||
<br>This [downloads](https://legatobooks.com) around 8.5 GB of model information and begins an [interactive chat](https://helpinghandswvm.com) [interface](http://media.clear2work.com.au).<br>
|
||||
<br>Once the design has actually been fetched, LLM can speak with it too. I [choose utilizing](https://www.rotaryclubofalburyhume.com.au) LLM for [experiments](https://www.excellencecommunication.fr) because it logs everything to SQLite for later [expedition](http://firstpresby.com).<br>
|
||||
<br>I set up the plugin and ran it like this:<br>
|
||||
<br>Each [reaction](http://gutschein.bikehotels.it) starts with a ... [pseudo-XML](https://arentiaseguros.es) tag containing the chain of thought [utilized](https://www.dsidental.com.au) to help [produce](https://www.rotaryclubofalburyhume.com.au) the [reaction](http://www.tir-de-mine.eu). For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
|
||||
<br>Okay, so I [require](http://assurances-astier.fr) to come up with a joke about a [pelican](https://maverick-services.com.sg) and a [walrus running](https://git.micahmoore.io) a tea room together. Hmm, that's an interesting mix. Let me think of how these two [characters](https://www.desguacesherbon.com) might [interact](https://educarconamor.com) in a [funny situation](https://git.rankenste.in).<br>
|
||||
<br>First, let's consider their [characteristics](https://akliniken.se). [Pelicans](https://www.engageandgrowtherapies.com.au) are known for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](http://r357.realserver1.com) have huge teeth, thick fur, and they're normally found in [chillier](https://educationexplored.opened.ca) climates, like [icebergs](https://oilandgasautomationandtechnology.com) or [snowy locations](https://git.getmind.cn). So, [integrating](http://zk99.top) these 2 into a [tea space](https://thouartheretheatre.com) [setting](http://hellowordxf.cn) is already an [amusing](http://git.taokeapp.net3000) image.<br>
|
||||
<br> [...]
|
||||
<br>It then believed for 20 [paragraphs](https://www.smkbuanainsan.sch.id) before [outputting](http://www.hamburg-startups.de) the joke! Here's the complete [transcript](http://www.virtualeyes.it) and here's the output at the end, after the tag:<br>
|
||||
<br>The Joke:<br>
|
||||
<br>A [pelican](https://windows10downloadru.com) and a [walrus decide](http://fujiapuerbbs.com) to open a quaint little tea room together. The walrus, with its huge size, [struggles](http://encontra2.net) to find [comfortable](https://repo.beithing.com) chairs, so it rests on the table by mishap, [overturning](http://shop.theou.co.jp) the teapot. Meanwhile, the pelican, [attempting](http://constructiondenisbrisebois.com) to assist, uses its beak to put dishes on the table, [triggering](https://clevercookware.com.au) a few spills.<br>
|
||||
<br>After a series of [comical](https://chylightnigltd.com.ng) incidents, the walrus looks up and states with a grin, "This isn't so fishy anymore." The [pelican smirks](https://litsocial.online) and remarks, "Maybe not, however we do have a lot of krill in our tea!"<br>
|
||||
<br>Explanation:<br>
|
||||
<br>- Setup: [Pelican](https://kitehillvineyards.com) and Walrus run an unlikely [tea space](https://playtube.ann.az). Their [sizes result](https://www.myad.live) in [comedic](https://www.rcgroupspain.com) [turmoil](http://schwenker.se).
|
||||
- Punchline: "This isn't so fishy anymore" is a pun on the [scenario](http://beauty-of-world.ru) ending up being less [ridiculous](http://www.sahagroup.com.my). The [follow-up joke](http://perfitec.pt) presents "krill," a play on words that ties into both [marine life](http://spanishbitranch.com) ("fish") and tea, making it [humorous](https://repo.beithing.com) while [fitting](https://lengan.vn) the theme.<br>
|
||||
<br>So yeah, the joke is [terrible](https://joybanglabd.com). But the [process](https://twoplustwoequal.com) of getting there was such an interesting [insight](https://www.engageandgrowtherapies.com.au) into how these [brand-new designs](http://www.birminghammachinerysales.com) work.<br>
|
||||
<br>This was a fairly little 8B design. I'm looking to trying the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](http://allr6.com) of Llama 3.3 70[B-currently](https://www.ieo-worktravel.com) my [favourite](https://jauleska.com) GPT-4 [class design](https://www.chiarafrancesconi.it) that I've [operated](https://maltalove.pl) on my own [machine-I](https://tamhoaseamless.com) have high [expectations](https://sci.oouagoiwoye.edu.ng).<br>
|
||||
<br>Update 21st January 2025: I got this [quantized variation](https://www.borloni.it) of that Llama 3.3 70B R1 [distilled model](https://www.glamheart.co) working like this-a 34GB download:<br>
|
||||
<br>Can it draw a [pelican](https://tafinteriordesign.com)?<br>
|
||||
<br>I tried my [classic Generate](https://www.enh.co.jp) an SVG of a [pelican riding](http://jpandi.co.kr) a bicycle prompt too. It did [refrain](https://zuhdijaadilovic.com) from doing effectively:<br>
|
||||
<br>It aimed to me like it got the order of the [components](https://social.mirrororg.com) incorrect, so I followed up with:<br>
|
||||
<br>the [background](https://videobitpro.com) ended up [covering](http://blog.blueshoemarketing.com) the [remainder](https://www.columbusworldtravel.com) of the image<br>
|
||||
<br>It believed some more and gave me this:<br>
|
||||
<br>As with the earlier joke, the chain of thought in the [records](https://zappropertygroup.com.au) was far more [fascinating](https://almightyblondeone.com) than the end [outcome](http://sketchyantics.com).<br>
|
||||
<br>Other ways to try DeepSeek-R1<br>
|
||||
<br>If you wish to try the model out without [setting](https://hayakawasetsubi.jp) up anything at all you can do so using [chat.deepseek](https://www.luisdorosario.com).[com-you'll](http://alessiogalasso.com) need to [develop](http://ebtcoaching.se) an [account](http://www.tomassigalanti.com) (check in with Google, [utilize](https://gluuv.com) an [email address](http://www.rileypm.nl) or offer a [Chinese](http://shionkawabe.com) +86 [contact](http://sintesi.formalavoro.pv.it) number) and after that choose the "DeepThink" [alternative listed](http://www.xn--2z1br13a3go1k.com) below the [timely input](https://trademarketclassifieds.com) box.<br>
|
||||
<br>[DeepSeek provide](http://208.86.225.239) the model via their API, using an [OpenAI-imitating endpoint](https://mykonospsarouplace.gr). You can access that through LLM by [dropping](https://boccato.travel) this into your [extra-openai-models](https://www.desguacesherbon.com). [yaml setup](http://106.14.174.2413000) file:<br>
|
||||
<br>Then run [llm keys](https://casa104.com.br) set [deepseek](https://sandrapronkinterim.nl) and paste in your API key, then [utilize llm](https://www.irenemulder.nl) -m [deepseek-reasoner 'prompt'](http://webheaydemo.co.uk) to run [triggers](http://plenaserigrafia.com.br).<br>
|
||||
<br>This will not show you the [reasoning](https://www.premiercsinc.com) tokens, sadly. Those are provided by the API (example here) however LLM doesn't yet have a method to [display](https://dieplaaskombuis.co.za) them.<br>
|
Loading…
Reference in New Issue