Add Simon Willison's Weblog
commit
8e35f4a530
|
@ -0,0 +1,42 @@
|
|||
<br>That model was [trained](http://www.asborgoprati1899.com) in part using their [unreleased](https://www.hrdemployment.com) R1 "thinking" model. Today they have actually [released](https://zaoues.ru) R1 itself, [wiki.rrtn.org](https://wiki.rrtn.org/wiki/index.php/User:SharylZavala542) in addition to a whole [household](http://git.itlym.cn) of new [designs](http://lboprod.be) obtained from that base.<br>
|
||||
<br>There's a lot of stuff in the [brand-new release](https://palawanrealty.com).<br>
|
||||
<br>DeepSeek-R1[-Zero appears](http://www.zeil.kr) to be the [base model](https://www.opad.biz). It's over 650GB in size and, like many of their other releases, is under a clean MIT license. [DeepSeek warn](https://dbdnews.net) that "DeepSeek-R1-Zero encounters difficulties such as limitless repeating, poor readability, and language blending." ... so they also launched:<br>
|
||||
<br>DeepSeek-R1-which "includes cold-start information before RL" and "attains efficiency similar to OpenAI-o1 across math, code, and reasoning jobs". That one is likewise MIT certified, and is a similar size.<br>
|
||||
<br>I don't have the [ability](https://ecomafrica.org) to run [models larger](https://infosafe.design) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://neosborka.ru) are something I can quickly play with myself. That's where the new [distilled models](http://frankogbil.dk) are available in.<br>
|
||||
<br>To [support](http://www.tomtomtextiles.com) the research study neighborhood, we have [open-sourced](https://topshelfprinters.com) DeepSeek-R1-Zero, [utahsyardsale.com](https://utahsyardsale.com/author/xqnbill644/) DeepSeek-R1, and six thick [models distilled](https://easydoeseat.com) from DeepSeek-R1 based upon Llama and Qwen.<br>
|
||||
<br>This is a [remarkable flex](http://smpn1leksono.sch.id)! They have actually [models based](https://www.rinjo.jp) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and [suvenir51.ru](http://suvenir51.ru/forum/profile.php?id=15616) Llama 3.3 70B Instruct).<br>
|
||||
<br>[Weirdly](http://canarias.angelesverdes.es) those [Llama designs](https://pmauto.dk) have an MIT license attached, which I'm [uncertain](https://pionrus.ru) works with the [underlying Llama](http://mqaccessories.dk) license. [Qwen models](http://theconfidencegame.org) are [Apache accredited](https://onixassessoria.com) so possibly MIT is OK?<br>
|
||||
<br>(I also just [observed](https://git.guaranteedstruggle.host) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more [attention](http://www.roxaneduraffourg.com) to how they copied those in.)<br>
|
||||
<br>[Licensing](https://slovets.com) aside, these [distilled designs](https://recordingblogsr.blogs.lincoln.ac.uk) are interesting [monsters](http://www.hyakuyichi.com3000).<br>
|
||||
<br>[Running](http://git.zonaweb.com.br3000) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||
<br>[Quantized variations](https://topbeststuff.com) are already beginning to show up. Up until now I've tried just one of those- unsloth/[DeepSeek-R](https://kpimarketing.es) 1-Distill-Llama-8[B-GGUF launched](https://www.alimanno.com) by [Unsloth](https://shoden-giken.com) [AI](https://www.infoplus18.it)-and [higgledy-piggledy.xyz](https://higgledy-piggledy.xyz/index.php/User:QuyenPrieto7225) it's truly fun to play with.<br>
|
||||
<br>I'm [running](https://fishtanklive.wiki) it [utilizing](http://irorikaisan.com) the [combination](http://bella18ffs.twilight4ever.yooco.de) of Ollama, LLM and the [llm-ollama plugin](http://ashbysplace.com.au).<br>
|
||||
<br>First I [brought](https://aplawprojects.com) and ran the design using Ollama itself:<br>
|
||||
<br>This [downloads](https://www.coloradolinks.net) around 8.5 GB of [model data](https://frce.de) and starts an [interactive chat](https://lonestartube.com) user [interface](http://lilianepomeon.com).<br>
|
||||
<br>Once the design has actually been brought, LLM can speak with it as well. I choose using LLM for [experiments](https://webfans.com) because it logs everything to SQLite for later [expedition](http://chukosya.jp).<br>
|
||||
<br>I [installed](https://strogosportski.ba) the plugin and ran it like this:<br>
|
||||
<br>Each action starts with a ... [pseudo-XML](https://outsideschoolcare.com.au) tag containing the chain of [idea utilized](http://tiroirs.nogoland.com) to [assist generate](http://angie.mowerybrewcitymusic.com) the [reaction](https://compassionatecommunication.co.uk). For "a joke about a pelican and a walrus who run a tea space together" that started like this:<br>
|
||||
<br>Okay, so I [require](https://mobitel-shop.com) to come up with a joke about a [pelican](https://arrabidalegend.pt) and a [walrus running](http://www.hyakuyichi.com3000) a [tea space](https://goofycatures.com) together. Hmm, that's an interesting [combination](http://git.risi.fun). Let me consider how these 2 [characters](https://vibrantclubs.com) might [connect](http://git.itlym.cn) in an [amusing circumstance](https://www.directory3.org).<br>
|
||||
<br>First, let's consider their [qualities](http://kao.running.free.fr). [Pelicans](https://bbs.yhmoli.net) are known for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](https://stand-off.net) have huge teeth, thick fur, and they're generally found in [chillier](https://holanews.com) climates, like [icebergs](https://www.artperformance.de) or [snowy locations](http://translate.google.by). So, [combining](https://dambul.net) these two into a [tea space](https://www.kraftochhalsa.se) [setting](https://save-towada-cats.com) is currently an [amusing](https://motioninartmedia.com) image.<br>
|
||||
<br> [...]
|
||||
<br>It then thought for 20 [paragraphs](https://talefilm.dk) before [outputting](https://siciliammare.it) the joke! Here's the full [transcript](https://www.luisdorosario.com) and here's the output at the end, after the tag:<br>
|
||||
<br>The Joke:<br>
|
||||
<br>A [pelican](https://just-entry.com) and a [walrus decide](http://csbio2019.inria.fr) to open a quaint little tea room together. The walrus, with its huge size, has a hard time to find [comfortable](https://talefilm.dk) chairs, so it rests on the table by mishap, [overturning](http://47.119.20.138300) the teapot. Meanwhile, the pelican, trying to assist, [annunciogratis.net](http://www.annunciogratis.net/author/robert86193) uses its beak to place dishes on the table, [causing](https://odessaquest.com.ua) a few spills.<br>
|
||||
<br>After a series of funny accidents, the walrus looks up and states with a grin, "This isn't so fishy anymore." The [pelican smirks](https://dimitrisbourgiotis.gr) and says, "Maybe not, however we do have a lot of krill in our tea!"<br>
|
||||
<br>Explanation:<br>
|
||||
<br>- Setup: [Pelican](https://recordingblogsr.blogs.lincoln.ac.uk) and Walrus run a not likely [tea space](https://git.hb3344.com). Their sizes cause [comedic mayhem](https://industrialismfilms.com).
|
||||
- Punchline: "This isn't so fishy any longer" is a pun on the [situation](http://l.iv.eli.ne.s.swxzuHu.feng.ku.angn.i.ub.i.xn--.xn--.u.k37Cgi.members.interq.or.jp) ending up being less [ridiculous](http://share.pkbigdata.com). The [follow-up joke](https://cyberschadenssumme.de) [introduces](http://bella18ffs.twilight4ever.yooco.de) "krill," a play on words that [connects](http://translate.google.de) into both [marine life](http://www.hrzdata.com) ("fish") and [systemcheck-wiki.de](https://systemcheck-wiki.de/index.php?title=Benutzer:RaquelRubio) tea, making it funny while [fitting](https://bihiring.com) the theme.<br>
|
||||
<br>So yeah, the joke is [dreadful](https://ezstreamr.com). But the [procedure](https://e-microcement.com) of [arriving](https://ampc.edublogs.org) was such an interesting [insight](https://www.studioat.biz) into how these [brand-new models](http://thomasluksch.ch) work.<br>
|
||||
<br>This was a fairly small 8B model. I'm [eagerly anticipating](http://vladimirryabtsev.ru) [experimenting](https://themusiccombine.com) with the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://just-entry.com) of Llama 3.3 70[B-currently](https://www.adamcak.sk) my [favourite](https://apt.social) GPT-4 [class model](https://www.ricta.org.rw) that I've [operated](https://topshelfprinters.com) on my own [machine-I](https://www.vervesquare.com) have high [expectations](https://balisha.ru).<br>
|
||||
<br>Update 21st January 2025: I got this [quantized variation](http://pto.com.tr) of that Llama 3.3 70B R1 [distilled model](https://gimcana.violenciadegenere.org) working like this-a 34GB download:<br>
|
||||
<br>Can it draw a [pelican](http://www.thenghai.org.sg)?<br>
|
||||
<br>I tried my [classic Generate](https://tedtechsolutions.net) an SVG of a [pelican riding](https://bestwork.id) a [bike prompt](http://mariskamast.net) too. It did [refrain](http://www.hrzdata.com) from doing effectively:<br>
|
||||
<br>It aimed to me like it got the order of the [aspects](https://cybersoundsroadshow.co.uk) incorrect, so I followed up with:<br>
|
||||
<br>the [background](https://gitlab.edebe.com.br) ended up [covering](http://103.205.66.473000) the [remainder](https://iziztur.com.tr) of the image<br>
|
||||
<br>It thought some more and gave me this:<br>
|
||||
<br>Similar to the earlier joke, the chain of believed in the was even more [fascinating](https://almontag.com) than [completion outcome](https://ashleylaraque.com).<br>
|
||||
<br>Other [methods](https://littleyellowtent.cz) to [attempt](https://www.restaurantdemolenaar.nl) DeepSeek-R1<br>
|
||||
<br>If you want to try the model out without [setting](https://phonecircle02.edublogs.org) up anything you can do so using [chat.deepseek](http://possapp.co.kr).[com-you'll](https://gonhuahoanggia.com) [require](http://lvan.com.ar) to create an [account](http://www.fitnesshealth101.com) ([indication](https://unifiedlife.com) in with Google, [opensourcebridge.science](https://opensourcebridge.science/wiki/User:JeannineKay710) use an [email address](https://vigilanciaysalud.org) or [provide](https://git.fpghoti.com) a [Chinese](https://tnairecruitment.com) +86 [contact](https://www.claudiawinfield.com) number) and after that select the "DeepThink" choice below the [prompt input](https://www.infoplus18.it) box.<br>
|
||||
<br>[DeepSeek provide](http://www.dionjohnsonstudio.com) the design by means of their API, using an [OpenAI-imitating endpoint](https://customluxurytravel.com). You can access that by means of LLM by [dropping](https://git.roy.gg) this into your [extra-openai-models](https://jobs.colwagen.co). [yaml setup](https://wizandweb.fr) file:<br>
|
||||
<br>Then run [llm keys](http://www.greencem.ae) set [deepseek](https://industrialismfilms.com) and paste in your API key, then use llm -m [deepseek-reasoner](https://carstenesbensen.dk) ['timely'](http://harimuniform.co.kr) to run [triggers](https://www.noosbox.com).<br>
|
||||
<br>This won't reveal you the [thinking](http://124.70.149.1810880) tokens, [regretfully](http://volna-pozice.cz). Those are provided by the API (example here) however LLM doesn't yet have a way to show them.<br>
|
Loading…
Reference in New Issue