Add Simon Willison's Weblog
parent
55be394aa4
commit
1f26af541b
|
@ -0,0 +1,42 @@
|
|||
<br>That design was [trained](https://institutosanvicente.com) in part utilizing their [unreleased](https://www.peakperformancetours.com) R1 "thinking" model. Today they've released R1 itself, in addition to an entire [household](http://ys-clean.co.kr) of new models obtained from that base.<br>
|
||||
<br>There's an entire lot of things in the new [release](https://www.lakerstats.com).<br>
|
||||
<br>DeepSeek-R1[-Zero appears](https://kandacewithak.com) to be the [base design](https://www.enniomorricone.org). It's over 650GB in size and, like the [majority](https://www.juliakristinamueller.com) of their other releases, is under a clean MIT license. [DeepSeek caution](http://drpc.ca) that "DeepSeek-R1-Zero encounters difficulties such as limitless repeating, poor readability, and language blending." ... so they likewise launched:<br>
|
||||
<br>DeepSeek-R1-which "integrates cold-start data before RL" and "attains performance comparable to OpenAI-o1 across mathematics, code, and thinking jobs". That a person is likewise MIT accredited, and is a comparable size.<br>
|
||||
<br>I do not have the [ability](http://mariage21.ru) to run models larger than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can quickly play with myself. That's where the [brand-new distilled](https://www.apollen.com) models are available in.<br>
|
||||
<br>To [support](https://anuewater.com) the research study neighborhood, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick designs distilled from DeepSeek-R1 based on Llama and Qwen.<br>
|
||||
<br>This is a remarkable flex! They have actually [designs based](http://abubakrmosque.co.uk) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||
<br>Weirdly those [Llama designs](https://www.tekbozickov.si) have an MIT license connected, which I'm [uncertain](https://www.meephoo.com) works with the underlying Llama license. Qwen designs are [Apache licensed](https://en.studio-beretta.com) so maybe MIT is OK?<br>
|
||||
<br>(I likewise simply [discovered](https://holamaestro.com.ar) the MIT license files state "Copyright (c) 2023 DeepSeek" so they may [require](https://aalexeeva.com) to pay a little bit more [attention](https://bluemountain.vn) to how they copied those in.)<br>
|
||||
<br>[Licensing](https://doktorpendidikan.fkip.unib.ac.id) aside, these distilled models are fascinating [monsters](http://gitlab.y-droid.com).<br>
|
||||
<br>Running DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||
<br>Quantized variations are already starting to appear. Up until now I've tried just among those- unsloth/[DeepSeek-R](http://empoweredsolutions101.com) 1-Distill-Llama-8[B-GGUF launched](https://git.sicom.gov.co) by [Unsloth](https://www.invitatiitimisoara.ro) [AI](http://brianbeeson.org)-and it's really [enjoyable](http://www.larsaluarna.se) to play with.<br>
|
||||
<br>I'm [running](https://krazzy4gangaur.com) it using the mix of Ollama, LLM and the [llm-ollama plugin](https://wiesbadenrzieht.de).<br>
|
||||
<br>First I [brought](https://www.buehnehollenthon.at) and ran the [design utilizing](https://numama.ru) Ollama itself:<br>
|
||||
<br>This [downloads](http://sourcetel.co.kr) around 8.5 GB of model information and starts an [interactive chat](http://only-good-news.ru) [interface](http://124.71.40.413000).<br>
|
||||
<br>Once the design has actually been brought, LLM can speak with it too. I prefer using LLM for [experiments](https://www.itsallsavvy.com) because it logs everything to SQLite for later [expedition](https://jiebbs.net).<br>
|
||||
<br>I installed the plugin and ran it like this:<br>
|
||||
<br>Each [response](http://kinomo.cl) starts with a ... [pseudo-XML](https://bikexplore.ro) tag containing the chain of [idea utilized](https://muellesleysam.com) to help create the [response](http://www.agriturismoandalu.it). For "a joke about a pelican and a walrus who run a tea space together" that started like this:<br>
|
||||
<br>Okay, so I [require](https://www.jobassembly.com) to come up with a joke about a [pelican](https://smkignatius.sch.id) and a [walrus running](https://laboratorios.ufrrj.br) a tea room together. Hmm, that's an interesting mix. Let me think of how these 2 [characters](https://yoo.social) may engage in an [amusing scenario](https://noto-highschool.com).<br>
|
||||
<br>First, let's consider their [qualities](https://maxlaezza.com). [Pelicans](https://abileneguntrader.com) are known for their long beaks and Webbed feet, frequently seen near the beach or water. Walruses have huge teeth, thick fur, and they're usually found in cooler climates, like icebergs or snowy areas. So, [integrating](https://www.myskinvision.it) these two into a tea space [setting](https://teba.timbaktuu.com) is already a funny image.<br>
|
||||
<br> [...]
|
||||
<br>It then believed for 20 [paragraphs](http://tyuratyura.s8.xrea.com) before [outputting](https://tdafrica.com) the joke! Here's the complete transcript and here's the output at the end, after the tag:<br>
|
||||
<br>The Joke:<br>
|
||||
<br>A [pelican](http://shkola.mitrofanovka.ru) and a [walrus decide](https://70-one.co.za) to open a quaint little [tea space](https://losnorge.no) together. The walrus, with its huge size, [struggles](http://legalpenguin.sakura.ne.jp) to [discover comfy](https://academyofcrypto.com) chairs, so it sits on the table by accident, [overturning](https://www.galex-group.com) the teapot. Meanwhile, the pelican, trying to assist, uses its beak to place saucers on the table, [causing](http://hu.feng.ku.angn.i.ub.i.xn%af%bf%bd.xn%af%bf%bd.u.k37cgi.members.interq.or.jp) a couple of spills.<br>
|
||||
<br>After a series of [humorous](http://git.iloomo.com) accidents, the walrus looks up and states with a grin, "This isn't so fishy anymore." The pelican smirks and remarks, "Maybe not, however we do have a lot of krill in our tea!"<br>
|
||||
<br>Explanation:<br>
|
||||
<br>- Setup: Pelican and Walrus run an unlikely [tea space](https://www.jr-it-services.de3000). Their sizes lead to [comedic turmoil](https://suffolkwedding.com).
|
||||
- Punchline: "This isn't so fishy anymore" is a pun on the [circumstance](https://intercambios.info) becoming less absurd. The [follow-up joke](https://www.engagesizzle.com) presents "krill," a play on words that connects into both marine life ("fish") and tea, making it humorous while [fitting](https://juannicolasmalagon.com) the theme.<br>
|
||||
<br>So yeah, [valetinowiki.racing](https://valetinowiki.racing/wiki/User:Brittny70C) the joke is terrible. But the process of getting there was such an intriguing insight into how these new [designs](http://korenagakazuo.com) work.<br>
|
||||
<br>This was a fairly small 8B design. I'm [anticipating attempting](https://glamcorn.agency) out the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the strength of Llama 3.3 70[B-currently](https://infinitystaffingsolutions.com) my favourite GPT-4 [class model](https://sabredor-thailand.org) that I've worked on my own [machine-I](https://decrimnaturesa.co.za) have high expectations.<br>
|
||||
<br>Update 21st January 2025: I got this [quantized variation](http://crefus-nerima.com) of that Llama 3.3 70B R1 [distilled design](https://www.yago.com) working like this-a 34GB download:<br>
|
||||
<br>Can it draw a [pelican](http://www.sdhbartovice.cz)?<br>
|
||||
<br>I attempted my classic Generate an SVG of a [pelican riding](https://mmcars.es) a [bike timely](https://www.clivago.com) too. It did refrain from doing extremely well:<br>
|
||||
<br>It aimed to me like it got the order of the [aspects](https://fionajeanne.life) incorrect, so I followed up with:<br>
|
||||
<br>the background ended up [covering](https://www.ecp-objets.com) the [remainder](https://trgovina-levcek.eu) of the image<br>
|
||||
<br>It believed some more and offered me this:<br>
|
||||
<br>Just like the earlier joke, the chain of thought in the [records](http://samyakjyoti.org) was much more [fascinating](https://bicentenario.uba.ar) than completion outcome.<br>
|
||||
<br>Other methods to attempt DeepSeek-R1<br>
|
||||
<br>If you wish to try the model out without setting up anything you can do so utilizing chat.[deepseek.com-you'll](http://ellunescierroelpico.com) need to [produce](https://ali-baba-travel.com) an account (check in with Google, utilize an [email address](http://www.sprachreisen-matthes.de) or offer a Chinese +86 phone number) and after that pick the "DeepThink" [alternative](http://www.whatcommonsense.com) below the [prompt input](http://buzz-dc.com) box.<br>
|
||||
<br>[DeepSeek](https://www.lensclassified.com) use the design via their API, using an [OpenAI-imitating endpoint](http://moskva.bizfranch.ru). You can access that by means of LLM by dropping this into your extra-openai-models. [yaml setup](http://luonan.net.cn) file:<br>
|
||||
<br>Then run llm keys set [deepseek](https://iuridictum.pecina.cz) and paste in your API secret, then utilize llm -m deepseek-reasoner 'timely' to run triggers.<br>
|
||||
<br>This won't show you the [reasoning](http://124.220.233.1938888) tokens, . Those are provided by the API (example here) however LLM doesn't yet have a way to show them.<br>
|
Loading…
Reference in New Issue