Machine Learning

machinelearning@lemmy.ml

PostsComments

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 12 days ago

DeepSeek open source DeepEP – library for MoE training and Inference

0

1

DeepSeek open source DeepEP – library for MoE training and Inference

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 12 days ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 18 days ago

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

transformer-circuits.pub

0

1

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

transformer-circuits.pub

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 18 days ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 18 days ago

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

transformer-circuits.pub

0

1

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

transformer-circuits.pub

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 18 days ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

0

1

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

Neurosymbolic AI -- Why, What, and How

0

1

Neurosymbolic AI -- Why, What, and How

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

Classical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligence

0

1

Classical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligence

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

Genie 2: A large-scale foundation world model

deepmind.google

0

1

Genie 2: A large-scale foundation world model

deepmind.google

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

A good primer on what to expect running local LLMs

nullprogram.com

0

1

A good primer on what to expect running local LLMs

nullprogram.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

0

Shamar@feddit.itEnglish · 4 months ago

A community statement supporting the Open Source Definition (OSD)

0

1

A community statement supporting the Open Source Definition (OSD)

Shamar@feddit.itEnglish · 4 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months ago

How ‘Embeddings’ Encode What Words Mean

www.quantamagazine.org

0

1

How ‘Embeddings’ Encode What Words Mean

www.quantamagazine.org

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months ago

New AI model “learns” how to simulate Super Mario Bros. from video footage

arstechnica.com

0

1

New AI model “learns” how to simulate Super Mario Bros. from video footage

arstechnica.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months ago

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o)

0

1

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o)

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months ago

It’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGI

www.lifeiscomputation.com

0

1

It’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGI

www.lifeiscomputation.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months ago

The Difference Between Speaking and Thinking

www.theatlantic.com

0

1

The Difference Between Speaking and Thinking

www.theatlantic.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months ago

Diffusion Models Are Real-Time Game Engines

gamengen.github.io

0

1

Diffusion Models Are Real-Time Game Engines

gamengen.github.io

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 6 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%.

0

1

Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%.

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

Transformer Explainer

poloclub.github.io

0

1

Transformer Explainer

poloclub.github.io

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

Alibaba claims no. 1 spot in AI math models with Qwen2-Math

venturebeat.com

0

1

Alibaba claims no. 1 spot in AI math models with Qwen2-Math

venturebeat.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

0

yboutros@infosec.pubEnglish · 7 months ago

How to convert a positionally encoded predicted embedding from a decoder to its matching token?

0

1

How to convert a positionally encoded predicted embedding from a decoder to its matching token?

yboutros@infosec.pubEnglish · 7 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

New Open-Source AI Image Generator Beats Midjourney, SD3 and Auraflow

0

1

New Open-Source AI Image Generator Beats Midjourney, SD3 and Auraflow

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 7 months ago

0