Browse by (fetches new results):
Tags:
Feeds:
Filter current view (local filtering):
Tags:
Feeds:
| Thing | 
|---|
| Poker Tournament for LLMs | 
| I built the same app 10 times: Evaluating frameworks for mobile performance | 
| Show HN: I tracked the adoption of AI coding extensions in VS Code since 2022 | 
| Altindex - Alternative financial data | 
| Koyfin - financial data | 
| GuruFocus - financial data | 
| Stockanalysis - financial data | 
| Finviz - financial data | 
| Finbox -  financial data | 
| EQ-BenchAI writing benchmarks | 
| Rust template engine comparisons by Askama | 
| LLM comparison in Register-Transfer Level generation for hardware design | 
| Artificial Analysis LLM Leaderboard | 
| @techfren Coding LLM Benchmarks | 
| CadEval - CAD performance of the LLMs | 
| LiveSWEBench - A Challenging, Contamination-Free Benchmark for AI Software Engineers | 
| MathArena: Evaluating LLMs on Uncontaminated Math Competitions | 
| Vellum LLM Leaderboard | 
| ProLLM Leaderboards | 
| Humanity's Last Exam - AI bencmark | 
|  BigCodeBench Leaderboard - Evaluates LLMs with practical and challenging programming tasks | 
| Open LLM Leaderboard | 
| LLM Explorer - A Curated Large Language Model Directory and Analytics | 
| Shadeform - compare GPUs on demand services | 
| EVKX - Electical Vehicles information site | 
| Tranco - A Research-Oriented Top Sites Ranking Hardened Against Manipulation | 
| Claude 3.5 Sonnet vs GPT-4o: Does Claude outperform GPT-4o? | 
|  SWE-bench - Can Language Models Resolve Real-World GitHub Issues? | 
| NYT Connections LLM Benchmark | 
| StackUnseen AI benchmark | 
| Database-like ops benchmark | 
| LiveCodeBench - AI Benchmark | 
| Artificial Analysis - AI comparison | 
| SciCode AI benchmark | 
| SEAL LLM Leaderboards | 
| DB Performance tests | 
| ARC Prize for AGI | 
| Aider AI leaderboard | 
| LLM benchmark | 
| OpenRouter AI router | 
| List of certain products | 
| LiveBench - A Challenging, Contamination-Free LLM Benchmark | 
| LMSYS Chatbot Arena |