Experimenting with a new LLM: QWEN
We have established an Alibaba Cloud account to evaluate their model offerings. Initial impressions are consistent with expectations: their models trail the frontier but perform comparably to Llama 3.1 and similar large open-source models — perhaps six to twelve months behind the leading providers. This week I plan to run direct comparisons against our existing Cognosa use cases, which is straightforward given the platform's ability to route identical queries across models and compare outputs side by side.
I have reduced the frequency of these comparisons, however, for two reasons. First, my daily satisfaction with Claude continues to increase. Second, ongoing disappointment with OpenAI — not only in coding tasks but across technical domains including DNS, Docker Compose, and infrastructure troubleshooting — reduces my incentive to benchmark regularly. Gemini has emerged as my preferred backup assistant and likely outperforms ChatGPT in most of my workflows, though I continue to alternate between them. I have also discontinued running large open-source models locally on my Mac Studio: inference is too slow, output quality insufficient, and my API spend across frontier providers remains well below any meaningful optimization threshold — at least for now.