工程 on 墨然

工程 on 墨然https://moran.is-a.dev/tags/%E5%B7%A5%E7%A8%8B/Recent content in 工程 on 墨然Hugozh-cnSun, 30 Nov 2025 14:06:00 +0800推理服务别只盯模型：我踩坑后总结的三件小事https://moran.is-a.dev/posts/llm-serving-basics/Sun, 30 Nov 2025 14:06:00 +0800https://moran.is-a.dev/posts/llm-serving-basics/用户觉得“模型不稳定”，很多时候是网关、队列、超时策略在暗地里打架。上下文窗口这事儿：我怎么让大模型“别忘太快”https://moran.is-a.dev/posts/llm-context-window/Fri, 12 Sep 2025 10:05:00 +0800https://moran.is-a.dev/posts/llm-context-window/我以前总以为模型“记性差”，后来才发现：很多遗忘是我自己喂的内容太乱。