Topic: benchmark

IBM Research creates new benchmark for measuring AI

IBM Research has created AGENT, a benchmark for evaluating an AI model’s core psychological reasoning ability, or common sense, to help users build and test AI models that reason similar to how humans do.  “We’re making progress toward building AI agents that can infer mental states, predict future actions, and even work with human partners. … continue reading Protection Status