When tech gets smarter
Yeah, yeah — of course
a computer won at a math competition. That’s not the point. This story,
which concerns a rather amazing program called GeoS from the Allen
Institute for Artificial Intelligence (AI2), is about the ability of AI
to usefully engage with the world. To a computer, with a brain literally
structured for these sorts of operations, the math SAT is not a test on
calculation, but reading comprehension. That’s why this story is so
interesting: GeoS isn’t as good as the average American at geometry,
it’s as good as the average American at the SAT itself.
Specifically, this AI program
was able to score 49% accuracy on official SAT geometry questions, and
61% in practice questions. The 49% figure is basically identical to the
average for real human test-takers. The program was not given digitized
or specially labeled versions of the test, but looked at the exact same
question layout as real students. It read the writing. It interpreted
the diagrams. It figured out what the question was asking, and then it
solved the problem. It only got the answer about half the time — which
makes it roughly as fallible as a human being.
Of course, GeoS makes errors
for different reasons than high-schoolers. A human being might correctly
interpret the question, then apply the wrong formula, or muck up the
calculation. GeoS, being a computer, will virtually always get the
correct answer so long as it truly understands the question. It might
not be able to read a word correctly, or the grammar of a question might
be too alien for the computer to parse. Regardless, what we’re really
measuring here is the computer’s ability to understand human
communication in a form that’s deliberately (pardon the pun) obtuse.
To
do this, the researchers had to smash together a whole array of
different software technologies. GeoS uses optical character recognition
(OCR) algorithms to read the text, and custom language processing to
try to understand what it reads. Geometry questions are structured to be
difficult to parse, hiding important information as inferences and
implications.
The other side of the coin is
that though geometry questions are dense and hard to tease apart,
they’re also extremely uniform in structure and subject matter. The AI’s
programmers can plan for the strict design principles that go into
writing the questions.
It couldn’t take this same programming and
directly apply it to calculus problems for instance, because they use
somewhat different language and mathematical symbols to describe the
problem. But a good GeometryBot would also be relatively easy to adapt
to those few distinguishing rules. Each successive new area of
competence would make the next one easier to acquire.
One intriguing implication of
this research is that someday, we might have algorithms
quality-checking SAT questions. We could have different AI
programs intended to achieve different levels of success on average
questions, perhaps even for different reasons. Run proposed new
questions through them, and their relative performance could not only
weed out bad questions for point to the source of the problem.
BadAtReadingAI and BadAtLogicAI did as expected on the question, but
BadAtDiagramsAI did terribly — maybe the drawing simply needs to be a
little clearer.
This isn’t a sign of the coming AI-pocalypse, or
at least not a particularly immediate sign; as dense as geometry
questions might be, they’re homogeneous and nowhere near as complex as
something like conversational speech. But this study shows how the
individual tools available to AI researchers can be assembled to create
rather full-featured artificial intelligences. When things will really
take off is when those same researchers start snapping together those amalgamations into something far more versatile and full-featured — something not entirely unlike a real biological mind.
Source: extremetech
Your VB Kid
Psypher



No comments:
Post a Comment