Download the original paper (https://www.sciencedirect.com/science/article/pii/S0045653524022173) and, using the ChatGPT website, try to reproduce Ethan Mollick's results, using his prompt: "carefully check the math in this paper".
Play around with alternative prompts. We have two goals at this stage: generate terse output that quickly indicates whether it found errors or not; and check for other kinds of errors besides just math. Off the top of my head, you might try things like:
- Carefully check the math in this paper. When you are done, if you identified anything that looks like an error, say 'THIS PAPER CONTAINS A POSSIBLE ERROR' and then explain the error. Otherwise, say 'NO ERRORS FOUND'.
- Same prompt, but the first sentence is "Carefully check this paper for any factual misstatements or mistaken assumptions".
- Same prompt, but the first sentence is "Carefully check this paper for any errors in reasoning".
Try the same thing with a few other papers, and fix any obvious problems with the prompts.
Move over to the API and try this on 1, then 10, then 100 papers. Produce a report that indicates the number of papers reviewed, the number of (alleged) errors found in each category (math, fact, logic), the API fees spent, and then lists the papers and the errors reported for each paper.
Come up with some good way of taking a random sample across a large repository of papers, or perhaps a few different repositories from different fields.
At this point we might go in different directions, depending on what sort of results we're seeing. Further experiments might include:
- Continuing on to larger numbers of papers
- If it's claiming to find errors but on inspection we find that often they aren't really errors, see whether we can get it to self-correct, e.g. ask it "are you sure this is an error" or "is there any interpretation under which this would not be an error".
- Try the o1-pro model

Prompts

h/t Este:

Please carefully review the following research article (provided in full text below). I want you to focus on verifying the accuracy of all arithmetic calculations, unit conversions, numerical comparisons, and quantitative interpretations presented. For each identified issue or area of concern, explain in detail:

What the calculation, unit conversion, or numeric claim is supposed to represent.

The step-by-step reasoning or arithmetic that the authors appear to have used.

Any errors, inconsistencies, or suspicious values you find, along with the correct calculation or a more appropriate method.

Whether the numeric results are reasonable in the context of the data, methods, and known standards.

If no errors are found, summarize the checks you performed and confirm that the numbers appear consistent and plausible. Aim to be as thorough, clear, and evidence-based as possible in your analysis.