1. Download the original paper (https://www.sciencedirect.com/science/article/pii/S0045653524022173) and, using the ChatGPT website, try to reproduce Ethan Mollick's results, using his prompt: "carefully check the math in this paper".
  2. Play around with alternative prompts. We have two goals at this stage: generate terse output that quickly indicates whether it found errors or not; and check for other kinds of errors besides just math. Off the top of my head, you might try things like:
  3. Try the same thing with a few other papers, and fix any obvious problems with the prompts.
  4. Move over to the API and try this on 1, then 10, then 100 papers. Produce a report that indicates the number of papers reviewed, the number of (alleged) errors found in each category (math, fact, logic), the API fees spent, and then lists the papers and the errors reported for each paper.
  5. Come up with some good way of taking a random sample across a large repository of papers, or perhaps a few different repositories from different fields.
  6. At this point we might go in different directions, depending on what sort of results we're seeing. Further experiments might include:

Prompts

h/t Este:

Please carefully review the following research article (provided in full text below). I want you to focus on verifying the accuracy of all arithmetic calculations, unit conversions, numerical comparisons, and quantitative interpretations presented. For each identified issue or area of concern, explain in detail:

  1. What the calculation, unit conversion, or numeric claim is supposed to represent.
  2. The step-by-step reasoning or arithmetic that the authors appear to have used.
  3. Any errors, inconsistencies, or suspicious values you find, along with the correct calculation or a more appropriate method.
  4. Whether the numeric results are reasonable in the context of the data, methods, and known standards.

If no errors are found, summarize the checks you performed and confirm that the numbers appear consistent and plausible. Aim to be as thorough, clear, and evidence-based as possible in your analysis.