LibGuides: Researching with Artificial Intelligence: Known Issues With Text Generated by AI

Known Issues With Text Generated by AI

The text generative AI produces can seem like magic, but it is prone to significant errors based on how it actually functions behind the chat window you use to prompt it. Tab through the options here to learn more about what you should keep in mind when evaluating the results produced by the generative AI services you use.

Hallucinations, in the context of generative AI, are when a generative AI chatbot provides a convincing answer to a question that is either partially incorrect or entirely incorrect. Researchers don't like the term "hallucination" because it suggests that a chatbot has an "intent" that the error can't be explained. However, since it is still the common term for an error, you should be aware of what it means.

Generative AI is designed to be helpful and affirming, and will try to be helpful and affirming to you at all costs, even providing you with answers that are incorrect. A more technical explanation of this behaviour is that “Hallucinations often occur when the model fills in gaps based on similar contexts from its training data, or when it is built using biased or incomplete training data. This leads to incorrect guesses.”

If AI doesn’t know, it will try to guess rather than say it doesn’t know. If some information you’ve been provided by generative AI seems false, verify it on your own: it very well could be a friendly and convincing hallucination.

Bias is a huge problem with the results provided by generative AI. Since generative AI services train their models on existing sources of human knowledge found on the internet, their services will reflect the bias present in that data. To be more precise, “bias in GenAI manifests in multiple forms, including gender, racial, cultural, and ideological biases. These biases frequently arise from both non-human and human factors embedded in training data and algorithms, reflecting societal prejudices and inequities.”

Bias exists in all writing, art, and information, and since generative AI is trained on all we create, it will reflect those biases. This is particularly problematic when generative AI is used blindly for large-scale decision making, such as insurance claims or college admissions. The stakes are, fortunately, lower when you as a student are using it for ideas or researcher starters. Be dilligent in assessing the results of your genAI queries for bias, just as you would anything written or created by another person.

Depending on which generative AI system you use, it may not be clear where the model is sourcing its information from. ChatGPT, for instance, does not always provide sources, whereas Perplexity always does. Even when these systems do provide their sources, you may be surprised to see that they’re often from the first sources you might consult yourself: Wikipedia, other online encyclopedias, and news articles. This is because most academic research is not available on the open web, and can’t be scoured by these models. Open source research articles are, of course, the exception to this.

Using generative AI can be good for beginning your research in some instances, but you may find that that you still need to do some searching yourself on OCtopus for those peer-reviewed sources.

Generative AI companies have disregarded copyright entirely in the training of their text and image models. Meta, for instance, trained its Llama 3 model on LibGen, a well known online repository for stolen books and textbooks. Their defence, which has at times worked in their favour, is that training their models on copyrighted material constitutes fair use. This is still being legally disputed, however, and Disney and Universal have found enough compelling evidence of copyright infringement in Midjourney’s results that they have launched a lawsuit against the company, calling it a “bottomless pit of plagiarism.”

Because these companies haven’t been transparent about how their models are trained, the unauthorized use of copyrighted material could be more extensive than even these examples indicate. Generative AI services are operating in, at best, a legal grey area at the moment, with much to be decided through future legal precedent. While this does not have any bearing on Gen-AI’s accuracy, it is still something to consider before using these services.