Getting started with ai-assisted LLMs
chatGPT, copilot, Palm, POE
Ai-assisted coding tools such as OpenAI’s ChatGPT, Google’s Bard, and GitHub’s Copilot can be a used in code generation, code completion, and learning to code. R-specific approaches to assisted coding are convenient and available through RStudio addins such as {gptstudio}.
This information is fluid and ever-evolving. The AI field of Large Language Models (LLM) is in flux.
LLMs can save you time. They can be biased, inaccurate, and generate harmful content.
Be careful and circumspect about the data uploaded to these models.
Recommendation
For conceptual coding help
Use the free version of GPT4 via bing.com/chat. Set the conversation style to “More Precise”
Alternatively, Claude.ai
Either of the above can be very helpful, especially when focusing your questions (prompt engineering) on the Tidyverse approach to data science.
For code completion
- Signup for GitHub for education, to integrate the free code-completion GitHub Copilot into your RStudio or VSCode IDE
Quick Start
Bing Chat1 can access openAI’s ChatGPT. Google Bard works in any browser. Another option is poe.com. Poe is especially useful for experimenting with a variety of Large Language Models (LLM) including ChatGPT, Google-PaLM (Bard), Sage, and Claude. Each of the options mentioned above are provided free of charge.2
Below is from the Bard LLM when asked for advice on learning to code with ai-assited LLMs:
- Be specific in your requests
- Use clear and concise language
- Provide examples
- Be patient
- Use a variety of LLMs and experiment with different approaches
- Use LLMs as a supplement to other learning resources
- Verify your computational results
See Also: the warning, above, and the ethics section, below.
Tools or packages
Aside from the websites mentioned in the Quick Start section above, there are many approaches to integrating AI-assistance directly into an IDE such as RStudio or VSCode. These addins, or plugins, are highlighted for their seemless integration in a coding IDE.
gptstudio
The gptstudio can reference a variety of LLMs defaulting to ChatGPT. A big advantage to {gptstudio} is the ability to stay within the RStudio IDE, interacting seemlessly with notebooks or plain script files. From the Addins menu, get assistance writing code, or checking spelling and grammar. A companion package {gpttools} can extend {gptstudio}.
Preequisites
- OpenAI account
- OpenAI API key (requires a credit card but may not require payment)
- Set the API key in RStudio environment. For security, if using version control (such as GitHub), include the .Renviron in the .gitignore file
Setup details are explained at gptstudio. Pricing seems exceedingly low as of this writing but there are no garauntees. The API allows coder to set cost limits.
RTutor
There is a website or a package for integrating into RStudio. Does not require an API key, but paying your way is appreciated. This is a tutor designed to help teach about ai-assisted coding and learn about exploring datasets. A subsection of RTutor automatically runs datasets through various EDA packages.
Github Copilot
Copilot is an autocomplete feature and works well in some cases. RStudio offers tips for integrating the tool into the RStudio editor. Defaults to OpenAI’s codex LLM based on code found in GitHub. This tool is focused on code completion rather than conceptual computational thinking. Free for teachers and students.
Comparisons of models
gptstudio: Great for its deep integration into RStudio and ability to work within code-chunks or prose. Setting up the API key is a necessary configuration. The {gptstudio} documentation shows clearly how to protect API-keys linked to a credit card — a notable issue when using version control like GitHub.
Open Source LLMs: h2o (Apache) Llama 2 (restricted Open) are available. Early reports are that Llama 2 is nearly as powerful as GPT 4 and has a mostly-open, or more-open, license than the proprietary licenses of the more proprietary models.
Poe.com: Easy access to several LLM models: ChatGPT, PaLM, GPT-4, Claude, and Sage. Works well. Comparison of LLM responses is convenient. Log-out is found in the settings page.
Claude: Of the easily accessible and free LLMs, I’ve had the best luck with Claude.ai. Llama 2 may be as good or better.
Bard/PaLM: Google’s LLM. Fast, efficient and familiar.
ChatGPT: OpenAI’s LLM. This works well. Free website can slow down due to user congestion limits. Subscription models exist, presumably with less congestion.
Copilot: OpenAI’s codex LLM focused on code completion. In my experience copilot is less useful in understanding conceptual questions about computational data analysis. Uses the VSCode IDE; a nice app that works with many coding languages including R. VSCode also works with Quarto. The R set-up in VSCode is a bit cranky. If you’re a Pythonite, VSCode set-up is more convenient than R. If you’re an R coder, you might be happy sticking with RStudio.
Summary comparison
Claude and Llama 2 are my current favorites. Bard/PaLM and ChatGPT (via addins, poe.com, bing.com, or openai.com) have worked well in my tests. There seem to be some usage-congestion limitations, especially if you don’t subscribe or want to use the latest ChatGPT model.
Ethics
LLMs knowledge bases are private and lack transparency. There are important societal concerns about the fairness of equitable access to these tools. It’s unclear how developers or users of these models can be held accountable.