Chapter 1 Computing Tips

Below are some tips for dealing with your computers, which will be instrumental to getting things done in informatics and computational biology.

These are just opininated suggestions on making your life easier. There are definitely other ways to do it.

Edit this file here.

1.1 Shell Terminals and Command Line Environments

You will have to get comfortable with a terminal, such as bash. Lots of scientific computing tools run exclusively or just easier in the terminal. Like learning a new language, you’ll want to start simple and incrementally build your command toolbox. Don’t worry about not being fluent at the start.

Experiment with commands occasionally by reading the documentation, which can be accessed using the man command. For example, if you want to learn more about the ls command, you can type man ls to read more.

Great places to find a good list of useful commands are the-art-of-command-line and The Grymoire.

If you don’t have access to a bash environment or want easy access, take a look at Repl.it, which is a place to experiment with not only bash environments, but also a large list of other programming languages.

1.2 GitHub

If you do not already have one, set up a GitHub account account (it’s free). Make sure that you get the education discount so you can set up private repos, plus some other goodies, too.

1.3 R Programming

If you don’t already have it, download R and, definitely, RStudio.

RStudio has an excellent collection of cheatsheets for different R packages.

Miscellaneous resources for R programming:

  • STAT545 - Data wrangling, exploration, and analysis with R
  • R for Data Science - R specific, but general principles on doing data science from transforming data to communicating.
  • Advanced R - Dig deeper into R.
  • RStudio Resources - Webinars on various R packages and RStudio.

1.4 Python Programming

1.4.1 Anaconda

Anaconda is a bundle of software with a package manager, conda, and it installs a variety of popular data science packages. Generally, this should cover most data analysis cases you may encounter.

1.4.2 Conda

conda is a package manager and environment manager to help manager your Python instance. Ideally, it should allow you to define your development environment so you can reproducibly create and share your specific development environment.

You can get started with conda here.

1.4.3 Jupyter Notebooks

Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.

Jupyter Notebooks should be installed already if you’ve already read through Section 1.4.1.

1.5 Programming Environment Alternatitves

Here are some options if you have issues setting up your development environmental locally.

  • Repl.it - Online coding platform with many languages; great for Python but okay for R
  • Google Collaboratory - Run Jupyter Notebooks online

1.6 Windows

TODO

1.6.1 Path Management

TODO

1.7 macOS

TODO

1.8 Linux

TODO

1.9 State Server

Students in DMICE have access to a server state.ohsu.edu, which can be accessed by

while replacing username with your school ID.

To reduce headaches of managing software installations, it is recommended to use either Linuxbrew (see 1.8) or conda (see 1.4.2).

1.10 Exacloud Compute Cluster

Exacloud is the supercomputing cluster resource. If you have access to it, this tutorial will be invaluable to you when you do analyses onto the cluster.