Chapter 1 Computing Tips
Below are some tips for dealing with your computers, which will be instrumental to getting things done in informatics and computational biology.
These are just opininated suggestions on making your life easier. There are definitely other ways to do it.
Edit this file here.
1.1 Shell Terminals and Command Line Environments
You will have to get comfortable with a terminal, such as bash
.
Lots of scientific computing tools run exclusively or just easier in the terminal.
Like learning a new language, you’ll want to start simple and incrementally build your command toolbox.
Don’t worry about not being fluent at the start.
Experiment with commands occasionally by reading the documentation, which can be accessed using the man
command.
For example, if you want to learn more about the ls
command, you can type man ls
to read more.
Great places to find a good list of useful commands are the-art-of-command-line and The Grymoire.
If you don’t have access to a bash environment or want easy access, take a look at
Repl.it, which is a place to experiment with not only bash
environments, but also a large list of other programming languages.
1.2 GitHub
If you do not already have one, set up a GitHub account account (it’s free). Make sure that you get the education discount so you can set up private repos, plus some other goodies, too.
1.3 R Programming
If you don’t already have it, download R
and, definitely, RStudio.
RStudio has an excellent collection of cheatsheets for different R packages.
Miscellaneous resources for R programming:
- STAT545 - Data wrangling, exploration, and analysis with R
- R for Data Science - R specific, but general principles on doing data science from transforming data to communicating.
- Advanced R - Dig deeper into R.
- RStudio Resources - Webinars on various R packages and RStudio.
1.4 Python Programming
1.4.1 Anaconda
Anaconda is a bundle of software with a
package manager,
conda
, and it installs a variety of popular data science
packages. Generally, this should cover most data analysis cases you may
encounter.
1.4.2 Conda
conda
is a package manager and environment manager to help manager your
Python instance. Ideally, it should allow you to define your development
environment so you can reproducibly
create
and
share
your specific development environment.
You can get started with conda
here.
1.4.3 Jupyter Notebooks
Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.
Jupyter Notebooks should be installed already if you’ve already read through Section 1.4.1.
1.5 Programming Environment Alternatitves
Here are some options if you have issues setting up your development environmental locally.
- Repl.it - Online coding platform with many languages; great for Python but okay for R
- Google Collaboratory - Run Jupyter Notebooks online
1.6 Windows
TODO
1.6.1 Path Management
TODO
1.7 macOS
TODO
1.8 Linux
TODO
1.9 State Server
Students in DMICE have access to a server state.ohsu.edu
, which can be accessed by
while replacing username
with your school ID.
To reduce headaches of managing software installations, it is recommended to use either Linuxbrew (see 1.8) or conda (see 1.4.2).
1.10 Exacloud Compute Cluster
Exacloud is the supercomputing cluster resource. If you have access to it, this tutorial will be invaluable to you when you do analyses onto the cluster.