Hello Everyone.
Today we are going to learn how to automate our data science or generally any coding workflow using bash.
At the end of this article you should;
- Know what bash & the shell is,
- How to create folders in the terminal,
- How to create files in the terminal,
- How to run commands in terminal,
- How to create, execute and run bash scripts
So what is bash anyway?
-Bash is a command language interpreter.
It is widely available on various operating systems and is a default command interpreter on most GNU/Linux systems. The name is an acronym for the ‘Bourne-Again Shell’, a pun on the name of the Bourne shell that it replaces and the notion of being "born again."
What is a shell?
A shell allows you to interact with your computer by use of commands, hence retrieve or store data, process information and various other simple or even complex tasks.
How to open the terminal?
Press ctrl+alt+t
. - on Linux or search and launch the terminal program on your computer.
Within the terminal, you can execute bash scripts and do some other magic stuff.
What is a bash script?
A bash script is a series of commands written in a file.
Think of a script for a play, or a movie, or a TV show. The script tells the actors what they should say and do. A script for a computer tells the computer what it should do or say. In the context of Bash scripts we are telling the Bash shell what it should do.
Bash scripts ends with a .sh
file extension. Just like html files ends with .html/htm
and JavaScript files ends with .js
file extensions.
Bash scripts are read and executed by the bash program.
Bash scripts execute line by line.
Now that you have your terminal open, let us look at some basic commands.
to view current working directory - pwd
,
to create a directory - mydirector
,
to change into the directory you created - cd mydirectory
,
to create a file called demo1.sh
- touch demo1.sh
All this commands can be executed once by writing them in one bash script file. Now lets create a bash script file to execute this commands once. The Linux terminal has a build-in text editor called nano
.
A text editor is a program that allows you to create, open, and edit text files on your computer.
We will use this to type the commands above.
Let us write the above commands to our file.
Type the command nano demo1.sh
and type the following commands.
#! /bin/bash
pwd
mkdir mydirectory
cd mydirectory
touch example.sh
Bash Scripts are identified with a shebang(#! /bin/bash) which is our first line of the script.
Shebang tells the shell to execute it via bash shell. It is simply an absolute path to the bash interpreter.
How do we execute bash scripts?
Bash scripts have execution rights for any user executing them.
To check any file permissions, write the command ls filename
in the terminal where filename
is the name of your file.
To check the file permission for our bash script file we execute the command ls demo1.sh
To provide execution rights the command chmod+x
is used. chmod
stands for change mode and the x
is for adding execution rights.
To grand execution rights to our file we should execute the command chmod+x demo1.sh
in the terminal.
Check file permission again using this command.
ls demo1.sh
To run bash scripts, the command ./scriptname
is used where scriptname
is the name of your script. Another command bash scriptname
is also used.
How is this applicable to data science?
The good thing about programming is automation. Not just data scientist but software developers too run repetitive tasks of creating folders and virtual environments. A basic data scientist’s workflow is mainly comprised of;
- Creating folders,
- Creating & activating virtual environments,
- Installing packages,
- And opening a code editor
This steps can be automated in a one bash script.
Let us implement it.
We will call it workflow.sh
.
Note: We will use variables in this scripts.
Bash variables are prefixed with a $
sign.
There is no space between the $
sign and the variable name.
Here is our script.
#!/bin/bash
$directory="myworkflow"
cd Desktop/
mkdir $directory
cd $directory
python -m venv myvenv
source myvenv/bin/activate
python -m pip install --upgrade pip
pip install numpy pandas matplotlib streamlit
code .
Description
Note: When you open the terminal the default directory is usually your home
directory.
Line 1 - Shebang line - Shebang tells the shell to execute it via bash shell.
Line 2 - The variable directory
stores the directory name myworkflow
Line 3 - navigate to the desktop or change to your desktop folder
Line 4 - create a directory called myworkflow
which is stored in your directory
variable.
Line 5 - navigate into your newly created directory
Line 6 - create a virtual environment called myvenv
Line 7 - activate your virtual environment
Line 8 - upgrade pip
Line 9 - install packages
Line 10 - open visual studio code - you should have it installed and configured.
Execute the script
chmod+x workflow.sh
If we run this script using the either command ./workflow.sh
or bash workflow.sh
the above steps will be executed line by line and your visual studio code editor will pop up.
Conclusion
I hope this article has helped you learn how to automate your own workflow or basically learned something new.
Happy Coding!