Automate your Data Science Workflow with Bash

·

5 min read

Automate your Data Science Workflow with Bash

Hello Everyone.

Today we are going to learn how to automate our data science or generally any coding workflow using bash.

At the end of this article you should;

  • Know what bash & the shell is,
  • How to create folders in the terminal,
  • How to create files in the terminal,
  • How to run commands in terminal,
  • How to create, execute and run bash scripts

So what is bash anyway?

-Bash is a command language interpreter.

It is widely available on various operating systems and is a default command interpreter on most GNU/Linux systems. The name is an acronym for the ‘Bourne-Again Shell’, a pun on the name of the Bourne shell that it replaces and the notion of being "born again."

What is a shell?

A shell allows you to interact with your computer by use of commands, hence retrieve or store data, process information and various other simple or even complex tasks.

How to open the terminal?

Press ctrl+alt+t. - on Linux or search and launch the terminal program on your computer. Within the terminal, you can execute bash scripts and do some other magic stuff.

What is a bash script?

A bash script is a series of commands written in a file.

Think of a script for a play, or a movie, or a TV show. The script tells the actors what they should say and do. A script for a computer tells the computer what it should do or say. In the context of Bash scripts we are telling the Bash shell what it should do.

Bash scripts ends with a .sh file extension. Just like html files ends with .html/htm and JavaScript files ends with .js file extensions.

Bash scripts are read and executed by the bash program.

Bash scripts execute line by line.

Now that you have your terminal open, let us look at some basic commands.

to view current working directory - pwd,

to create a directory - mydirector,

to change into the directory you created - cd mydirectory,

to create a file called demo1.sh - touch demo1.sh

All this commands can be executed once by writing them in one bash script file. Now lets create a bash script file to execute this commands once. The Linux terminal has a build-in text editor called nano.

A text editor is a program that allows you to create, open, and edit text files on your computer.

We will use this to type the commands above.

Let us write the above commands to our file.

Type the command nano demo1.sh and type the following commands.

#! /bin/bash
pwd
mkdir mydirectory
cd mydirectory
touch example.sh

Bash Scripts are identified with a shebang(#! /bin/bash) which is our first line of the script.

Shebang tells the shell to execute it via bash shell. It is simply an absolute path to the bash interpreter.

How do we execute bash scripts?

Bash scripts have execution rights for any user executing them. To check any file permissions, write the command ls filename in the terminal where filename is the name of your file. To check the file permission for our bash script file we execute the command ls demo1.sh

To provide execution rights the command chmod+x is used. chmod stands for change mode and the x is for adding execution rights. To grand execution rights to our file we should execute the command chmod+x demo1.sh in the terminal.

Check file permission again using this command.

ls demo1.sh

To run bash scripts, the command ./scriptname is used where scriptname is the name of your script. Another command bash scriptname is also used.

How is this applicable to data science?

The good thing about programming is automation. Not just data scientist but software developers too run repetitive tasks of creating folders and virtual environments. A basic data scientist’s workflow is mainly comprised of;

  • Creating folders,
  • Creating & activating virtual environments,
  • Installing packages,
  • And opening a code editor

This steps can be automated in a one bash script.

Let us implement it.

We will call it workflow.sh.

Note: We will use variables in this scripts. Bash variables are prefixed with a $ sign. There is no space between the $ sign and the variable name.

Here is our script.

#!/bin/bash
$directory="myworkflow"
cd Desktop/
mkdir $directory
cd $directory
python -m venv myvenv
source myvenv/bin/activate
python -m pip install --upgrade pip
pip install numpy pandas matplotlib streamlit
code .

Description

Note: When you open the terminal the default directory is usually your home directory.

Line 1 - Shebang line - Shebang tells the shell to execute it via bash shell.

Line 2 - The variable directory stores the directory name myworkflow

Line 3 - navigate to the desktop or change to your desktop folder

Line 4 - create a directory called myworkflow which is stored in your directory variable.

Line 5 - navigate into your newly created directory

Line 6 - create a virtual environment called myvenv

Line 7 - activate your virtual environment

Line 8 - upgrade pip

Line 9 - install packages

Line 10 - open visual studio code - you should have it installed and configured.

Execute the script

chmod+x workflow.sh

If we run this script using the either command ./workflow.sh or bash workflow.sh the above steps will be executed line by line and your visual studio code editor will pop up.

Conclusion

I hope this article has helped you learn how to automate your own workflow or basically learned something new.

Happy Coding!