We need to imports several libraries such as requests, csv, cx_Oracle, pandas, datetime, and sqlalchemy.
import requests import csv import cx_Oracle import pandas as pd from datetime import datetime, time from datetime import timedelta, date from sqlalchemy import types, create_engine
requests
is a Python library used to send HTTP requests to a server and receive a response.csv
is a built-in Python module for reading and writing CSV files.cx_Oracle
is a Python extension module that allows access to the Oracle Database using the Oracle Call Interface (OCI) API.pandas
is a popular data analysis library for Python that provides data structures and functions for working with structured data.datetime
andtime
are built-in Python modules for working with dates and times.timedelta
is a class in the datetime module used to represent a duration of time.date
is a class in the datetime module used to represent a date.sqlalchemy
is a Python SQL toolkit and Object-Relational Mapping (ORM) library that provides a set of high-level API for connecting to and interacting with relational databases.
These libraries are commonly used in data analysis and data science projects in Python. requests
is used to fetch data from a server, csv
is used to read and write CSV files, cx_Oracle
is used to interact with Oracle databases, pandas
is used for data manipulation and analysis, and datetime
and related modules are used for working with dates and times. Finally, sqlalchemy
is used for connecting to and interacting with relational databases.
cx_Oracle.init_oracle_client(lib_dir=r"C:\oracle\instantclient_19_10") con = cx_Oracle.connect('user_name', 'Password', 'localhost/orcl') conn = create_engine('oracle+cx_oracle://username:password@localhost/orcl')
These lines of code initialize the Oracle Instant Client and create connections to an Oracle database using the cx_Oracle
and sqlalchemy
libraries.
The cx_Oracle.init_oracle_client
function is used to initialize the Oracle Instant Client, which is a set of libraries that provides access to Oracle databases without the need for a full Oracle client installation. This function takes a parameter lib_dir
that specifies the directory path where the Oracle Instant Client libraries are installed.
The cx_Oracle.connect
function is used to create a connection to an Oracle database using the credentials user_name
, password
, and the service name orcl
. The localhost
parameter specifies the hostname of the database server.
The create_engine
function from the sqlalchemy
library is used to create an Engine object that represents a connection to the same Oracle database using the same credentials. The oracle+cx_oracle
parameter specifies the database dialect to use, and the user_name:password@localhost/orcl
parameter specifies the connection string. The resulting conn
object can be used to execute SQL statements against the database using pandas
or other tools that work with sqlalchemy
engines.
def daterange(start_date, end_date): for n in range(int ((end_date - start_date).days)): yield start_date + timedelta(n) timestamp = datetime.now().strftime("%H:%M:%S") now = datetime.now() start_date = date(2023, 3,3) end_date = date(2023, 3, 11)
The daterange
function is a Python generator function that generates a range of dates between a start date and an end date. It takes two arguments, start_date
and end_date
, which are both date
objects representing the start and end dates of the range.
The function uses a for
loop and the yield
statement to generate each date in the range. The range
function is used to generate a sequence of integers representing the number of days between the start date and end date, and the timedelta
function is used to create a time interval of one day that is added to the start date for each iteration of the loop.
The timestamp
variable is a string representing the current time in hours, minutes, and seconds using the format "%H:%M:%S"
. The now
variable is a datetime
object representing the current date and time.
The start_date
and end_date
variables are date
objects representing the start and end dates of the range of dates to generate. In this case, they represent March 3, 2023 and March 11, 2023, respectively.
for single_date in daterange(start_date, end_date): ttime = single_date.strftime("%d%m%Y") filename = 'sec_bhavdata_full_' filename = filename + ttime+'.csv' url = "https://archives.nseindia.com/products/content/"+filename url_oc = "https://www.nseindia.com" headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, ' 'like Gecko) ' 'Chrome/80.0.3987.149 Safari/537.36', 'accept-language': 'en,gu;q=0.9,hi;q=0.8', 'accept-encoding': 'gzip, deflate, br'} session = requests.Session() request = session.get(url_oc, headers=headers, timeout=30) cookies = dict(request.cookies) response = session.get(url, headers=headers, timeout=30, cookies=cookies) decoded_content = response.content.decode('utf-8') print(url) if "will be right back" in decoded_content: print('data not avilable ' + ttime) else: cr = csv.reader(decoded_content.splitlines(), delimiter=',') df = pd.read_csv(url,skipinitialspace = True) #df.rename(columns=lambda x: x.strip()) df = df[(df['SERIES'] == 'EQ')] df['DATE1'] = pd.to_datetime(df['DATE1']) df["DELIV_QTY"] = df["DELIV_QTY"].astype(float) df["DELIV_PER"] = df["DELIV_PER"].astype(float) #df.info() print(df) cur = con.cursor() df.to_sql('bhavdata', conn, if_exists='append', index=False) con.commit() cur.close() cur = con.cursor() cur.execute("delete bhavdata where rowid not in ( select min(rowid) from bhavdata group by symbol,date1)") con.commit() cur.close() con.close() print('done')
- Defines a function
daterange
that generates a sequence of dates between two given dates. - Defines start and end dates.
- Loops through the sequence of dates generated by the
daterange
function. - For each date, generates a URL for downloading a CSV file from the National Stock Exchange of India (NSE) website.
- Uses the
requests
library to download the CSV file and read it into a pandas dataframe. - Cleans the dataframe by keeping only rows where the ‘SERIES’ column equals ‘EQ’, converting the ‘DELIV_QTY’ and ‘DELIV_PER’ columns to floats, and converting the ‘DATE1’ column to a datetime object.
- Uses the
cx_Oracle
library to connect to an Oracle database and write the cleaned data to a table called ‘bhavdata’. - Deletes duplicate rows in the ‘bhavdata’ table, keeping only the row with the minimum ‘rowid’ for each combination of ‘symbol’ and ‘date1’.
- Closes the database connection and prints ‘done’.
Overall, this script downloads daily stock market data from the NSE website and writes it to an Oracle database for further analysis.
0 Comments