We need to imports several libraries such as requests, csv, cx_Oracle, pandas, datetime, and sqlalchemy.

import requests
import csv
import cx_Oracle
import pandas as pd
from datetime import datetime, time
from datetime import timedelta, date
from sqlalchemy import types, create_engine
  • requests is a Python library used to send HTTP requests to a server and receive a response.
  • csv is a built-in Python module for reading and writing CSV files.
  • cx_Oracle is a Python extension module that allows access to the Oracle Database using the Oracle Call Interface (OCI) API.
  • pandas is a popular data analysis library for Python that provides data structures and functions for working with structured data.
  • datetime and time are built-in Python modules for working with dates and times.
  • timedelta is a class in the datetime module used to represent a duration of time.
  • date is a class in the datetime module used to represent a date.
  • sqlalchemy is a Python SQL toolkit and Object-Relational Mapping (ORM) library that provides a set of high-level API for connecting to and interacting with relational databases.

These libraries are commonly used in data analysis and data science projects in Python. requests is used to fetch data from a server, csv is used to read and write CSV files, cx_Oracle is used to interact with Oracle databases, pandas is used for data manipulation and analysis, and datetime and related modules are used for working with dates and times. Finally, sqlalchemy is used for connecting to and interacting with relational databases.

cx_Oracle.init_oracle_client(lib_dir=r"C:\oracle\instantclient_19_10")
con = cx_Oracle.connect('user_name', 'Password', 'localhost/orcl')
conn = create_engine('oracle+cx_oracle://username:password@localhost/orcl')

These lines of code initialize the Oracle Instant Client and create connections to an Oracle database using the cx_Oracle and sqlalchemy libraries.

The cx_Oracle.init_oracle_client function is used to initialize the Oracle Instant Client, which is a set of libraries that provides access to Oracle databases without the need for a full Oracle client installation. This function takes a parameter lib_dir that specifies the directory path where the Oracle Instant Client libraries are installed.

The cx_Oracle.connect function is used to create a connection to an Oracle database using the credentials user_name, password, and the service name orcl. The localhost parameter specifies the hostname of the database server.

The create_engine function from the sqlalchemy library is used to create an Engine object that represents a connection to the same Oracle database using the same credentials. The oracle+cx_oracle parameter specifies the database dialect to use, and the user_name:password@localhost/orcl parameter specifies the connection string. The resulting conn object can be used to execute SQL statements against the database using pandas or other tools that work with sqlalchemy engines.

def daterange(start_date, end_date):
    for n in range(int ((end_date - start_date).days)):
        yield start_date + timedelta(n)
timestamp = datetime.now().strftime("%H:%M:%S")
now = datetime.now()
start_date = date(2023, 3,3)
end_date = date(2023, 3, 11)

The daterange function is a Python generator function that generates a range of dates between a start date and an end date. It takes two arguments, start_date and end_date, which are both date objects representing the start and end dates of the range.

The function uses a for loop and the yield statement to generate each date in the range. The range function is used to generate a sequence of integers representing the number of days between the start date and end date, and the timedelta function is used to create a time interval of one day that is added to the start date for each iteration of the loop.

The timestamp variable is a string representing the current time in hours, minutes, and seconds using the format "%H:%M:%S". The now variable is a datetime object representing the current date and time.

The start_date and end_date variables are date objects representing the start and end dates of the range of dates to generate. In this case, they represent March 3, 2023 and March 11, 2023, respectively.

for single_date in daterange(start_date, end_date):
    ttime = single_date.strftime("%d%m%Y")
    filename = 'sec_bhavdata_full_'
    filename = filename + ttime+'.csv'
    url = "https://archives.nseindia.com/products/content/"+filename
    url_oc = "https://www.nseindia.com"
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, '
                             'like Gecko) '
                             'Chrome/80.0.3987.149 Safari/537.36',
               'accept-language': 'en,gu;q=0.9,hi;q=0.8', 'accept-encoding': 'gzip, deflate, br'}
    session = requests.Session()
    request = session.get(url_oc, headers=headers, timeout=30)
    cookies = dict(request.cookies)
    response = session.get(url, headers=headers, timeout=30, cookies=cookies)
    decoded_content = response.content.decode('utf-8')
    print(url)
    if "will be right back" in decoded_content:
     print('data not avilable ' + ttime)
    else:
        cr = csv.reader(decoded_content.splitlines(), delimiter=',')
        df = pd.read_csv(url,skipinitialspace = True)
        #df.rename(columns=lambda x: x.strip())
        df = df[(df['SERIES'] == 'EQ')]
        df['DATE1'] = pd.to_datetime(df['DATE1'])
        df["DELIV_QTY"] = df["DELIV_QTY"].astype(float)
        df["DELIV_PER"] = df["DELIV_PER"].astype(float)
        #df.info()
        print(df)
        cur = con.cursor()
        df.to_sql('bhavdata', conn, if_exists='append', index=False)
        con.commit()
        cur.close()
cur = con.cursor()
cur.execute("delete bhavdata  where  rowid not in ( select min(rowid) from bhavdata  group  by symbol,date1)")
con.commit()
cur.close()
con.close()
print('done')

  • Defines a function daterange that generates a sequence of dates between two given dates.
  • Defines start and end dates.
  • Loops through the sequence of dates generated by the daterange function.
  • For each date, generates a URL for downloading a CSV file from the National Stock Exchange of India (NSE) website.
  • Uses the requests library to download the CSV file and read it into a pandas dataframe.
  • Cleans the dataframe by keeping only rows where the ‘SERIES’ column equals ‘EQ’, converting the ‘DELIV_QTY’ and ‘DELIV_PER’ columns to floats, and converting the ‘DATE1’ column to a datetime object.
  • Uses the cx_Oracle library to connect to an Oracle database and write the cleaned data to a table called ‘bhavdata’.
  • Deletes duplicate rows in the ‘bhavdata’ table, keeping only the row with the minimum ‘rowid’ for each combination of ‘symbol’ and ‘date1’.
  • Closes the database connection and prints ‘done’.

Overall, this script downloads daily stock market data from the NSE website and writes it to an Oracle database for further analysis.


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *