Python IO

Input and output of Python

Pyhton & txt (general text file)

Basic Python function, i.e. read and write of file.

Python & Saving an object

standard lib: JSON and Pickle

Python & Numpy

Input and output of numpy array.

Python & CSV

Using Pandas module.

Python & MatLab

Using scipy's io module.

It's more like a summary instead of a lecture. You don't need to understand everything on it, only learn it when necessary.

Python & txt (general text file)

Create a new file and write something into it.

Basically we will discuss three function: open, write, read.

Ref:

The official document for open

The official document for file object

Let's start from a simple example.

Now let's see what's inside in our file.

In [1]:
with open('./text/text.txt','r') as f:
    print(f.read())
Hello World!
1,0.235,0.645,0.457
1  0.2350000  0.6450000  0.4570000

Argument 'r' means 'read mode', so you cannot change anything in the file.

Now open the file and write something into it.

In [2]:
with open('./text/text.txt','w') as f:
    f.write('this is a test file.\n')

'w' means 'write mode' and '\n' means change to a newline. See the change of the file.

In [3]:
with open('./text/text.txt','r') as f:
    print(f.read())
this is a test file.

As you may notice, the former "Hello world" is replaced by new text. So what if I just want add something into the file?

You can use the 'append' mode, namely change the argument from 'w' to 'a'.

In [4]:
with open('./text/text.txt','a') as f:
    f.write("Hello World!\n")
In [5]:
with open('./text/text.txt','r') as f:
    print(f.read())
this is a test file.
Hello World!

One may find a detailed explanation of modes here (Doc for Python2. Python3 add a "exclusive" mode, you can refer the official doc for more info)

And you may notice that each time we open a file, we start with a "with". If you do in this way, Python will automatically close the file. Otherwise you need to close the file manually.

Always do not forget close the file, it's memory consuming! You can do in this way:

In [6]:
f = open('./text/text.txt','a+')
f.write("Hello World!\n")
f.close()

f = open('./text/text.txt','r')
data = f.read()
print(data)
f.close()
this is a test file.
Hello World!
Hello World!

You can also use "Print" function to write text.

In [7]:
with open('./text/text.txt', 'w') as f:
    print('Hello World!', file=f)
In [8]:
with open('./text/text.txt','r') as f:
    print(f.read())
Hello World!

String's attribute "format" is useful when you need customize your output:

Official ref here

In [9]:
with open('./text/text.txt','a') as f:
    f.write('1,0.235,0.645,0.457\n')
    f.write('{:d}  {:3.7f}  {:3.7f}  {:3.7f}\n'.format(1,0.235,0.645,0.457))
with open('./text/text.txt','r') as f:
    print(f.read())
Hello World!
1,0.235,0.645,0.457
1  0.2350000  0.6450000  0.4570000

You can see the difference between the two output.

A self-explained example is as follow:

And reference can be find: here(Chinese version) here(English version)

In [10]:
a = "{:.2f} {:+.2f}  {:.0f}   {:.2%} \n".format(3.14,3.14,3.14,3.14)
b = "{:+d} {:+d} {:-d} {:-d} \n".format(+1,-1,+1,-1)
c = "{4} {3} {2} {1} {0} \n".format(1,2,3,4,5)
d = "{0:0>6d} {0:0<6d} {0:3>6d} {0} \n".format(5)
e = "{:5d} {:5d} {:5d} {:5d} {:5d} \n".format(1,2,3,4,5)
f = "{:5d} {:<5d} {:5d} {:^5d} {:5d} \n".format(1,2,3,4,5)
print(a,b,c,d,e,f)
3.14 +3.14  3   314.00% 
 +1 -1 1 -1 
 5 4 3 2 1 
 000005 500000 333335 5 
     1     2     3     4     5 
     1 2         3   4       5 

Now we look at the "read" function.

There are three "read" function, namely "read", "readline" and "readlines". See their output below:

In [11]:
with open('./text/text.txt','r') as f:
    print(f.read())
    
with open('./text/text.txt','r') as f:
    print(f.readlines())
    
with open('./text/text.txt','r') as f:
    print(f.readline())

with open('./text/text.txt','r') as f:
    for line in f:
        print(line)
Hello World!
1,0.235,0.645,0.457
1  0.2350000  0.6450000  0.4570000

['Hello World!\n', '1,0.235,0.645,0.457\n', '1  0.2350000  0.6450000  0.4570000\n']
Hello World!

Hello World!

1,0.235,0.645,0.457

1  0.2350000  0.6450000  0.4570000

read() will return a string form of the content.

readlines() will return a list of each line.

readline() return only one line of the text.

The first two method will load all the text into memory at once, while the readline() will load the text one by one. If your file is very big(e.g. 4G or larger than your memory), read and readlines is not recommended.

For the big file, you may consider output in this way:

In [12]:
with open('./text/text.txt','r') as f:
    line = f.readline()
    while line:
        print(line)
        line = f.readline()
Hello World!

1,0.235,0.645,0.457

1  0.2350000  0.6450000  0.4570000

We just introduce the very basic read and write part of Python. But one may notice that all the discussion above based on string type. If you have a list or dict, save it to a text file is not a good choice, because you need to reconstruct list or dict from the text.

Is there any other way of saving an object?

Python & Saving an object

First we're going to introduce Pickle.

Official document here

Usage of Pickle is simple:

In [13]:
import pickle

data = ['HKUST','PHYS',6810,'Rm 4402','Lift17-18']
with open('data_pickle','wb') as f:
    pickle.dump(data,f)
In [14]:
with open('./pickle/data_pickle','rb') as f:
    data = pickle.load(f)
    for i in data:
        print(i)
HKUST
PHYS
6810
Rm 4402
Lift17-18

And you can process multiple object like this:

In [15]:
f = open('./pickle/somedata', 'wb')
pickle.dump([1, 2, 3, 4], f)
pickle.dump('hello', f)
pickle.dump({'Apple', 'Pear', 'Banana'}, f)
f.close()

f = open('./pickle/somedata', 'rb')
print(pickle.load(f))
print(pickle.load(f))
print(pickle.load(f))
f.close()
[1, 2, 3, 4]
hello
{'Pear', 'Banana', 'Apple'}

You can see it's order preserved.

Pickle can save a function.

In [16]:
import math
with open('./pickle/function','wb') as f:
    pickle.dump(math.cos,f)
In [17]:
with open('./pickle/function','rb') as f:
    Cos = pickle.load(f)
    print(Cos(0))
1.0

As you may notice, we add a "b" argument when we open the file. This indicate the data pickle save is a binary file, which is not human readable.

If a human readable file is needed, one may consider JSON.

JSON is short for "JavaScript Object Notation", which is a common data format.

JSON only support None,bool,int,float and str datatype, and list,tuple,dict which contain those data.

Let's see some example.

In [18]:
import json

data = {
    'name' : 'ACME',
    'shares' : 100,
    'price' : 542.23
}

with open('./pickle/data.json','w') as f:
    json.dump(data, f)
In [19]:
with open('./pickle/data.json', 'r') as f:
    data = json.load(f)
    for key,value in data.items():
        print(key,":",value)
name : ACME
shares : 100
price : 542.23

You can use that to save list or tuple.

More sophisticated techniques can be found at the official website

Differences between JSON and Pickle

JSON is a text serialization format (it outputs unicode text, although most of the time it is then encoded to utf-8), while pickle is a binary serialization format.

JSON is human-readable, while pickle is not.

JSON is interoperable and widely used outside of the Python ecosystem, while pickle is Python-specific.

JSON, by default, can only represent a subset of the Python built-in types, and no custom classes; pickle can represent an extremely large number of Python types (many of them automatically, by clever usage of Python’s introspection facilities; complex cases can be tackled by implementing specific object APIs).

Python & Numpy

NumPy module from SciPy is widely used in scientific computing, and it will also be the most important module we will learn along with the Matplotlib module.

Now let's see what can we make use of NumPy.

In [20]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline 
In [21]:
data=np.load('./numpy/data.npy')
psi6 = np.load('./numpy/psi6.npy')
plt.figure(figsize=(10,10))
ax = plt.gca()
ax.set_aspect('equal')
ax.scatter(data[:,0],data[:,1],s=5,c=psi6,cmap=plt.cm.rainbow_r)
# plt.savefig('psi6.png',dpi=200)
plt.show()

The very basic object in NumPy is called numpy.ndarray, which is a N-dimensional matrix. NumPy has already implement common matrix operation in the ndarray(i.e. dot, transpose,etc),making it really convenient for you to use.

One may find there website here. A MatLab user may find this tutorial useful.

Let's see how to save array in NumPy.

NumPy provide a lot of useful function not only aim for its array but also for other files.

(First import numpy as np)

np.save np.load

np.savez np.savez_compressed

np.savetxt np.loadtxt

One can find more info here

Save a single array in to a 'npy' file.

In [22]:
# create a new array
m = np.random.rand(100,2)
np.save('./numpy/m.npy',m)
print(m[0:5,:])
[[0.63747681 0.27082748]
 [0.73047444 0.7092957 ]
 [0.99831309 0.4086804 ]
 [0.28199021 0.31725638]
 [0.10055243 0.81826568]]
In [23]:
l = np.load('./numpy/m.npy')
print(l[0:5,:])
[[0.63747681 0.27082748]
 [0.73047444 0.7092957 ]
 [0.99831309 0.4086804 ]
 [0.28199021 0.31725638]
 [0.10055243 0.81826568]]

Note that it's not actually open a file, the load function just return a array object. So no need to close the file.

Save multiple arrays in to a 'npz' file.

In [24]:
np.savez('./numpy/ml.npz',x=m,y=l)
In [25]:
n = np.load('./numpy/ml.npz')
print(n)
print((n['x']-n['y'])[0:5,:])
<numpy.lib.npyio.NpzFile object at 0x0000014DA3B039E8>
[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]

Also no need to close the npz file.

Function np.savez_compressed will save the file in a compressed way. It costs less disk memory and the file can also be loaded with the load function.

The loadtxt function is vert handy when dealing with text files.

In [26]:
data = np.loadtxt('./numpy/loadtxt_example.txt',skiprows=2,usecols=(0,4))
plt.figure(figsize=(7,7))
ax = plt.gca()
ax.plot(data[:,0],data[:,1])
ax.set_xlabel("Timestep")
ax.set_ylabel("Total energy")
plt.show()

One may find doc for savetxt function here.

Python & CSV

Although Python contains a csv module in its standard library, it is relative low level and inconvenient to use. Here we're going to introduce the Pandas from Scipy.

official website here

Pandas stands for "Python Data Analysis Library"

Reference here

Now let's see the basic part of the module.

This is the standard way of importing pandas.

In [27]:
import pandas as pd
In [28]:
df = pd.read_csv('./csv/uk_rain_2014.csv', header=0)

df.head(n) give you the first n rows in dataframe.

In [29]:
df.head(5)
Out[29]:
Water Year Rain (mm) Oct-Sep Outflow (m3/s) Oct-Sep Rain (mm) Dec-Feb Outflow (m3/s) Dec-Feb Rain (mm) Jun-Aug Outflow (m3/s) Jun-Aug
0 1980/81 1182 5408 292 7248 174 2212
1 1981/82 1098 5112 257 7316 242 1936
2 1982/83 1156 5701 330 8567 124 1802
3 1983/84 993 4265 391 8905 141 1078
4 1984/85 1182 5364 217 5813 343 4313

df.tail(n) give you the last n rows in dataframe.

In [30]:
df.tail(5)
Out[30]:
Water Year Rain (mm) Oct-Sep Outflow (m3/s) Oct-Sep Rain (mm) Dec-Feb Outflow (m3/s) Dec-Feb Rain (mm) Jun-Aug Outflow (m3/s) Jun-Aug
28 2008/09 1139 4941 268 6690 323 3189
29 2009/10 1103 4738 255 6435 244 1958
30 2010/11 1053 4521 265 6593 267 2885
31 2011/12 1285 5500 339 7630 379 5261
32 2012/13 1090 5329 350 9615 187 1797

You can change the column labels by using df.columns

In [31]:
df.columns = ['Year','rain_octsep', 'outflow_octsep',
              'rain_decfeb', 'outflow_decfeb', 'rain_junaug', 'outflow_junaug']

df.head(5)
Out[31]:
Year rain_octsep outflow_octsep rain_decfeb outflow_decfeb rain_junaug outflow_junaug
0 1980/81 1182 5408 292 7248 174 2212
1 1981/82 1098 5112 257 7316 242 1936
2 1982/83 1156 5701 330 8567 124 1802
3 1983/84 993 4265 391 8905 141 1078
4 1984/85 1182 5364 217 5813 343 4313

You can change the format of data and get a statistic describe:

In [32]:
pd.options.display.float_format = '{:,.3f}'.format
df.describe()
Out[32]:
rain_octsep outflow_octsep rain_decfeb outflow_decfeb rain_junaug outflow_junaug
count 33.000 33.000 33.000 33.000 33.000 33.000
mean 1,129.000 5,019.182 325.364 7,926.545 237.485 2,439.758
std 101.900 658.588 69.995 1,692.800 66.168 1,025.914
min 856.000 3,479.000 206.000 4,578.000 103.000 1,078.000
25% 1,053.000 4,506.000 268.000 6,690.000 193.000 1,797.000
50% 1,139.000 5,112.000 309.000 7,630.000 229.000 2,142.000
75% 1,182.000 5,497.000 360.000 8,905.000 280.000 2,959.000
max 1,387.000 6,391.000 484.000 11,486.000 379.000 5,261.000

And you can access the columns in two ways:

In [33]:
df['rain_octsep'][0:5]
Out[33]:
0    1182
1    1098
2    1156
3     993
4    1182
Name: rain_octsep, dtype: int64
In [34]:
df.rain_octsep[28:]
Out[34]:
28    1139
29    1103
30    1053
31    1285
32    1090
Name: rain_octsep, dtype: int64

Also it's easy to implement bool operation.

In [35]:
df[df.rain_octsep < 1000]
Out[35]:
Year rain_octsep outflow_octsep rain_decfeb outflow_decfeb rain_junaug outflow_junaug
3 1983/84 993 4265 391 8905 141 1078
8 1988/89 976 4330 309 6465 200 1440
15 1995/96 856 3479 245 5515 172 1439
In [36]:
df[(df.rain_octsep < 1000) & (df.outflow_octsep < 4000)]
Out[36]:
Year rain_octsep outflow_octsep rain_decfeb outflow_decfeb rain_junaug outflow_junaug
15 1995/96 856 3479 245 5515 172 1439

If we need to search string in dataframe:

In [37]:
df[df.Year.str.startswith('199')]
Out[37]:
Year rain_octsep outflow_octsep rain_decfeb outflow_decfeb rain_junaug outflow_junaug
10 1990/91 1022 4418 305 7120 216 1923
11 1991/92 1151 4506 246 5493 280 2118
12 1992/93 1130 5246 308 8751 219 2551
13 1993/94 1162 5583 422 10109 193 1638
14 1994/95 1110 5370 484 11486 103 1231
15 1995/96 856 3479 245 5515 172 1439
16 1996/97 1047 4019 258 5770 256 2102
17 1997/98 1169 4953 341 7747 285 3206
18 1998/99 1268 5824 360 8771 225 2240
19 1999/00 1204 5665 417 10021 197 2166

Use df.loc to locate specific row and column.

In [38]:
df.loc[11,['Year','rain_octsep']]
Out[38]:
Year           1991/92
rain_octsep       1151
Name: 11, dtype: object

Now I want to compare the rainfall of HK and UK.

In [39]:
hkdf = pd.read_csv('./csv/hk_weather_data.csv',header=0)
In [40]:
hkdf.head(5)
Out[40]:
Year Avg Pressure(100P) Max Temp Avg Temp(H) Avg Temp Avg Temp(L) Min Temp Rainfall(mm) sunshine(hr)
0 1961 1,012.600 34.200 25.600 22.900 20.800 7.300 2,232.400 1,981.600
1 1962 1,013.200 35.500 25.800 22.700 20.400 6.000 1,741.000 2,395.400
2 1963 1,013.400 35.600 26.500 23.300 20.900 7.100 901.100 2,469.700
3 1964 1,012.700 33.900 25.700 22.900 20.500 7.000 2,432.100 2,029.600
4 1965 1,012.800 33.400 25.900 23.100 20.900 7.300 2,352.600 1,990.700

Combine the two dataset together and get the plot of rainfall.

First pick up the columns we need.

In [41]:
hk_rainfall = hkdf.loc[:,['Year','Rainfall(mm)']]
hk_rainfall = hk_rainfall[(hk_rainfall.Year<=2012)&(hk_rainfall.Year>=1980)]
hk_rainfall.columns = ['Year','rainfall']
uk_rainfall = df.loc[:,['Year','rain_octsep']]

The 'Year' column of uk_rainfall is string type, so we need to change it to integer.

In [42]:
def str2int(year):
    year = int(year[:4])
    return year
In [43]:
print(type(uk_rainfall.Year[0]))
uk_rainfall.Year = uk_rainfall.Year.apply(str2int)
<class 'str'>
In [44]:
type(uk_rainfall.Year[0])
Out[44]:
numpy.int64

Now we merge two dataframe together.

In [45]:
hk_uk_data = uk_rainfall.merge(hk_rainfall,on='Year')
In [46]:
hk_uk_data.head(5)
Out[46]:
Year rain_octsep rainfall
0 1980 1182 1,710.600
1 1981 1098 1,659.500
2 1982 1156 3,247.500
3 1983 993 2,893.800
4 1984 1182 2,017.000

And we can get a plot easily.

In [47]:
hk_uk_data.plot(x='Year',y=['rain_octsep','rainfall'],figsize=(7,7))
plt.show()

Finally we save new dataframe to a csv file.

In [48]:
hk_uk_data.to_csv('./csv/hk_uk_rain.csv')

Python & MatLab

MatLab is a important tool for physics students.

The io module from scipy provide a lot of useful APIs for data reading, which include a MatLab file API. And vice versa, you can use Python API in MatLab, too.

Let's see.

One may find reference here

Mainly three function will be used:

loadmat: load MatLab format file.

savemat: save file in MatLab format.

whosmat: see what's inside a Mat file.

In [49]:
import scipy.io as sio
# import numpy as np
# import matplotlib.pyplot as plt
# %matplotlib inline

Make sure you import all the module.

Let's see whosmat first:

In [50]:
sio.whosmat('./mat/voro.mat')
Out[50]:
[('vx', (2, 579), 'double'),
 ('vy', (2, 579), 'double'),
 ('x', (200, 1), 'double'),
 ('y', (200, 1), 'double')]

Now load the mat file.

In [51]:
mat = sio.loadmat('./mat/voro.mat')
x = mat['x']
y = mat['y']
vx = mat['vx']
vy = mat['vy']

Get a figure using matplotlib.

In [52]:
plt.figure(figsize=(10,10))
plt.gca()
plt.plot(x,y,'bo',vx,vy)
plt.xlim(0,1)
plt.ylim(0,1)
plt.show()

Finaly we sava a random array into matlab format

In [53]:
sio.savemat('./mat/np_rand.mat',{'position':np.random.rand(100,2)})
In [54]:
sio.whosmat('./mat/np_rand.mat')
Out[54]:
[('position', (100, 2), 'double')]