Input and output of Python
Basic Python function, i.e. read and write of file.
standard lib: JSON and Pickle
Input and output of numpy array.
Using Pandas module.
Using scipy's io module.
Create a new file and write something into it.
Basically we will discuss three function: open, write, read.
Ref:
Let's start from a simple example.
Now let's see what's inside in our file.
with open('./text/text.txt','r') as f:
print(f.read())
Hello World! 1,0.235,0.645,0.457 1 0.2350000 0.6450000 0.4570000
Argument 'r' means 'read mode', so you cannot change anything in the file.
Now open the file and write something into it.
with open('./text/text.txt','w') as f:
f.write('this is a test file.\n')
'w' means 'write mode' and '\n' means change to a newline. See the change of the file.
with open('./text/text.txt','r') as f:
print(f.read())
this is a test file.
As you may notice, the former "Hello world" is replaced by new text. So what if I just want add something into the file?
You can use the 'append' mode, namely change the argument from 'w' to 'a'.
with open('./text/text.txt','a') as f:
f.write("Hello World!\n")
with open('./text/text.txt','r') as f:
print(f.read())
this is a test file. Hello World!
One may find a detailed explanation of modes here (Doc for Python2. Python3 add a "exclusive" mode, you can refer the official doc for more info)
And you may notice that each time we open a file, we start with a "with". If you do in this way, Python will automatically close the file. Otherwise you need to close the file manually.
Always do not forget close the file, it's memory consuming! You can do in this way:
f = open('./text/text.txt','a+')
f.write("Hello World!\n")
f.close()
f = open('./text/text.txt','r')
data = f.read()
print(data)
f.close()
this is a test file. Hello World! Hello World!
You can also use "Print" function to write text.
with open('./text/text.txt', 'w') as f:
print('Hello World!', file=f)
with open('./text/text.txt','r') as f:
print(f.read())
Hello World!
String's attribute "format" is useful when you need customize your output:
Official ref here
with open('./text/text.txt','a') as f:
f.write('1,0.235,0.645,0.457\n')
f.write('{:d} {:3.7f} {:3.7f} {:3.7f}\n'.format(1,0.235,0.645,0.457))
with open('./text/text.txt','r') as f:
print(f.read())
Hello World! 1,0.235,0.645,0.457 1 0.2350000 0.6450000 0.4570000
You can see the difference between the two output.
A self-explained example is as follow:
And reference can be find: here(Chinese version) here(English version)
a = "{:.2f} {:+.2f} {:.0f} {:.2%} \n".format(3.14,3.14,3.14,3.14)
b = "{:+d} {:+d} {:-d} {:-d} \n".format(+1,-1,+1,-1)
c = "{4} {3} {2} {1} {0} \n".format(1,2,3,4,5)
d = "{0:0>6d} {0:0<6d} {0:3>6d} {0} \n".format(5)
e = "{:5d} {:5d} {:5d} {:5d} {:5d} \n".format(1,2,3,4,5)
f = "{:5d} {:<5d} {:5d} {:^5d} {:5d} \n".format(1,2,3,4,5)
print(a,b,c,d,e,f)
3.14 +3.14 3 314.00%
+1 -1 1 -1
5 4 3 2 1
000005 500000 333335 5
1 2 3 4 5
1 2 3 4 5
Now we look at the "read" function.
There are three "read" function, namely "read", "readline" and "readlines". See their output below:
with open('./text/text.txt','r') as f:
print(f.read())
with open('./text/text.txt','r') as f:
print(f.readlines())
with open('./text/text.txt','r') as f:
print(f.readline())
with open('./text/text.txt','r') as f:
for line in f:
print(line)
Hello World! 1,0.235,0.645,0.457 1 0.2350000 0.6450000 0.4570000 ['Hello World!\n', '1,0.235,0.645,0.457\n', '1 0.2350000 0.6450000 0.4570000\n'] Hello World! Hello World! 1,0.235,0.645,0.457 1 0.2350000 0.6450000 0.4570000
read() will return a string form of the content.
readlines() will return a list of each line.
readline() return only one line of the text.
The first two method will load all the text into memory at once, while the readline() will load the text one by one. If your file is very big(e.g. 4G or larger than your memory), read and readlines is not recommended.
For the big file, you may consider output in this way:
with open('./text/text.txt','r') as f:
line = f.readline()
while line:
print(line)
line = f.readline()
Hello World! 1,0.235,0.645,0.457 1 0.2350000 0.6450000 0.4570000
We just introduce the very basic read and write part of Python. But one may notice that all the discussion above based on string type. If you have a list or dict, save it to a text file is not a good choice, because you need to reconstruct list or dict from the text.
Is there any other way of saving an object?
First we're going to introduce Pickle.
Official document here
Usage of Pickle is simple:
import pickle
data = ['HKUST','PHYS',6810,'Rm 4402','Lift17-18']
with open('data_pickle','wb') as f:
pickle.dump(data,f)
with open('./pickle/data_pickle','rb') as f:
data = pickle.load(f)
for i in data:
print(i)
HKUST PHYS 6810 Rm 4402 Lift17-18
And you can process multiple object like this:
f = open('./pickle/somedata', 'wb')
pickle.dump([1, 2, 3, 4], f)
pickle.dump('hello', f)
pickle.dump({'Apple', 'Pear', 'Banana'}, f)
f.close()
f = open('./pickle/somedata', 'rb')
print(pickle.load(f))
print(pickle.load(f))
print(pickle.load(f))
f.close()
[1, 2, 3, 4]
hello
{'Pear', 'Banana', 'Apple'}
You can see it's order preserved.
Pickle can save a function.
import math
with open('./pickle/function','wb') as f:
pickle.dump(math.cos,f)
with open('./pickle/function','rb') as f:
Cos = pickle.load(f)
print(Cos(0))
1.0
As you may notice, we add a "b" argument when we open the file. This indicate the data pickle save is a binary file, which is not human readable.
If a human readable file is needed, one may consider JSON.
JSON is short for "JavaScript Object Notation", which is a common data format.
JSON only support None,bool,int,float and str datatype, and list,tuple,dict which contain those data.
Let's see some example.
import json
data = {
'name' : 'ACME',
'shares' : 100,
'price' : 542.23
}
with open('./pickle/data.json','w') as f:
json.dump(data, f)
with open('./pickle/data.json', 'r') as f:
data = json.load(f)
for key,value in data.items():
print(key,":",value)
name : ACME shares : 100 price : 542.23
You can use that to save list or tuple.
More sophisticated techniques can be found at the official website
Differences between JSON and Pickle
JSON is a text serialization format (it outputs unicode text, although most of the time it is then encoded to utf-8), while pickle is a binary serialization format.
JSON is human-readable, while pickle is not.
JSON is interoperable and widely used outside of the Python ecosystem, while pickle is Python-specific.
JSON, by default, can only represent a subset of the Python built-in types, and no custom classes; pickle can represent an extremely large number of Python types (many of them automatically, by clever usage of Python’s introspection facilities; complex cases can be tackled by implementing specific object APIs).
NumPy module from SciPy is widely used in scientific computing, and it will also be the most important module we will learn along with the Matplotlib module.
Now let's see what can we make use of NumPy.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
data=np.load('./numpy/data.npy')
psi6 = np.load('./numpy/psi6.npy')
plt.figure(figsize=(10,10))
ax = plt.gca()
ax.set_aspect('equal')
ax.scatter(data[:,0],data[:,1],s=5,c=psi6,cmap=plt.cm.rainbow_r)
# plt.savefig('psi6.png',dpi=200)
plt.show()
The very basic object in NumPy is called numpy.ndarray, which is a N-dimensional matrix. NumPy has already implement common matrix operation in the ndarray(i.e. dot, transpose,etc),making it really convenient for you to use.
One may find there website here. A MatLab user may find this tutorial useful.
Let's see how to save array in NumPy.
NumPy provide a lot of useful function not only aim for its array but also for other files.
(First import numpy as np)
np.save np.load
np.savez np.savez_compressed
np.savetxt np.loadtxt
One can find more info here
Save a single array in to a 'npy' file.
# create a new array
m = np.random.rand(100,2)
np.save('./numpy/m.npy',m)
print(m[0:5,:])
[[0.63747681 0.27082748] [0.73047444 0.7092957 ] [0.99831309 0.4086804 ] [0.28199021 0.31725638] [0.10055243 0.81826568]]
l = np.load('./numpy/m.npy')
print(l[0:5,:])
[[0.63747681 0.27082748] [0.73047444 0.7092957 ] [0.99831309 0.4086804 ] [0.28199021 0.31725638] [0.10055243 0.81826568]]
Note that it's not actually open a file, the load function just return a array object. So no need to close the file.
Save multiple arrays in to a 'npz' file.
np.savez('./numpy/ml.npz',x=m,y=l)
n = np.load('./numpy/ml.npz')
print(n)
print((n['x']-n['y'])[0:5,:])
<numpy.lib.npyio.NpzFile object at 0x0000014DA3B039E8> [[0. 0.] [0. 0.] [0. 0.] [0. 0.] [0. 0.]]
Also no need to close the npz file.
Function np.savez_compressed will save the file in a compressed way. It costs less disk memory and the file can also be loaded with the load function.
The loadtxt function is vert handy when dealing with text files.
data = np.loadtxt('./numpy/loadtxt_example.txt',skiprows=2,usecols=(0,4))
plt.figure(figsize=(7,7))
ax = plt.gca()
ax.plot(data[:,0],data[:,1])
ax.set_xlabel("Timestep")
ax.set_ylabel("Total energy")
plt.show()
One may find doc for savetxt function here.
Now let's see the basic part of the module.
This is the standard way of importing pandas.
import pandas as pd
df = pd.read_csv('./csv/uk_rain_2014.csv', header=0)
df.head(n) give you the first n rows in dataframe.
df.head(5)
| Water Year | Rain (mm) Oct-Sep | Outflow (m3/s) Oct-Sep | Rain (mm) Dec-Feb | Outflow (m3/s) Dec-Feb | Rain (mm) Jun-Aug | Outflow (m3/s) Jun-Aug | |
|---|---|---|---|---|---|---|---|
| 0 | 1980/81 | 1182 | 5408 | 292 | 7248 | 174 | 2212 |
| 1 | 1981/82 | 1098 | 5112 | 257 | 7316 | 242 | 1936 |
| 2 | 1982/83 | 1156 | 5701 | 330 | 8567 | 124 | 1802 |
| 3 | 1983/84 | 993 | 4265 | 391 | 8905 | 141 | 1078 |
| 4 | 1984/85 | 1182 | 5364 | 217 | 5813 | 343 | 4313 |
df.tail(n) give you the last n rows in dataframe.
df.tail(5)
| Water Year | Rain (mm) Oct-Sep | Outflow (m3/s) Oct-Sep | Rain (mm) Dec-Feb | Outflow (m3/s) Dec-Feb | Rain (mm) Jun-Aug | Outflow (m3/s) Jun-Aug | |
|---|---|---|---|---|---|---|---|
| 28 | 2008/09 | 1139 | 4941 | 268 | 6690 | 323 | 3189 |
| 29 | 2009/10 | 1103 | 4738 | 255 | 6435 | 244 | 1958 |
| 30 | 2010/11 | 1053 | 4521 | 265 | 6593 | 267 | 2885 |
| 31 | 2011/12 | 1285 | 5500 | 339 | 7630 | 379 | 5261 |
| 32 | 2012/13 | 1090 | 5329 | 350 | 9615 | 187 | 1797 |
You can change the column labels by using df.columns
df.columns = ['Year','rain_octsep', 'outflow_octsep',
'rain_decfeb', 'outflow_decfeb', 'rain_junaug', 'outflow_junaug']
df.head(5)
| Year | rain_octsep | outflow_octsep | rain_decfeb | outflow_decfeb | rain_junaug | outflow_junaug | |
|---|---|---|---|---|---|---|---|
| 0 | 1980/81 | 1182 | 5408 | 292 | 7248 | 174 | 2212 |
| 1 | 1981/82 | 1098 | 5112 | 257 | 7316 | 242 | 1936 |
| 2 | 1982/83 | 1156 | 5701 | 330 | 8567 | 124 | 1802 |
| 3 | 1983/84 | 993 | 4265 | 391 | 8905 | 141 | 1078 |
| 4 | 1984/85 | 1182 | 5364 | 217 | 5813 | 343 | 4313 |
You can change the format of data and get a statistic describe:
pd.options.display.float_format = '{:,.3f}'.format
df.describe()
| rain_octsep | outflow_octsep | rain_decfeb | outflow_decfeb | rain_junaug | outflow_junaug | |
|---|---|---|---|---|---|---|
| count | 33.000 | 33.000 | 33.000 | 33.000 | 33.000 | 33.000 |
| mean | 1,129.000 | 5,019.182 | 325.364 | 7,926.545 | 237.485 | 2,439.758 |
| std | 101.900 | 658.588 | 69.995 | 1,692.800 | 66.168 | 1,025.914 |
| min | 856.000 | 3,479.000 | 206.000 | 4,578.000 | 103.000 | 1,078.000 |
| 25% | 1,053.000 | 4,506.000 | 268.000 | 6,690.000 | 193.000 | 1,797.000 |
| 50% | 1,139.000 | 5,112.000 | 309.000 | 7,630.000 | 229.000 | 2,142.000 |
| 75% | 1,182.000 | 5,497.000 | 360.000 | 8,905.000 | 280.000 | 2,959.000 |
| max | 1,387.000 | 6,391.000 | 484.000 | 11,486.000 | 379.000 | 5,261.000 |
And you can access the columns in two ways:
df['rain_octsep'][0:5]
0 1182 1 1098 2 1156 3 993 4 1182 Name: rain_octsep, dtype: int64
df.rain_octsep[28:]
28 1139 29 1103 30 1053 31 1285 32 1090 Name: rain_octsep, dtype: int64
Also it's easy to implement bool operation.
df[df.rain_octsep < 1000]
| Year | rain_octsep | outflow_octsep | rain_decfeb | outflow_decfeb | rain_junaug | outflow_junaug | |
|---|---|---|---|---|---|---|---|
| 3 | 1983/84 | 993 | 4265 | 391 | 8905 | 141 | 1078 |
| 8 | 1988/89 | 976 | 4330 | 309 | 6465 | 200 | 1440 |
| 15 | 1995/96 | 856 | 3479 | 245 | 5515 | 172 | 1439 |
df[(df.rain_octsep < 1000) & (df.outflow_octsep < 4000)]
| Year | rain_octsep | outflow_octsep | rain_decfeb | outflow_decfeb | rain_junaug | outflow_junaug | |
|---|---|---|---|---|---|---|---|
| 15 | 1995/96 | 856 | 3479 | 245 | 5515 | 172 | 1439 |
If we need to search string in dataframe:
df[df.Year.str.startswith('199')]
| Year | rain_octsep | outflow_octsep | rain_decfeb | outflow_decfeb | rain_junaug | outflow_junaug | |
|---|---|---|---|---|---|---|---|
| 10 | 1990/91 | 1022 | 4418 | 305 | 7120 | 216 | 1923 |
| 11 | 1991/92 | 1151 | 4506 | 246 | 5493 | 280 | 2118 |
| 12 | 1992/93 | 1130 | 5246 | 308 | 8751 | 219 | 2551 |
| 13 | 1993/94 | 1162 | 5583 | 422 | 10109 | 193 | 1638 |
| 14 | 1994/95 | 1110 | 5370 | 484 | 11486 | 103 | 1231 |
| 15 | 1995/96 | 856 | 3479 | 245 | 5515 | 172 | 1439 |
| 16 | 1996/97 | 1047 | 4019 | 258 | 5770 | 256 | 2102 |
| 17 | 1997/98 | 1169 | 4953 | 341 | 7747 | 285 | 3206 |
| 18 | 1998/99 | 1268 | 5824 | 360 | 8771 | 225 | 2240 |
| 19 | 1999/00 | 1204 | 5665 | 417 | 10021 | 197 | 2166 |
Use df.loc to locate specific row and column.
df.loc[11,['Year','rain_octsep']]
Year 1991/92 rain_octsep 1151 Name: 11, dtype: object
Now I want to compare the rainfall of HK and UK.
hkdf = pd.read_csv('./csv/hk_weather_data.csv',header=0)
hkdf.head(5)
| Year | Avg Pressure(100P) | Max Temp | Avg Temp(H) | Avg Temp | Avg Temp(L) | Min Temp | Rainfall(mm) | sunshine(hr) | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1961 | 1,012.600 | 34.200 | 25.600 | 22.900 | 20.800 | 7.300 | 2,232.400 | 1,981.600 |
| 1 | 1962 | 1,013.200 | 35.500 | 25.800 | 22.700 | 20.400 | 6.000 | 1,741.000 | 2,395.400 |
| 2 | 1963 | 1,013.400 | 35.600 | 26.500 | 23.300 | 20.900 | 7.100 | 901.100 | 2,469.700 |
| 3 | 1964 | 1,012.700 | 33.900 | 25.700 | 22.900 | 20.500 | 7.000 | 2,432.100 | 2,029.600 |
| 4 | 1965 | 1,012.800 | 33.400 | 25.900 | 23.100 | 20.900 | 7.300 | 2,352.600 | 1,990.700 |
Combine the two dataset together and get the plot of rainfall.
First pick up the columns we need.
hk_rainfall = hkdf.loc[:,['Year','Rainfall(mm)']]
hk_rainfall = hk_rainfall[(hk_rainfall.Year<=2012)&(hk_rainfall.Year>=1980)]
hk_rainfall.columns = ['Year','rainfall']
uk_rainfall = df.loc[:,['Year','rain_octsep']]
The 'Year' column of uk_rainfall is string type, so we need to change it to integer.
def str2int(year):
year = int(year[:4])
return year
print(type(uk_rainfall.Year[0]))
uk_rainfall.Year = uk_rainfall.Year.apply(str2int)
<class 'str'>
type(uk_rainfall.Year[0])
numpy.int64
Now we merge two dataframe together.
hk_uk_data = uk_rainfall.merge(hk_rainfall,on='Year')
hk_uk_data.head(5)
| Year | rain_octsep | rainfall | |
|---|---|---|---|
| 0 | 1980 | 1182 | 1,710.600 |
| 1 | 1981 | 1098 | 1,659.500 |
| 2 | 1982 | 1156 | 3,247.500 |
| 3 | 1983 | 993 | 2,893.800 |
| 4 | 1984 | 1182 | 2,017.000 |
And we can get a plot easily.
hk_uk_data.plot(x='Year',y=['rain_octsep','rainfall'],figsize=(7,7))
plt.show()
Finally we save new dataframe to a csv file.
hk_uk_data.to_csv('./csv/hk_uk_rain.csv')
MatLab is a important tool for physics students.
The io module from scipy provide a lot of useful APIs for data reading, which include a MatLab file API. And vice versa, you can use Python API in MatLab, too.
Let's see.
One may find reference here
Mainly three function will be used:
loadmat: load MatLab format file.
savemat: save file in MatLab format.
whosmat: see what's inside a Mat file.
import scipy.io as sio
# import numpy as np
# import matplotlib.pyplot as plt
# %matplotlib inline
Make sure you import all the module.
Let's see whosmat first:
sio.whosmat('./mat/voro.mat')
[('vx', (2, 579), 'double'),
('vy', (2, 579), 'double'),
('x', (200, 1), 'double'),
('y', (200, 1), 'double')]
Now load the mat file.
mat = sio.loadmat('./mat/voro.mat')
x = mat['x']
y = mat['y']
vx = mat['vx']
vy = mat['vy']
Get a figure using matplotlib.
plt.figure(figsize=(10,10))
plt.gca()
plt.plot(x,y,'bo',vx,vy)
plt.xlim(0,1)
plt.ylim(0,1)
plt.show()
Finaly we sava a random array into matlab format
sio.savemat('./mat/np_rand.mat',{'position':np.random.rand(100,2)})
sio.whosmat('./mat/np_rand.mat')
[('position', (100, 2), 'double')]