Python程序设计基础（第3版）——习题及答案【ch10】正则表达式与简单爬虫.docx-淘文阁

资源描述

《Python程序设计基础（第3版）——习题及答案【ch10】正则表达式与简单爬虫.docx》由会员分享，可在线阅读，更多相关《Python程序设计基础（第3版）——习题及答案【ch10】正则表达式与简单爬虫.docx（3页珍藏版）》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。

1、第十章正则表达式与简单爬虫1.根据下列字符串的构成规律写出正则表达式，并尝试利用re库中的有关方法实现对测试字符串的匹配、搜索、切分、分组和替换操作。（1）E-mai I 地址。答案：import reemail_pattern =a-zA-ZO-9. _%+-+a-zA-ZO-9. -+. a-zA-Z 2, $/ztest_string = zexampleexample. com”if re. match（email_pattern, test_string）:print （匹配成功：，test string）else:print （匹配失败：，test_string）IPv4地址。答

2、案：import reipv4_pattern =（？： 0-9 1, 3 .）3 0-9 1,3$test_string = 192.168.1. 1if re. match（ipv4_pattern, test_string）:print （匹配成功：test string）else:print （匹配失败：，test string）国内手机号码。答案：import rephone_pattern =3 - 9 d9 $if re. match（phone_pattern, test_string）:print （匹配成功：“，test_string）else:print （匹配失败：”,

3、test string）（4）国内电话号码（0开头的34位区号-58位号码）。答案：import retelephone_pattern = rOd 2, 3 -d 5, 8 $test_string = 010-12345678if re. match(telephone_pattern, test_string):print (匹配成功：，test_string)else:print (匹配失败：“，test_string)(5) 18位身份证号码(不考虑大小月、闰月和校验规则)o 牧案.口 -import reid card pattern = r、d17 dXx $test_strin

4、g =，z，if re. match(id card pattern, test string):print (匹配成功：，test_string)else:print (匹配失败：，test string)2.创建简单爬虫程序，实现对静态网页(例如，本校的院系主页昵图网知乎日报、淘宝网等)中新闻标题、JPG、PNG、GIF图片或MP3等素材的自动下载。简单的示例来实现对网页中的图片下载。我们将使用Python的requests库进行网页请求， BeautifulSoup库进行页面解析，并使用urllib库进行文件下载。首先，确保已安装所需的库。您可以使用以下命令来安装它们：pip i

5、nstall requests beautifulsoup4接下来，我们将实现一个简单的图片下载器。 import osimport requestsfrom bs4 import BeautifulSoupfrom urllib. parse import urljoindef download_images(url, save_folder): response = requests, get (url) soup = BeautifulSoup(response, text, J html. parser，)if not os. path, exists(save_folder): os

6、. makedirs(save_folder)for img tag in soup, find all (? img，): img_url = img_tag. get ( src，) if imgurl:img url = urljoin(url, img url)filename = os. path. join(save_folder, os. path. basename(img_url) down1oad_file(img_ur1, filename)def download_file(url, save_path):with requests, get (url, stream=True) as response:response. raise_for_status()with open(save_path, wb) as f:for chunk in response. iter_content(chunk_size=8192):f. write(chunk)if name = main ：url=https:example, com” #替换为您要爬取的网页URL save_folder=images” #图片保存目录，可以根据需要修改down1oad_images (url, save_folder)

展开阅读全文

Python程序设计基础（第3版）——习题及答案 【ch10】正则表达式与简单爬虫.docx

Python程序设计基础（第3版）——习题及答案【ch10】正则表达式与简单爬虫.docx