1、背景
调用百度翻译的接口,当用户在运行界面输入相应的中文时,程序能够返回翻译后的英文。
此文还有一个关键点未搞定,如果哪位知道怎么解决,请回复我解决方法,感谢。
2、原始代码:
import requests
# 定义URL,data,headers
inputStr = input("请输入要翻译的中文:")
urlStr = "https://fanyi.baidu.com/v2transapi?from=zh&to=en"
headersStr = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"}
dataStr = {
"from": "zh",
"to": "en",
"query": inputStr
}
result = requests.post(url=urlStr, headers=headersStr, data=dataStr)
print(result.content)
运行结果:
请输入要翻译的中文:你好
b'{"errno":997,"errmsg":"\未\知\错\误","query":"\你\好","from":"zh","to":"en","error":997}'
此处返回的是997的error,原因是百度的反爬导致,可以采用手机版本的百度翻译
3、修改后的代码:
将请求头修改为模拟手机客户端进行发送,并且在请求的data中增加sign和token字段,如下:
import requests
# 定义URL,data,headers
inputStr = input("请输入要翻译的中文:")
urlStr = "https://fanyi.baidu.com/v2transapi?from=zh&to=en"
headersStr = {"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1"}
dataStr = {
"from": "zh",
"to": "en",
"query": inputStr,
"transtype": "translang",
"simple_means_flag": 3,
"sign": 667486.969839,
"token": "0e1eadb721c5ff09625ec67405c1916b",
"domain": "common"
}
result = requests.post(url=urlStr, headers=headersStr, data=dataStr)
print(result.content.decode())
再次运行程序:
请输入要翻译的中文:你好
{"errno":997,"errmsg":"未知错误","query":"你好","from":"zh","to":"en","error":997}
还是一样,看来百度的反爬技术已经做得非常完善了
4、再次修改代码:
在headers中增加cookie字段,完全模拟某一次用户的搜索行为
import requests
# 定义URL,data,headers
inputStr = input("请输入要翻译的中文:")
urlStr = "https://fanyi.baidu.com/v2transapi?from=zh&to=en"
headersStr = {"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1",
"Cookie": "BIDUPSID=2AE4749BC9AD0C1FEF43AA79FDA9BF7B; PSTM=1614951854; BAIDUID=2AE4749BC9AD0C1F513A557DE9E56DD8:FG=1; __yjs_duid=1_6f4f709aa8de99c222457e707689d56d1619187669735; MCITY=-%3A; BAIDUID_BFESS=2AE4749BC9AD0C1F513A557DE9E56DD8:FG=1; H_PS_PSSID=35836_35105_31253_34584_35491_35872_35246_35906_35804_35317_26350; delPer=0; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1645360319; REALTIME_TRANS_SWITCH=1; SOUND_SPD_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_PREFER_SWITCH=1; APPGUIDE_10_0_2=1; ZD_ENTRY=baidu; PSINO=3; BA_HECTOR=812g018g8h8k202kc61h14fps0r; Hm_lvt_afd111fa62852d1f37001d1f980b6800=1645363077; Hm_lpvt_afd111fa62852d1f37001d1f980b6800=1645363077; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1645363077; ab_sr=1.0.1_Y2M5YjFjN2NjYzMzMzNlMWNmMDFmOGY3ZTRlOTRlYjFhOTY3M2MzZGRjZTUxZWQ4MGVjNzE1ZTE0ZGE1MTE0NGQwODM4ODVjM2Q2NWZiYmRjYzZhMTdlYzJiN2VmMWRkMmU3NzhlYTUzODRhZjVjNGQ3NGQwZWFmZjQxMWU0MWQ3Yzc1NGM0ZDMyMTRjZmFlZjA1YWZmYTJjYjEyMjlkOQ=="}
dataStr = {
"from": "zh",
"to": "en",
"query": inputStr,
"transtype": "translang",
"simple_means_flag": 3,
"sign": 667486.969839,
"token": "0e1eadb721c5ff09625ec67405c1916b",
"domain": "common"
}
result = requests.post(url=urlStr, headers=headersStr, data=dataStr)
print(result.content.decode())
再次运行,输入你们在哪里,发现已经可以获取到翻译后的英文了 where are you?
但是如果输入中文:你好,结果还是一样,如下所示:
请输入要翻译的中文:你好
{"errno":998,"errmsg":"未知错误","query":"你好","from":"zh","to":"en","error":998}
这是因为刚才增加的字段完全是按照 翻译 "你们在哪里" 这一次用户的搜索行为进行量身定做的,百度翻译根据搜寻的字符串生成了cookie、sign、token。一旦搜寻的字符串变化了,再用老的进行搜寻,肯定是搜寻失败的。这个也就是百度翻译的反爬机制。
好了,到这里我暂且是无能为力了,哪位给个完全的方案?或者等我后续研究后再来解答本文中的难题。
如若转载,请注明出处:https://www.sumedu.com/faq/240551.html