Английская Википедия:Beautiful Soup (HTML parser)
Шаблон:Short description Шаблон:Other usesШаблон:Primary sources Шаблон:Infobox software
Beautiful Soup is a Python package for parsing HTML and XML documents, including those with malformed markup. It creates a parse tree for documents that can be used to extract data from HTML,[1] which is useful for web scraping.[2][3]
Beautiful Soup was started by Leonard Richardson, who continues to contribute to the project,[4] and is additionally supported by Tidelift, a paid subscription to open-source maintenance.[5]
Code example
Beautiful Soup represents parsed data as a tree which can be searched and iterated over with ordinary Python loops.[6] The example below uses the Python standard library's urllib[7] to load Wikipedia's main page, then uses Beautiful Soup to parse the document and search for all links within.
#!/usr/bin/env python3
# Anchor extraction from HTML document
from bs4 import BeautifulSoup
from urllib.request import urlopen
with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response:
soup = BeautifulSoup(response, 'html.parser')
for anchor in soup.find_all('a'):
print(anchor.get('href', '/'))
History
Beautiful Soup is named both after a poem in Alice's Adventures in Wonderland[8] and tag soup.[9]
Beautiful Soup 3 was the official release line of Beautiful Soup from May 2006 to March 2012. The current release is Beautiful Soup 4.x. Beautiful Soup 4 can be installed with pip install beautifulsoup4
.
In 2021, Python 2.7 support was retired and the release 4.9.3 was the last to support Python 2.7.[10]
See also
References
- ↑ Шаблон:Citation
- ↑ Ошибка цитирования Неверный тег
<ref>
; для сносокcrummy.com
не указан текст - ↑ Шаблон:Cite web
- ↑ Шаблон:Cite web
- ↑ Шаблон:Cite web
- ↑ Шаблон:Cite web
- ↑ Шаблон:Cite web
- ↑ Шаблон:Cite web
- ↑ Шаблон:Cite web
- ↑ Шаблон:Cite web