首页 > 编程语言> > python实现词法分析器

python实现词法分析器

2020-11-20 12:01:17 作者：互联网

这大概是全网最简陋的词法分析器……学了一点python后上手的第一个小实验。

实验要求粘在下面了，但是实现过程中我根据自己想法做了一些修改。

实验目的：

设计并实现一个包含预处理功能的词法分析程序，加深对编译中词法分析过程的理解。

实验要求：

1.实现预处理功能

源程序中可能包含有对程序执行无意义的符号，要求将其剔除。
首先编制一个源程序的输入过程，从键盘、文件或文本框输入若干行语句，依次存入输入缓冲区（字符型数据）；然后编制一个预处理子程序，去掉输入串中的回车符、换行符和跳格符等编辑性文字；把多个空白符合并为一个；去掉注释。

2.实现词法分析功能

输入：所给文法的源程序字符串。
输出：
二元组（syn,token或sum）构成的序列。其中，
syn为单词种别码。
Token为存放的单词自身字符串。
Sum为整型常量。
具体实现时，可以将单词的二元组用结构进行处理。

3.待分析的C语言子集的词法

关键字

main  if  then  while  do  static  int  double  struct  break  else  long  switch  case  typedef  char  return  const  float  short  continue  for  void  default  sizeof  do

所有的关键字都是小写。

运算符和界符
+ - * / : := < <> <= > >= = ; ( ) #
其他标记ID和NUM
通过以下正规式定义其他标记：

ID→letter(letter|digit)*
NUM→digit digit*
letter→a|…|z|A|…|Z
digit→0|…|9…

空格由空白、制表符和换行符组成
空格一般用来分隔ID、NUM、专用符号和关键字，词法分析阶段通常被忽略。

4.各种单词符号对应的种别码

单词符号	种别码	单词符号	种别码
main	1	;	41
if	2	(	42
then	3	)	43
while	4	int	7
do	5	double	8
static	6	struct	9
ID	25	break	10
NUM	26	else	11
+	27	long	12
-	28	switch	13
*	29	case	14
/	30	typedef	15
**	31	char	16
==	32	return	17
<	33	const	18
<>	34	float	19
<=	35	short	20
>	36	continue	21
>=	37	for	22
=	38	void	23
[	39	sizeof	24
]	40	#	0

源代码：

import re
import sys

#关键字，百度百科上复制来的63个关键字……
key_word = ['asm','do','if','return','typedef','auto','double','inline','short','typeid','bool',
            'dynamic_cast','int','signed','typename','break','else','long','sizeof','union','case',
            'enum','mutable','static','unsigned','catch','explicit','namespace','static_cast',
            'using','char','export','new','struct','virtual','class','extern','operator','switch',
            'void','const','false','private','template','volatile','const_cast','float','protected',
            'this','wchar_t','continue','for','public','throw','while','default','friend','register'
            'true','delete','goto','reinterpret_cast','try']

#一些常用函数，不然老被识别为标识符，目前是16个
function_word = ['cin','cout','scanf','printf','abs','sqrt','isalpha','isdigit','tolower','toupper'
                 'strcpy','strlen','time','rand','srand','exit']

operator = ['+','-','*','/',':',':=','<','<>','<=','>','>=','=',';','(',')','#','==','{','}',',','&','[',']',"'"]

with open('cpp.txt', 'w') as file:
    print("请输入需要进行词法分析的源程序:")
    txt = sys.stdin.readlines()
    file.writelines(txt)

with open('cpp.txt', 'r') as file:
        #预处理,增加了去除字符串的功能，毕竟字符串肯定不是标识符啊……
        txt = ' '.join(file.readlines())
        deal_txt = re.sub(r'/\*(.|[\r\n])*?\*/|//.*', ' ', txt)
        deal_txt = re.sub(r'\"(.|[\r\n])*?\"', ' ', txt)
        deal_txt = deal_txt.strip()
        deal_txt = deal_txt.replace('\t', ' ').replace('\r', ' ').replace('\n', ' ')
        #词法分析，标识符识别规则加入了_
        keyword = []
        funword = []
        opeword = []
        idword = []
        numword = []
        errword = []
        pha = re.findall(r'[a-zA-Z_][a-zA-Z0-9_]*', deal_txt)
        num = re.findall(r'\d+',deal_txt)
        str = re.findall(r'[^\w]', deal_txt)
        for p in pha:
            if p in key_word:
                keyword.append({p : key_word.index(p) + 1})
            elif p in function_word:
                funword.append({p : len(key_word) + function_word.index(p) + 1})
            else:
                idword.append({p : 80})
        for n in num:
            numword.append({n : 81})
        for s in str:
            if s in operator:
                opeword.append({s: len(key_word) + len(function_word) + operator.index(s) + 3})
            elif s != ' ':
                errword.append({s : 'ERROR'})
        print("关键字：\n", keyword)
        print('函数：\n', funword)
        print("ID:\n", idword)
        print("数字：\n", numword)
        print("运算符与界符：\n", opeword)
        if len(errword) != 0:
            print("其他：\n", errword)

标签：word,deal,python,分析器,词法,print,txt,append
来源： https://www.cnblogs.com/chengjqyu/p/14010162.html